The rdkit sdf files were washed with MOE and then read into rdkit        again 
for 3D structure generation.
As there are only a dozen problem cases out of 1.5 million compounds, I just 
removed them from my main file and downloaded the mol files from chembl and 
double check the structures.

Briiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiian

-----Original Message-----
From: Chris Earnshaw [mailto:cgearns...@gmail.com] 
Sent: Thursday, October 05, 2017 08:46
To: Bennion, Brian <benni...@llnl.gov>
Cc: RDKit Discuss (rdkit-discuss@lists.sourceforge.net) 
<rdkit-discuss@lists.sourceforge.net>
Subject: Re: [Rdkit-discuss] nitrogen valence issues

Hi

Some interesting differences in behaviour compared with my RDkit installation. 
Using the ChEMBL SMILES (freshly downloaded now) -

[NH-][NH+]=NC[C@H]1O[C@@H]2O[C@@H]3[C@@H](CN=[N+]=[N-])O[C@H](O[C@@H]4[C@@H](CN=[N+]=[N-])O[C@H](O[C@@H]5[C@@H](CN=[N+]=[N-])O[C@H](O[C@@H]6[C@@H](CN=[N+]=[N-])O[C@H](O[C@@H]7[C@@H](CN=[N+]=[N-])O[C@H](O[C@@H]8[C@@H](CN=[N+]=[N-])O[C@H](O[C@H]1[C@H](O)[C@H]2O)[C@H](O)[C@H]8O)[C@H](O)[C@H]7O)[C@H](O)[C@H]6O)[C@H](O)[C@H]5O)[C@H](O)[C@H]4O)[C@H](O)[C@H]3O

The problem atoms are the first two. If I convert this to an SD file (using a 
C++ program based on the RDkit libraries) then the resulting SD file contains 
no charge information in the atom block, it's all in M CHG records (which are 
correct) and the problem atoms are still the first two.

The bonding information is incorrect as there's a single bond between the two 
nitrogens and it should be double. Editing the first record in the bond block 
from -
  1  2  1  0
to -
  1  2  2  0
fixes the structure for me, and the resulting SD file can be processed by other 
RDkit programs. I've attached the resulting file in case it helps throw any 
light on what's happening.

I'm puzzled as to why the behaviour is significantly different for you...

Chris

On 5 October 2017 at 15:40, Bennion, Brian <benni...@llnl.gov> wrote:
> The sdf is an rdkit reading of the original smiles string, which if 
> wrong would explain the funky charge settings in the mol block for 
> atoms 84 and 85.  I modified these to 5 and 3 respectively to make the 
> correct charge states, however, that did not resolve the issue.  
> Perhaps the bonding info is also incorrect.  The file is on a remote 
> server so I will repost with attachment if I continue to have problems.
>
> Brian
>
>
> ________________________________
> From: Chris Earnshaw <cgearns...@gmail.com>
> Sent: Thursday, October 5, 2017 12:04:02 AM
> To: Bennion, Brian; RDKit Discuss 
> (rdkit-discuss@lists.sourceforge.net)
> Subject: Re: [Rdkit-discuss] nitrogen valence issues
>
> Hi
>
> Be aware that there is a problem with one of the azide groups in
> CHEMBL592333 - in SMILES it's '-N=[NH+]-[NH-]' rather than '-N=[N+]=[N-].
> This doesn't render the structure chemically invalid but it's probably 
> wrong.
>
> What's the provenance of your SD file? It isn't the same as as a fresh 
> download of this structure from CHEMBL, which can be processed by 
> RDkit quite happily (allowing for the structure being wrong!). Is it 
> possible that your file has got corrupted by some other processing step?
>
> Regards,
> Chris
>
> On 5 October 2017 at 03:28, Greg Landrum <greg.land...@gmail.com> wrote:
>>
>> Hi Brian,
>>
>> When you pasted that into the email the formatting of the mol block 
>> did end up screwed up, which makes this hard to reproduce.
>> Could you please attach the mol block to the message as a file?
>>
>> -greg
>>
>> On Thu, Oct 5, 2017 at 2:21 AM, Bennion, Brian <benni...@llnl.gov> wrote:
>>>
>>> Hello,
>>>
>>> After looking at the email list and seeing that this error has 
>>> cropped up several times for aromatic/aliphatic heterocyclic 
>>> nitrogens I still haven’t been able to resolve the valence =4 error 
>>> for one of the azo groups in a molecule that has 7.  The first 
>>> couple of azo groups seem to be interpreted fine.
>>>
>>> Am I doing something incorrect here or is the mol file not formatted 
>>> properly?
>>>
>>> Thanks
>>>
>>> Brian
>>>
>>>
>>>
>>>
>>>
>>> [16:50:29] Explicit valence for atom # 85 N, 4, is greater than 
>>> permitted
>>>
>>> [16:50:29] ERROR: Could not sanitize molecule ending on line 206
>>>
>>> [16:50:29] ERROR: Explicit valence for atom # 85 N, 4, is greater 
>>> than permitted
>>>
>>>
>>>
>>> CHEMBL592333
>>>
>>>            3D
>>>
>>>
>>>
>>> 91 98  0  0  0  0  0  0  0  0999 V2000
>>>
>>>     8.3826   -4.1789    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>     7.6967   -2.8968    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>     8.5551   -1.5926    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>     9.9817   -1.6449    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>    10.7075   -3.0051    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>     9.8956   -4.2577    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>     9.5145    2.2882    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>     9.8798    0.8284    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>    11.3118    0.3983    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>    12.3905    1.3820    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>    12.0187    2.9205    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>    10.6272    3.3273    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>     9.8204    4.3866    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>     7.9490    5.4779    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>     8.5179    4.0883    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>     4.3822    5.0511    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>     7.6673    2.8820    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>     6.4150    5.6541    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>     5.5680    4.4264    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>     6.1262    3.0849    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>    11.0958   -0.9118    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>     4.5054   -4.0612    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>     6.0427   -3.9376    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>     6.8866   -5.1186    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>     6.3028   -6.4902    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>     4.7637   -6.6458    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>     3.9029   -5.4182    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>     2.7159   -5.9831    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>     0.6356   -5.6026    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>     1.9599   -4.8851    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>     3.1331    3.3333    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>     3.0734    4.8222    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>     1.8216    2.5373    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>    -2.1064   -0.8835    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>    -1.6580    0.6255    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>    -1.8630   -3.0392    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>    -1.1133   -1.9528    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>     0.5294    3.2743    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>     1.7976    5.5400    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>     0.4790    4.7418    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>    -0.1940    0.9129    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>    -0.2800    2.2297    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>     0.3560   -1.6364    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>     0.7840   -0.1761    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>     2.0400   -3.4362    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>    -0.6446   -4.7799    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>    -0.5472   -3.2855    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>     0.7300   -2.5970    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>     8.1699   -5.4891    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>    -0.9196    6.9314    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>     3.3718   -2.7438    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>     3.4375   -1.2457    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>    12.2233   -2.9903    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>     7.3609   -7.5406    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>    -1.9851   -5.4581    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>     4.4420    2.6027    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>    13.8111    0.9358    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>    -3.5451   -1.3353    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>     0.5839   -7.0998    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>    10.5449   -5.6259    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>     4.2197   -8.0395    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>    13.2649    3.7658    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>    -0.8506    5.4331    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>    -2.6861    1.6826    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>     1.8710    1.0374    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>     6.7424    4.5334    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>     8.1208    4.6949    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>     5.0210   -4.9380    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>     4.8553   -6.2914    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>     9.6021   -2.6537    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>    11.1193   -2.6926    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>    -0.3366   -0.2945    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>    -1.8968   -0.3852    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>     8.7855    6.7340    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>     5.3923    6.7330    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>    10.8786    1.4957    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>    12.0747    2.2167    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>    -2.4480   -1.8457    0.0000 N   0  3  0  0  0  0  0  0  0  0  0  0
>>>
>>>    -3.5497   -2.9029    0.0000 N   0  5  0  0  0  0  0  0  0  0  0  0
>>>
>>>    -2.2517    7.6211    0.0000 N   0  3  0  0  0  0  0  0  0  0  0  0
>>>
>>>    -3.5805    8.3170    0.0000 N   0  5  0  0  0  0  0  0  0  0  0  0
>>>
>>>     9.2243    5.6060    0.0000 N   0  3  0  0  0  0  0  0  0  0  0  0
>>>
>>>    10.5994    6.3590    0.0000 N   0  5  0  0  0  0  0  0  0  0  0  0
>>>
>>>    13.4890    2.2193    0.0000 N   0  3  0  0  0  0  0  0  0  0  0  0
>>>
>>>    15.0494    2.4295    0.0000 N   0  5  0  0  0  0  0  0  0  0  0  0
>>>
>>>    11.6943   -4.1991    0.0000 N   0  3  0  0  0  0  0  0  0  0  0  0
>>>
>>>    12.8782   -5.1236    0.0000 N   0  5  0  0  0  0  0  0  0  0  0  0
>>>
>>>     3.4537   -7.0177    0.0000 N   0  3  0  0  0  0  0  0  0  0  0  0
>>>
>>>     2.4539   -8.1103    0.0000 N   0  5  0  0  0  0  0  0  0  0  0  0
>>>
>>>     4.7687   -0.5531    0.0000 N   0  3  0  0  0  0  0  0  0  0  0  0
>>>
>>>     6.1360    0.0648    0.0000 N   0  5  0  0  0  0  0  0  0  0  0  0
>>>
>>>   7 12  1  0  0  0  0
>>>
>>> 31 33  1  0  0  0  0
>>>
>>>   9  8  1  0  0  0  0
>>>
>>> 63 50  1  0  0  0  0
>>>
>>>   9 10  1  0  0  0  0
>>>
>>> 45 51  1  0  0  0  0
>>>
>>> 34 35  1  0  0  0  0
>>>
>>> 51 52  1  0  0  0  0
>>>
>>> 10 11  1  0  0  0  0
>>>
>>> 36 37  1  0  0  0  0
>>>
>>> 34 37  1  0  0  0  0
>>>
>>> 38 33  1  0  0  0  0
>>>
>>> 12 11  1  0  0  0  0
>>>
>>> 13 12  1  0  0  0  0
>>>
>>> 38 40  1  0  0  0  0
>>>
>>>   5 53  1  0  0  0  0
>>>
>>> 32 39  1  0  0  0  0
>>>
>>> 25 54  1  0  0  0  0
>>>
>>> 39 40  1  0  0  0  0
>>>
>>> 46 55  1  0  0  0  0
>>>
>>> 35 64  1  0  0  0  0
>>>
>>> 41 35  1  0  0  0  0
>>>
>>> 31 56  1  0  0  0  0
>>>
>>>   5  6  1  0  0  0  0
>>>
>>> 10 57  1  0  0  0  0
>>>
>>> 41 42  1  0  0  0  0
>>>
>>> 34 58  1  0  0  0  0
>>>
>>> 38 42  1  0  0  0  0
>>>
>>> 29 59  1  0  0  0  0
>>>
>>> 37 43  1  0  0  0  0
>>>
>>>   6 60  1  0  0  0  0
>>>
>>>   7  8  1  0  0  0  0
>>>
>>> 26 61  1  0  0  0  0
>>>
>>> 43 44  1  0  0  0  0
>>>
>>> 11 62  1  0  0  0  0
>>>
>>> 41 44  1  0  0  0  0
>>>
>>>   9 21  1  0  0  0  0
>>>
>>> 40 63  1  0  0  0  0
>>>
>>> 22 23  1  0  0  0  0
>>>
>>> 13 15  1  0  0  0  0
>>>
>>> 14 15  1  0  0  0  0
>>>
>>> 33 65  1  0  0  0  0
>>>
>>>   1  2  1  0  0  0  0
>>>
>>> 20 66  1  0  0  0  0
>>>
>>> 15 17  1  0  0  0  0
>>>
>>> 66 67  1  0  0  0  0
>>>
>>> 30 45  1  0  0  0  0
>>>
>>> 23 68  1  0  0  0  0
>>>
>>>   1  6  1  0  0  0  0
>>>
>>> 68 69  1  0  0  0  0
>>>
>>>   2  3  1  0  0  0  0
>>>
>>>   3 70  1  0  0  0  0
>>>
>>> 27 22  1  0  0  0  0
>>>
>>> 70 71  1  0  0  0  0
>>>
>>> 23 24  1  0  0  0  0
>>>
>>> 43 72  1  0  0  0  0
>>>
>>> 45 48  1  0  0  0  0
>>>
>>> 72 73  1  0  0  0  0
>>>
>>> 29 46  1  0  0  0  0
>>>
>>> 46 47  1  0  0  0  0
>>>
>>> 47 48  1  0  0  0  0
>>>
>>> 14 74  1  0  0  0  0
>>>
>>> 36 47  1  0  0  0  0
>>>
>>> 18 75  1  0  0  0  0
>>>
>>> 24 25  1  0  0  0  0
>>>
>>> 25 26  1  0  0  0  0
>>>
>>> 26 27  1  0  0  0  0
>>>
>>>   7 76  1  0  0  0  0
>>>
>>> 28 27  1  0  0  0  0
>>>
>>> 76 77  1  0  0  0  0
>>>
>>>   3  4  1  0  0  0  0
>>>
>>> 73 78  2  3  0  0  0
>>>
>>>   4  5  1  0  0  0  0
>>>
>>> 78 79  2  0  0  0  0
>>>
>>> 17 20  1  0  0  0  0
>>>
>>> 50 80  2  3  0  0  0
>>>
>>> 14 18  1  0  0  0  0
>>>
>>> 80 81  2  0  0  0  0
>>>
>>> 28 30  1  0  0  0  0
>>>
>>> 67 82  2  3  0  0  0
>>>
>>> 29 30  1  0  0  0  0
>>>
>>> 82 83  2  0  0  0  0
>>>
>>> 24 49  1  0  0  0  0
>>>
>>> 77 84  2  3  0  0  0
>>>
>>> 18 19  1  0  0  0  0
>>>
>>> 84 85  2  0  0  0  0
>>>
>>>   4 21  1  0  0  0  0
>>>
>>> 71 86  2  0  0  0  0
>>>
>>>   1 49  1  0  0  0  0
>>>
>>> 86 87  2  0  0  0  0
>>>
>>> 19 20  1  0  0  0  0
>>>
>>> 69 88  2  3  0  0  0
>>>
>>> 31 32  1  0  0  0  0
>>>
>>> 88 89  2  0  0  0  0
>>>
>>> 16 32  1  0  0  0  0
>>>
>>> 52 90  2  3  0  0  0
>>>
>>> 16 19  1  0  0  0  0
>>>
>>> 90 91  2  0  0  0  0
>>>
>>> M  CHG  1  78   1
>>>
>>> M  CHG  1  79  -1
>>>
>>> M  CHG  1  80   1
>>>
>>> M  CHG  1  81  -1
>>>
>>> M  CHG  1  82   1
>>>
>>> M  CHG  1  83  -1
>>>
>>> M  CHG  1  84   1
>>>
>>> M  CHG  1  85  -1
>>>
>>> M  CHG  1  88   1
>>>
>>> M  CHG  1  89  -1
>>>
>>> M  CHG  1  90   1
>>>
>>> M  CHG  1  91  -1
>>>
>>> M  END
>>>
>>> >  <chembl_id>  (540484)
>>>
>>> CHEMBL592333
>>>
>>> $$$$
>>>
>>>
>>>
>>> --------------------------------------------------------------------
>>> ---------- Check out the vibrant tech community on one of the 
>>> world's most engaging tech sites, Slashdot.org! 
>>> http://sdm.link/slashdot 
>>> _______________________________________________
>>> Rdkit-discuss mailing list
>>> Rdkit-discuss@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> --------- Check out the vibrant tech community on one of the world's 
>> most engaging tech sites, Slashdot.org! http://sdm.link/slashdot 
>> _______________________________________________
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to