Hi all,

Yep the SMILES parser changed on master and won’t accept invalid SMILES by 
default. Notice how daylight rejects it also. It should now be the case now 
that if CDK rejects it - daylight also rejects it (If not then it’s a bug). The 
new parser automatically kekulises on load, verifying the bond orders can be 
assigned to aromatic systems. This is much friendly for the CDK as you don’t 
have molecules with all single aromatic bonds floating about. When we added 
this it fixed 2 failing unit tests.

In the molecule you're missing a hydrogen of one or more nitrogens, to know 
which ones is the problem.

The SMILES should be:
> c4ccc2c(cc1=Nc3[nH]cccc3(Cn12))c4
> 


Some toolkits will fix this by default but that’s making several assumptions 
and it’s nothing more than an hack for broken SMILES input. To fix this you 
need to change the formula of the molecule which is never a good start.  You 
can still parse it with the CDK by turning on ‘preserve aromaticity’ (need to 
rename) this disables electron checking but I strongly discourage that. The 
actual fix involves checking every possible combination of hydrogens on 
aromatic nitrogens and phosphates, checkout the fixarom core from 
http://www.daylight.com/download/contrib/.  

Now where this molecules come from is probably more interesting. Most likely 
it’s people using the aromaticity models on formats which don’t support it. The 
MDL model for example doesn’t allow lone pair contributions. If you have marvin 
sketch, try loading ‘[nH]1cccc1’ and then generating an MDL mol file. You’ll 
notice they have there own non-portable work around to ensure the hydrogen is 
kept. Of course everyone knows you should never store aromaticity in the mol 
file :-).

  Mrv0541 11071317592D          

  5  5  0  0  0  0            999 V2000
    1.2964    0.6723    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
    1.9639    0.1874    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    1.7089   -0.5972    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.8839   -0.5972    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.6290    0.1874    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  4  0  0  0  0
  2  3  4  0  0  0  0
  4  5  4  0  0  0  0
  1  5  4  0  0  0  0
  3  4  4  0  0  0  0
M  STY  1   1 DAT
M  SAL   1  1   1
M  SDT   1 MRV_IMPLICIT_H                                        
M  SDD   1     0.0000    0.0000    DR    ALL  0       0  
M  SED   1 IMPL_H1
M  END


Oh some more examples which are now correctly rejected.

> C/1.C/C=C/1
> C-1.C/C=C=1
> ccc
> ccccc
> p1cccc1                       <- generated by older CDK versions!


Cheers,
J

On 7 Nov 2013, at 16:33, Nina Jeliazkova <jeliazkova.n...@gmail.com> wrote:

> 
> 
> 
> On 7 November 2013 18:26, Nina Jeliazkova <jeliazkova.n...@gmail.com> wrote:
> 
> 
> 
> On 7 November 2013 18:18, Rajarshi Guha <rajarshi.g...@gmail.com> wrote:
> It seems 
> c4ccc2c(cc1=Nc3ncccc3(Cn12))c4
> 
> does not parse using the latest CDK master, but does parse fine using 
> http://apps.ideaconsult.net:8080/ambit2/depict?search=c4ccc2c%28cc1%3DNc3ncccc3%28Cn12%29%29c4&smarts=
> 
> I'm not sure what version ambit is using
> 
> 
> cdk 1.4.11
> 
> There is also a test version using cdk 1.5.3 (Sep 2013) and seems to parse 
> fine 
> http://apps.ideaconsult.net:8080/bioclipse/depict?search=c4ccc2c%28cc1%3DNc3ncccc3%28Cn12%29%29c4&smarts=
>  
> 
> Nina
> 
> Regards,
> Nina 
> but could somebody confirm this issue with the latest master?
> 
> 
> -- 
> Rajarshi Guha | http://blog.rguha.net
> NIH Center for Advancing Translational Science
> 
> ------------------------------------------------------------------------------
> November Webinars for C, C++, Fortran Developers
> Accelerate application performance with scalable programming models. Explore
> techniques for threading, error checking, porting, and tuning. Get the most
> from the latest Intel processors and coprocessors. See abstracts and register
> http://pubads.g.doubleclick.net/gampad/clk?id=60136231&iu=/4140/ostg.clktrk
> _______________________________________________
> Cdk-user mailing list
> Cdk-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/cdk-user
> 
> 
> 
> ------------------------------------------------------------------------------
> November Webinars for C, C++, Fortran Developers
> Accelerate application performance with scalable programming models. Explore
> techniques for threading, error checking, porting, and tuning. Get the most 
> from the latest Intel processors and coprocessors. See abstracts and register
> http://pubads.g.doubleclick.net/gampad/clk?id=60136231&iu=/4140/ostg.clktrk_______________________________________________
> Cdk-user mailing list
> Cdk-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/cdk-user

------------------------------------------------------------------------------
November Webinars for C, C++, Fortran Developers
Accelerate application performance with scalable programming models. Explore
techniques for threading, error checking, porting, and tuning. Get the most 
from the latest Intel processors and coprocessors. See abstracts and register
http://pubads.g.doubleclick.net/gampad/clk?id=60136231&iu=/4140/ostg.clktrk
_______________________________________________
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user

Reply via email to