HI Till,
> Currently, i just tested the things out and it is not used in any release
> versions. However, i am looking into integrating the speed boost. Is there
> any plan when the 1.6 stable will be released? Does it make sense to use a
> 1.5.x version or are the interfaces to unstable at the moment?
The original plan was end of this sumer :(.
Here’s the main things I would like to get done before 1.6. Maven is kind of
the big one and ideally I don’t want to switch to maven and then release
straight away so we can find problems. In my mind the release will be 1
month/1.5 months after the maven switch.
correct perception of tetrahedral and double bond stereochemistry from 2D/3D
coordinates (in progress)
convert build system to maven (depends on go ahead from Egon and having
existing patches applied)
SMILES utility including universal SMILES -
http://www.jcheminf.com/content/4/1/22 (depends on maven)
Here’s what would be nice to have:
much faster fingerprint generation (will probably add this in 1.6 as a separate
module like nio in the JDK ‘nfp=new fingerprint')
depict stereochemisy of double bonds and wedge/hatch placement for tetrahedral
centres. If anyone knows or wants to do this that would be a great help?
> Does it make sense to use a 1.5.x version or are the interfaces to unstable
> at the moment?
So I *might* change the stereochemistry interfaces but that only affects you if
you are doing manual manipulation of the IStereoElements. The SMILES changes
will be as a new class and I intend to keep the parser/generate using the same
API (the default functionality might change though). I did want to deprecate
IAtomContainer but I don’t think it’s feasible for this version.
> We need canonical SMILES. Do i understand correctly, that the current git
> versions isomeric SMILES are canonical?
Yes they are at the moment. But the CDK canonical labeller does not consider
steroechemsitry and so generating canonical isomeric SMILES (absolute SMILES)
has always been impossible it’s just never been indicated as such. The addition
of Universal SMILES (which uses the InChI algorithm) will provide correct
canonical isomeric SMILES.
Might I also suggest the ‘cdk-hash’ module. This will generate 64-bit stereo
specific hash codes for indexing structures and quickly finding identical
structures. The entry point is
http://cdk.github.io/cdk/1.5/docs/api/index.html?org/openscience/cdk/hash/HashGeneratorMaker.html.
Oh and the (sub)graph isomorphism testing is also faster now and
stereospecific. This is unreviewed but you can find it in the maven release I
made the other day. Entry points for the substructure matching is detailed
here:
http://efficientbits.blogspot.co.uk/2013/11/improved-substructure-matching.html
> Will a 1.6 canonical string match a 1.4 canonical string? We use the SMILES
> to match molecules.
The same canonicalisation algorithm is used but the SMILES generated also
depends on other rules such as ring numbering, visiting double bonds first,
aromaticity etc. The inclusion of Universal SMILES will definitely be
different. I think it's safer to presume they are different. It’s unfortunate
but is better in the long run.
Thanks,
J
On 15 Nov 2013, at 12:43, Till Schäfer <till2.schae...@tu-dortmund.de> wrote:
> Hi,
> thanks for the detailed explanations. I have a few further qustions:
>
> Am Freitag, 15. November 2013, 09:58:06 schrieb John May:
>> No problem,
>>
>> So it’s still work in progress (before 1.6) and there are a couple of
>> caveats and things to be aware of. Here’s some info that might not be
>> obvious if you’re using the developer version. The current parser and
>> generator will be more of a do it yourself SMILES with another utility class
>> which will do it correct (i.e. ensure correct aromaticity etc).
> Currently, i just tested the things out and it is not used in any release
> versions. However, i am looking into integrating the speed boost. Is there
> any plan when the 1.6 stable will be released? Does it make sense to use a
> 1.5.x version or are the interfaces to unstable at the moment?
>>
>> Generator
>> - The 6 seconds is still slow but that is likely due to the
>> canonicalisation, strictly speaking isomeric SMILES is non-canonical (that
>> would be absolute SMILES) and in future canonical generation won’t be on by
>> default.
> We need canonical SMILES. Do i understand correctly, that the current git
> versions isomeric SMILES are canonical?
> Will a 1.6 canonical string match a 1.4 canonical string? We use the SMILES
> to match molecules.
>
>> - Aromaticity is no longer redone for SMILES generation. The generator
>> outputs what ever you give it. If you’re using the SMILES for indexing
>> structures they should be aromatised first (see below).
> Good to know! Applying the aromaticity does not seem to add a measurable
> amount of computation time to the SMILES generation.
>
>
> Greetings
> Till
>
>
>> - Tetrahedral and Double-Bond stereo chemistry are now round tripped between
>> SMILES/InChI (working on MDL and interpreting depictions / 3D coordinates).
>> - implicit hydrogen specification on the organic subset is now correct
>>
>> Parser
>> - molecules read from SMILES have their implicit hydrogen counts all set
>> (depending on what else you use this means you might not need to atom type
>> your structures)
>> - SMILES are kekulised automatically on load - if a molecule could not be
>> kekulised an exception is throw. The kekulisation is fast enough (< 10 s on
>> 1 mil structures) that it’s a good sanity check. If you find a molecule
>> throws an exceptions check with Daylight’s DEPICT service. If they accept it
>> then it’s a bug - otherwise the generated SMILES is invalid (normally
>> missing Hs on nitrogens).
>>
>> For the aromaticity, there is a new (faster) class. Need to go through and
>> replaced the existing uses but here is a summary:
>>
>> Aromaticity aromaticity = new Aromaticity(ElectronDontation.daylight(), //
>> CDK model needs atom types, Daylight model need hydrogens
>> Cycles.all()); // will
>> timeout on fullerenes but I have a fix on the patch tracker
>>
>> aromaticity.apply(molecule); // apply the aromaticity model to the container
>> (removing any previous specification)
>>
>> Cheers,
>> John
>>
>> On 14 Nov 2013, at 18:49, Till Schäfer <till2.schae...@tu-dortmund.de> wrote:
>>
>>> Hi,
>>> the new isomeric SmilesGenerator (todays git) is incredible fast. For a
>>> small (110 mols) data set with huge molecules the smiles creation time went
>>> down from 110 seconds (scaffold hunters "optimized" 1.4.19 version) to 6
>>> seconds!
>>>
>>> in the following: the largest mol in the data set :-)
>>>
>>> [H]OC1([H])C([H])([H])C([H])(OC1([H])C([H])([H])OP(=O)(O[H])OC2([H])C([H])([H])C([H])(OC2([H])C([H])([H])OP(=O)(O[H])OC3([H])C([H])([H])C([H])(OC3([H])C([H])([H])OP(=O)(O[H])OC4([H])C([H])(O[H])C([H])(OC4([H])C([H])([H])OP(=O)(O[H])OC5([H])C([H])(O[H])C([H])(OC5([H])C([H])([H])OP(=O)(O[H])OC6([H])C([H])(O[H])C([H])(OC6([H])C([H])([H])OP(=O)(O[H])OC7([H])C([H])(O[H])C([H])(OC7([H])C([H])([H])OP(=O)(O[H])OC8([H])C([H])(O[H])C([H])(OC8([H])C([H])([H])OP(=O)(O[H])OC9([H])C([H])(O[H])C([H])(OC9([H])C([H])([H])OP(=O)(O[H])OC%10([H])C([H])(O[H])C([H])(OC%10([H])C([H])([H])OP(=O)(O[H])OC%11([H])C([H])(O[H])C([H])(OC%11([H])C([H])([H])OP(=O)(O[H])OC%12([H])C([H])(O[H])C([H])(OC%12([H])C([H])([H])OP(=O)(O[H])OC%13([H])C([H])(O[H])C([H])(OC%13([H])C([H])([H])OP(=O)(O[H])OC%14([H])C([H])(O[H])C([H])(OC%14([H])C([H])([H])OP(=O)(O[H])OC%15([H])C([H])(O[H])C([H])(OC%15([H])C([H])([H])OP(=O)(O[H])OC%16([H])C([H])(O[H])C([H])(OC%16([H])C([H])([H])OP(=O)(O[H])OC%17([H])C([H])(O[H])C([H])(OC%17([H])C([H])([H])OP(=O)(O[H])OC%18([H])C([H])(O[H])C([H])(OC%18([H])C([H])([H])OP(=O)(O[H])OC%19([H])C([H])(O[H])C([H])(OC%19([H])C([H])([H])OP(=O)(O[H])OC%20([H])C([H])(O[H])C([H])(OC%20([H])C([H])([H])OP(=O)(O[H])OC%21([H])C([H])(O[H])C([H])(OC%21([H])C([H])([H])OP(=O)(O[H])OC%22([H])C([H])(O[H])C([H])(OC%22([H])C([H])([H])OP(=O)(O[H])OC%23([H])C([H])(O[H])C([H])(OC%23([H])C([H])([H])OP(=O)(O[H])OC%24([H])C([H])(O[H])C([H])(OC%24([H])C([H])([H])OP(=O)(O[H])OC%25([H])C([H])(O[H])C([H])(OC%25([H])C([H])([H])OP(=O)(O[H])OC%26([H])C([H])([H])C([H])(OC%26([H])C([H])([H])OP(=O)(O[H])OC%27([H])C([H])([H])C([H])(OC%27([H])C([H])([H])OP(=O)(O[H])OC%28([H])C([H])([H])C([H])(OC%28([H])C([H])([H])OP(=O)(O[H])O[H])N%29C([H])=NC=%30C(=O)N([H])C(=NC%30%29)N([H])[H])N%31C([H])=NC=%32C(=O)N([H])C(=NC%32%31)N([H])[H])N%33C(=O)N=C(C([H])=C%33[H])N([H])[H])N%34C([H])=C([H])C(=O)N([H])C%34=O)N%35C([H])=NC=%36C(=O)N([H])C(=NC%36%35)N([H])[H])N%37C([H])=NC=%38C(=O)N([H])C(=NC%38%37)N([H])[H])N%39C([H])=NC=%40C(=O)N([H])C(=NC%40%39)N([H])[H])N%41C(=O)N=C(C([H])=C%41[H])N([H])[H])N%42C([H])=NC=%43C(=O)N([H])C(=NC%43%42)N([H])[H])N%44C(=O)N=C(C([H])=C%44[H])N([H])[H])N%45C([H])=NC%46=C(N=C([H])N=C%46%45)N([H])[H])N%47C(=O)N=C(C([H])=C%47[H])N([H])[H])N%48C([H])=C([H])C(=O)N([H])C%48=O)N%49C([H])=C([H])C(=O)N([H])C%49=O)N%50C(=O)N=C(C([H])=C%50[H])N([H])[H])N%51C([H])=NC=%52C(=O)N([H])C(=NC%52%51)N([H])[H])N%53C([H])=NC=%54C(=O)N([H])C(=NC%54%53)N([H])[H])N%55C([H])=C([H])C(=O)N([H])C%55=O)N%56C([H])=NC=%57C(=O)N([H])C(=NC%57%56)N([H])[H])N%58C(=O)N=C(C([H])=C%58[H])N([H])[H])N%59C([H])=NC=%60C(=O)N([H])C(=NC%60%59)N([H])[H])N%61C([H])=NC=%62C(=O)N([H])C(=NC%62%61)N([H])[H])N%63C([H])=C([H])C(=O)N([H])C%63=O)N%64C(=O)N=C(C([H])=C%64[H])N([H])[H])N%65C([H])=NC%66=C(N=C([H])N=C%66%65)N([H])[H])N%67C([H])=NC=%68C(=O)N([H])C(=NC%68%67)N([H])[H])N%69C(=O)N=C(C([H])=C%69[H])N([H])[H])N%70C(=O)N=C(C([H])=C%70[H])N([H])[H]
>>>
>>>
>>> Thanks for the good work
>>> Till Schäfer
>>>
>>>
>>
> --
> Dipl.-Inf. Till Schäfer
> Technische Universität Dortmund
> Chair 11 - Algorithm Engineering
> Otto-Hahn-Str. 14 / Raum 237
> 44227 Dortmund, Germany
>
> e-mail: till.schae...@cs.tu-dortmund.de
> phone: +49(231)755-7706
> fax: +49(231)755-7740
> web: http://ls11-www.cs.uni-dortmund.de/staff/schaefer
> pgp:
> https://keyserver2.pgp.com/vkd/SubmitSearch.event?&&SearchCriteria=0xD84DED79------------------------------------------------------------------------------
> DreamFactory - Open Source REST & JSON Services for HTML5 & Native Apps
> OAuth, Users, Roles, SQL, NoSQL, BLOB Storage and External API Access
> Free app hosting. Or install the open source package on any LAMP server.
> Sign up and see examples for AngularJS, jQuery, Sencha Touch and Native!
> http://pubads.g.doubleclick.net/gampad/clk?id=63469471&iu=/4140/ostg.clktrk_______________________________________________
> Cdk-user mailing list
> Cdk-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/cdk-user
------------------------------------------------------------------------------
DreamFactory - Open Source REST & JSON Services for HTML5 & Native Apps
OAuth, Users, Roles, SQL, NoSQL, BLOB Storage and External API Access
Free app hosting. Or install the open source package on any LAMP server.
Sign up and see examples for AngularJS, jQuery, Sencha Touch and Native!
http://pubads.g.doubleclick.net/gampad/clk?id=63469471&iu=/4140/ostg.clktrk
_______________________________________________
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user