Hi, thanks for the detailed explanations. I have a few further qustions: Am Freitag, 15. November 2013, 09:58:06 schrieb John May: > No problem, > > So it’s still work in progress (before 1.6) and there are a couple of caveats > and things to be aware of. Here’s some info that might not be obvious if > you’re using the developer version. The current parser and generator will be > more of a do it yourself SMILES with another utility class which will do it > correct (i.e. ensure correct aromaticity etc). Currently, i just tested the things out and it is not used in any release versions. However, i am looking into integrating the speed boost. Is there any plan when the 1.6 stable will be released? Does it make sense to use a 1.5.x version or are the interfaces to unstable at the moment? > > Generator > - The 6 seconds is still slow but that is likely due to the > canonicalisation, strictly speaking isomeric SMILES is non-canonical (that > would be absolute SMILES) and in future canonical generation won’t be on by > default. We need canonical SMILES. Do i understand correctly, that the current git versions isomeric SMILES are canonical? Will a 1.6 canonical string match a 1.4 canonical string? We use the SMILES to match molecules.
> - Aromaticity is no longer redone for SMILES generation. The generator > outputs what ever you give it. If you’re using the SMILES for indexing > structures they should be aromatised first (see below). Good to know! Applying the aromaticity does not seem to add a measurable amount of computation time to the SMILES generation. Greetings Till > - Tetrahedral and Double-Bond stereo chemistry are now round tripped between > SMILES/InChI (working on MDL and interpreting depictions / 3D coordinates). > - implicit hydrogen specification on the organic subset is now correct > > Parser > - molecules read from SMILES have their implicit hydrogen counts all set > (depending on what else you use this means you might not need to atom type > your structures) > - SMILES are kekulised automatically on load - if a molecule could not be > kekulised an exception is throw. The kekulisation is fast enough (< 10 s on 1 > mil structures) that it’s a good sanity check. If you find a molecule throws > an exceptions check with Daylight’s DEPICT service. If they accept it then > it’s a bug - otherwise the generated SMILES is invalid (normally missing Hs > on nitrogens). > > For the aromaticity, there is a new (faster) class. Need to go through and > replaced the existing uses but here is a summary: > > Aromaticity aromaticity = new Aromaticity(ElectronDontation.daylight(), // > CDK model needs atom types, Daylight model need hydrogens > Cycles.all()); // will > timeout on fullerenes but I have a fix on the patch tracker > > aromaticity.apply(molecule); // apply the aromaticity model to the container > (removing any previous specification) > > Cheers, > John > > On 14 Nov 2013, at 18:49, Till Schäfer <till2.schae...@tu-dortmund.de> wrote: > > > Hi, > > the new isomeric SmilesGenerator (todays git) is incredible fast. For a > > small (110 mols) data set with huge molecules the smiles creation time went > > down from 110 seconds (scaffold hunters "optimized" 1.4.19 version) to 6 > > seconds! > > > > in the following: the largest mol in the data set :-) > > > > [H]OC1([H])C([H])([H])C([H])(OC1([H])C([H])([H])OP(=O)(O[H])OC2([H])C([H])([H])C([H])(OC2([H])C([H])([H])OP(=O)(O[H])OC3([H])C([H])([H])C([H])(OC3([H])C([H])([H])OP(=O)(O[H])OC4([H])C([H])(O[H])C([H])(OC4([H])C([H])([H])OP(=O)(O[H])OC5([H])C([H])(O[H])C([H])(OC5([H])C([H])([H])OP(=O)(O[H])OC6([H])C([H])(O[H])C([H])(OC6([H])C([H])([H])OP(=O)(O[H])OC7([H])C([H])(O[H])C([H])(OC7([H])C([H])([H])OP(=O)(O[H])OC8([H])C([H])(O[H])C([H])(OC8([H])C([H])([H])OP(=O)(O[H])OC9([H])C([H])(O[H])C([H])(OC9([H])C([H])([H])OP(=O)(O[H])OC%10([H])C([H])(O[H])C([H])(OC%10([H])C([H])([H])OP(=O)(O[H])OC%11([H])C([H])(O[H])C([H])(OC%11([H])C([H])([H])OP(=O)(O[H])OC%12([H])C([H])(O[H])C([H])(OC%12([H])C([H])([H])OP(=O)(O[H])OC%13([H])C([H])(O[H])C([H])(OC%13([H])C([H])([H])OP(=O)(O[H])OC%14([H])C([H])(O[H])C([H])(OC%14([H])C([H])([H])OP(=O)(O[H])OC%15([H])C([H])(O[H])C([H])(OC%15([H])C([H])([H])OP(=O)(O[H])OC%16([H])C([H])(O[H])C([H])(OC%16([H])C([H])([H])OP(=O)(O[H])OC%17([H])C([H])(O[H])C([H])(OC%17([H])C([H])([H])OP(=O)(O[H])OC%18([H])C([H])(O[H])C([H])(OC%18([H])C([H])([H])OP(=O)(O[H])OC%19([H])C([H])(O[H])C([H])(OC%19([H])C([H])([H])OP(=O)(O[H])OC%20([H])C([H])(O[H])C([H])(OC%20([H])C([H])([H])OP(=O)(O[H])OC%21([H])C([H])(O[H])C([H])(OC%21([H])C([H])([H])OP(=O)(O[H])OC%22([H])C([H])(O[H])C([H])(OC%22([H])C([H])([H])OP(=O)(O[H])OC%23([H])C([H])(O[H])C([H])(OC%23([H])C([H])([H])OP(=O)(O[H])OC%24([H])C([H])(O[H])C([H])(OC%24([H])C([H])([H])OP(=O)(O[H])OC%25([H])C([H])(O[H])C([H])(OC%25([H])C([H])([H])OP(=O)(O[H])OC%26([H])C([H])([H])C([H])(OC%26([H])C([H])([H])OP(=O)(O[H])OC%27([H])C([H])([H])C([H])(OC%27([H])C([H])([H])OP(=O)(O[H])OC%28([H])C([H])([H])C([H])(OC%28([H])C([H])([H])OP(=O)(O[H])O[H])N%29C([H])=NC=%30C(=O)N([H])C(=NC%30%29)N([H])[H])N%31C([H])=NC=%32C(=O)N([H])C(=NC%32%31)N([H])[H])N%33C(=O)N=C(C([H])=C%33[H])N([H])[H])N%34C([H])=C([H])C(=O)N([H])C%34=O)N%35C([H])=NC=%36C(=O)N([H])C(=NC%36%35)N([H])[H])N%37C([H])=NC=%38C(=O)N([H])C(=NC%38%37)N([H])[H])N%39C([H])=NC=%40C(=O)N([H])C(=NC%40%39)N([H])[H])N%41C(=O)N=C(C([H])=C%41[H])N([H])[H])N%42C([H])=NC=%43C(=O)N([H])C(=NC%43%42)N([H])[H])N%44C(=O)N=C(C([H])=C%44[H])N([H])[H])N%45C([H])=NC%46=C(N=C([H])N=C%46%45)N([H])[H])N%47C(=O)N=C(C([H])=C%47[H])N([H])[H])N%48C([H])=C([H])C(=O)N([H])C%48=O)N%49C([H])=C([H])C(=O)N([H])C%49=O)N%50C(=O)N=C(C([H])=C%50[H])N([H])[H])N%51C([H])=NC=%52C(=O)N([H])C(=NC%52%51)N([H])[H])N%53C([H])=NC=%54C(=O)N([H])C(=NC%54%53)N([H])[H])N%55C([H])=C([H])C(=O)N([H])C%55=O)N%56C([H])=NC=%57C(=O)N([H])C(=NC%57%56)N([H])[H])N%58C(=O)N=C(C([H])=C%58[H])N([H])[H])N%59C([H])=NC=%60C(=O)N([H])C(=NC%60%59)N([H])[H])N%61C([H])=NC=%62C(=O)N([H])C(=NC%62%61)N([H])[H])N%63C([H])=C([H])C(=O)N([H])C%63=O)N%64C(=O)N=C(C([H])=C%64[H])N([H])[H])N%65C([H])=NC%66=C(N=C([H])N=C%66%65)N([H])[H])N%67C([H])=NC=%68C(=O)N([H])C(=NC%68%67)N([H])[H])N%69C(=O)N=C(C([H])=C%69[H])N([H])[H])N%70C(=O)N=C(C([H])=C%70[H])N([H])[H] > > > > > > Thanks for the good work > > Till Schäfer > > > > > -- Dipl.-Inf. Till Schäfer Technische Universität Dortmund Chair 11 - Algorithm Engineering Otto-Hahn-Str. 14 / Raum 237 44227 Dortmund, Germany e-mail: till.schae...@cs.tu-dortmund.de phone: +49(231)755-7706 fax: +49(231)755-7740 web: http://ls11-www.cs.uni-dortmund.de/staff/schaefer pgp: https://keyserver2.pgp.com/vkd/SubmitSearch.event?&&SearchCriteria=0xD84DED79
signature.asc
Description: This is a digitally signed message part.
------------------------------------------------------------------------------ DreamFactory - Open Source REST & JSON Services for HTML5 & Native Apps OAuth, Users, Roles, SQL, NoSQL, BLOB Storage and External API Access Free app hosting. Or install the open source package on any LAMP server. Sign up and see examples for AngularJS, jQuery, Sencha Touch and Native! http://pubads.g.doubleclick.net/gampad/clk?id=63469471&iu=/4140/ostg.clktrk
_______________________________________________ Cdk-user mailing list Cdk-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/cdk-user