I usually take this route: a) Filter molecules which are identical using CDK fingerprint (i.e. they are similar) b) Then run MCS on this subset to filter out those which are not identical (i.e topological identity).
Best wishes, Asad Syed Asad Rahman (PhD, PG, B.Engg) Research Scientist s9a...@googlemail.com On 23 Feb 2010, at 12:03, cdk-user-requ...@lists.sourceforge.net wrote: > Send Cdk-user mailing list submissions to > cdk-user@lists.sourceforge.net > > To subscribe or unsubscribe via the World Wide Web, visit > https://lists.sourceforge.net/lists/listinfo/cdk-user > or, via email, send a message with subject or body 'help' to > cdk-user-requ...@lists.sourceforge.net > > You can reach the person managing the list at > cdk-user-ow...@lists.sourceforge.net > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Cdk-user digest..." > > > Today's Topics: > > 1. Unique identifiers for a molecule (Thomas G. Kristensen) > 2. Re: Unique identifiers for a molecule (Rajarshi Guha) > 3. Re: Unique identifiers for a molecule (Thomas G. Kristensen) > 4. Re: Unique identifiers for a molecule (Rajarshi Guha) > 5. Re: Unique identifiers for a molecule (Thomas G. Kristensen) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Mon, 22 Feb 2010 12:51:43 -0800 > From: "Thomas G. Kristensen" <t...@cs.au.dk> > Subject: [Cdk-user] Unique identifiers for a molecule > To: cdk-user@lists.sourceforge.net > Message-ID: > <f0da2721002221251i69cf918fme57e71af7e807...@mail.gmail.com> > Content-Type: text/plain; charset=ISO-8859-1 > > Hi all, > > I'm writing a program that needs to compare molecule structures to > infer if they are topologically identical. So far I have been using > CDKs SmilesGenerator to compare string representations of molecules, > but I've realised that the SMILES strings are not canonical, even > though the API states that they are. > > I realise that fixing this is a very time consuming task, and I don't > expect it to be fixed in the near future. Is there another way of > comparing two molecules to assess if they are topologically identical? > > Thanks, > > Thomas > > > > ------------------------------ > > Message: 2 > Date: Mon, 22 Feb 2010 16:02:48 -0500 > From: Rajarshi Guha <rajarshi.g...@gmail.com> > Subject: Re: [Cdk-user] Unique identifiers for a molecule > To: "Thomas G. Kristensen" <t...@cs.au.dk> > Cc: cdk-user@lists.sourceforge.net > Message-ID: > <773cea9e1002221302j757206efg953dfb5512618...@mail.gmail.com> > Content-Type: text/plain; charset=ISO-8859-1 > > On Mon, Feb 22, 2010 at 3:51 PM, Thomas G. Kristensen <t...@cs.au.dk> wrote: >> Hi all, >> >> I'm writing a program that needs to compare molecule structures to >> infer if they are topologically identical. So far I have been using >> CDKs SmilesGenerator to compare string representations of molecules, >> but I've realised that the SMILES strings are not canonical, even >> though the API states that they are. > > InChI's? > > Also, can you provide some examples where the SMILES output is not canonical? > > -- > Rajarshi Guha > NIH Chemical Genomics Center > > > > ------------------------------ > > Message: 3 > Date: Mon, 22 Feb 2010 13:10:38 -0800 > From: "Thomas G. Kristensen" <t...@cs.au.dk> > Subject: Re: [Cdk-user] Unique identifiers for a molecule > To: rajarshi.g...@gmail.com > Cc: cdk-user@lists.sourceforge.net > Message-ID: > <f0da2721002221310x4f900a8bj11e4d403a330e...@mail.gmail.com> > Content-Type: text/plain; charset=ISO-8859-1 > > That was exactly what I was looking for, can't understand I didn't > remember that! > > Regarding the examples, I just want to make sure I'm right about my > claim. My program outputs both > "CC(C)C" > and > "C(C)CC" > which I would think is the same molecule. It's in an intermediate > step, so I don't have it on file, but I can probably write the program > to output them during a run. > > Thomas > > On Mon, Feb 22, 2010 at 1:02 PM, Rajarshi Guha <rajarshi.g...@gmail.com> > wrote: >> On Mon, Feb 22, 2010 at 3:51 PM, Thomas G. Kristensen <t...@cs.au.dk> wrote: >>> Hi all, >>> >>> I'm writing a program that needs to compare molecule structures to >>> infer if they are topologically identical. So far I have been using >>> CDKs SmilesGenerator to compare string representations of molecules, >>> but I've realised that the SMILES strings are not canonical, even >>> though the API states that they are. >> >> InChI's? >> >> Also, can you provide some examples where the SMILES output is not canonical? >> >> -- >> Rajarshi Guha >> NIH Chemical Genomics Center >> > > > > ------------------------------ > > Message: 4 > Date: Mon, 22 Feb 2010 17:12:03 -0500 > From: Rajarshi Guha <rajarshi.g...@gmail.com> > Subject: Re: [Cdk-user] Unique identifiers for a molecule > To: "Thomas G. Kristensen" <t...@cs.au.dk> > Cc: cdk-user@lists.sourceforge.net > Message-ID: > <773cea9e1002221412r23b363fcsa295b3b829f7...@mail.gmail.com> > Content-Type: text/plain; charset=ISO-8859-1 > > On Mon, Feb 22, 2010 at 4:10 PM, Thomas G. Kristensen <t...@cs.au.dk> wrote: >> That was exactly what I was looking for, can't understand I didn't >> remember that! >> >> Regarding the examples, I just want to make sure I'm right about my >> claim. My program outputs both >> "CC(C)C" >> and >> "C(C)CC" >> which I would think is the same molecule. > > Actually they are different molecules - since in the first one there > is a single tertiary carbon and in the second one there are only > secondary carbons. > > (Also, SMILES are read from left to right) > > -- > Rajarshi Guha > NIH Chemical Genomics Center > > > > ------------------------------ > > Message: 5 > Date: Mon, 22 Feb 2010 16:39:38 -0800 > From: "Thomas G. Kristensen" <t...@cs.au.dk> > Subject: Re: [Cdk-user] Unique identifiers for a molecule > To: rajarshi.g...@gmail.com > Cc: cdk-user@lists.sourceforge.net > Message-ID: > <f0da2721002221639u2c225f2fs599cef02e6fb7...@mail.gmail.com> > Content-Type: text/plain; charset=ISO-8859-1 > > Ahhh! Then there might not be a problem at all. I'll return if I find > other examples. > > Thomas > > On Mon, Feb 22, 2010 at 2:12 PM, Rajarshi Guha <rajarshi.g...@gmail.com> > wrote: >> On Mon, Feb 22, 2010 at 4:10 PM, Thomas G. Kristensen <t...@cs.au.dk> wrote: >>> That was exactly what I was looking for, can't understand I didn't >>> remember that! >>> >>> Regarding the examples, I just want to make sure I'm right about my >>> claim. My program outputs both >>> "CC(C)C" >>> and >>> "C(C)CC" >>> which I would think is the same molecule. >> >> Actually they are different molecules - since in the first one there >> is a single tertiary carbon and in the second one there are only >> secondary carbons. >> >> (Also, SMILES are read from left to right) >> >> -- >> Rajarshi Guha >> NIH Chemical Genomics Center >> > > > > ------------------------------ > > ------------------------------------------------------------------------------ > Download Intel® Parallel Studio Eval > Try the new software tools for yourself. Speed compiling, find bugs > proactively, and fine-tune applications for parallel performance. > See why Intel Parallel Studio got high marks during beta. > http://p.sf.net/sfu/intel-sw-dev > > ------------------------------ > > _______________________________________________ > Cdk-user mailing list > Cdk-user@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/cdk-user > > > End of Cdk-user Digest, Vol 45, Issue 9 > *************************************** ------------------------------------------------------------------------------ Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev _______________________________________________ Cdk-user mailing list Cdk-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/cdk-user