On Dec 16, 2016, at 1:55 PM, Stephen Pickett wrote: > With a 2013 RDkit install we get consistent canonicalization between reaction > labelled and unlabelled atoms. > >>> mol = Chem.MolFromSmiles('C1CC([*])CCN1') > >>> Chem.MolToSmiles(mol) > '[*]C1CCNCC1' > >>> mol = Chem.MolFromSmiles('C1CC([*:1])CCN1') > >>> Chem.MolToSmiles(mol) > '[*:1]C1CCNCC1'
2013 RDKit didn't preserve the atom order between labeled and unlabeled atoms. It looked like it for many cases, but there were a few cases where the slight change to the initial atom invariants, caused by the atom label, ended up affecting the SMILES. I no longer have an older version of RDKit installed. Going through my notes, here was one of the failure cases: core => Cc1cc2c3c(c1)C[N@]([*])CCN(C)CC[N@@]([*])Cc1cc(C)cc(c1OCCCO3)C[N@@](C)CCN(C)CC[N@](C)C2 syntax => Cc1cc2c3c(c1)C[N@]([*:1])CCN(C)CC[N@@]([*:2])Cc1cc(C)cc(c1OCCCO3)C[N@](C)CCN(C)CC[N@@](C)C2 canonical => Cc1cc2c3c(c1)C[N@]([*:2])CCN(C)CC[N@@]([*:1])Cc1cc(C)cc(c1OCCCO3)C[N@@](C)CCN(C)CC[N@](C)C2 For my project I ended up canonicalizing with unlabeled atoms, using the _smilesAtomOutputOrder to figure out where the "*" atoms were located in the SMILES string, use CanonicalRankAtoms() to figure out which were symmetrical, and come up with my own canonical labeling on top of the canonical unlabeled SMILES. > I can understand why canonicalization can be different between versions but > I’m not sure whether this change in behaviour is expected? While it is possible to generate a canonical labeling which preserves the same atom order as the canonical unlabeled SMILES (as I did above), that's more complicated. It's easier to include the label as part of the atom invariant and use the regular canonicalization mechanism. Cheers, Andrew da...@dalkescientific.com ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot _______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss