On Apr 19, 2017, at 12:03, Thilo Bauer <thilo.ba...@fau.de> wrote: > is converting SMARTS to SMILES a "lossless" operation, or does one loose > information on doing so?
It is obviously not lossless if you include terms that cannot be represented in SMILES. >>> from rdkit import Chem >>> Chem.MolToSmiles(Chem.MolFromSmarts("[C,N]")) 'C' or which don't make sense as a molecule: >>> Chem.MolToSmiles(Chem.MolFromSmarts("c")) 'c' >>> Chem.MolFromSmiles("c") [23:02:24] non-ring atom 0 marked aromatic It also loses some information which could be represented in SMILES: >>> Chem.MolToSmiles(Chem.MolFromSmarts("[NH4+]")) 'N' >>> Chem.MolToSmiles(Chem.MolFromSmarts("C[N+]1(C)CCCCC1")) 'CN1(C)CCCCC1' >>> Chem.MolToSmiles(Chem.MolFromSmarts("[12C]"), isomericSmiles=True) 'C' Do be careful if you want to handle aromatic atoms and bonds: >>> Chem.MolToSmiles(Chem.MolFromSmarts("[#6]:1:[#6]:[#6]:[#6]:[#6]:[#6]:1")) 'C1:C:C:C:C:C:1' >>> Chem.MolToSmiles(Chem.MolFromSmarts("c=1-c=c-c=c-c=1")) 'c1=c-c=c-c=c-1' > Background: > I've got three different SMARTS strings representing the same structure > - at least when depicting it. Also all three strings result in the exact > same SMILES (see code and output below). It looks like you want SMARTS canonicalization. In general this is hard, because SMARTS can include boolean expressions and recursive SMARTS. If you limit yourself to patterns like '[#6]-1=[#6]-[#6]...', with only atomic numbers and single/double/triple bonds, then I think RDKit will do what you want. Andrew da...@dalkescientific.com ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss