On Apr 19, 2017, at 12:03, Thilo Bauer <thilo.ba...@fau.de> wrote:
> is converting SMARTS to SMILES a "lossless" operation, or does one loose 
> information on doing so?


It is obviously not lossless if you include terms that cannot be represented in 
SMILES.

>>> from rdkit import Chem
>>> Chem.MolToSmiles(Chem.MolFromSmarts("[C,N]"))
'C'

or which don't make sense as a molecule:

>>> Chem.MolToSmiles(Chem.MolFromSmarts("c"))
'c'
>>> Chem.MolFromSmiles("c")
[23:02:24] non-ring atom 0 marked aromatic


It also loses some information which could be represented in SMILES:

>>> Chem.MolToSmiles(Chem.MolFromSmarts("[NH4+]"))
'N'
>>> Chem.MolToSmiles(Chem.MolFromSmarts("C[N+]1(C)CCCCC1"))
'CN1(C)CCCCC1'
>>> Chem.MolToSmiles(Chem.MolFromSmarts("[12C]"), isomericSmiles=True)
'C'

Do be careful if you want to handle aromatic atoms and bonds:

>>> Chem.MolToSmiles(Chem.MolFromSmarts("[#6]:1:[#6]:[#6]:[#6]:[#6]:[#6]:1"))
'C1:C:C:C:C:C:1'
>>> Chem.MolToSmiles(Chem.MolFromSmarts("c=1-c=c-c=c-c=1"))
'c1=c-c=c-c=c-1'


> Background:
> I've got three different SMARTS strings representing the same structure 
> - at least when depicting it. Also all three strings result in the exact 
> same SMILES (see code and output below).

It looks like you want SMARTS canonicalization.

In general this is hard, because SMARTS can include boolean expressions and 
recursive SMARTS.

If you limit yourself to patterns like '[#6]-1=[#6]-[#6]...', with only atomic 
numbers and single/double/triple bonds, then I think RDKit will do what you 
want.




                                Andrew
                                da...@dalkescientific.com



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to