[Rdkit-discuss] How to preserve stereochemistry in a SMARTS subgraph?

2017-05-05 Thread Thilo Bauer
Dear Mailinglist-members,

in rdkit, when doing a MCS search for molecules bearing a chirality 
center, (how) is it possible to preserve the stereochemical information 
when exporting the subgraph to a SMARTS string?

Consider the following three molecules:

 >>> mol_ccw = Chem.MolFromSmiles('C1=C[C@H](Cl)CCC1')
 >>> mol_cw  = Chem.MolFromSmiles('C1=C[C@@H](Cl)CCC1')
 >>> mol_lin = Chem.MolFromSmiles('C=C[C@H](Cl)CCC')

Doing a chirality-sensitive subgraph search leads to the somewhat 
expected result:

 >>> rdFMCS.FindMCS([mol_ccw, mol_cw], matchChiralTag=True).smartsString
'[#6](=[#6])-[#6]-[#6]-[#6]-[#6]-[#17]'
 >>> rdFMCS.FindMCS([mol_ccw, mol_lin], matchChiralTag=True).smartsString
'[#6]=[#6]-[#6](-[#17])-[#6]-[#6]-[#6]'
 >>> rdFMCS.FindMCS([mol_cw, mol_lin], matchChiralTag=True).smartsString
'[#6]-[#6]-[#6]-[#6]-[#6]'

The subgraph over mol_cw and mol_lin includes the stereocenter. But 
unfortunately, the chirality information is not stored in the SMARTS 
string, and using [#6]=[#6]-[#6](-[#17])-[#6]-[#6]-[#6] for a 
chirality-sensitive substructure match leads to the expected result of 
the pattern made from that SMARTS string matching all three molecules:

 >>> patt = Chem.MolFromSmarts('[#6]=[#6]-[#6](-[#17])-[#6]-[#6]-[#6]')
 >>> len({mol_cw, mol_ccw, mol_lin}.GetSubstructMatches(patt, 
useChirality=True))
1

Manually inserting a lazy @H or a &* at the stereocenter leads -of 
course- to the desired result:

 >>> patt = Chem.MolFromSmarts('[#6]=[#6]-[#6@H](-[#17])-[#6]-[#6]-[#6]')
 >>> len({mol_ccw, mol_lin}.GetSubstructMatches(patt, useChirality=True))
1
 >>> len(mol_cw.GetSubstructMatches(patt, useChirality=True))
0

Now, when matching mol_ccw and mol_lin, how do I get the 
stereochemistry-aware SMARTS string 
[#6]=[#6]-[#6&*](-[#17])-[#6]-[#6]-[#6] as the substructure in the 
first place?


Thank you & kind regards,
Thilo


-- 
Dr. Thilo Bauer

Computer-Chemie-Centrum
Friedrich-Alexander-Universität
Nägelsbachstr. 25
91052 Erlangen

+49 170 9738141

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Information contained in SMARTS and SMILES

2017-04-19 Thread Thilo Bauer
Dear mailinglist-members,

is converting SMARTS to SMILES a "lossless" operation, or does one loose 
information on doing so?

Background:
I've got three different SMARTS strings representing the same structure 
- at least when depicting it. Also all three strings result in the exact 
same SMILES (see code and output below).

Now, don't take this wrong, I do know the differences between SMARTS and 
SMILES, and I do know what the symbols in SMARTS mean. I just wonder, 
when I use either the threes SMARTS or the single SMILES as a pattern 
for a substruct match, if there is a chance that I get different 
results, or let's say if I would miss substructure occurences by using 
the single SMILES? I could not make up a case where this happened.


 >>> m = 
Chem.MolFromSmarts('[#6]-1=[#6]-[#6](-[#6]-[#6](-[#6]-1)-[#6])=[#8]')
 >>> Chem.MolToSmiles(m)
'CC1CC=CC(=O)C1'
 >>> m = Chem.MolFromSmarts('[#6]-1-[#6]=[#6]-[#6](-[#6]-[#6]-1-[#6])=[#8]')
 >>> Chem.MolToSmiles(m)
'CC1CC=CC(=O)C1'
 >>> m = Chem.MolFromSmarts('[#6]-1-[#6](-[#6]=[#6]-[#6]-[#6]-1-[#6])=[#8]')
 >>> Chem.MolToSmiles(m)
'CC1CC=CC(=O)C1'


Thank's a lot in advance!

Thilo





--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss