Re: [Rdkit-discuss] DeleteSubstructs using an MCS object

2019-01-21 Thread Paolo Tosco

Hi Jenke,


The reason why the resulting molecule contains no atoms is that the 
3,5-dichlorophenyl substituent matches twice the 3-chlorophenyl 
substructure. If you look at the deleteSubstructs() code in 
ChemTransforms.cpp you may see that the atoms deleted from the main 
structures are the union of all possible matches (two in this case). 
Therefore, both the 3- and the 5-chloro substituents are deleted, and as 
all other atoms are part of the MCS, you are left with an empty molecule.



If you wish to delete matches one at a time, rather than deleting the 
union of all matches, you may use the DeleteSubstructs() Python 
implementation that I drafted in this gist:


https://gist.github.com/ptosco/1b6bc727ddee32b4d411cf5c2aea7291

(disclaimer: I wrote it very quickly and tested it only with your 
example, so I don't guarantee it is bug-free)


This DeleteSubstructs() implementation returns a list of all structures 
obtained by deleting the matched substructures one at a time. As I see 
you are using 3D structures, the code deletes all orphaned hydrogens and 
caps the fragments remaining after deleting substructures with 
hydrogens. If you don't care about 3D coordinates you may want to 
uniquify results based on canonical SMILES and, for example in the case 
of lig12, keep the remaining HCl fragment only once.


Another observation is that the PDB reader implemented in the RDKit 
autoconnects atoms based on proximity, but does not guess bond orders, 
so your structures are not chemically correct as they are all 
single-bonded even if you have carbonyls, aromatic rings, etc.
In my gist I used Flare to perceive bond orders as I was running the 
code in the Flare Jupyter Notebook; you may also use other 
cheminformatics toolkit for that purpose.


I hope the above helps; please feel free to get back to me off-list if 
something is not clear.


Cheers,
p.

On 01/21/19 15:51, SCHEEN Jenke wrote:


Hi all,


I'm trying to remove the MCS between two molecules (attached, 02.pdb 
and 12.pdb) using rdFMCS.FindMCS and AllChem.DeleteSubstructs using 
the following code:




#

from rdkit import Chem
from rdkit.Chem import AllChem, rdmolfiles, rdFMCS

#

# load molecules:
lig02_pdb = open("02.pdb", 'r').read()
lig12_pdb = open("12.pdb", 'r').read()

lig02_mol = rdmolfiles.MolFromPDBBlock(lig02_pdb)
lig12_mol = rdmolfiles.MolFromPDBBlock(lig12_pdb)

# make list of molecules to map the MCS to:
perturbation_pair = []

perturbation_pair.append(lig02_mol)
perturbation_pair.append(lig12_mol)

MCS_object = rdFMCS.FindMCS(perturbation_pair, completeRingsOnly=True)
MCS_SMARTS = Chem.MolFromSmarts(MCS_object.smartsString)

# remove MCS from each molecule:
lig02_stripped = AllChem.DeleteSubstructs(lig02_mol, MCS_SMARTS)
lig12_stripped = AllChem.DeleteSubstructs(lig12_mol, MCS_SMARTS)

# print SMILES of each stripped molecule:
print("lig02: " + str(Chem.MolToSmiles(lig02_stripped)))
print("lig12: " + str(Chem.MolToSmiles(lig12_stripped)))

#print(Chem.MolToMolBlock(MCS_SMARTS),file=open('./MCS.mol','w+'))

#


I attached the MCS.mol file as well. The lig12.pdb contains an extra 
Cl atom, and lig12_stripped should thus contain a single Cl atom after 
deletion of the MCS substructure (the MCS substructure is equal to 
lig02.pdb). When running the script it actually contains 0 atoms.



I haven't been able to locate the source of the issue in the 
AllChem.DeleteSubstructs documentation, does anyone have a suggestion?



I'm using a conda install of rdkit (2018.09.1.0).



Best,

Jenke
--
University of Edinburgh
David Brewster road
Edinburgh, EH9 3FJ
United Kingdom
-

The University of Edinburgh is a charitable body, registered in 
Scotland, with registration number SC005336.





___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] DeleteSubstructs using an MCS object

2019-01-21 Thread SCHEEN Jenke
Hi all,


I'm trying to remove the MCS between two molecules (attached, 02.pdb and 
12.pdb) using rdFMCS.FindMCS and AllChem.DeleteSubstructs using the following 
code:


#


from rdkit import Chem
from rdkit.Chem import AllChem, rdmolfiles, rdFMCS

#

# load molecules:
lig02_pdb = open("02.pdb", 'r').read()
lig12_pdb = open("12.pdb", 'r').read()

lig02_mol = rdmolfiles.MolFromPDBBlock(lig02_pdb)
lig12_mol = rdmolfiles.MolFromPDBBlock(lig12_pdb)

# make list of molecules to map the MCS to:
perturbation_pair = []

perturbation_pair.append(lig02_mol)
perturbation_pair.append(lig12_mol)

MCS_object = rdFMCS.FindMCS(perturbation_pair, completeRingsOnly=True)
MCS_SMARTS = Chem.MolFromSmarts(MCS_object.smartsString)

# remove MCS from each molecule:
lig02_stripped = AllChem.DeleteSubstructs(lig02_mol, MCS_SMARTS)
lig12_stripped = AllChem.DeleteSubstructs(lig12_mol, MCS_SMARTS)

# print SMILES of each stripped molecule:
print("lig02: " + str(Chem.MolToSmiles(lig02_stripped)))
print("lig12: " + str(Chem.MolToSmiles(lig12_stripped)))

#print(Chem.MolToMolBlock(MCS_SMARTS),file=open('./MCS.mol','w+'))

#


I attached the MCS.mol file as well. The lig12.pdb contains an extra Cl atom, 
and lig12_stripped should thus contain a single Cl atom after deletion of the 
MCS substructure (the MCS substructure is equal to lig02.pdb). When running the 
script it actually contains 0 atoms.


I haven't been able to locate the source of the issue in the 
AllChem.DeleteSubstructs documentation, does anyone have a suggestion?


I'm using a conda install of rdkit (2018.09.1.0).



Best,

Jenke
--
University of Edinburgh
David Brewster road
Edinburgh, EH9 3FJ
United Kingdom
-

The University of Edinburgh is a charitable body, registered in Scotland, with 
registration number SC005336.


12.pdb
Description: 12.pdb


02.pdb
Description: 02.pdb


MCS.pdb
Description: MCS.pdb
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss