Thank you, Sereina.
I understand importance of addition of hydrogens to get a reasonable 3D coordinates. But the situation may be not that simple.

1. Addition of hydrogen is only required for custom coordinates supplied from an external file. If coordinates of a template is generated with rdkit embedding it works without addition of explicit hydrogens.

2. I found an opposite example where addition of hydrogens breaks constrained embedding if custom coordinates of a template is used. And again if I generate coordinates of a template by rdkit everything is OK without addition of Hs.

These suggest that there is some issue with custom coordinates usage for constrained embedding.

I provided the code and output below.

Code:

data = [('1.mol', 'C[C@@H]1CCCCC1=O', 'C[C@@H]1CC[C@H](O)CC1=O'),
        ('2.mol', 'CCCCCCCC[C@@H](CCC)NC(=O)c1ccc(F)cc1', 'CCCC[C@H](CCC[C@@H](CCC)NC(=O)c1ccc(F)cc1)NC(=O)c1ccco1')]

for i, (mol_fname, smi_template, smi_child) in enumerate(data):

    print('iteration', i)

    mode = 'read template mol file, no AddHs'
    print(mode)
    mol_template = Chem.MolFromMolFile(mol_fname)
    mol_child = Chem.MolFromSmiles(smi_child)
    try:
        mol = AllChem.ConstrainedEmbed(mol_child, mol_template)
        print(mol.GetProp('EmbedRMS'))
    except ValueError as e:
        print(e)

    mode = 'read template mol file, AddHs'
    print(mode)
    mol_template = Chem.MolFromMolFile(mol_fname)
    mol_child = Chem.MolFromSmiles(smi_child)
    try:
        mol = AllChem.ConstrainedEmbed(Chem.AddHs(mol_child), mol_template)
        print(mol.GetProp('EmbedRMS'))
    except ValueError as e:
        print(e)

    mode = 'embed template mol in rdkit, no AddHs'
    print(mode)
    mol_template = Chem.MolFromSmiles(smi_template)
    AllChem.EmbedMolecule(mol_template)
    mol_child = Chem.MolFromSmiles(smi_child)
    try:
        mol = AllChem.ConstrainedEmbed(mol_child, mol_template)
        print(mol.GetProp('EmbedRMS'))
    except ValueError as e:
        print(e)

Output:

iteration 0
read template mol file, no AddHs
Could not embed molecule.
read template mol file, AddHs
0.05014807519735495
embed template mol in rdkit, no AddHs
0.12358989886023371

iteration 1
read template mol file, no AddHs
0.057937898735270194
read template mol file, AddHs
Could not embed molecule.     # <-- here rdkit spends a lot of time but fails
embed template mol in rdkit, no AddHs
0.1012757033705761


Pavel.



On 07/07/2020 21:41, Sunhwan Jo wrote:
Makes sense :)


On Jul 7, 2020, at 12:35 PM, Sereina Riniker <sereina.rini...@gmail.com <mailto:sereina.rini...@gmail.com>> wrote:

Dear Pavel and Sunhwan,

Please note that hydrogens should always be added for the embedding algorithm to work properly (i.e. it’s not a walk around but what should be done). See also Section “Working with 3D Molecules” in https://www.rdkit.org/docs/GettingStartedInPython.html

Best regards,
Sereina



On 7 Jul 2020, at 21:26, Sunhwan Jo <sunhw...@gmail.com <mailto:sunhw...@gmail.com>> wrote:


The reason constraint embed didn’t work is the molecule simply can’t be embedded using the rdkit’s algorithm.

In [25]: mol_child = Chem.MolFromSmiles('C[C@@H]1CC[C@H](O)CC1=O')

In [26]: AllChem.EmbedMolecule(mol_child)
Out[26]: -1

See more discussion here:
https://github.com/rdkit/rdkit/issues/2996


The SMILES you posted looks valid to me and doesn’t look that complicated, but the anyway I think somehow the RDKit’s algorithm tripped up and couldn’t finish embedding without some help. Hope
someone with more in-depth insight can help here.


Anyway, for a walk around, adding H seems to do the trick:

In [39]: mol = AllChem.AddHs(mol_child)

In [40]: AllChem.EmbedMolecule(mol)
Out[40]: 0 # worked

In [41]: AllChem.ConstrainedEmbed(mol, mol_parent)
Out[41]: <rdkit.Chem.rdchem.Mol at 0x7fe8000f6f80> # also worked



Sunhwan




On Jul 7, 2020, at 12:36 AM, Pavel Polishchuk <pavel_polishc...@ukr.net <mailto:pavel_polishc...@ukr.net>> wrote:

Hi all,

  I have an issue with ConstrainedEmbed and I cannot figure out what exactly causes this.   I have a molecule C[C@@H]1CCCCC1=O with 3D coordinates in 1.mol file (attached). And I want to generate coordinates for another structure with this core -
C[C@@H]1CC[C@H](O)CC1=O.

  This is usual way which causes issue with embedding and the corresponding error.

mol_parent = Chem.MolFromMolFile('1.mol')
mol_child = Chem.MolFromSmiles('C[C@@H]1CC[C@H](O)CC1=O')
try:
    mol = AllChem.ConstrainedEmbed(mol_child, mol_parent)
except ValueError as e:
    print(e)

  If I add explicit hydrogens the issue disappears.

mol_parent = Chem.MolFromMolFile('1.mol')
mol_child = Chem.MolFromSmiles('C[C@@H]1CC[C@H](O)CC1=O')
mol = AllChem.ConstrainedEmbed(Chem.AddHs(mol_child), mol_parent)

  If I do not use pre-defined coordinates - everything works well.

mol_parent = Chem.MolFromSmiles('C[C@@H]1CCCCC1=O')
AllChem.EmbedMolecule(mol_parent)
mol_child = Chem.MolFromSmiles('C[C@@H]1CC[C@H](O)CC1=O')
mol = AllChem.ConstrainedEmbed(mol_child, mol_parent)

  Does ugly coordinates in 1.mol file cause the embedding issue? Or the issue is caused by some implicit properties of a molecule? How to solve this properly?

Kind regards,
Pavel.
<1.mol>_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net <mailto:Rdkit-discuss@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net <mailto:Rdkit-discuss@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss



Attachment: 1.mol
Description: MOL mdl chemical test

Attachment: 2.mol
Description: MOL mdl chemical test

_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to