Re: [Rdkit-discuss] Problems reading XYZ file

2023-05-03 Thread Gustavo Seabra
Hi Guys,

I'm sorry it took me this long to try it... But I could finally get to it,
and it works well now. Thanks for your help!
--
Gustavo Seabra.


On Tue, Apr 11, 2023 at 3:19 AM Jan Halborg Jensen 
wrote:

> Hi Gustavo
>
> raw_mol = Chem.MolFromXYZFile('acetate.xyz')
> mol = Chem.Mol(raw_mol)
> rdDetermineBonds.DetermineBonds(mol,charge=-1)
>
> Best regards, Jan
>
> On 7 Apr 2023, at 22.57, Gustavo Seabra  wrote:
>
> Hi everyone,
>
> I'm having difficulties using RDKit to read molecules from an XYZ file,
> and I would really appreciate some help.
>
> The problem is that whenever i read a molecule from an XYZ file, I get
> just a disconnected clump of atoms, not a molecule. For example: the
> following code:
>
> import rdkit
> from rdkit import Chem
> from rdkit.Chem import Draw, rdmolfiles
> mol = Chem.MolFromSmiles('COC1=C(O)C[C@@](O)(CO)CC1=O')
> mol = Chem.AddHs(mol)
> mol
>
> 
>
> Chem.AllChem.EmbedMolecule(mol)
> Chem.MolToXYZFile(mol, "rdkit_mol.xyz")
> mol2 = Chem.MolFromXYZFile('rdkit_mol.xyz')
> mol2
> 
> Is there a bug on the XYZ code, or am I missing something?
>
> Thanks!
> --
> Gustavo Seabra.
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
>
> https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.sourceforge.net%2Flists%2Flistinfo%2Frdkit-discuss=05%7C01%7Cjhjensen%40chem.ku.dk%7Ca747765687134eda68a708db37ab1ba1%7Ca3927f91cda14696af898c9f1ceffa91%7C0%7C0%7C638164980266752900%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=%2FKeB%2FR%2FQzRDYIe9zpZfKMqbjNYULOH4VQ5jhfJmxK6I%3D=0
>
>
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Problems reading XYZ file

2023-04-07 Thread Gustavo Seabra
Hi everyone,

I'm having difficulties using RDKit to read molecules from an XYZ file, and
I would really appreciate some help.

The problem is that whenever i read a molecule from an XYZ file, I get just
a disconnected clump of atoms, not a molecule. For example: the following
code:

import rdkit
from rdkit import Chem
from rdkit.Chem import Draw, rdmolfiles
mol = Chem.MolFromSmiles('COC1=C(O)C[C@@](O)(CO)CC1=O')
mol = Chem.AddHs(mol)
mol

[image: image.png]

Chem.AllChem.EmbedMolecule(mol)
Chem.MolToXYZFile(mol, "rdkit_mol.xyz")
mol2 = Chem.MolFromXYZFile('rdkit_mol.xyz')
mol2
[image: image.png]
Is there a bug on the XYZ code, or am I missing something?

Thanks!
--
Gustavo Seabra.
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Generating 3D molecules for docking

2021-07-27 Thread Gustavo Seabra
Hi Francesca,

As far as I know (someone please correct me if I'm wrong), RDKit can read
but cannot save the files in Mol2 format. But if you have the file in SDF
format, you can convert them to Mol2 using OpenBabel. The command would be
something like:

$ obabel -isdf sdf_file.sdf -omol2 -Omol2_file.mol2 -m

The -m tells obabel to split the multimolecule file into individual
molecules.
--
Gustavo Seabra.


On Tue, Jul 27, 2021 at 1:37 PM Francesca Magarotto -
francesca.magarot...@studio.unibo.it 
wrote:

> Hi,
> after a cluster analysis using a dataset of compounds from ZINC15 (in
> smiles format) I have picked a subset for virtual screening.
> However, I have a problem.
> The program Dock6 reads only TRIPOS mol2 format: is it possible to convert
> the molecules I chose for virtual screening with RDKit?
> In ZINC15 the molecules are also provided in mol2 format, but in this case
> I download all of them and not only the ones I selected after cluster
> analysis.
> I don't know what to do.
> Thanks,
> regards.
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Maximum Common Substructure using SMARTS

2021-07-26 Thread Gustavo Seabra
On Fri, Jul 23, 2021 at 4:53 AM Paolo Tosco 
wrote:

>
> # here there seems to a be a bug with the 2D depiction, but that's another 
> story
>
> template
>
> [image: image.png]
>
>
Just a quick thing: I don't know if this is supposed to be a bug or a
feature, but I noticed that this seems to be caused by properties of the
Mol created from SMARTS *not* being set when the mol is created, but only
when they are requested the first time. Right after creating the mol
object by Chem.MolFromSmiles the IsInRing doesn't seem to be set
correctly (or at all), and the comparison operation distorts the molecule.
But, if you force the computation of the properties, e.g. by printing them,

for idx, atom in enumerate(template.GetAtoms()):
print(f"{idx:>4d}  {atom.GetAtomicNum():5d}  {str(atom.IsInRing()):>7}
{str(atom.GetIsAromatic()):>5}")

After that, all seems to work as expected \o/. I don't know if it is by
design that the properties are calculated only when needed?
Gustavo.
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Maximum Common Substructure using SMARTS

2021-07-23 Thread Gustavo Seabra
Thanks a lot!
--
Gustavo Seabra.


On Fri, Jul 23, 2021 at 12:18 PM Paolo Tosco 
wrote:

> Hi Gustavo,
>
> Chem.Atom.HasQuery() and Chem.Bond.HasQuery() return True when the
> underlying atom (or bond) is an instance of Chem.QueryAtom (or Chem.
> QueryBond).
> Query atoms and bonds can either be defined through SMARTS expressions...
>
> from rdkit import Chem
> from rdkit.Chem import rdqueries
>
> a = Chem.Atom(6)
> a.HasQuery()
> False
>
> mol = Chem.MolFromSmarts("[+1;D3]")
> qa_from_smarts = mol.GetAtomWithIdx(0)
> qa_from_smarts.HasQuery()
> True
>
> qa_from_smarts.DescribeQuery()
> 'AtomAnd\n  AtomFormalCharge 1 = val\n  AtomExplicitDegree 3 = val\n'
>
> ...or be directly instantiated from Python and combined at your leisure;
> through this approach you can actually define very specific queries that
> may not be possible to describe with SMARTS.
> Below I show how to construct the same query atom as from the above SMARTS
> expression:
>
> qa = rdqueries.FormalChargeEqualsQueryAtom(1)
> qa
> 
>
> qa.HasQuery()
> True
>
> qa.DescribeQuery()
> 'AtomFormalCharge 1 = val\n'
>
> qa2 = rdqueries.ExplicitDegreeEqualsQueryAtom(3)
> qa2.DescribeQuery()
> 'AtomExplicitDegree 3 = val\n'
>
> qa.ExpandQuery(qa2)
> qa.DescribeQuery()
> 'AtomAnd\n  AtomFormalCharge 1 = val\n  AtomExplicitDegree 3 = val\n'
>
> Cheers,
> p.
>
> On Fri, Jul 23, 2021 at 5:47 PM Gustavo Seabra 
> wrote:
>
>> This works perfectly!
>>
>> I could understand most of what you did there ;-), but what does the
>> ".HasQuery()" mean? The RDKit API is not very clear about it: "Returns
>> whether or not the atom has an associated query". Is this described
>> anywhere else?
>>
>> Thank you so much!
>> --
>> Gustavo Seabra.
>>
>>
>> On Fri, Jul 23, 2021 at 4:53 AM Paolo Tosco 
>> wrote:
>>
>>> Hi Gustavo,
>>>
>>> you should be able to address this with a custom AtomCompare (and
>>> BondCompare, if you want to use bond queries too) class, that now is
>>> also supported from Python.
>>> You can take a look at Code/GraphMol/FMCS/Wrap/testFMCS.py for
>>> inspiration how to use it; here's something that seems to work for your
>>> example:
>>>
>>> from rdkit import Chem
>>> from rdkit.Chem import rdFMCS
>>>
>>> template =
>>> Chem.MolFromSmarts('[a]1(-[S](-*)(=[O])=[O]):[a]:[a]:[a]:[a]:[a]:1')
>>> # This should give a sulfone connected to an aromatic ring and
>>> # some other (any) element. Notice that the ring may have
>>> # any atoms (N,C,O), but for me it is important to have the SO2 group.
>>>
>>> template
>>> [image: image.png]
>>>
>>> mol1 = Chem.MolFromSmiles('CS(=O)(=O)c1ccc(C2=C(c3c3)CCN2)cc1')
>>> # This molecule has the pattern.
>>>
>>> mol1
>>> [image: image.png]
>>>
>>> compare = [template, mol1]
>>> res = rdFMCS.FindMCS(compare,
>>> atomCompare=rdFMCS.AtomCompare.CompareElements,
>>> bondCompare=rdFMCS.BondCompare.CompareAny,
>>> ringMatchesRingOnly=False,
>>> completeRingsOnly=False)
>>> res.smartsString
>>> # gives: '[#16](=[#8])=[#8]'
>>>
>>> # Let's address the problem with a custom AtomCompare class:
>>>
>>> class CompareQueryAtoms(rdFMCS.MCSAtomCompare):
>>> def __call__(self, p, mol1, atom1, mol2, atom2):
>>> a1 = mol1.GetAtomWithIdx(atom1)
>>> a2 = mol2.GetAtomWithIdx(atom2)
>>> if ((not a1.HasQuery()) and (not a2.HasQuery()) and
>>> a1.GetAtomicNum() != a2.GetAtomicNum()):
>>> return False
>>> if (p.MatchValences and a1.GetTotalValence() !=
>>> a2.GetTotalValence()):
>>> return False
>>> if (p.MatchChiralTag and not self.CheckAtomChirality(p, mol1,
>>> atom1, mol2, atom2)):
>>> return False
>>> if (p.MatchFormalCharge and (not a1.HasQuery()) and (not
>>> a2.HasQuery()) and not self.CheckAtomCharge(p, mol1, atom1, mol2, atom2)):
>>> return False
>>> if p.RingMatchesRingOnly:
>>> return self.CheckAtomRingMatch(p, mol1, atom1, mol2, atom2)
>>> if ((a1.HasQuery() or a2.HasQuery()) and (not a1.Match(a2))):
>>> return False
>>> return True
>>>
>>> params = rdFMCS.MCSParameters()
>>> params.AtomCompareParamet

Re: [Rdkit-discuss] Maximum Common Substructure using SMARTS

2021-07-23 Thread Gustavo Seabra
This works perfectly!

I could understand most of what you did there ;-), but what does the
".HasQuery()" mean? The RDKit API is not very clear about it: "Returns
whether or not the atom has an associated query". Is this described
anywhere else?

Thank you so much!
--
Gustavo Seabra.


On Fri, Jul 23, 2021 at 4:53 AM Paolo Tosco 
wrote:

> Hi Gustavo,
>
> you should be able to address this with a custom AtomCompare (and
> BondCompare, if you want to use bond queries too) class, that now is also
> supported from Python.
> You can take a look at Code/GraphMol/FMCS/Wrap/testFMCS.py for
> inspiration how to use it; here's something that seems to work for your
> example:
>
> from rdkit import Chem
> from rdkit.Chem import rdFMCS
>
> template =
> Chem.MolFromSmarts('[a]1(-[S](-*)(=[O])=[O]):[a]:[a]:[a]:[a]:[a]:1')
> # This should give a sulfone connected to an aromatic ring and
> # some other (any) element. Notice that the ring may have
> # any atoms (N,C,O), but for me it is important to have the SO2 group.
>
> template
> [image: image.png]
>
> mol1 = Chem.MolFromSmiles('CS(=O)(=O)c1ccc(C2=C(c3c3)CCN2)cc1')
> # This molecule has the pattern.
>
> mol1
> [image: image.png]
>
> compare = [template, mol1]
> res = rdFMCS.FindMCS(compare,
> atomCompare=rdFMCS.AtomCompare.CompareElements,
> bondCompare=rdFMCS.BondCompare.CompareAny,
> ringMatchesRingOnly=False,
> completeRingsOnly=False)
> res.smartsString
> # gives: '[#16](=[#8])=[#8]'
>
> # Let's address the problem with a custom AtomCompare class:
>
> class CompareQueryAtoms(rdFMCS.MCSAtomCompare):
> def __call__(self, p, mol1, atom1, mol2, atom2):
> a1 = mol1.GetAtomWithIdx(atom1)
> a2 = mol2.GetAtomWithIdx(atom2)
> if ((not a1.HasQuery()) and (not a2.HasQuery()) and
> a1.GetAtomicNum() != a2.GetAtomicNum()):
> return False
> if (p.MatchValences and a1.GetTotalValence() !=
> a2.GetTotalValence()):
> return False
> if (p.MatchChiralTag and not self.CheckAtomChirality(p, mol1,
> atom1, mol2, atom2)):
> return False
> if (p.MatchFormalCharge and (not a1.HasQuery()) and (not
> a2.HasQuery()) and not self.CheckAtomCharge(p, mol1, atom1, mol2, atom2)):
> return False
> if p.RingMatchesRingOnly:
> return self.CheckAtomRingMatch(p, mol1, atom1, mol2, atom2)
> if ((a1.HasQuery() or a2.HasQuery()) and (not a1.Match(a2))):
> return False
> return True
>
> params = rdFMCS.MCSParameters()
> params.AtomCompareParameters.RingMatchesRingOnly = False
> params.BondCompareParameters.RingMatchesRingOnly = False
> params.AtomCompareParameters.CompleteRingsOnly = False
> params.BondCompareParameters.CompleteRingsOnly = False
> params.BondTyper = rdFMCS.BondCompare.CompareAny
> params.AtomTyper = CompareQueryAtoms()
>
> compare = [template, mol1]
> res = rdFMCS.FindMCS(compare, params)
> res.smartsString
>
> '[#16](-[#0,#6])(=[#8])(=[#8])-[#0,#6]1:[#0,#6]:[#0,#6]:[#0,#6]:[#0,#6]:[#0,#6]:1'
>
>
> # the queryMol returned by MCS will match the template, but the original 
> template query
>
> # has many more details, so we extract the MCS part of the original template 
> and use that
>
> # as query instead
>
> def trim_template(template, query):
> template_mcs_core = Chem.ReplaceSidechains(template, query)
> for a in template_mcs_core.GetAtoms():
> if (not a.GetAtomicNum()) and a.GetIsotope():
> a.SetAtomicNum(1)
> a.SetIsotope(0)
> return Chem.RemoveAllHs(template_mcs_core)
>
>
> query_mol = trim_template(template, res.queryMol)
> template.GetSubstructMatch(query_mol)
>
> (0, 1, 2, 3, 4, 5, 6, 7, 8, 9)
>
>
> # here there seems to a be a bug with the 2D depiction, but that's another 
> story
>
> template
>
> [image: image.png]
>
> mol1.GetSubstructMatches(query_mol)
>
> ((4, 1, 0, 2, 3, 5, 6, 7, 19, 20),)
>
>
> mol1
>
> [image: image.png]
>
>
> mol2 = Chem.MolFromSmiles('Cc1ccc(C2=CCNC2c2ccc(C(C)(F)F)nc2)nn1')
> compare = [template, mol2]
>
>
> mol2
>
> [image: image.png]
>
>
> res = rdFMCS.FindMCS(compare, params)
> res.smartsString
>
> '[#0,#6]1:[#0,#6]:[#0,#6]:[#0,#6]:[#0,#7]:[#0,#7]:1'
>
>
> query_mol = trim_template(template, res.queryMol)
>
> query_mol
>
> [image: image.png]
>
>
> mol2.GetSubstructMatches(query_mol)
>
> ((1, 2, 3, 4, 20, 21), (10, 11, 12, 13, 18, 19))
>
>
> mol2
>
> [image: image.png]
>
>
> I hope the above helps, cheers
>
> p.
>
>
> O

Re: [Rdkit-discuss] Maximum Common Substructure using SMARTS

2021-07-22 Thread Gustavo Seabra
Hi,

Thanks a lot for the reply! However, in this case, it looks like I would
have to somehow label the isotope in every query molecule, right? For
example:
```
template =
Chem.MolFromSmarts('[c]1(-[2S](=[3O])(=[3O])(-C)):[c]:[c]:[c]:[c]:[c]:1')
mol1 = Chem.MolFromSmiles('CS(=O)(=O)c1ccc(C2=C(c3c3)CCN2)cc1')
compare = [template,mol1]
res = rdFMCS.FindMCS(compare,
atomCompare=rdFMCS.AtomCompare.CompareIsotopes,
bondCompare=rdFMCS.BondCompare.CompareAny,
ringMatchesRingOnly=False,
completeRingsOnly=False)
res.smartsString
```
returns: '[0*]1:[0*]:[0*]:[0*]:[0*]:[0*]:1', that is, it only picks the
ring but not the sulfone. I actually want the sulfone to be found, if it is
there. My problem is that I also want flexibility to change the ring atoms
and still find the ring as a match, while considering a match on the
sulfone only if it really is there. (e.g., CF3 should *not* match.) Does it
make sense?

Thanks a lot!
--
Gustavo Seabra.


On Thu, Jul 22, 2021 at 4:52 PM Andrew Dalke 
wrote:

> Hi Gustavo,
>
>
> > template =
> Chem.MolFromSmarts('[a]1(-[S](-*)(=[O])=[O]):[a]:[a]:[a]:[a]:[a]:1')
>
> Unless things have changed since I last looked at the algorithm, you can't
> meaningfully pass a SMARTS-based query molecule into the MCS program,
> outside of a few simple cases.
>
> It generates a SMARTS pattern based on the properties of the molecule. You
> asked it to CompareElements, but those [a] terms all have an atomic number
> of 0.
>
>   >>> template =
> Chem.MolFromSmarts('[a#1]1(-[S](-*)(=[O])=[O]):[a#1]:[a#1]:[a#1]:[a#1]:[a#1]:1')
>   >>> [a.GetAtomicNum() for a in template.GetAtoms()]
>   [0, 16, 0, 8, 8, 0, 0, 0, 0, 0]
>
> That's why your CompareAny search returns the #0 terms, like:
>
>
> '[#16,#6](-[#0,#6])(=,-[#8,#9])(=,-[#8,#9])-[#0,#6]1:[#0,#6]:[#0,#6]:[#0,#6]:[#0,#6]:[#0,#7]:1'
>
> > I would appreciate some pointers on how it would be possible to find the
> maximum common substructure of 2 molecules, where in the template structure
> some atoms may be *any*, but some other atoms must be fixed.
>
> Perhaps with isotope labelling?
>
> That is, label the "any" atoms as isotope 1, and label your
> -[S](=[O])(=[O])- as -[2S](=[3O])(=[3O])-
>
> Then use rdFMCS.AtomCompare.CompareIsotopes .
>
> If there's anything you don't want to match at all, give each atom a
> unique isotope value.
>
> Best regards,
>
> Andrew
> da...@dalkescientific.com
>
>
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Maximum Common Substructure using SMARTS

2021-07-22 Thread Gustavo Seabra
Hi all,,

I would appreciate some pointers on how it would be possible to find the
maximum common substructure of 2 molecules, where in the template structure
some atoms may be *any*, but some other atoms must be fixed.

Currently, I'm trying to use rdFMCS module. For example:

from rdkit import Chem
from rdkit.Chem import rdFMCS

template =
Chem.MolFromSmarts('[a]1(-[S](-*)(=[O])=[O]):[a]:[a]:[a]:[a]:[a]:1')
# This should give a sulfone connected to an aromatic ring and
# some other (any) element. Notice that the ring may have
# any atoms (N,C,O), but for me it is important to have the SO2 group.

mol1 = Chem.MolFromSmiles('CS(=O)(=O)c1ccc(C2=C(c3c3)CCN2)cc1')
# This molecule has the pattern.

# Now, if I try to find a substructure match, I use:
compare = [template, mol1]
res = rdFMCS.FindMCS(compare,
atomCompare=rdFMCS.AtomCompare.CompareElements,
bondCompare=rdFMCS.BondCompare.CompareAny,
ringMatchesRingOnly=False,
completeRingsOnly=False)
res.smartsString
# gives: '[#16](=[#8])=[#8]'

# Notice that the only match is the SO2, it does not match the ring.
However, if I try that with another structure that has a CF3 in place of
the SO2, I get:
mol2 = Chem.MolFromSmiles('Cc1ccc(C2=CCNC2c2ccc(C(C)(F)F)nc2)nn1')
compare = [template,mol2]
res = rdFMCS.FindMCS(compare,
atomCompare=rdFMCS.AtomCompare.CompareElements,
bondCompare=rdFMCS.BondCompare.CompareAny,
ringMatchesRingOnly=False,
completeRingsOnly=False)
res.smartsString
# Returns: '' (empty string)

# if I change to AtomCompare.CompareAny, now a CF3 will also match
# in the SO2-X:
mol2 = Chem.MolFromSmiles('Cc1ccc(C2=CCNC2c2ccc(C(C)(F)F)nc2)nn1')
compare = [template,mol2]
res = rdFMCS.FindMCS(compare,
atomCompare=rdFMCS.AtomCompare.CompareAny,
bondCompare=rdFMCS.BondCompare.CompareAny,
ringMatchesRingOnly=False,
completeRingsOnly=False)
res.smartsString
# Returns:
'[#16,#6](-[#0,#6])(=,-[#8,#9])(=,-[#8,#9])-[#0,#6]1:[#0,#6]:[#0,#6]:[#0,#6]:[#0,#6]:[#0,#7]:1'

But now theCF3 is counted in place of the SO2. The result I'd like to get
here would be just the ring, as in the case:
new_template = Chem.MolFromSmarts('CS(=O)(=O)c1cnccc1')
mol2 = Chem.MolFromSmiles('Cc1ccc(C2=CCNC2c2ccc(C(C)(F)F)nc2)nn1')
compare = [new_template,mol2]
res = rdFMCS.FindMCS(compare,
atomCompare=rdFMCS.AtomCompare.CompareElements,
bondCompare=rdFMCS.BondCompare.CompareAny,
ringMatchesRingOnly=False,
completeRingsOnly=False)
res.smartsString
# Returns: '[#6]1:[#6]:[#7]:[#6]:[#6]:[#6]:1' (just the ring)

Notice that if I use CompareElements, there seems to be no way to match the
ring with either N or C.

Does anyone have a suggestion on how I can specify flexibility (similar to
AtomCompare.CompareAny) only for a portion of the molecule and still
enforce specific atoms in another portion?

Thank you so much!
--
Gustavo Seabra.
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Autodock Vina

2021-06-22 Thread Gustavo Seabra
Hi Valik,

I do this on a regular basis for our generators here. Basically what you
will need is to:

1. Generate 3D structures for the molecules (RDKit can do that)
2. Save to SDF files (again, RDKit)
3. Convert to PDBQT (I use OpenBabel: "$ obabel -isdf structures.sdf
-opdbqt -Oname-.pdbqt -m")

Then you'll have the files you need. Of course, you will still need to
build the pdbqt file for the target and the vina_config file, but that you
only need to do once.

All the best,
--
Gustavo Seabra.


On Tue, Jun 22, 2021 at 4:08 AM Velik Velikov  wrote:

> Dear all,
>
>
>
> I am constructing new molecules (de novo design) that are drug-like with
> RDKit. I have my molecules in SMILES now and I need to check them with
> AutoDock Vina. I have never used it and I have been trying since last week
> but I kind of don’t know where to go from here.
>
> What is my config file, ligand or receptor? Do I need MGL Tools, PyMOL or
> something else?
>
> Also, I couldn’t run it on my mac - Big Sur, I tried with a VirtualBox but
> it didn’t work out either. I am thinking about installing Autodock Vina on
> my old windows laptop now. Appreciate any help with this tool. Thanks in
> advance.
>
>
> Best,
>
> Velik Velikov
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] 2021.03.1 RDKit Release

2021-03-26 Thread Gustavo Seabra
Thak a lot to Greg and all contributors for the continuing development of
this project!

--
Gustavo Seabra.


On Fri, Mar 26, 2021 at 11:16 AM Greg Landrum 
wrote:

> Dear all,
>
> I'm pleased to announce that the 2021.03 version of the RDKit is released.
> We actually managed to get the .03 release done during March. Shocking! ;-)
> The release notes are below.[1]
>
> The release files are on the github release page:
> https://github.com/rdkit/rdkit/releases/tag/Release_2021_03_1
> The DOI for this release is:
> https://doi.org/10.5281/zenodo.4639022
>
> I do not plan to do conda builds for the Python wrappers in the rdkit
> channel for this release. The builds done as part of the conda-forge
> project are automated and cover more Python versions and operating systems
> than I could ever hope to do manually.
> Please install the rdkit using conda-forge:
> conda install -c conda-forge rdkit
> I believe that the conda-forge builds of the new version should appear
> over the next couple of days.
>
> I hope to finish the conda builds of the PostgreSQL cartridge for linux
> and the mac and have them available in the rdkit channel by later today
> or tomorrow.
>
> The online version of the documentation at rdkit.org (
> http://rdkit.org/docs/index.html) has been updated.
>
> Thanks to everyone who submitted code, bug reports, and suggestions for
> this release!
>
> Please let me know if you find any problems with the release or have
> suggestions for the next one, which is scheduled for September/October 2021.
>
> Best Regards,
> -greg
> [1] We probably should figure out some way to make the release notes a bit
> less verbose. ;-)
>
>
> # Release_2021.03.1
> (Changes relative to Release_2020.09.1)
>
> ## Backwards incompatible changes
> - The distance-geometry based conformer generation now by defaults
> generates
>   trans(oid) conformations for amides, esters, and related structures.
> This can
>   be toggled off with the `forceTransAmides` flag in EmbedParameters. Note
> that
>   this change does not impact conformers created using one of the ET
> versions.
>   (#3794)
> - The conformer generator now uses symmetry by default when doing RMS
> pruning.
>   This can be disabled using the `useSymmetryForPruning` flag in
>   EmbedParameters. (#3813)
> - Double bonds with unspecified stereochemistry in the products of chemical
>   reactions now have their stereo set to STEREONONE instead of STEREOANY
> (#3078)
> - The MolToSVG() function has been moved from rdkit.Chem to rdkit.Chem.Draw
>   (#3696)
> - There have been numerous changes to the RGroup Decomposition code which
> change
>   the results. (#3767)
> - In RGroup Decomposition, when onlyMatchAtRGroups is set to false, each
> molecule
>   is now decomposed based on the first matching scaffold which adds/uses
> the
>   least number of non-user-provided R labels, rather than simply the first
>   matching scaffold.
>   Among other things, this allows the code to provide the same results for
> both
>   onlyMatchAtRGroups=true and onlyMatchAtRGroups=false when suitable
> scaffolds
>   are provided without requiring the user to get overly concerned about the
>   input ordering of the scaffolds. (#3969)
> - There have been numerous changes to
> `GenerateDepictionMatching2DStructure()` (#3811)
> - Setting the kekuleSmiles argument (doKekule in C++) to MolToSmiles will
> now
>   cause the molecule to be kekulized before SMILES generation. Note that
> this
>   can lead to an exception being thrown. Previously this argument would
> only
>   write kekulized SMILES if the molecule had already been kekulized (#2788)
> - Using the kekulize argument in the MHFP code will now cause the molecule
> to be
>   kekulized before the fingerprint is generated. Note that becaues
> kekulization
>   is not canonical, using this argument currently causes the results to
> depend
>   on the input atom numbering. Note that this can lead to an exception
> being
>   thrown. (#3942)
> - Gradients for angle and torsional restraints in both UFF and MMFF were
> computed
>   incorrectly, which could give rise to potential instability during
> minimization.
>   As part of fixing this problem, force constants have been switched to
> using
>   kcal/degree^2 units instead of kcal/rad^2 units, consistently with the
> fact that
>   angle and dihedral restraints are specified in degrees. (#3975)
>
> ## Highlights
> - MolDraw2D now does a much better job of handling query features like
> common
>   query bond types, atom lists, variable attachment points, and link
> nodes. It
>   also supports adding annotations at the molecule level, displaying
> brackets
>   for Sgro

[Rdkit-discuss] Get conformers as independent mols?

2021-01-07 Thread Gustavo Seabra
Hi all,

Could anyone please help me with ideas on how to visualize
molecule conformers inside a Jupyter notebook?

I generate the conformers, for example, using:
AllChem.EmbedMultipleConfs(mol, numConfs=5)

And would like to see them in 3D inside the notebook.

I tried using NGLView(https://github.com/nglviewer/nglview), but it
only shows what I believe is the first conformer in the molecule. How can I
change the conformer shown? or maybe is there a way to convert the
conformers to Mol objects?

Any idea would be greatly appreciated.

Thank you!
--
Gustavo Seabra.
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] activate my-rdkit-env from python script

2020-12-02 Thread Gustavo Seabra
Well, I stand corrected. From Norwid's answer, it seems it may be possible to 
change environment during execution.

Still, remember that this is the opposite of the idea of having environments! 
The whole idea of conda environments to have a contained space with all you 
need. If you need to change environment during runtime,  it just means that 
your environment is missing something...

--
Gustavo Seabra


From: Jeff Saxon 
Sent: Wednesday, December 2, 2020 9:29:37 AM
To: Gustavo Seabra ; 
rdkit-discuss@lists.sourceforge.net 
Subject: Re: [Rdkit-discuss] activate my-rdkit-env from python script

Right, many thanks!
Yes, each time, before I run any script using python with the conda
that I used to install RDKIT, I have to source the proper environment
directly to bash, after which everything works correctly..
btw, why #subprocess.run('conda activate my-rdkit-env', shell=True)
did not work? I thought it would be the same as the aforementioned
step, but it asks me

To initialize your shell, run


$ conda init 


Currently supported shells are:

  - bash

  - fish

  - tcsh

  - xonsh

  - zsh

  - powershell

ср, 2 дек. 2020 г. в 14:25, Gustavo Seabra :
>
> I don't believe that it is possible. You have to run your script from within 
> the environment where you installed rdkit.
>
> What I actually do is to have a work environment,  and then install all the 
> packages I need in this same env.
>
> --
> Gustavo Seabra
>
> 
> From: Jeff Saxon 
> Sent: Wednesday, December 2, 2020 6:48:47 AM
> To: rdkit-discuss@lists.sourceforge.net 
> Subject: [Rdkit-discuss] activate my-rdkit-env from python script
>
> Dear All,
>
> Since I installed RDKIT using conda, I have to use the following
> command from my bash terminal to activate the RDKIT environment:
> conda activate my-rdkit-env
> How can I do the same but inside my python script?
> I have already tried to call subprocess, but it did not work
> # source environment from python script;
> subprocess.run('conda init bash', shell=True)
> subprocess.run('conda activate my-rdkit-env', shell=True)
>
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Applying Lipinsky filter on ligand data set

2020-12-02 Thread Gustavo Seabra
Great,  I'm glad it works for you now.

As for the fikes that don't work, you could try loading them individually to 
look into them,  or save the molecules again.

If you could share the molecules here, maybe someone could find what is the 
problem. (I'd recommend starting a new thread for it)

All the best,
Gustavo.

--
Gustavo Seabra


From: Jeff Saxon 
Sent: Wednesday, December 2, 2020 9:37:01 AM
To: Gustavo Seabra ; 
rdkit-discuss@lists.sourceforge.net 
Subject: Re: [Rdkit-discuss] Applying Lipinsky filter on ligand data set

Thank you again, Gustato!

Here is how I adopted your script for multi-SDF filles:
Note that I added directly to the script, a new datafile called 'All',
into which I append each of the datafiles produced by your function
using FOR loop ..
Also I added TRY statement within FOR loop to ignore these two SDF
caused a problem. However, I have no idea why they don't work (there
are 2 filles from 1000, which in Pymol looks fine!)


import subprocess, os, glob, shutil, sys
import pandas as pd

from rdkit import Chem, DataStructs
from rdkit.Chem import Draw, PandasTools, Descriptors, rdMolDescriptors, AllChem
from IPython.display import HTML

# the main function
def load_sdf_file(file, key):
"""
Reads molecules from an SDF file keeping only molecules
with valid SMILES, and assign a source field
"""
df = PandasTools.LoadSDF(file)
df['LIGAND'] = key
#df['SMILES'] = df['ROMol'].apply(Chem.MolToSmiles)
df['LogP'] = df['ROMol'].apply(Chem.Descriptors.MolLogP)
df['MolWt'] = df['ROMol'].apply(Chem.Descriptors.MolWt)
df['HBA'] = df['ROMol'].apply(Chem.rdMolDescriptors.CalcNumLipinskiHBA)
df['HBD'] = df['ROMol'].apply(Chem.rdMolDescriptors.CalcNumLipinskiHBD)
df = df[['LIGAND','LogP','MolWt','HBA','HBD']]
return df


pwd = os.getcwd()
filles='sdf'
results='results'
#set directory to analyse
data = os.path.join(pwd,filles)
#set directory with outputs
results = os.path.join(pwd,results)

os.chdir(data)

all = pd.DataFrame()
for sdf in dirlist:
try:
sdf_name=sdf.rsplit( ".", 1 )[ 0 ]
key = f'{sdf_name}'
df = load_sdf_file(sdf,key)
all = all.append(df,ignore_index = True)
print(f'{sdf_name}.sdf has been processed')
except:
print(f'{sdf_name}.sdf has not been processed')
# make a log of broken sdf filles
with open(results+"/log.txt", "a") as log:
log.write("%s has not been processed\n" %(key))

ср, 2 дек. 2020 г. в 13:55, Gustavo Seabra :
>
> Yes, the way it is written it will only keep the last sdf file read. I can 
> think of 2 options:
>
> 1. You can concatenate all sdfs into one,  multi-molecule file:
> $ cat *.sdf > multi.sdf
>
> And read this one.
>
> 2. Alternatively,  instead of overwriting the final pandas dataframe every 
> time, you can create one initial df then only concatenate it with the results 
> of the function (see 
> https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html)
>
> data = 
> pd.DataFrame(columns=['Source','LogP','MolWt','LipinskyHBA','LipinskyHBD])
>
> Then, for each file:
> data = data.append(load_sdf_file(sdf,key))
>
> If possible, I believe option (1) should be faster.
>
>
> As for the error you are seeing,  sometimes RDKit cannot read a molecule, so 
> it returns no 'ROMol' object. It usually happens when the molecule is 
> ill-defined. If you really need to read the molecules one-by-one, then you 
> will need to treat this situation maybe with an 'if' statement in the 
> function. If you read a multi-molecule sdf, it just ignores the molecules it 
> can't read and keeps going.
>
> Ah, I dont think there is a function to use pdb files with Pandas. SDF is a 
> better format for small molecules,  anyway.
>
> All the best,
>
> --
> Gustavo Seabra
>
> 
> From: Jeff Saxon 
> Sent: Wednesday, December 2, 2020 4:53:05 AM
> To: Gustavo Seabra ; 
> rdkit-discuss@lists.sourceforge.net 
> Subject: Re: [Rdkit-discuss] Applying Lipinsky filter on ligand data set
>
> Hey Gustavo,
>
> Thank you very much for your script!
> I need to specify that I am working with many SDF filles, each of
> which consist of one 3D structure of the ligand ( I don't see any
> difference here between pdb, so if I can apply it on PDB directly it
> would be rather better!!)
>  Anyway I've just tried to adapt you script for my case
>
> # I simplify the function to take only 4 properties required for
> lipinsky calculations,
> # I also substitute Source on the name of the particular SDF file (See below)
> def load_sdf_file(file, key):
> """
> Reads molecules from an SDF file keeping only molecules
> with valid SMILES, and assign a source field
> """
> df = PandasTools.LoadSDF(file)
> df['Source'] = key
> df['LogP'] = df['ROM

Re: [Rdkit-discuss] activate my-rdkit-env from python script

2020-12-02 Thread Gustavo Seabra
I don't believe that it is possible. You have to run your script from within 
the environment where you installed rdkit.

What I actually do is to have a work environment,  and then install all the 
packages I need in this same env.

--
Gustavo Seabra


From: Jeff Saxon 
Sent: Wednesday, December 2, 2020 6:48:47 AM
To: rdkit-discuss@lists.sourceforge.net 
Subject: [Rdkit-discuss] activate my-rdkit-env from python script

Dear All,

Since I installed RDKIT using conda, I have to use the following
command from my bash terminal to activate the RDKIT environment:
conda activate my-rdkit-env
How can I do the same but inside my python script?
I have already tried to call subprocess, but it did not work
# source environment from python script;
subprocess.run('conda init bash', shell=True)
subprocess.run('conda activate my-rdkit-env', shell=True)


___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Applying Lipinsky filter on ligand data set

2020-12-02 Thread Gustavo Seabra
Yes, the way it is written it will only keep the last sdf file read. I can 
think of 2 options:

1. You can concatenate all sdfs into one,  multi-molecule file:
$ cat *.sdf > multi.sdf

And read this one.

2. Alternatively,  instead of overwriting the final pandas dataframe every 
time, you can create one initial df then only concatenate it with the results 
of the function (see 
https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html)

data = 
pd.DataFrame(columns=['Source','LogP','MolWt','LipinskyHBA','LipinskyHBD])

Then, for each file:
data = data.append(load_sdf_file(sdf,key))

If possible, I believe option (1) should be faster.


As for the error you are seeing,  sometimes RDKit cannot read a molecule, so it 
returns no 'ROMol' object. It usually happens when the molecule is ill-defined. 
If you really need to read the molecules one-by-one, then you will need to 
treat this situation maybe with an 'if' statement in the function. If you read 
a multi-molecule sdf, it just ignores the molecules it can't read and keeps 
going.

Ah, I dont think there is a function to use pdb files with Pandas. SDF is a 
better format for small molecules,  anyway.

All the best,

--
Gustavo Seabra


From: Jeff Saxon 
Sent: Wednesday, December 2, 2020 4:53:05 AM
To: Gustavo Seabra ; 
rdkit-discuss@lists.sourceforge.net 
Subject: Re: [Rdkit-discuss] Applying Lipinsky filter on ligand data set

Hey Gustavo,

Thank you very much for your script!
I need to specify that I am working with many SDF filles, each of
which consist of one 3D structure of the ligand ( I don't see any
difference here between pdb, so if I can apply it on PDB directly it
would be rather better!!)
 Anyway I've just tried to adapt you script for my case

# I simplify the function to take only 4 properties required for
lipinsky calculations,
# I also substitute Source on the name of the particular SDF file (See below)
def load_sdf_file(file, key):
"""
Reads molecules from an SDF file keeping only molecules
with valid SMILES, and assign a source field
"""
df = PandasTools.LoadSDF(file)
df['Source'] = key
df['LogP'] = df['ROMol'].apply(Chem.Descriptors.MolLogP)
df['MolWt'] = df['ROMol'].apply(Chem.Descriptors.MolWt)
df['LipinskyHBA'] = df['ROMol'].apply(Chem.rdMolDescriptors.CalcNumLipinskiHBA)
df['LipinskyHBD'] = df['ROMol'].apply(Chem.rdMolDescriptors.CalcNumLipinskiHBD)
df = df[['Source','LogP','MolWt','LipinskyHBA','LipinskyHBD']]
return df


pwd = os.getcwd()
filles='sdf'
results='results'
#set directory to analyse
data = os.path.join(pwd,filles)
#set directory with outputs
results = os.path.join(pwd,results)

# go to the folder with all SDF filles
os.chdir(data)

# loop each SDF and use it with the function
for sdf in dirlist:
sdf_name=sdf.rsplit( ".", 1 )[ 0 ]
key = f'{sdf_name}'
df = load_sdf_file(sdf,key)
print(f'{sdf_name}.sdf has been processed')

The problem is that it always stores the last line within DF, while I
need rather to append each processed SDF file. Also I've got an error
on one of the sdf file which interrupted the script:

Traceback (most recent call last):

  File "./lipinski2.py", line 67, in 

df = load_sdf_file(sdf,key)

  File "./lipinski2.py", line 26, in load_sdf_file

df['LogP']   = df['ROMol'].apply(Chem.Descriptors.MolLogP)

  File 
"/Users/gleb/opt/miniconda3/envs/my-rdkit-env/lib/python3.7/site-packages/pandas/core/frame.py",
line 2906, in __getitem__

indexer = self.columns.get_loc(key)

  File 
"/Users/gleb/opt/miniconda3/envs/my-rdkit-env/lib/python3.7/site-packages/pandas/core/indexes/base.py",
line 2897, in get_loc

raise KeyError(key) from err

KeyError: 'ROMol'

Probably some additional IF statement is required to ignore the file
in the case of "broken" SDF...

вт, 1 дек. 2020 г. в 19:07, Gustavo Seabra :
>
> Hi Jeff,
>
>
>
> There's a lot f people here with way more experience than me, so this may not 
> be the optimal solution... But here is what I would do in this case:
>
>
>
> from rdkit import Chem, DataStructs
>
> from rdkit.Chem import Draw, PandasTools, Descriptors, rdMolDescriptors
>
> from IPython.display import HTML
>
>
>
> def load_sdf_file(file,source,id_column):
>
> """
>
> Reads molecules from an SDF file keeping only molecules
>
> with valid SMILES, and assign a source field
>
> """
>
> df = PandasTools.LoadSDF(file)
>
> df['Source'] = source
>
> df['ID'] = df[id_column]
>
> df['SMILES'] = df['ROMol'].apply(Chem.MolToSmiles)
>
> df['LogP']   = df['ROMol'].apply(Chem.Descriptors.MolLogP)
>
> df['MolWt']  = df['ROMol'].apply(Chem.Descriptors.MolWt)
>
> df['LipinskyHBA'] = 
> df['ROMol'].apply(Chem.rdMolDescriptors.CalcNumLipinskiHBA)
>
>

Re: [Rdkit-discuss] Partial substructure match?

2020-11-23 Thread Gustavo Seabra
Thank you so much!

 

What I ended up doing follows the same basic idea, although not even close
to the level of detail you put in your program. I'm only comparing the
structures in pairs, and doing the following:

(Sorry for the mess - its part of a larger system I just copied the relevant
parts.)

 

 

def scaffold_matching(query_smi, scaff_smi):

"""

Checks if the scaffold from scaff_smi is 

contained in the query_smi.

 

Uses a stringent scaffold test.

"""

sca = Chem.MolFromSmiles(scaff_smi)

que = Chem.MolFromSmiles(query_smi)

 

match = 0

if que is not None:

maxMatch = sca.GetNumAtoms()

match = rdFMCS.FindMCS([sca,que],

atomCompare=rdFMCS.AtomCompare.CompareAny,

bondCompare=rdFMCS.BondCompare.CompareOrder,

ringMatchesRingOnly=True,

completeRingsOnly=True,

).numAtoms / maxMatch

return match

 

if __name__ == "__main__":

template_smiles= 

query_smiles=

template_mol = Chem.MolFromSmiles(template_smiles)

core = MurckoScaffold.GetScaffoldForMol(template_mol)

scaffold = Chem.MolToSmiles(core)

match = scaffold_matching(query_smiles,scaffold)

 

--

Gustavo Seabra

 

From: Andrew Dalke  
Sent: Monday, November 23, 2020 7:59 AM
To: Gustavo Seabra 
Cc: rdkit-discuss@lists.sourceforge.net
Subject: Re: [Rdkit-discuss] Partial substructure match?

 

On Nov 19, 2020, at 17:48, Gustavo Seabra mailto:gustavo.sea...@gmail.com> > wrote:



Is it possible to search for *partial* substructure matches using RDKit?

  ...



For example, if the pattern is a naphthalene and the molecule to
search has a benzene, that would count as a 60% match.


A number of people pointed out that RDKit's MCS feature might be
appropriate.

I've attached an example program based around that.

For example, the default is your two structures:

% python mcs_search.py
No --query specified, using naphthalene as the default.
No --target or --targets specified, using phenol as the default.
Target_ID: phenol
nAtoms: 7
nBonds: 7
match_nAtoms: 6
match_nBonds: 6
atom_overlap: 0.600
bond_overlap: 0.545
atom_Tanimoto: 0.545
bond_Tanimoto: 0.500

I'll reverse it by specifying the SMILES on the command-line. 

% python mcs_search.py --query 'c1c1O' --target 'c1ccc2c2c1'
Target_ID: query
nAtoms: 10
nBonds: 11
match_nAtoms: 6
match_nBonds: 6
atom_overlap: 0.857
bond_overlap: 0.857
atom_Tanimoto: 0.545
bond_Tanimoto: 0.500

 

 

The program includes options to configure the FindMCS() parameters.

 

In addition, if chemfp 3.x is installed then some additional features are
available, like the following example, which applies the MCS search to all
records in ChEBI: 


% python mcs_search.py --query 'COC(=O)C1C(OC(=O)c2c2)CC2CCC1N2C'
--targets ~/databases/ChEBI_lite.sdf.gz --id-tag 'ChEBI ID'
Target_IDnAtoms nBonds  match_nAtoms  match_nBonds
atom_overlap   bond_overlap atom_Tanimoto
bond_Tanimoto
CHEBI:776   21   24   9 8
0.409 0.333 0.265 0.200
CHEBI:1148 7 6 6 5
0.273 0.208 0.261 0.200
CHEBI:1734 19   21   16   15   0.727
0.625 0.640 0.500
CHEBI:1895 9 9 9 8
0.409 0.333 0.409 0.320
  ...






On Nov 20, 2020, at 15:56, Gustavo Seabra mailto:gustavo.sea...@gmail.com> > wrote:

Is it possible to get a partial match with substructure search?

 

No.


Andrew
 
da...@dalkescientific.com <mailto:da...@dalkescientific.com> 



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Partial substructure match?

2020-11-20 Thread Gustavo Seabra
Hi Adelene,

Doesn't the substructure match only works for the whole substructure,  as an 
all-or-nothing?

I suppose I could use the MCSS and count the number of matching atoms,  then 
calculate the percentage match myself.

Is it possible to get a partial match with substructure search?

Gustavo.

--
Gustavo Seabra


From: Adelene LAI 
Sent: Friday, November 20, 2020 9:13:15 AM
To: Dan Nealschneider ; Gustavo Seabra 

Cc: RDKit Discuss 
Subject: Re: [Rdkit-discuss] Partial substructure match?


Hi Dan and Gustavo,


MCSS sounds good, but depends on the goal.


>From the way Gustavo wrote, it sounds like a Query-Target substructure search 
>- he has a list of targets and one specific query, and he wants to compare 
>matching rate amongst the members of the list.


If so, I would try query SMARTS.

https://www.rdkit.org/docs/GettingStartedInPython.html#substructure-searching


Regarding the % substructure match, interesting question. How would you 
quantify that? Not sure such a thing exists in RDKit right now.


Adelene


Doctoral Researcher

Environmental Cheminformatics

UNIVERSITÉ DU LUXEMBOURG


Campus Belval | Luxembourg Centre for Systems Biomedicine

6, avenue du Swing, L-4367 Belvaux

T +356 46 66 44 67 18

[github.png] adelenelai











From: Dan Nealschneider 
Sent: Thursday, November 19, 2020 6:01:37 PM
To: Gustavo Seabra
Cc: RDKit Discuss
Subject: Re: [Rdkit-discuss] Partial substructure match?

Gustavo -
That sounds like the "maximum common substructure" problem. Here's the relevant 
section in RDKit's  "Getting started in Python"

https://www.rdkit.org/docs/GettingStartedInPython.html#maximum-common-substructure



dan nealschneider | lead developer

[Schrodinger Logo]<https://www.schrodinger.com/>


On Thu, Nov 19, 2020 at 8:50 AM Gustavo Seabra 
mailto:gustavo.sea...@gmail.com>> wrote:
Hi all,

Is it possible to search for *partial* substructure matches using RDKit?

I'm aware of "HasSubstructMatch/ GetSubstructMatch", but my impression is
that it only returns full matches (100%) of the required pattern in a
structure.

However, what I'd like to do is a bit different: Imagine I have one specific
substructure (scaffold), and I'd like to search for molecules that have the
full substructure *or part of it*, and maybe get the percentage of the
substructure match? (100% = the full substructure is contained in the
molecule). For example, if the pattern is a naphthalene and the molecule to
search has a benzene, that would count as a 60% match.

Is there a way to do that in RDKit?

Thanks a lot!
--
Gustavo Seabra




___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net<mailto:Rdkit-discuss@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Partial substructure match?

2020-11-19 Thread Gustavo Seabra
Hi all,

Is it possible to search for *partial* substructure matches using RDKit?

I'm aware of "HasSubstructMatch/ GetSubstructMatch", but my impression is
that it only returns full matches (100%) of the required pattern in a
structure. 

However, what I'd like to do is a bit different: Imagine I have one specific
substructure (scaffold), and I'd like to search for molecules that have the
full substructure *or part of it*, and maybe get the percentage of the
substructure match? (100% = the full substructure is contained in the
molecule). For example, if the pattern is a naphthalene and the molecule to
search has a benzene, that would count as a 60% match.

Is there a way to do that in RDKit?

Thanks a lot!
--
Gustavo Seabra




___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Nitrogen sp2 isomers get the same InChI Key

2020-10-30 Thread Gustavo Seabra
Sure, here is:

1. The question:

"I noticed that compounds that differ only on the cis-trans isomerization
> around an sp2 nitrogen get the same InChI Key from RDKit. For example:
> > inchi_cis =
> Chem.inchi.MolToInchiKey(Chem.MolFromSmiles("C/N=C(/NC#N)NCCSCc1nc[nH]c1C"))
> > inchi_cis
> 'AQIXAKUUQRKLND-UHFFFAOYSA-N'
> > inchi_trans =
> Chem.inchi.MolToInchiKey(Chem.MolFromSmiles("C/N=C(\\NC#N)NCCSCc1nc[nH]c1C"))
> > inchi_trans
> 'AQIXAKUUQRKLND-UHFFFAOYSA-N'
> > inchi_cis == inchi_trans
> True
> I wonder if this is a limitation of the InChI Key definition, or an
> implementation issue.


There answer to the question, in the end, was that the InChI Keys were
behaving as intended, by design, as pointed out by Igor Pletnev:

though InChI is not perfect, in this case it behaves as intended.
> Please see below.
> The discussed molecules contain substituted guanidine fragment
> (RHN)C(=NMe)(NHR')
> It is subjected to tautomerism, and in different tautomers different C-N
> bonds have double order:
> (RHN)C(=NMe)(NHR')
> (RHN)C(NHMe)(=NR')
> (RN=)C(NHMe)(NHR')
> You generated Standard InChI, which is evidenced by "InChI=1S/" prefix in
> the examples.
> Standard InChI is specifically designed to produce the same identifier for
> all tautomers (by indicating that two hydrogens are shared by three
> nitrogen atoms, for any tautomer; bond orders are not indicated in InChI).
> As the tautomer-invariant Std InChI does not know which C-N bond is
> actually a double, there is the only option for treating stereo -- to
> completely ignore it as a drawing artifact.
> All in all:
> Standard InChI means that the exact tautomeric form is unknown ==> all
> tautomers are mapped to the same generic representation ==>  the exact C-N
> double bond placement in this generic is unspecified ==> C-N double bond
> stereo is ignored ==> generated StdInChI and Std InChIKey are the same for
> seemingly different, by initial drawing, cis/trans forms.
> Once again, this behavior is by design; it is intended for maximal
> interoperability while comparing different drawings of the "same" compound.
> If, for any reason, you would like to consider your examples as the
> definite and resolvable structures, each having its own identifier, just
> use non-Standard InChI.
> The InChI which preserves the exact positions of tautomeric H's and double
> bond ("as drawn") is produced by just specifying option /FixedH upon
> generation.
> More on this may be found in InChI FAQ:
> https://www.inchi-trust.org/technical-faq-2/


The only question remaining was how to use this "/FixedH" option in RDKit,
and that was answered by Paolo Tosco:

you can pass InChI options to the underlying InChI API through the
options parameter
> of Chem.inchi.MolToInchi() and  Chem.inchi.MolToInchiKey(); e.g.:
> inchi.MolToInchi(mol, options="/FixedH")
> Source:
> https://www.rdkit.org/docs/source/rdkit.Chem.inchi.html?highlight=inchi#rdkit.Chem.inchi.MolBlockToInchi


And this is what I'm using now to remove duplicate molecules from my
database. I'm using a Pandas DataFrame and, with the more recent versions
of Pandas, the following works fine:

> df['InChI Key'] = df[mol_col].progress_apply(Chem.MolToInchiKey,
options="/FixedH")
> df.drop_duplicates(subset=['InChI Key'], keep='first', inplace=True)

All the best,
--
Gustavo Seabra.


On Fri, Oct 30, 2020 at 4:47 AM Adelene LAI  wrote:

> Hi Gustavo,
>
>
> Looks like you found a solution for your deduplication task. Would you
> mind sharing it with us? (Seems some emails in the chain are missing.)
>
>
> I'm curious - returning to your original question, did we figure out why
> the same InChIKey was given for the stereoisomers?
>
>
> Adelene
>
>
> Doctoral Researcher
>
> Environmental Cheminformatics
>
> UNIVERSITÉ DU LUXEMBOURG
>
>
> LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE
>
> 6, avenue du Swing, L-4367 Belvaux
>
> T +356 46 66 44 67 18
>
> [image: github.png] adelenelai
>
>
>
>
>
> --
> *From:* Gustavo Seabra 
> *Sent:* Thursday, October 29, 2020 10:23:20 PM
> *To:* Paolo Tosco
> *Cc:* Igor Pletnev; RDKit Discuss
> *Subject:* Re: [Rdkit-discuss] Nitrogen sp2 isomers get the same InChI Key
>
> Aha! Fantastic!
>
> Thanks a lot!!
> Gustavo.
>
> --
> Gustavo Seabra
>
> --
> *From:* Paolo Tosco 
> *Sent:* Thursday, October 29, 2020 5:13:33 PM
> *To:* Gustavo Seabra 
> *Cc:* Igor Pletnev ; RDKit Discuss <
> rdkit-discuss@lists.sourceforge.net>
> *Subject:* Re: [Rdkit-discuss] Nitrogen sp2 isomers get the same InChI Key
>
> Hi Gusta

Re: [Rdkit-discuss] Nitrogen sp2 isomers get the same InChI Key

2020-10-29 Thread Gustavo Seabra
Aha! Fantastic!

Thanks a lot!!
Gustavo.

--
Gustavo Seabra


From: Paolo Tosco 
Sent: Thursday, October 29, 2020 5:13:33 PM
To: Gustavo Seabra 
Cc: Igor Pletnev ; RDKit Discuss 

Subject: Re: [Rdkit-discuss] Nitrogen sp2 isomers get the same InChI Key

Hi Gustavo,

you can pass InChI options to the underlying InChI API through the options 
parameter of Chem.inchi.MolToInchi() and  Chem.inchi.MolToInchiKey(); e.g.:

inchi.MolToInchi(mol, options="/FixedH")

Source: 
https://www.rdkit.org/docs/source/rdkit.Chem.inchi.html?highlight=inchi#rdkit.Chem.inchi.MolBlockToInchi

Cheers,
p.

On Thu, Oct 29, 2020 at 9:42 PM Gustavo Seabra 
mailto:gustavo.sea...@gmail.com>> wrote:
Ok, thanks!
--
Gustavo Seabra.


On Thu, Oct 29, 2020 at 4:33 PM Igor Pletnev 
mailto:igor.plet...@gmail.com>> wrote:
>  Is this "/FixedH" an option in RDKit? How to use that? (I don't see it in 
> the docs).

Sorry, I am not so proficient in RDKit and can not answer exactly. Anyway, this 
option is available in InChI API calls, and I am pretty sure that it is also 
available in RDKit.

I recall that couple of years ago, on some InChI event,  Greg Landrum somewhat 
surprised me by saying that he himself often uses non-Standard InChI instead of 
Standard one — exactly to distinguish tautomers.
So I guess Greg can answer on how it is arranged in RDKit.

Regards,
Igor





On Thu, 29 Oct 2020 at 23:03, Gustavo Seabra 
mailto:gustavo.sea...@gmail.com>> wrote:
That does make sense, I understand it now, thanks!

Is this "/FixedH" an option in RDKit? How to use that? (I don't see it in the 
docs).

Thanks,
--
Gustavo Seabra.


On Wed, Oct 28, 2020 at 6:10 PM Igor Pletnev 
mailto:igor.plet...@gmail.com>> wrote:
Hi Gustavo,

>  ... I was generating the InChI Keys to get a unique hash for each compound, 
> thinking it would be better than SMILES (guaranteed to be unique), but is 
> clearly not the case. On the bright side, I won't lose time generating 
> InChIs...

though InChI is not perfect, in this case it behaves as intended.
Please see below.

The discussed molecules contain substituted guanidine fragment 
(RHN)C(=NMe)(NHR')

It is subjected to tautomerism, and in different tautomers different C-N bonds 
have double order:
(RHN)C(=NMe)(NHR')
(RHN)C(NHMe)(=NR')
(RN=)C(NHMe)(NHR')

You generated Standard InChI, which is evidenced by "InChI=1S/" prefix in the 
examples.
Standard InChI is specifically designed to produce the same identifier for all 
tautomers (by indicating that two hydrogens are shared by three nitrogen atoms, 
for any tautomer; bond orders are not indicated in InChI).

As the tautomer-invariant Std InChI does not know which C-N bond is actually a 
double, there is the only option for treating stereo -- to completely ignore it 
as a drawing artifact.

All in all:
Standard InChI means that the exact tautomeric form is unknown ==> all 
tautomers are mapped to the same generic representation ==>  the exact C-N 
double bond placement in this generic is unspecified ==> C-N double bond stereo 
is ignored ==> generated StdInChI and Std InChIKey are the same for seemingly 
different, by initial drawing, cis/trans forms.

Once again, this behavior is by design; it is intended for maximal 
interoperability while comparing different drawings of the "same" compound.

If, for any reason, you would like to consider your examples as the definite 
and resolvable structures, each having its own identifier, just use 
non-Standard InChI.
The InChI which preserves the exact positions of tautomeric H's and double bond 
("as drawn") is produced by just specifying option /FixedH upon generation.

More on this may be found in InChI FAQ:
https://www.inchi-trust.org/technical-faq-2/

Hope this helps.

Regards,
Igor



On Mon, Oct 26, 2020 at 6:56 PM Gustavo Seabra 
mailto:gustavo.sea...@gmail.com>> wrote:
Thanks a lot Peter and Adelene,

Yes, it looks like canonical SMILES is the way to go, and I have no problem 
sticking with RDKit. I was generating the InChI Keys to get a unique hash for 
each compound, thinking it would be better than SMILES (guaranteed to be 
unique), but is clearly not the case. On the bright side, I won't lose time 
generating InChIs...

Can I trust that the same molecule will always get the same canonical SMILES 
from RDKit, independent of how it is read? (Different SDF files, geometries, 
atom orders, etc.?)

All the best,
Gustavo.


--
Gustavo Seabra.


On Sun, Oct 25, 2020 at 8:27 PM Peter S. Shenkin 
mailto:shen...@gmail.com>> wrote:
Canonical SMILES is probably the way to go, but you might also be able to use 
the InchiKey and the Inchi auxiliary information together as a compound hash 
key.

-P.

On Sun, Oct 25, 2020 at 10:53 AM Adelene LAI 
mailto:adelene@uni.lu>> wrote:

Hi Gustavo,


(Sorry, forgot to reply all before...)


Your deduplication task is quite fa

Re: [Rdkit-discuss] Nitrogen sp2 isomers get the same InChI Key

2020-10-29 Thread Gustavo Seabra
Ok, thanks!
--
Gustavo Seabra.


On Thu, Oct 29, 2020 at 4:33 PM Igor Pletnev  wrote:

> >  Is this "/FixedH" an option in RDKit? How to use that? (I don't see it
> in the docs).
>
> Sorry, I am not so proficient in RDKit and can not answer exactly. Anyway,
> this option is available in InChI API calls, and I am pretty sure that it
> is also available in RDKit.
>
> I recall that couple of years ago, on some InChI event,  Greg Landrum
> somewhat surprised me by saying that he himself often uses non-Standard
> InChI instead of Standard one — exactly to distinguish tautomers.
> So I guess Greg can answer on how it is arranged in RDKit.
>
> Regards,
> Igor
>
>
>
>
>
> On Thu, 29 Oct 2020 at 23:03, Gustavo Seabra 
> wrote:
>
>> That does make sense, I understand it now, thanks!
>>
>> Is this "/FixedH" an option in RDKit? How to use that? (I don't see it in
>> the docs).
>>
>> Thanks,
>> --
>> Gustavo Seabra.
>>
>>
>> On Wed, Oct 28, 2020 at 6:10 PM Igor Pletnev 
>> wrote:
>>
>>> Hi Gustavo,
>>>
>>> >  ... I was generating the InChI Keys to get a unique hash for each
>>> compound, thinking it would be better than SMILES (guaranteed to be
>>> unique), but is clearly not the case. On the bright side, I won't lose time
>>> generating InChIs...
>>>
>>> though InChI is not perfect, in this case it behaves as intended.
>>> Please see below.
>>>
>>> The discussed molecules contain substituted guanidine fragment
>>> (RHN)C(=NMe)(NHR')
>>>
>>> It is subjected to tautomerism, and in different tautomers different C-N
>>> bonds have double order:
>>> (RHN)C(=NMe)(NHR')
>>> (RHN)C(NHMe)(=NR')
>>> (RN=)C(NHMe)(NHR')
>>>
>>> You generated Standard InChI, which is evidenced by "InChI=1S/" prefix
>>> in the examples.
>>> Standard InChI is specifically designed to produce the same identifier
>>> for all tautomers (by indicating that two hydrogens are shared by three
>>> nitrogen atoms, for any tautomer; bond orders are not indicated in InChI).
>>>
>>> As the tautomer-invariant Std InChI does not know which C-N bond is
>>> actually a double, there is the only option for treating stereo -- to
>>> completely ignore it as a drawing artifact.
>>>
>>> All in all:
>>> Standard InChI means that the exact tautomeric form is unknown ==> all
>>> tautomers are mapped to the same generic representation ==>  the exact C-N
>>> double bond placement in this generic is unspecified ==> C-N double bond
>>> stereo is ignored ==> generated StdInChI and Std InChIKey are the same for
>>> seemingly different, by initial drawing, cis/trans forms.
>>>
>>> Once again, this behavior is by design; it is intended for maximal
>>> interoperability while comparing different drawings of the "same" compound.
>>>
>>> If, for any reason, you would like to consider your examples as the
>>> definite and resolvable structures, each having its own identifier, just
>>> use non-Standard InChI.
>>> The InChI which preserves the exact positions of tautomeric H's and
>>> double bond ("as drawn") is produced by just specifying option /FixedH upon
>>> generation.
>>>
>>> More on this may be found in InChI FAQ:
>>> https://www.inchi-trust.org/technical-faq-2/
>>>
>>> Hope this helps.
>>>
>>> Regards,
>>> Igor
>>>
>>>
>>>
>>> On Mon, Oct 26, 2020 at 6:56 PM Gustavo Seabra 
>>> wrote:
>>>
>>>> Thanks a lot Peter and Adelene,
>>>>
>>>> Yes, it looks like canonical SMILES is the way to go, and I have no
>>>> problem sticking with RDKit. I was generating the InChI Keys to get a
>>>> unique hash for each compound, thinking it would be better than SMILES
>>>> (guaranteed to be unique), but is clearly not the case. On the bright side,
>>>> I won't lose time generating InChIs...
>>>>
>>>> Can I trust that the same molecule will always get the same canonical
>>>> SMILES from RDKit, independent of how it is read? (Different SDF files,
>>>> geometries, atom orders, etc.?)
>>>>
>>>> All the best,
>>>> Gustavo.
>>>>
>>>>
>>>> --
>>>> Gustavo Seabra.
>>>>
>>>>
>>>> On Sun, Oct 25, 2020 at 8:27 

Re: [Rdkit-discuss] Nitrogen sp2 isomers get the same InChI Key

2020-10-29 Thread Gustavo Seabra
That does make sense, I understand it now, thanks!

Is this "/FixedH" an option in RDKit? How to use that? (I don't see it in
the docs).

Thanks,
--
Gustavo Seabra.


On Wed, Oct 28, 2020 at 6:10 PM Igor Pletnev  wrote:

> Hi Gustavo,
>
> >  ... I was generating the InChI Keys to get a unique hash for each
> compound, thinking it would be better than SMILES (guaranteed to be
> unique), but is clearly not the case. On the bright side, I won't lose time
> generating InChIs...
>
> though InChI is not perfect, in this case it behaves as intended.
> Please see below.
>
> The discussed molecules contain substituted guanidine fragment
> (RHN)C(=NMe)(NHR')
>
> It is subjected to tautomerism, and in different tautomers different C-N
> bonds have double order:
> (RHN)C(=NMe)(NHR')
> (RHN)C(NHMe)(=NR')
> (RN=)C(NHMe)(NHR')
>
> You generated Standard InChI, which is evidenced by "InChI=1S/" prefix in
> the examples.
> Standard InChI is specifically designed to produce the same identifier for
> all tautomers (by indicating that two hydrogens are shared by three
> nitrogen atoms, for any tautomer; bond orders are not indicated in InChI).
>
> As the tautomer-invariant Std InChI does not know which C-N bond is
> actually a double, there is the only option for treating stereo -- to
> completely ignore it as a drawing artifact.
>
> All in all:
> Standard InChI means that the exact tautomeric form is unknown ==> all
> tautomers are mapped to the same generic representation ==>  the exact C-N
> double bond placement in this generic is unspecified ==> C-N double bond
> stereo is ignored ==> generated StdInChI and Std InChIKey are the same for
> seemingly different, by initial drawing, cis/trans forms.
>
> Once again, this behavior is by design; it is intended for maximal
> interoperability while comparing different drawings of the "same" compound.
>
> If, for any reason, you would like to consider your examples as the
> definite and resolvable structures, each having its own identifier, just
> use non-Standard InChI.
> The InChI which preserves the exact positions of tautomeric H's and double
> bond ("as drawn") is produced by just specifying option /FixedH upon
> generation.
>
> More on this may be found in InChI FAQ:
> https://www.inchi-trust.org/technical-faq-2/
>
> Hope this helps.
>
> Regards,
> Igor
>
>
>
> On Mon, Oct 26, 2020 at 6:56 PM Gustavo Seabra 
> wrote:
>
>> Thanks a lot Peter and Adelene,
>>
>> Yes, it looks like canonical SMILES is the way to go, and I have no
>> problem sticking with RDKit. I was generating the InChI Keys to get a
>> unique hash for each compound, thinking it would be better than SMILES
>> (guaranteed to be unique), but is clearly not the case. On the bright side,
>> I won't lose time generating InChIs...
>>
>> Can I trust that the same molecule will always get the same canonical
>> SMILES from RDKit, independent of how it is read? (Different SDF files,
>> geometries, atom orders, etc.?)
>>
>> All the best,
>> Gustavo.
>>
>>
>> --
>> Gustavo Seabra.
>>
>>
>> On Sun, Oct 25, 2020 at 8:27 PM Peter S. Shenkin 
>> wrote:
>>
>>> Canonical SMILES is probably the way to go, but you might also be able
>>> to use the InchiKey and the Inchi auxiliary information together as a
>>> compound hash key.
>>>
>>> -P.
>>>
>>> On Sun, Oct 25, 2020 at 10:53 AM Adelene LAI  wrote:
>>>
>>>> Hi Gustavo,
>>>>
>>>>
>>>> (Sorry, forgot to reply all before...)
>>>>
>>>>
>>>> Your deduplication task is quite familiar to me and something I do
>>>> quite a lot of in my own work ;)
>>>>
>>>>
>>>> Can I suggest deduplicating using Canonical SMILES?
>>>>
>>>>
>>>> It doesn't solve your InChIKey issue, but it is a solution for now.
>>>>
>>>>
>>>> I updated my gist to show that it is feasible:
>>>>
>>>>
>>>> https://gist.github.com/adelenelai/59a8794e1f030941c19bcb50aa8adf3f
>>>>
>>>>
>>>> <https://gist.github.com/adelenelai/59a8794e1f030941c19bcb50aa8adf3f>
>>>>
>>>> Adelene
>>>>
>>>>
>>>>
>>>> Doctoral Researcher
>>>>
>>>> Environmental Cheminformatics
>>>>
>>>> UNIVERSITÉ DU LUXEMBOURG
>>>>
>>>>
>>>> LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE
>>>>

Re: [Rdkit-discuss] Nitrogen sp2 isomers get the same InChI Key

2020-10-26 Thread Gustavo Seabra
Thanks a lot Peter and Adelene,

Yes, it looks like canonical SMILES is the way to go, and I have no problem
sticking with RDKit. I was generating the InChI Keys to get a unique hash
for each compound, thinking it would be better than SMILES (guaranteed to
be unique), but is clearly not the case. On the bright side, I won't lose
time generating InChIs...

Can I trust that the same molecule will always get the same canonical
SMILES from RDKit, independent of how it is read? (Different SDF files,
geometries, atom orders, etc.?)

All the best,
Gustavo.


--
Gustavo Seabra.


On Sun, Oct 25, 2020 at 8:27 PM Peter S. Shenkin  wrote:

> Canonical SMILES is probably the way to go, but you might also be able to
> use the InchiKey and the Inchi auxiliary information together as a compound
> hash key.
>
> -P.
>
> On Sun, Oct 25, 2020 at 10:53 AM Adelene LAI  wrote:
>
>> Hi Gustavo,
>>
>>
>> (Sorry, forgot to reply all before...)
>>
>>
>> Your deduplication task is quite familiar to me and something I do quite
>> a lot of in my own work ;)
>>
>>
>> Can I suggest deduplicating using Canonical SMILES?
>>
>>
>> It doesn't solve your InChIKey issue, but it is a solution for now.
>>
>>
>> I updated my gist to show that it is feasible:
>>
>>
>> https://gist.github.com/adelenelai/59a8794e1f030941c19bcb50aa8adf3f
>>
>>
>> <https://gist.github.com/adelenelai/59a8794e1f030941c19bcb50aa8adf3f>
>>
>> Adelene
>>
>>
>>
>> Doctoral Researcher
>>
>> Environmental Cheminformatics
>>
>> UNIVERSITÉ DU LUXEMBOURG
>>
>>
>> LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE
>>
>> 6, avenue du Swing, L-4367 Belvaux
>>
>> T +356 46 66 44 67 18
>>
>> [image: github.png] adelenelai
>>
>>
>>
>>
>>
>> --
>> *From:* Gustavo Seabra 
>> *Sent:* Sunday, October 25, 2020 2:27:15 PM
>> *To:* Adelene LAI
>> *Subject:* Re: [Rdkit-discuss] Nitrogen sp2 isomers get the same InChI
>> Key
>>
>> Actually,  I was trying to generate all stereoisomers for molecules in a
>> database,  and filter duplicate molecules by using the InChI Key to detect
>> duplicates.  But it gives cis/trans isomers on sp2-N the same Key.
>>
>> Gustavo.
>>
>> --
>> Gustavo Seabra
>>
>> --
>> *From:* Adelene LAI 
>> *Sent:* Sunday, October 25, 2020 1:44:01 AM
>> *To:* Gustavo Seabra 
>> *Subject:* Re: [Rdkit-discuss] Nitrogen sp2 isomers get the same InChI
>> Key
>>
>>
>> Hi Gustavo,
>>
>>
>> It occurred to me while swimming yesterday - was there a reason you
>> pointed out the hybridisation state of N in your original subject text?
>>
>>
>> Was it just to specify which N to focus on, or did you expect something
>> special about sp2 hybridisation wrt InChIKey?
>>
>>
>> Adelene
>>
>>
>> Doctoral Researcher
>>
>> Environmental Cheminformatics
>>
>> UNIVERSITÉ DU LUXEMBOURG
>>
>>
>> LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE
>>
>> 6, avenue du Swing, L-4367 Belvaux
>>
>> T +356 46 66 44 67 18
>>
>> [image: github.png] adelenelai
>>
>>
>>
>>
>>
>> --
>> *From:* Gustavo Seabra 
>> *Sent:* Saturday, October 24, 2020 5:37:09 AM
>> *To:* RDKit Discuss; Adelene LAI
>> *Subject:* Re: [Rdkit-discuss] Nitrogen sp2 isomers get the same InChI
>> Key
>>
>> Thanks for looking into it. I'm happy to see.it wasn't just a mistake by
>> me ;-)
>>
>> I hope we can find what's wrong there.
>>
>> Best,
>> Gustavo.
>>
>> --
>> Gustavo Seabra
>>
>> --
>> *From:* Adelene LAI 
>> *Sent:* Friday, October 23, 2020 11:28:55 PM
>> *To:* Gustavo Seabra ; RDKit Discuss <
>> rdkit-discuss@lists.sourceforge.net>
>> *Subject:* Re: [Rdkit-discuss] Nitrogen sp2 isomers get the same InChI
>> Key
>>
>>
>> Hi Gustavo,
>>
>>
>> <https://gist.github.com/adelenelai/59a8794e1f030941c19bcb50aa8adf3f>
>> https://gist.github.com/adelenelai/59a8794e1f030941c19bcb50aa8adf3f
>>
>>
>> In the gist above, I tried doing some further investigating.
>>
>>
>> It seems for the example you gave, the rdkit functions indeed give the
>> same inchikey and inchi, but different aux info.
>>
>>
>> Why this different aux info doesn't translate into di

Re: [Rdkit-discuss] Nitrogen sp2 isomers get the same InChI Key

2020-10-23 Thread Gustavo Seabra
Thanks for looking into it. I'm happy to see.it wasn't just a mistake by me ;-)

I hope we can find what's wrong there.

Best,
Gustavo.

--
Gustavo Seabra


From: Adelene LAI 
Sent: Friday, October 23, 2020 11:28:55 PM
To: Gustavo Seabra ; RDKit Discuss 

Subject: Re: [Rdkit-discuss] Nitrogen sp2 isomers get the same InChI Key


Hi Gustavo,


<https://gist.github.com/adelenelai/59a8794e1f030941c19bcb50aa8adf3f>https://gist.github.com/adelenelai/59a8794e1f030941c19bcb50aa8adf3f


In the gist above, I tried doing some further investigating.


It seems for the example you gave, the rdkit functions indeed give the same 
inchikey and inchi, but different aux info.


Why this different aux info doesn't translate into different inchikeys/inchis, 
I'm not sure.


Adelene






Doctoral Researcher

Environmental Cheminformatics

UNIVERSITÉ DU LUXEMBOURG


LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE

6, avenue du Swing, L-4367 Belvaux

T +356 46 66 44 67 18

[github.png] adelenelai






From: Gustavo Seabra 
Sent: Friday, October 23, 2020 6:43:07 PM
To: RDKit Discuss
Subject: [Rdkit-discuss] Nitrogen sp2 isomers get the same InChI Key

Hi all,

I run into an issue here, and I'd appreciate your input. I noticed that 
compounds that differ only on the cis-trans isomerization around an sp2 
nitrogen get the same InChI Key from RDKit. For example:

> inchi_cis = 
> Chem.inchi.MolToInchiKey(Chem.MolFromSmiles("C/N=C(/NC#N)NCCSCc1nc[nH]c1C"))
> inchi_cis
'AQIXAKUUQRKLND-UHFFFAOYSA-N'

> inchi_trans = 
> Chem.inchi.MolToInchiKey(Chem.MolFromSmiles("C/N=C(\\NC#N)NCCSCc1nc[nH]c1C"))
> inchi_trans
'AQIXAKUUQRKLND-UHFFFAOYSA-N'

> inchi_cis == inchi_trans
True

I wonder if this is a limitation of the InChI Key definition, or an 
implementation issue.

Thanks a lot,
--
Gustavo Seabra.
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Nitrogen sp2 isomers get the same InChI Key

2020-10-23 Thread Gustavo Seabra
Hi all,

I run into an issue here, and I'd appreciate your input. I noticed that
compounds that differ only on the cis-trans isomerization around an sp2
nitrogen get the same InChI Key from RDKit. For example:

> inchi_cis =
Chem.inchi.MolToInchiKey(Chem.MolFromSmiles("C/N=C(/NC#N)NCCSCc1nc[nH]c1C"))
> inchi_cis
'AQIXAKUUQRKLND-UHFFFAOYSA-N'

> inchi_trans =
Chem.inchi.MolToInchiKey(Chem.MolFromSmiles("C/N=C(\\NC#N)NCCSCc1nc[nH]c1C"))
> inchi_trans
'AQIXAKUUQRKLND-UHFFFAOYSA-N'

> inchi_cis == inchi_trans
True

I wonder if this is a limitation of the InChI Key definition, or an
implementation issue.

Thanks a lot,
--
Gustavo Seabra.
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Converting csv/xls file containing SMILES to .sdf

2020-05-28 Thread Gustavo Seabra
You can open the csv file directly into Schrodinger's Maestro. The free version 
can open CSV files.

--
Gustavo Seabra


From: ITS RDC 
Sent: Thursday, May 28, 2020 9:11:42 AM
To: RDKit Discuss 
Subject: [Rdkit-discuss] Converting csv/xls file containing SMILES to .sdf

Hi all,

I have a list of compounds that I want to know their topological and molecular 
properties to be able to generate a model for QSAR. I have over a hundred 
compounds contained in an MS Excel file in csv format since we only downloaded 
these compounds from existing chemical databases that do not offer the sdf 
format. I think it is not convenient to manually open each compound in ChemDraw 
to pool all compounds. I am looking into PandasTools but the documentation only 
indicated that sdf can be converted to csv and not vice versa. Has anyone 
worked with similar task before? Your response is very much appreciated. Thank 
you.

Joanna
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Multiline legend in MolsToGridImage

2020-04-08 Thread Gustavo Seabra
[image: Screenshot from 2020-04-08 16-28-37.png]

Hi,

Does anyone know how to write multiline legends when using MolsToGridImage?
I've been trying the code [here](
https://sourceforge.net/p/rdkit/mailman/message/35561198/), but nothing
there seems to work for me, as I only get a blank rectangle in place of the
\n or \r symbols... (see picture)

Are there any ideas?
Thanks,
--
Gustavo Seabra.
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDKit Chem.MolFromPDBFile ignores some files...

2020-04-05 Thread Gustavo Seabra
Thanks. Yes, I too understood that it should get the connectivity from the 
distances.

I'm using PDB for it being the output from another program.

I'll see what I can change then.

Thanks,
Gustavo.

--
Gustavo Seabra


From: Alan Kerstjens Medina 
Sent: Sunday, April 5, 2020 9:15:26 AM
To: Gustavo Seabra ; 
rdkit-discuss@lists.sourceforge.net 
Subject: RE: [Rdkit-discuss] RDKit Chem.MolFromPDBFile ignores some files...


Hi Gustavo,



I haven’t looked into the RDKit source code for this but I assume this has to 
do with the lack of CONECT records in the PDB file you attached (i.e. you are 
only storing atom coordinates, not connectivity).



>From what I could gather from the RDKit documentation, the default behaviour 
>for the MolFromPDBFile function is to “sense” bonds based on atom proximity 
>(proximityBonding=True), but I guess that isn’t happening. Maybe someone else 
>could chime in and clarify how to make that feature work as intended.



Is there any particular reason you want to use PDB files for small molecules? 
They tend to be a bit of a headache and not particularly efficient 
storage-wise. If atom coordinates are important maybe it would be easier to use 
SDF or MOL2 files instead.



Best regards,

Alan



From: Gustavo Seabra<mailto:gustavo.sea...@gmail.com>
Sent: 04 April 2020 22:08
To: 
rdkit-discuss@lists.sourceforge.net<mailto:rdkit-discuss@lists.sourceforge.net>
Subject: [Rdkit-discuss] RDKit Chem.MolFromPDBFile ignores some files...



Hi all,

I'm having another problem when reading a PDB file. Some files just return
"None", with no error message at all. For example, the attached file:

>>> Chem.MolFromPDBFile("./a3.pdb")

Does not return a Mol object. Does anyone know what is wrong with this file?
I can open it regularly in other programs. Is there any way to "force" rdkit
to recognize the file?

Thanks,
--
Gustavo Seabra


___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] RDKit Chem.MolFromPDBFile ignores some files...

2020-04-04 Thread Gustavo Seabra
Hi all,

I'm having another problem when reading a PDB file. Some files just return
"None", with no error message at all. For example, the attached file:

>>> Chem.MolFromPDBFile("./a3.pdb")

Does not return a Mol object. Does anyone know what is wrong with this file?
I can open it regularly in other programs. Is there any way to "force" rdkit
to recognize the file?

Thanks,
--
Gustavo Seabra



a3.pdb
Description: Binary data
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Help mapping atoms between two files

2020-04-04 Thread Gustavo Seabra
HI all, 

I'm trying to use get the substructure matches between two different PDB
files with the same molecule, but different atom order and naming. However,
GetSubstructMatches Just returns nothing, i.e. no matches (files attached):

For example:
>>> ref_mol = Chem.MolFromPDBFile(str("a1.pdb"))
>>> tgt_mol = Chem.MolFromPDBFile(str("a2.pdb"))

>>> ref_mol.GetNumAtoms(),tgt_mol.GetNumAtoms()
(27, 27)

>>> ref_mol.GetSubstructMatches(tgt_mol)
()

>>> ref_mol.HasSubstructMatch(tgt_mol)
False

Could anyone here suggest a different way to get the atom mapping between
the two molecules?

Thanks a lot,
--
Gustavo Seabra



a1.pdb
Description: Binary data


a2.pdb
Description: Binary data
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] PandasTools LoadSDF: Different treatment of SMILES depending on presence of 'MOL' column?

2019-09-23 Thread Gustavo Seabra
Hi all,

 

I'm trying to load a DrugBank library into a Pandas DataFrame, using two
different possibilities: creating or not a 'mol' column during load. In
principle I'm only interested in the SMILES, so creating the 'Mol' column
should not be necessary.

 

However, I noticed that the two procedures actually generate a different
number of molecules, and the SMILES are not necessarily the same: 

 

1.   Creating 'Mol' column: 2,410 molecules

2.   Not creating the 'Mol' column: 2,413 molecules

 

I assumed the difference would be due to some molecules which RDKit could
not generate the 'Mol' column for some reason and then just silently dropped
the molecules. So, I tried to find out the difference between the sets by:

 

>>>
drugbank.merge(drugbank_nomol,how='outer',on='SMILES',indicator=True).loc[
lambda x: x['_merge'] == 'right_only']

 

Which, assuming the SMILES are the same, *should* be 3, but it returns 1865
records (!) meaning the SMILES are mostly different between the sets.

 

Could someone help me figure out what is going on here?

 

To avoid attach files here, I put a test database and a Jupyter Notebook
with the example in here:

https://www.dropbox.com/s/v8kf7vzpmrjkidl/RDKit_test.zip?dl=0

 

Thanks a lot!

--

Gustavo Seabra

 

 

___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss