[Rdkit-discuss] FindMolChiralCenters finds centers in achiral bicyclo systems

2021-09-27 Thread Murphy, Sean via Rdkit-discuss
I noticed Chem.FindMolChiralCenters is finding one or two chiral centers in the 
bicyclo systems below which I believe do not have any chiral centers.

import rdkit
print(rdkit.__version__)
m1 = Chem.MolFromSmiles("NC12CC(C2)C1")   # bicyclo[1.1.1]pentan-1-amine
m2 = Chem.MolFromSmiles("N1(CC2)CCC2CC1") # quinuclidine
m3 = Chem.MolFromSmiles("C1(C2)CCC2CC1")  # bicyclo[2.2.1]heptane
print(Chem.FindMolChiralCenters(m1,includeUnassigned=True,useLegacyImplementation=False))
print(Chem.FindMolChiralCenters(m2,includeUnassigned=True,useLegacyImplementation=False))
print(Chem.FindMolChiralCenters(m3,includeUnassigned=True,useLegacyImplementation=False))


2021.03.4
[(1, '?'), (3, '?')]
[(0, '?'), (5, '?')]
[(0, '?')]
[(0, '?')]

Sean Murphy
Takeda San Diego
Medicinal Chemistry

The content of this email and of any files transmitted may contain 
confidential, proprietary or legally privileged information and is intended 
solely for the use of the person/s or entity/ies to whom it is addressed. If 
you have received this email in error you have no permission whatsoever to use, 
copy, disclose or forward all or any of its contents. Please immediately notify 
the sender and thereafter delete this email and any attachments.
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Parsing a PDB file with atoms that are too close, causing bad bond

2021-09-27 Thread Francois Berenger

On 27/09/2021 19:22, Lewis Martin wrote:

Very interesting - thank you Francois! PDB re-do does the trick:

import requests
from rdkit import Chem

def getPDB(code):
out =
requests.get(f'https://pdb-redo.eu/db/{code}/{code}_final.pdb')
return out.content

pdb_string = getPDB('3udn')
Chem.MolFromPDBBlock(pdb_string)

I think this solves it for me, but if anyone knows how to infer
correct bonding information without relying on distances, I'd love to
hear it too! So far I've noticed that Parmed and PDBFixer infer
correct bonds, but they don't determine bond orders, so it's difficult
to port the molecule into RDKit.


I just remember one paper; it might give you an entry point into the
scientific literature:

Determination of molecular topology and atomic hybridization states from 
heavy atom coordinates

Elaine C. Meng, Richard A. Lewis
https://doi.org/10.1002/jcc.540120716

Regards,
F.


Cheers
Lewis

On Mon, Sep 27, 2021 at 5:55 PM Francois Berenger 
wrote:


Hi Lewis,

Just an idea: you might try to load your PDB in UCSF Chimera, then
save it as a mol2 or sdf file.
Then, try to read this sdf file from rdkit.

Another idea: try to get your pdb file through the pdbredo service.
https://pdb-redo.eu/
They might have fixed a few things; maybe this PDB will read better
in
rdkit.

Regards,
F.

On 26/09/2021 17:02, Lewis Martin wrote:

Hi RDKit,
While parsing proteins from the PBD with RDKit, I've come across
situations where the distance-based bond determination leads to
'incorrect' bonds between atoms that are erroneously too close
together. PDB files have no bond information, so it's not really
'incorrect' (rather the model coordinates are off), but the bonds

are

nonphysical - and it means the Mol objects won't sanitize.

Here's an example:

import requests
from io import BytesIO
import gzip
from rdkit import Chem

def getPDB(code):
out =
requests.get(f'https://files.rcsb.org/download/{code}.pdb1.gz [1]

[1]')

binary_stream =  BytesIO(out.content)
return gzip.open(binary_stream).read()

pdb_string = getPDB('3udn')
Chem.MolFromPDBBlock(pdb_string)

Error is:

RDKit ERROR: [22:38:21] Explicit valence for atom # 573 O, 3, is
greater than permitted

This is caused by the threonine 72 sidechain being too close to

the

TYR71 backbone carbonyl oxygen (this can be visualized at


https://www.rcsb.org/3d-view/3UDN?preset=ligandInteraction=09B
,

TYR71 is near the ligand).

Does anyone know how to avoid this to create a Chem.Mol? I've

tried

using Parmed and PDBFixer, since they use residue templates to
generate the correct bonding topology, but they don't write CONECT
records or SDFs, so the bonds are still lost to RDKit.

Thanks for your time!
Lewis
PS - why not just use PDBFixer? I'm trying to calculate atom
invariants using RDKit's morgan fingerprinter implementation, so
ultimately I want a sanitized Mol object

Links:
--
[1] https://files.rcsb.org/download/%7Bcode%7D.pdb1.gz
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss



Links:
--
[1] https://files.rcsb.org/download/%7Bcode%7D.pdb1.gz
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] MFP question about similar substructures and feature reduction

2021-09-27 Thread Natasha Gupta
Hello,

Apologies. this is a very basic question:
If I am converting many ligands into morgan fingerprints, could I
theoretically stack the bit representations on top of each other to get the
same features represented across ligands? For example is the below
representation correct?

| sample | feature1 | feature2 | feature3 |
|:   |::|::|-:|
| 1  | bit 1| bit 2| bit 3|
| 2  | bit 1| bit 2| bit 3|
| 3  | bit 1| bit 2| bit 3|

So basically is feature 1, 2, 3 etc always one type of substructure no
matter what the input molecule is? What happens if the 2048 bits or
substructures predesignated in rdkit do not contain a new substructure in a
molecule we are evaluating?

Any advice on how to reduce features and then use that reduced feature list
for new molecules after training a model would also be appreciated. How
would the model only extract the reduced bits for a new ligand if I remove
low variance bits from the training set for example?
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] validating stereochemistry

2021-09-27 Thread Tim Dudgeon
Thanks Brian. Didn't know that existed! Very useful.
As you mention, it is a lot slower to do the check.
Tim

On Mon, Sep 27, 2021 at 1:30 PM Brian Cole  wrote:

> Good Morning Tim,
>
> The RDKit EnumerateStereoisomers function accomplishes this through the
> ‘tryEmbedding’ flag:
> https://github.com/rdkit/rdkit/blob/d20e5cadc81bf6c7b4e590124866f178f2f2fe28/rdkit/Chem/EnumerateStereoisomers.py#L8
>
> It attempts to generate a 3D conformer for the given stereo configuration
> and fails the configuration if the conformer isn’t reasonable. Not fast,
> but it is reliable.
>
> -Brian
>
>
> On Sep 27, 2021, at 8:06 AM, Tim Dudgeon  wrote:
>
> 
> I have Python code to enumerate undefined chiral centres in a molecule.
> Mostly this works fine, but for some constrained structures this can
> generate stereochemistry that makes no sense. For
> instance consider NC1CC2CCC1C2:
> <#1 (2).png>
>
>
> These two make sense:
> <#2.png>
>
> <#3.png>
>
>
> But these two don't:
> <#5.png>
>
> <#4.png>
>
>
> Is there a way to filter out the invalid ones?
>
> Thanks
> Tim
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] validating stereochemistry

2021-09-27 Thread Giovanni Tricarico
Hello,
very interesting.

I think it is worth raising a related point regarding the representation of 
stereochemistry for such types of molecules, because it’s important not only to 
validate them, but to interpret them correctly, too.
See this document: http://publications.iupac.org/pac/2006/pdf/7810x1897.pdf
Page 1926, at the bottom, shows this:

[cid:image006.png@01D7B3AF.AB916440]

The question is: when you submit to rdkit a molecule with the wedge pointing to 
the exocyclic H, is the stereochemistry correctly interpreted by rdkit 
(regardless of the conformational possibility of it existing or not)?

Using Biovia Draw, with the various possibilities (I changed a bit the molecule 
in the OP to make the CIP assignment more obvious), you see that indeed it’s 
not as easy as it seems:
[cid:image007.jpg@01D7B3B1.6B11C930]
If the 6-membered ring bonds are all on the plane, what does an up wedge to the 
top H atom mean?
Presumably, that the bond ‘in the middle’ of the ring, the one going from C to 
NH, is below the plane.
In that case,  the stereochemistry for the top C is R.
And as you can see, the software ‘gets it wrong’ when using the exocyclic wedge.

Perhaps it would be interesting to know what rdkit makes of a molecule like the 
first one on the left.
Does it ‘read’ the stereochemistry correctly?

brg
Giovanni

From: Tim Dudgeon 
Sent: 27 September 2021 14:04
To: RDKit Discuss 
Subject: [Rdkit-discuss] validating stereochemistry

*** CAUTION : External e-mail ***

I have Python code to enumerate undefined chiral centres in a molecule.
Mostly this works fine, but for some constrained structures this can generate 
stereochemistry that makes no sense. For instance consider NC1CC2CCC1C2:
[cid:image001.png@01D7B3AF.4C33C970]

These two make sense:
[cid:image002.png@01D7B3AF.4C33C970]
[cid:image003.png@01D7B3AF.4C33C970]

But these two don't:
[cid:image004.png@01D7B3AF.4C33C970]
[cid:image005.png@01D7B3AF.4C33C970]

Is there a way to filter out the invalid ones?

Thanks
Tim
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Parsing a PDB file with atoms that are too close, causing bad bond

2021-09-27 Thread Maciek Wójcikowski
Hi Lewis,

You can try to use PreparePDBMol in oddt
https://github.com/oddt/oddt/blob/master/oddt/toolkits/extras/rdkit/fixer.py#L623-L669
that we used in PLEC model training and PDBFixer didn't worked for us
either. Note that as soon as you have correct bonding you can disable
automatic bonding in RDKit using proximityBonding=False.

Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl


pon., 27 wrz 2021 o 12:25 Lewis Martin 
napisał(a):

> Very interesting - thank you Francois! PDB re-do does the trick:
>
>
>
>
>
>
>
>
>
> *import requestsfrom rdkit import Chemdef getPDB(code):out =
> requests.get(f'https://pdb-redo.eu/db/{code}/{code}_final.pdb
> ')return
> out.contentpdb_string = getPDB('3udn')Chem.MolFromPDBBlock(pdb_string)*
>
> I think this solves it for me, but if anyone knows how to infer correct
> bonding information without relying on distances, I'd love to hear it too!
> So far I've noticed that Parmed and PDBFixer infer correct bonds, but they
> don't determine bond orders, so it's difficult to port the molecule into
> RDKit.
>
> Cheers
> Lewis
>
>
>
> On Mon, Sep 27, 2021 at 5:55 PM Francois Berenger 
> wrote:
>
>> Hi Lewis,
>>
>> Just an idea: you might try to load your PDB in UCSF Chimera, then
>> save it as a mol2 or sdf file.
>> Then, try to read this sdf file from rdkit.
>>
>> Another idea: try to get your pdb file through the pdbredo service.
>> https://pdb-redo.eu/
>> They might have fixed a few things; maybe this PDB will read better in
>> rdkit.
>>
>> Regards,
>> F.
>>
>> On 26/09/2021 17:02, Lewis Martin wrote:
>> > Hi RDKit,
>> > While parsing proteins from the PBD with RDKit, I've come across
>> > situations where the distance-based bond determination leads to
>> > 'incorrect' bonds between atoms that are erroneously too close
>> > together. PDB files have no bond information, so it's not really
>> > 'incorrect' (rather the model coordinates are off), but the bonds are
>> > nonphysical - and it means the Mol objects won't sanitize.
>> >
>> > Here's an example:
>> >
>> > import requests
>> > from io import BytesIO
>> > import gzip
>> > from rdkit import Chem
>> >
>> > def getPDB(code):
>> > out =
>> > requests.get(f'https://files.rcsb.org/download/{code}.pdb1.gz [1]')
>> > binary_stream =  BytesIO(out.content)
>> > return gzip.open(binary_stream).read()
>> >
>> > pdb_string = getPDB('3udn')
>> > Chem.MolFromPDBBlock(pdb_string)
>> >
>> > Error is:
>> >
>> > RDKit ERROR: [22:38:21] Explicit valence for atom # 573 O, 3, is
>> > greater than permitted
>> >
>> > This is caused by the threonine 72 sidechain being too close to the
>> > TYR71 backbone carbonyl oxygen (this can be visualized at
>> > https://www.rcsb.org/3d-view/3UDN?preset=ligandInteraction=09B ,
>> > TYR71 is near the ligand).
>> >
>> > Does anyone know how to avoid this to create a Chem.Mol? I've tried
>> > using Parmed and PDBFixer, since they use residue templates to
>> > generate the correct bonding topology, but they don't write CONECT
>> > records or SDFs, so the bonds are still lost to RDKit.
>> >
>> > Thanks for your time!
>> > Lewis
>> > PS - why not just use PDBFixer? I'm trying to calculate atom
>> > invariants using RDKit's morgan fingerprinter implementation, so
>> > ultimately I want a sanitized Mol object
>> >
>> > Links:
>> > --
>> > [1] https://files.rcsb.org/download/%7Bcode%7D.pdb1.gz
>> > ___
>> > Rdkit-discuss mailing list
>> > Rdkit-discuss@lists.sourceforge.net
>> > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] validating stereochemistry

2021-09-27 Thread Brian Cole
Good Morning Tim, 

The RDKit EnumerateStereoisomers function accomplishes this through the 
‘tryEmbedding’ flag: 
https://github.com/rdkit/rdkit/blob/d20e5cadc81bf6c7b4e590124866f178f2f2fe28/rdkit/Chem/EnumerateStereoisomers.py#L8

It attempts to generate a 3D conformer for the given stereo configuration and 
fails the configuration if the conformer isn’t reasonable. Not fast, but it is 
reliable. 

-Brian

> 
> On Sep 27, 2021, at 8:06 AM, Tim Dudgeon  wrote:
> 
> 
> I have Python code to enumerate undefined chiral centres in a molecule.
> Mostly this works fine, but for some constrained structures this can generate 
> stereochemistry that makes no sense. For instance consider NC1CC2CCC1C2:
> <#1 (2).png>
> 
> 
> These two make sense:
> <#2.png>
> 
> <#3.png>
> 
> 
> But these two don't:
> <#5.png>
> 
> <#4.png>
> 
> 
> Is there a way to filter out the invalid ones?
> 
> Thanks
> Tim
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] validating stereochemistry

2021-09-27 Thread Tim Dudgeon
I have Python code to enumerate undefined chiral centres in a molecule.
Mostly this works fine, but for some constrained structures this can
generate stereochemistry that makes no sense. For
instance consider NC1CC2CCC1C2:
[image: #1 (2).png]

These two make sense:
[image: #2.png]
[image: #3.png]

But these two don't:
[image: #5.png]
[image: #4.png]

Is there a way to filter out the invalid ones?

Thanks
Tim
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Parsing a PDB file with atoms that are too close, causing bad bond

2021-09-27 Thread Lewis Martin
Very interesting - thank you Francois! PDB re-do does the trick:









*import requestsfrom rdkit import Chemdef getPDB(code):out =
requests.get(f'https://pdb-redo.eu/db/{code}/{code}_final.pdb
')return
out.contentpdb_string = getPDB('3udn')Chem.MolFromPDBBlock(pdb_string)*

I think this solves it for me, but if anyone knows how to infer correct
bonding information without relying on distances, I'd love to hear it too!
So far I've noticed that Parmed and PDBFixer infer correct bonds, but they
don't determine bond orders, so it's difficult to port the molecule into
RDKit.

Cheers
Lewis



On Mon, Sep 27, 2021 at 5:55 PM Francois Berenger  wrote:

> Hi Lewis,
>
> Just an idea: you might try to load your PDB in UCSF Chimera, then
> save it as a mol2 or sdf file.
> Then, try to read this sdf file from rdkit.
>
> Another idea: try to get your pdb file through the pdbredo service.
> https://pdb-redo.eu/
> They might have fixed a few things; maybe this PDB will read better in
> rdkit.
>
> Regards,
> F.
>
> On 26/09/2021 17:02, Lewis Martin wrote:
> > Hi RDKit,
> > While parsing proteins from the PBD with RDKit, I've come across
> > situations where the distance-based bond determination leads to
> > 'incorrect' bonds between atoms that are erroneously too close
> > together. PDB files have no bond information, so it's not really
> > 'incorrect' (rather the model coordinates are off), but the bonds are
> > nonphysical - and it means the Mol objects won't sanitize.
> >
> > Here's an example:
> >
> > import requests
> > from io import BytesIO
> > import gzip
> > from rdkit import Chem
> >
> > def getPDB(code):
> > out =
> > requests.get(f'https://files.rcsb.org/download/{code}.pdb1.gz [1]')
> > binary_stream =  BytesIO(out.content)
> > return gzip.open(binary_stream).read()
> >
> > pdb_string = getPDB('3udn')
> > Chem.MolFromPDBBlock(pdb_string)
> >
> > Error is:
> >
> > RDKit ERROR: [22:38:21] Explicit valence for atom # 573 O, 3, is
> > greater than permitted
> >
> > This is caused by the threonine 72 sidechain being too close to the
> > TYR71 backbone carbonyl oxygen (this can be visualized at
> > https://www.rcsb.org/3d-view/3UDN?preset=ligandInteraction=09B ,
> > TYR71 is near the ligand).
> >
> > Does anyone know how to avoid this to create a Chem.Mol? I've tried
> > using Parmed and PDBFixer, since they use residue templates to
> > generate the correct bonding topology, but they don't write CONECT
> > records or SDFs, so the bonds are still lost to RDKit.
> >
> > Thanks for your time!
> > Lewis
> > PS - why not just use PDBFixer? I'm trying to calculate atom
> > invariants using RDKit's morgan fingerprinter implementation, so
> > ultimately I want a sanitized Mol object
> >
> > Links:
> > --
> > [1] https://files.rcsb.org/download/%7Bcode%7D.pdb1.gz
> > ___
> > Rdkit-discuss mailing list
> > Rdkit-discuss@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Parsing a PDB file with atoms that are too close, causing bad bond

2021-09-27 Thread Francois Berenger

Hi Lewis,

Just an idea: you might try to load your PDB in UCSF Chimera, then
save it as a mol2 or sdf file.
Then, try to read this sdf file from rdkit.

Another idea: try to get your pdb file through the pdbredo service.
https://pdb-redo.eu/
They might have fixed a few things; maybe this PDB will read better in 
rdkit.


Regards,
F.

On 26/09/2021 17:02, Lewis Martin wrote:

Hi RDKit,
While parsing proteins from the PBD with RDKit, I've come across
situations where the distance-based bond determination leads to
'incorrect' bonds between atoms that are erroneously too close
together. PDB files have no bond information, so it's not really
'incorrect' (rather the model coordinates are off), but the bonds are
nonphysical - and it means the Mol objects won't sanitize.

Here's an example:

import requests
from io import BytesIO
import gzip
from rdkit import Chem

def getPDB(code):
out =
requests.get(f'https://files.rcsb.org/download/{code}.pdb1.gz [1]')
binary_stream =  BytesIO(out.content)
return gzip.open(binary_stream).read()

pdb_string = getPDB('3udn')
Chem.MolFromPDBBlock(pdb_string)

Error is:

RDKit ERROR: [22:38:21] Explicit valence for atom # 573 O, 3, is
greater than permitted

This is caused by the threonine 72 sidechain being too close to the
TYR71 backbone carbonyl oxygen (this can be visualized at
https://www.rcsb.org/3d-view/3UDN?preset=ligandInteraction=09B ,
TYR71 is near the ligand).

Does anyone know how to avoid this to create a Chem.Mol? I've tried
using Parmed and PDBFixer, since they use residue templates to
generate the correct bonding topology, but they don't write CONECT
records or SDFs, so the bonds are still lost to RDKit.

Thanks for your time!
Lewis
PS - why not just use PDBFixer? I'm trying to calculate atom
invariants using RDKit's morgan fingerprinter implementation, so
ultimately I want a sanitized Mol object

Links:
--
[1] https://files.rcsb.org/download/%7Bcode%7D.pdb1.gz
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss