Re: [Rdkit-discuss] Atom indexing with AssignBondOrdersFromTemplate

2022-07-17 Thread Bennion, Brian via Rdkit-discuss
I don’t know if this is the issue but Are the order of atoms the same between 
the pdb file and the smiles string?


---
Sent from Workspace ONE Boxer

On July 16, 2022 at 8:14:51 PM PDT, He, Amy  wrote:

Dear RDKit experts,



I am new to RDKit and I have a simple question about atom indexing with RDKit. 
I am using RDKit in Python to process a collection of small molecule, for each 
I have the SMILES and a pdb file without explicit hydrogens. To draw the 2D 
images of the molecules, I used AssignBondOrdersFromTemplate to assign the bond 
order.



pdbmol = rdmolfiles.MolFromPDBFile()

smilemol = Chem.MolFromSmiles()



newmol = AllChem.AssignBondOrdersFromTemplate(smilemol, pdbmol)

rdDepictor.Compute2DCoords(newmol)

for atom in newmol.GetAtoms():

atom.SetAtomMapNum(atom.GetIdx())



Using the above lines, I was able to draw the ligand in 2D with Idx of atoms. 
However, I noticed that the Idx of atoms have changed in the newmol… Is there a 
way to keep the original indexing from the PDB file in the newmol (perhaps by 
an alternative way to set bond order)? I wanted to label a couple of atoms in 
the 2D picture, but I only know their Idx in the PDB file, so I really wish to 
keep the original indexing.



Any comments/suggestions would be greatly appreciated. Thank you for your time 
and kind advice in advance!!





Many Thanks,

Amy

--

Amy He

Chemistry Graduate Teaching Assistant

Hadad Research Group

Ohio State University

he.1...@osu.edu





___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] deprotection of dimthyl acetal?

2022-03-03 Thread Bennion, Brian via Rdkit-discuss
Perfect!

I was struggling with the tuple in the reaction syntax.  Adding the comma did 
the trick for the reaction, previously i was using rxn.RunReactants((m)) and 
receiving errors.

very much appreciate the steps for the deprotection.  The reaction is much more 
straightforward for my use case.

Brian



From: Kangway Chuang 
Sent: Wednesday, March 2, 2022 7:31 PM
To: Bennion, Brian 
Cc: rdkit-discuss@lists.sourceforge.net 
Subject: Re: [Rdkit-discuss] deprotection of dimthyl acetal?

Hi Brian,

The Deprotect function will apply any number of deprotections that are stored 
as the DeprotectData (see rdkit.Chem.rdDeprotect.GetDeprotections() to get the 
list). In your case, you can set up a custom DeprotectData deprotection:

from rdkit.Chem.rdDeprotect import DeprotectData

# Set up the deprotection
reaction_class = "hydrolysis"
reaction_smarts = "[CX4;H3][O][C:1][O][CX4;H3]>>[C:1]=[O]"
abbreviation = "(OMe)2"
full_name = "dimethylacetal"
data = DeprotectData(reaction_class, reaction_smarts, abbreviation, full_name)
assert data.isValid()

data is now an instance of DeprotectData, which you can pass into the Deprotect 
function along with your example molecule:

# get an example ketal
mol = Chem.MolFromSmiles('COC1(C1)OC')

# make the call to get the transformed molecule, in this case, cyclohexanone
result = rdkit.Chem.rdDeprotect.Deprotect(mol, deprotections=[data]) # pass in 
your newly created deprotection wrapped in a list

If you're only trying to apply a single chemical reaction, you could just 
create a ChemicalReaction to achieve the same result:

from rdkit.Chem import rdChemReactions

# Set up your chemical reaction
reaction_smarts = "[CX4;H3][O][C:1][O][CX4;H3]>>[C:1]=[O]"
rxn = rdChemReactions.ReactionFromSmarts(reaction_smarts)

# Run your chemical reaction on your molecule
mol = Chem.MolFromSmiles('COC1(C1)OC')
products = rxn.RunReactants((mol,)) # make sure you're passing a tuple

Hope this points you in the right direction.

Best,

Kangway Chuang, Ph. D. (he/him/his)
Senior AI Scientist and Group Lead
AI | ML Department in Research Biology
Genentech Research and Early Development
+1 (805) 754-0058
chuang.kang...@gene.com<mailto:chuang.kang...@gene.com>


On Wed, Mar 2, 2022 at 3:58 PM Bennion, Brian via Rdkit-discuss 
mailto:rdkit-discuss@lists.sourceforge.net>>
 wrote:
Hello All,

I have poking about the docs and some emails trying to find a way to deprotect 
a group compounds.
The docs describe the DeprotectData module but I am not making the connection 
to how that will operate on my molecules.

my tentative smarts is here
[CX4;H3][O][C:1][O][CX4;H3]>>[C:1]=[O]

I was able to access the function and assert the "dataIsValid" for the example 
case.  So I am just missing that next and maybe last step where I can deprotect 
a list of compounds with this module.

Thank you
brian

___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net<mailto:Rdkit-discuss@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss<https://urldefense.us/v3/__https://lists.sourceforge.net/lists/listinfo/rdkit-discuss__;!!G2kpM7uM-TzIFchu!kqoNiJKlKzwrbCG9w7MeHe9lkL5p3Isi20l9HfjyxAxYD8mNpdY2I3npIZ1Nry1n$>


___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] deprotection of dimthyl acetal?

2022-03-02 Thread Bennion, Brian via Rdkit-discuss
Hello All,

I have poking about the docs and some emails trying to find a way to deprotect 
a group compounds.
The docs describe the DeprotectData module but I am not making the connection 
to how that will operate on my molecules.

my tentative smarts is here
[CX4;H3][O][C:1][O][CX4;H3]>>[C:1]=[O]

I was able to access the function and assert the "dataIsValid" for the example 
case.  So I am just missing that next and maybe last step where I can deprotect 
a list of compounds with this module.

Thank you
brian

___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] proper technical term for generating virtual compounds with rdkit and smarts

2020-09-24 Thread Bennion, Brian via Rdkit-discuss
hello

I have a paper in review and is intended for a large audience that has 
synthetic chemists, biologist and comp chem.
One reviewer had issues with the term in-silico syntheses.
I used rdkit and smarts reactions to generate large libraries of compounds for 
our research project.  Is there a better term to use?  I feel "chemical 
enumeration" is just as foreign.

The abstract is below.


The current standard treatment for organophosphate poisoning primarily relies 
on the use of small molecule-based oximes that can efficiently restore 
acetylcholinesterase (AChE) activity.  Despite their efficacy in reactivating 
AChE, the action of drugs like 2-pralidoxime (2-PAM) is primarily limited to 
the peripheral nervous system (PNS) and, thus, provides no protection to the 
central nervous system (CNS).  This lack of action in the CNS stems from the 
ionic nature of the drugs; they cannot cross the blood-brain barrier (BBB) to 
access to any nerve agent-inhibited AChE therein.  In this report, we present a 
small molecule oxime, called LLNL-02, that can diffuse across the BBB for 
reactivation of nerve agent-inhibited AChE in the CNS.  Our 
candidate-development approach utilizes a combination of parallel chemical and 
in - silico syntheses, computational modeling, and a battery of detailed in 
vitro and in vivo assessments that have identified LLNL-02 as a top CNS-active 
candidate against nerve agent poisoning.   Additional experiments to determine 
acute and chronic  toxicity as required for regulatory approval are ongoing.

___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] changes in chirality in rdkit?

2020-07-15 Thread Bennion, Brian via Rdkit-discuss
Ok Greg.  thank you for the sneak preview.

brian


From: Greg Landrum 
Sent: Tuesday, July 14, 2020 11:36 PM
To: Bennion, Brian 
Cc: Rafal Roszak ; RDKit Discuss 

Subject: Re: [Rdkit-discuss] changes in chirality in rdkit?

Hi Brian,

I think you're misinterpreting the drawings. Those two images look like they 
correspond to the same molecule.

The easiest way to check things like this without having to interpret drawings 
is to use Chem.FindMolChiralCenters, which will show you absolute stereo labels 
for all stereoatoms:
In [2]: m1 = 
Chem.MolFromSmiles('OC[C@@H]1O[C@H](Cn2c(N33)nnc2[C@@H]2OCCC2)CC1')
In [3]: m2 = 
Chem.MolFromSmiles('OC[C@H]1CC[C@@H](Cn2c([C@H]3CCCO3)nnc2N22)O1')
In [4]: Chem.FindMolChiralCenters(m1)
Out[4]: [(2, 'R'), (4, 'S'), (16, 'R')]
In [5]: Chem.FindMolChiralCenters(m2)
Out[5]: [(2, 'R'), (5, 'S'), (9, 'R')]
Here's the substructure mapping between those molecules:
In [16]: match = m1.GetSubstructMatch(m2)
In [17]: match[2]
Out[17]: 2
In [18]: match[5]
Out[18]: 4
In [19]: match[9]
Out[19]: 16

The "R" and "S" labels that function produces are not necessarily correct 
according to IUPAC rules (though in this case they are), but they are 
consistently calculated.

As a preview: the 2020.09 RDKit release will include a new CIP calculator using 
the algorithm described in this paper 
https://pubs.acs.org/doi/10.1021/acs.jcim.8b00324. The new code, which I think 
will be quite helpful for people who need CIP labels, was implemented by 
Ricardo Rodriguez Schmidt at Schrodinger and derived from John Mayfield's java 
implementation (https://github.com/SiMolecule/centres). Here's what it says 
about your molecules:
In [6]: from rdkit.Chem import rdCIPLabeler
In [7]: rdCIPLabeler.AssignCIPLabels(m1)
In [8]: rdCIPLabeler.AssignCIPLabels(m2)
In [9]: [(i,x.GetProp("_CIPCode")) for i,x in enumerate(m1.GetAtoms()) if 
x.HasProp('_CIPCode')]
Out[9]: [(2, 'R'), (4, 'S'), (16, 'R')]
In [10]: [(i,x.GetProp("_CIPCode")) for i,x in enumerate(m2.GetAtoms()) if 
x.HasProp('_CIPCode')]
Out[10]: [(2, 'R'), (5, 'S'), (9, 'R')]

Best,
-greg

On Tue, Jul 14, 2020 at 6:36 PM Bennion, Brian via Rdkit-discuss 
mailto:rdkit-discuss@lists.sourceforge.net>>
 wrote:

Hello Rafal,

Nice to see you on this forum.  I completely expect reordering of smiles 
strings from program to program.  This is why I like to convert them to images

The tetrahydrofuran with the methoxy group has inverted stereochemistry of its 
substituents.

The original string is shown in the first image.

[cid:173510855394cff311]



The second string after RDKit processing is shown in the second image here.

[cid:173510855395b16b22]

-Original Message-
From: Rafal Roszak mailto:rmrmg.c...@gmail.com>>
Sent: Tuesday, July 14, 2020 1:25 AM
To: Bennion, Brian mailto:benni...@llnl.gov>>
Cc: Bennion, Brian via Rdkit-discuss 
mailto:rdkit-discuss@lists.sourceforge.net>>
Subject: Re: [Rdkit-discuss] changes in chirality in rdkit?



Hello Brain,





> The original smiles string

> "OC[C@@H]1O[C@H](Cn2c(N33)nnc2[C@@H]2OCCC2)CC1"

>

> after conversion with rdkit

> "OC[C@H]1CC[C@@H](Cn2c([C@H]3CCCO3)nnc2N22)O1"



After visualisation it seems to me that both smiles represent the same 
structure (stereochemistry is the same, just molecule orientation is diffrent). 
Canonical smiles from rdkit not allways is the same like canonical smiles from 
other programs. If you want to prevent atom order you can try use option 
canonical=False. See example below:



>>> mol1=Chem.MolFromSmiles('OC[C@@H]1O[C@H](Cn2c(N33)nnc2[C@@H]2OCC

>>> C2)CC1')

>>> mol2=Chem.MolFromSmiles('OC[C@H]1CC[C@@H](Cn2c([C@H]3CCCO3)nnc2N2CCC

>>> C2)O1')

>>> Chem.MolToSmiles(mol1)

'OC[C@H]1CC[C@@H](Cn2c([C@H]3CCCO3)nnc2N22)O1'

>>> Chem.MolToSmiles(mol2)

'OC[C@H]1CC[C@@H](Cn2c([C@H]3CCCO3)nnc2N22)O1'

#canonical smiles for both smiles are the same (above) but without 
canonicalisation you will get diffrent smiles:

>>> Chem.MolToSmiles(mol1, canonical=False)

'OC[C@@H]1O[C@H](Cn2c(N33)nnc2[C@@H]2OCCC2)CC1'

>>> Chem.MolToSmiles(mol2, canonical=False)

'OC[C@H]1CC[C@@H](Cn2c([C@H]3CCCO3)nnc2N22)O1'





Best,



Rafal



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net<mailto:Rdkit-discuss@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] changes in chirality in rdkit?

2020-07-14 Thread Bennion, Brian via Rdkit-discuss
Hello Rafal,

Nice to see you on this forum.  I completely expect reordering of smiles 
strings from program to program.  This is why I like to convert them to images

The tetrahydrofuran with the methoxy group has inverted stereochemistry of its 
substituents.

The original string is shown in the first image.

[cid:image001.png@01D659C1.FD411770]



The second string after RDKit processing is shown in the second image here.

[cid:image002.png@01D659C1.FD411770]

-Original Message-
From: Rafal Roszak 
Sent: Tuesday, July 14, 2020 1:25 AM
To: Bennion, Brian 
Cc: Bennion, Brian via Rdkit-discuss 
Subject: Re: [Rdkit-discuss] changes in chirality in rdkit?



Hello Brain,





> The original smiles string

> "OC[C@@H]1O[C@H](Cn2c(N33)nnc2[C@@H]2OCCC2)CC1"

>

> after conversion with rdkit

> "OC[C@H]1CC[C@@H](Cn2c([C@H]3CCCO3)nnc2N22)O1"



After visualisation it seems to me that both smiles represent the same 
structure (stereochemistry is the same, just molecule orientation is diffrent). 
Canonical smiles from rdkit not allways is the same like canonical smiles from 
other programs. If you want to prevent atom order you can try use option 
canonical=False. See example below:



>>> mol1=Chem.MolFromSmiles('OC[C@@H]1O[C@H](Cn2c(N33)nnc2[C@@H]2OCC

>>> C2)CC1')

>>> mol2=Chem.MolFromSmiles('OC[C@H]1CC[C@@H](Cn2c([C@H]3CCCO3)nnc2N2CCC

>>> C2)O1')

>>> Chem.MolToSmiles(mol1)

'OC[C@H]1CC[C@@H](Cn2c([C@H]3CCCO3)nnc2N22)O1'

>>> Chem.MolToSmiles(mol2)

'OC[C@H]1CC[C@@H](Cn2c([C@H]3CCCO3)nnc2N22)O1'

#canonical smiles for both smiles are the same (above) but without 
canonicalisation you will get diffrent smiles:

>>> Chem.MolToSmiles(mol1, canonical=False)

'OC[C@@H]1O[C@H](Cn2c(N33)nnc2[C@@H]2OCCC2)CC1'

>>> Chem.MolToSmiles(mol2, canonical=False)

'OC[C@H]1CC[C@@H](Cn2c([C@H]3CCCO3)nnc2N22)O1'





Best,



Rafal


___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] changes in chirality in rdkit?

2020-07-13 Thread Bennion, Brian via Rdkit-discuss
hello,
I am "translating" smiles strings output in a csv file  from another program 
into RDKit canonical strings with this code.
If there is something that I am doing incorrectly I would appreciate the input.
thanks
brian bennion


The original  smiles string

"OC[C@@H]1O[C@H](Cn2c(N33)nnc2[C@@H]2OCCC2)CC1"


after conversion with rdkit

"OC[C@H]1CC[C@@H](Cn2c([C@H]3CCCO3)nnc2N22)O1"

my code is below.

   protn_pat = re.compile(r'\[([IBnN])\+(@*)(H[1234]*)*\]')

   line = inFile.readline()
   while len(line) != 0:
fields = line.replace('","',' ').split()
mol_name = fields[2]
molMOE = fields[3].replace('"','')
mol1check = protn_pat.search(molMOE)
if mol1check is not None:
   print("Found crazy MOE string",mol1check,molMOE)
   mol1 = protn_pat.sub(r'[\1\3\2+]',molMOE)
else:
   mol1 = molMOE
try:
mol = Chem.MolFromSmiles(mol1)
except:
mol = None
if mol is None:
print('mol failed:'+molMOE+' '+mol1+' '+str(count)+'\n')

else:
rdkitsmichiout.write('\"'+Chem.MolToSmiles(mol, 
isomericSmiles=True)+'\",')

rdkitsmichiout.write('\"'+Chem.inchi.MolToInchi(mol,options='/FixedH')+'\",')

rdkitsmichiout.write('\"'+(Chem.inchi.InchiToInchiKey(Chem.inchi.MolToInchi(mol,options='/FixedH')))+'\"\n')

___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] trying to figure out what an rdkit warning means

2020-06-12 Thread Bennion, Brian via Rdkit-discuss
When I was looking for Inchi options on the rdkit docs. this is what I found.

rdkit.Chem.inchi.MolToInchi(mol, options='', logLevel=None, 
treatWarningAsError=False)

Returns the standard InChI string for a molecule

Keyword arguments: logLevel – the log level used for logging logs and messages 
from InChI API. set to None to diable the logging completely 
treatWarningAsError – set to True to raise an exception in case of a molecule 
that generates warning in calling InChI API. The resultant InChI string and 
AuxInfo string as well as the error message are encoded in the exception.

Returns: the standard InChI string returned by InChI API for the input molecule

As far as viewing the smiles strings to 2D structure, i have been using an web 
service openmolecule.org.  So that engine might be translating the smiles 
string and doing similar things as the sanitize function in rdkit is doing, if 
its not just using rdkit as well.

Brian


From: Greg Landrum 
Sent: Friday, June 12, 2020 7:06 AM
To: Bennion, Brian 
Cc: rdkit-discuss 
Subject: Re: [Rdkit-discuss] trying to figure out what an rdkit warning means



On Thu, Jun 11, 2020 at 4:04 PM Bennion, Brian 
mailto:benni...@llnl.gov>> wrote:
Thank you for the explanation Greg. When the smiles strings are viewed I see 
the E designation for them two trans double bonds. What other double bond is 
missing ?


How do you view the SMILES strings? The way you are currently constructing the 
molecule (without sanitization) means that the RDKit doesn't see the 
stereochemistry information that's present in them.

Also, is it possible within RDKit to activate the fixedH layer in the inchi 
creation?

Sure, all of the InChI options can be provided just like you would on the 
command line to the InChI executable:
In [54]: m1 = Chem.MolFromSmiles('CC1=CNC=N1')
In [55]: m2 = Chem.MolFromSmiles('CC1=CN=CN1')
In [58]: Chem.MolToInchi(m1,options='/FixedH')
Out[58]: 'InChI=1/C4H6N2/c1-4-2-5-3-6-4/h2-3H,1H3,(H,5,6)/f/h5H'
In [59]: Chem.MolToInchi(m2,options='/FixedH')
Out[59]: 'InChI=1/C4H6N2/c1-4-2-5-3-6-4/h2-3H,1H3,(H,5,6)/f/h6H'
In [60]: Chem.MolToInchi(m1)==Chem.MolToInchi(m2)
Out[60]: True

Best,
-greg
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] trying to figure out what an rdkit warning means

2020-06-11 Thread Bennion, Brian via Rdkit-discuss
Thank you for the explanation Greg. When the smiles strings are viewed I see 
the E designation for them two trans double bonds. What other double bond is 
missing ?

Also, is it possible within RDKit to activate the fixedH layer in the inchi 
creation?

Brian


---
Sent from Workspace ONE Boxer<https://whatisworkspaceone.com/boxer>

On June 11, 2020 at 12:13:10 AM PDT, Greg Landrum  
wrote:
Hi Brian,

The warning is actually because you have double bonds with unspecified 
stereochemistry.
You are skipping sanitization of the molecules. When you do this no 
stereochemistry perception is done, so the InChI code is called without any 
stereochemistry information and you get the warning.
If you construct the molecule "normally" (i.e. with sanitization) you get the 
correct InChI and no warning:
In [57]: m = 
Chem.MolFromSmiles(r'O=C(/C=C/c1c1)c1ccc(OC/C=C(/CC/C=C(\C)/C)\C)cc1')
In [58]: Chem.MolToInchi(m)
Out[58]: 
'InChI=1S/C25H28O2/c1-20(2)8-7-9-21(3)18-19-27-24-15-13-23(14-16-24)25(26)17-12-22-10-5-4-6-11-22/h4-6,8,10-18H,7,9,19H2,1-3H3/b17-12+,21-18+'

If you really want to call the InChI code without sanitizing the molecules and 
want the stereochemistry to be correct, you have to do a bit more work:
In [63]: m = 
Chem.MolFromSmiles(r'O=C(/C=C/c1c1)c1ccc(OC/C=C(/CC/C=C(\C)/C)\C)cc1',sanitize=False)
In [64]: m.UpdatePropertyCache(strict=False)
In [65]: Chem.AssignStereochemistry(m)
In [66]: Chem.MolToInchi(m)
Out[66]: 
'InChI=1S/C25H28O2/c1-20(2)8-7-9-21(3)18-19-27-24-15-13-23(14-16-24)25(26)17-12-22-10-5-4-6-11-22/h4-6,8,10-18H,7,9,19H2,1-3H3/b17-12+,21-18+'

Best,
-greg


On Thu, Jun 11, 2020 at 3:46 AM Bennion, Brian via Rdkit-discuss 
mailto:rdkit-discuss@lists.sourceforge.net>>
 wrote:
Hello,
Below I show a smiles string from MOE and the smiles string calculated from 
RDKit and the InChI string calculated by RDkit(2020_1).

The error on conversion to inchi string is confusing me after entering both 
smiles strings into a viewer I don't see any undefined stereo center.

O=C(/C=C/c1c1)c1ccc(OC/C=C(/CC/C=C(\C)/C)\C)cc1
CC(C)=CCC/C(C)=C/COc1ccc(C(=O)/C=C/c2c2)cc1
[18:10:42] WARNING: Omitted undefined stereo
InChI=1S/C25H28O2/c1-20(2)8-7-9-21(3)18-19-27-24-15-13-23(14-16-24)25(26)17-12-22-10-5-4-6-11-22/h4-6,8,10-18H,7,9,19H2,1-3H3


   while len(line) != 0:
fields = line.replace('","',' ').split()
mol1 = fields[0].replace('"','')
mol_name = fields[1]

try:
mol = Chem.MolFromSmiles(mol1,sanitize=False) #, removeHs=False)
except:
mol = None
if mol is None:
print("mol1 failed:",mol1)
output.write("mol1 failes:",mol1)
else:
rkditsmiout.write('\"'+Chem.MolToSmiles(mol, 
isomericSmiles=True)+'\"\n')
print(Chem.MolToSmiles(mol, isomericSmiles=True))
rkditsmiout.write('\"'+Chem.inchi.MolToInchi(mol)+'\"\n')
print(Chem.inchi.MolToInchi(mol))
count += 1
print(count)

___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net<mailto:Rdkit-discuss@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] trying to figure out what an rdkit warning means

2020-06-10 Thread Bennion, Brian via Rdkit-discuss
Hello,
Below I show a smiles string from MOE and the smiles string calculated from 
RDKit and the InChI string calculated by RDkit(2020_1).

The error on conversion to inchi string is confusing me after entering both 
smiles strings into a viewer I don't see any undefined stereo center.

O=C(/C=C/c1c1)c1ccc(OC/C=C(/CC/C=C(\C)/C)\C)cc1
CC(C)=CCC/C(C)=C/COc1ccc(C(=O)/C=C/c2c2)cc1
[18:10:42] WARNING: Omitted undefined stereo
InChI=1S/C25H28O2/c1-20(2)8-7-9-21(3)18-19-27-24-15-13-23(14-16-24)25(26)17-12-22-10-5-4-6-11-22/h4-6,8,10-18H,7,9,19H2,1-3H3


   while len(line) != 0:
fields = line.replace('","',' ').split()
mol1 = fields[0].replace('"','')
mol_name = fields[1]

try:
mol = Chem.MolFromSmiles(mol1,sanitize=False) #, removeHs=False)
except:
mol = None
if mol is None:
print("mol1 failed:",mol1)
output.write("mol1 failes:",mol1)
else:
rkditsmiout.write('\"'+Chem.MolToSmiles(mol, 
isomericSmiles=True)+'\"\n')
print(Chem.MolToSmiles(mol, isomericSmiles=True))
rkditsmiout.write('\"'+Chem.inchi.MolToInchi(mol)+'\"\n')
print(Chem.inchi.MolToInchi(mol))
count += 1
print(count)

___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Any known papers on reverse engineering fingerprints into structures? --> We have just published a preprint to this!

2020-05-15 Thread Bennion, Brian via Rdkit-discuss
Thank you for the link. I will look at it!


---
Sent from Workspace ONE Boxer

On May 14, 2020 at 11:25:49 PM PDT, Tuan Le  wrote:

Hi Brian,



I was working on a study to deduce molecular structures given ECFP fingerprints 
and came across your open question on the rdkit mailing-list 
(https://www.mail-archive.com/rdkit-discuss@lists.sourceforge.net/msg07851.html).

I really enjoyed reading the discussion in the mailing list which presented 
also references to related work (thanks to Nils !).

We have just published a preprint on a study to reverse-engineer molecular 
structures given ECFP descriptors: 
https://chemrxiv.org/articles/Neuraldecipher_-_Reverse-Engineering_ECFP_Fingerprints_to_Their_Molecular_Structures/12286727.

Our learning approach maps ECFPs to latent molecular descriptors which are then 
decoded back to SMILES representation.



I don’t know how to directly respond to the open thread on sourceforge, so I 
hope sending this email suffices.

(Mail is also sent to rdkit-mailinglist, Nils Weskamp and Andrew Dalke).





Best regards,



Tuan



Tuan Le

Ph.D Student Research Scientist

Machine Learning Research







Bayer AG

Research & Development, Pharmaceuticals

Machine Learning Research

Müllerstr. 178, Building S110/702

13353 Berlin, Germany







___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Open-source business models and the RDKit

2019-03-27 Thread Bennion, Brian via Rdkit-discuss
One of the goals of ATOM is to fund work that will be open sourced.  I think 
any of the partners can choose to hire consultants for the work.


https://atomscience.org/

Atom
atomscience.org
Transforming drug discovery. The Accelerating Therapeutics for Opportunities in 
Medicine (ATOM) consortium is a public-private partnership with the mission of 
transforming drug discovery by accelerating the development of more effective 
therapies for patients.



Brian



From: Andrew Dalke 
Sent: Wednesday, March 27, 2019 4:07:07 AM
To: RDKit Discuss
Subject: Re: [Rdkit-discuss] Open-source business models and the RDKit

On Mar 27, 2019, at 08:24, Francois Berenger  wrote:
> As an open-source project, I feel rdkit is quite successful.
> So, the user community is not so small.
> Some people who cannot contribute time could contribute money to the project
> (especially if it is tax-deductible, I guess).

I think the questions are "why would they contribute money?" and "why haven't 
they contributed money?".

If those questions cannot be answered well, then there's little reason to go 
further down this path to the next question, which is "how do we effectively 
encourage them to contribute money in the future?".

To be clear, Novartis contributed a lot of money for the RDKit development. 
Roche also funded me to develop and contribute the MCS package now part of the 
RDKit core, and the mmpdb project which was contributed to RDKit. These are 
also financial contributions and must not be ignored, and these are not the 
only two organizations which have done that.

But I honestly thought that there would be more interest in hiring my services 
as a consultant, to work on further development of open source software. I feel 
like there are clear economic benefits for companies to fund open source 
packages.

Instead, it feels like the more open source software packages I write and 
release, the fewer leads I get for new consulting work, compared to when I gave 
"I wrote this in-house application for company X that no one else will ever 
use" talks. Perhaps what's easily available for no cost is seen as having no 
value, while that which is hidden, no matter how hacky, is treasured?

My optimism started 20 years ago, when I was still involved with the Biopython 
project. My company offered commercial support for Biopython, and I had NDAs in 
place with several of the other Biopython developers so we could easily be 
funded to work on specific improvements that an organization might need.

I never found someone interested in providing that sort of funding for 
Biopython, and it still looks like that's the case in cheminformatics.

See also 'Roads and Bridges: The Unseen Labor Behind Our Digital 
Infrastructure' (ref. 53 in my paper) for further examples of the difficulties 
in funding open source work. 
https://www.fordfoundation.org/about/library/reports-and-studies/roads-and-bridges-the-unseen-labor-behind-our-digital-infrastructure/


> On Mar 27, 2019, at 10:06, Greg Landrum  wrote:
> If rdkit was accepted at the software freedom conservancy, I understand
> the management fee would be 10%:

There's also Software in the Public Interest, which "serves the free software 
and open source community by facilitating the administrative and financial 
needs of its associated projects", including the Open Bioinformatics 
(ex-)Foundation.

When the OBF was created, it was common for many groups to start their own 
foundations. Since most of the administrative needs are the same for the 
different projects, it makes sense to consolidate.

> A question since I genuinely don't know: is it important to anyone that this 
> go through a not-for-profit entity?

The OBF became a not-for-profit to make it easier to organize the BOSC 
(Bioinformatics Open Source Conference) meetings. Some of the early BOSC 
meetings were run out of someone's personal bank account, and he was personally 
financially liable in case of problems.

Working through a non-profit makes it easier to set up things like summer 
internships (a la Google Summer of Code) and travel support, because the 
payment is less likely to be viewed as a way to get around employment laws. 
Open Bioinformatics has a Travel Fellowship program. I don't know the details.

Looking at the report for 2018 at 
http://spi-inc.org/corporate/annual-reports/2018.pdf , Open Bioinformatics 
spends about $5,000/year for IT and meet ups, an "ordinary income" of $5,400, 
and an equity of $85K.

There's overhead to running a non-profit, like filing paperwork, and that 
requires specialized knowledge. For revenues that small, it really helps to be 
affiliated with an existing umbrella organization. The OBF gave up their 
incorporation in 2012 to be an SPI-associated project.

For what RDKit does now, I see no need to set up/join a foundation. T5 
Informatics can organize an RDKit UGM the same way that any vendor can organize 
a UGM, and 

[Rdkit-discuss] boron compound not recognized by RDkit

2018-09-25 Thread Bennion, Brian via Rdkit-discuss
Hello,
Awhile back I had noticed that rdkit has issues with boron containing 
compounds.  One is below, and I admit it is a strange one. I read in an sdf 
file and write it out after calculating a formal charge on the molecule.
It seems to be read into rdkit ok but writing errored out with "ValueError: 
could not find number of expected rings."
I think it odd that the compound can be read in, but not written out.  Should I 
just ignore this molecule?
Brian




OpenBabel08161816583D

12 30  0  0  0  0  0  0  0  0999 V2000
0.7000   -4.9240   -0.0370 B   0  0  0  0  0  0  0  0  0  0  0  0
1.5320   -2.2270   -0.0390 B   0  0  0  0  0  0  0  0  0  0  0  0
0.0470   -3.3430   -0.0100 B   0  0  0  0  0  0  0  0  0  0  0  0
3.9570   -0.6710   -0.0740 B   0  0  0  0  0  0  0  0  0  0  0  0
   -1.31000.94600.0290 B   0  0  0  0  0  0  0  0  0  0  0  0
   -0.7380   -1.00100.0270 B   0  0  0  0  0  0  0  0  0  0  0  0
3.8030   -1.9300   -0.2150 B   0  0  0  0  0  0  0  0  0  0  0  0
2.05600.0860   -0.0260 B   0  0  0  0  0  0  0  0  0  0  0  0
   -0.7320   -1.1240   -0.0280 B   0  0  0  0  0  0  0  0  0  0  0  0
   -1.8540   -2.5110   -0.1560 B   0  0  0  0  0  0  0  0  0  0  0  0
0.80301.37400.1040 C   0  0  0  0  0  0  0  0  0  0  0  0
1.2660   -0.0510   -0.0220 C   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  1  0  0  0  0
  1  3  1  0  0  0  0
  1  4  1  0  0  0  0
  1  5  1  0  0  0  0
  1  6  1  0  0  0  0
  2  3  1  0  0  0  0
  2  4  1  0  0  0  0
  2  7  1  0  0  0  0
  2  8  1  0  0  0  0
  3  6  1  0  0  0  0
  3  7  1  0  0  0  0
  3 10  1  0  0  0  0
  4  5  1  0  0  0  0
  4  8  1  0  0  0  0
  4  9  1  0  0  0  0
  5  6  1  0  0  0  0
  5  9  1  0  0  0  0
  5 12  1  0  0  0  0
  6 10  1  0  0  0  0
  6 12  1  0  0  0  0
  7  8  1  0  0  0  0
  7 10  1  0  0  0  0
  7 11  1  0  0  0  0
  8  9  1  0  0  0  0
  8 11  1  0  0  0  0
  9 11  1  0  0  0  0
  9 12  1  0  0  0  0
10 11  1  0  0  0  0
10 12  1  0  0  0  0
11 12  1  0  0  0  0
M  END




___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] boron atom/element support in RDkit

2018-06-12 Thread Bennion, Brian via Rdkit-discuss
Hello

Does RDkit support boron in SMILES strings?  We have a number of compounds for 
which rdkit parsing is not successful.  The commonality is that there is a B or 
b listed in the string.


Thank you for any help.

Brian Bennion

LLNL
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] problems with EmbedMol

2018-02-28 Thread Bennion, Brian
Hello Greg and Jan,


This is a real newbie question, but what is the use case for this function?  Is 
it used to generate all possible connections (limited by some distance) between 
3 or more atoms given in a smiles string?


Brian



From: Greg Landrum 
Sent: Wednesday, February 28, 2018 8:53:59 AM
To: Jan Halborg Jensen
Cc: rdkit-discuss@lists.sourceforge.net
Subject: Re: [Rdkit-discuss] problems with EmbedMol

Hi Jan,

It took me much longer than it should have to figure this one out...

The bounds matrix that is returned by GetMoleculeBoundsMatrix() needs to have 
triangle bounds smoothing applied to it before it can be embedded. The bounds 
smoothing process narrows the possible distance ranges between the atoms. 
Here's a quick demo of that.

We start with your example:

In [19]: mol = Chem.MolFromSmiles("CCC")
...: mol = Chem.AddHs(mol)
...: bounds = AllChem.GetMoleculeBoundsMatrix(mol)
...: EmbedLib.EmbedMol(mol,bounds)
...:
---
ValueErrorTraceback (most recent call last)
 in ()
  2 mol = Chem.AddHs(mol)
  3 bounds = AllChem.GetMoleculeBoundsMatrix(mol)
> 4 EmbedLib.EmbedMol(mol,bounds)

c:\Users\glandrum\RDKit_git\rdkit\Chem\Pharm3D\EmbedLib.py in EmbedMol(mol, bm, 
atomMatch, weight, randomSeed, excludedVolumes)
183   for i in range(nAts):
184 weights.append((i, idx, weight))
--> 185   coords = DG.EmbedBoundsMatrix(bm, weights=weights, numZeroFail=1, 
randomSeed=randomSeed)
186   # for row in coords:
187   #  print(', '.join(['%.2f'%x for x in row]))

ValueError: could not embed matrix


But if we do the triangle bounds smoothing things embed without problems:

In [20]: from rdkit import DistanceGeometry

In [21]: DistanceGeometry.DoTriangleSmoothing(bounds)
Out[21]: True

In [22]: EmbedLib.EmbedMol(mol,bounds)

In [23]:

There is a good argument to be made for GetMoleculeBoundsMatrix() returning the 
smoothed bounds matrix by default. I'll put that on the list for the next 
release.

Best,
-greg



On Wed, Feb 28, 2018 at 10:41 AM, Jan Halborg Jensen 
> wrote:
The following code works fine with ethane (CC) but for propane (CCC) or 
anything else I get the following error
ValueError: could not embed matrix

Any ideas or solutions would be appreciated

Best regards, Jan


from rdkit import Chem
from rdkit.Chem import AllChem
from rdkit.Chem.Pharm3D import EmbedLib


mol = Chem.MolFromSmiles("CCC")
mol = Chem.AddHs(mol)
bounds = AllChem.GetMoleculeBoundsMatrix(mol)

EmbedLib.EmbedMol(mol,bounds)
EmbedLib.OptimizeMol(mol, bounds)

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] UFFTYPE error in MMFF minimization

2017-10-20 Thread Bennion, Brian
Hello
In order to bypass errors in UFF typing I am using MMFF94 as a minimization 
forcefield.  However errors about UFF atom type are still occurring for Boron.

staring MMFF94 minimization
CHEMBL2374533
[15:59:34] UFFTYPER: Unrecognized atom type: B_1 (26)
[15:59:34] UFFTYPER: Unrecognized atom type: B_1 (27)
staring MMFF94 minimization
CHEMBL115107

Not sure what things means?  Is UFF atom typing just the default behavior and 
there is no way around it?

Thanks
Brian
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Question about pKa prediction using RDKit

2017-10-16 Thread Bennion, Brian
Hello Jacob,

Have you received any offline replies to this post?

brian



From: Jacob D Durrant 
Sent: Thursday, October 12, 2017 10:21:46 PM
To: rdkit-discuss@lists.sourceforge.net
Subject: [Rdkit-discuss] Question about pKa prediction using RDKit

I've been struggling to implement the SMARTS-based pKa prediction algorithm 
outlined by Crippen here: http://pubs.acs.org/doi/abs/10.1021/ci8001815

This same method has been mentioned elsewhere on this forum:  
https://sourceforge.net/p/rdkit/mailman/message/27318424/ ; 
http://rdkit-discuss.narkive.com/jOHraNs8/crippen-pka-model-in-rdkit

Am I right in thinking that this method has never been successfully implemented?

Assuming not, I'm wondering if anyone else has had a hard time reproducing the 
values listed in that paper's SI using the provided decision tree. For example, 
consider the compound O(C)c1cc(ccc1OC)C(=O)C(O)=O

Running through the decision tree:

Node 2: Does contain [#G6H]C(=O)
Node 4: Does contain [OH][i](=O)*(~*)~*
Node 8: Does contain a[#X]
Node 16: Does contain *~*~*~*~*~*~*~*~*
Node 32: Does contain [i][#G6v2]
Node 64: Does contain [O][i]~[i]~[i]~[i]~[i]~[i]~[i]~[A]
Node 129: Does not contain [OH][i]~[i]~[i]~[i]-*
Node 258: Does contain [OH][i](=O)[i]~[i]~[i]~[i]-*
  Terminal node. 3.184 (2.79)

And yet the paper lists the decision-tree output for that molecule as 1.8.

Am I missing something obvious? I'd appreciate any help the community could 
offer. Having a basic pKa predictor in rdkit would be so useful...

Thanks!



--

Sent from my mobile.
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] UFF atom type errors

2017-10-09 Thread Bennion, Brian
Hello Greg,
The only thought I had was that these atom types just didn’t exist in the UFF 
forcefield and therefore I was just out of luck for compounds with boron 
(borax) and such.
If there are indeed plausible atom types and rdkit is just not picking them, 
then that is another issue all together.

Brian


From: Greg Landrum [mailto:greg.land...@gmail.com]
Sent: Friday, October 06, 2017 10:27 PM
To: Bennion, Brian <benni...@llnl.gov>
Cc: Guillaume GODIN <guillaume.go...@firmenich.com>; RDKit Discuss 
(rdkit-discuss@lists.sourceforge.net) <rdkit-discuss@lists.sourceforge.net>
Subject: Re: [Rdkit-discuss] UFF atom type errors

Yeah, those atom types that you are seeing are the names of the UFF atom types 
that the RDKit assigns.
If an atom has a hybridization/charge-state that's not recognized, you'll get 
those parameterization errors.

If you're aware of a better way to handle this (or a way to sensibly guess 
parameters), please let me know.

-greg


On Fri, Oct 6, 2017 at 8:52 PM, Bennion, Brian 
<benni...@llnl.gov<mailto:benni...@llnl.gov>> wrote:
CHEMBL1796997 is Helium and the same molecule that throws the first UFFTYPER 
warnings.  Originally I had searched my  sdf file only for He1 and found no 
hits.
From the original Goddard paper in 1992, there is only He4+4 atomtype described 
explicitly.  I can reason based on the error that RDKIT is classifying He as 
He1 and then as He for which there is no formal UFF atom type.
Is this presumption correct?

Brian
From: Guillaume GODIN 
[mailto:guillaume.go...@firmenich.com<mailto:guillaume.go...@firmenich.com>]
Sent: Thursday, October 05, 2017 22:59
To: Bennion, Brian <benni...@llnl.gov<mailto:benni...@llnl.gov>>; RDKit Discuss 
(rdkit-discuss@lists.sourceforge.net<mailto:rdkit-discuss@lists.sourceforge.net>)
 
<rdkit-discuss@lists.sourceforge.net<mailto:rdkit-discuss@lists.sourceforge.net>>
Subject: RE: UFF atom type errors


Hello,



Can you provide a small sdf contains molecules that have this issue ?



thanks in advance,



BR,



Dr. Guillaume GODIN
Principal Scientist
Chemoinformatic & Datamining
Innovation
CORPORATE R DIVISION
DIRECT LINE +41 (0)22 780 3645<tel:+41%2022%20780%2036%2045>
MOBILE  +41 (0)79 536 1039<tel:+41%2079%20536%2010%2039>
Firmenich SA
RUE DES JEUNES 1 | CASE POSTALE 239 | CH-1211 GENEVE 8


De : Bennion, Brian <benni...@llnl.gov<mailto:benni...@llnl.gov>>
Envoyé : vendredi 6 octobre 2017 06:11
À : RDKit Discuss 
(rdkit-discuss@lists.sourceforge.net<mailto:rdkit-discuss@lists.sourceforge.net>)
Objet : [Rdkit-discuss] UFF atom type errors

Hello,
As part of my workflow, I am attempting to generate a starting 3D structure 
from a 2D representation loaded from an sdf file.
On certain structures I receive the following UFF errors when attempting to 
minimize the structure.

[20:50:43] UFFTYPER: Unrecognized atom type: He1 (0)
[20:50:43] UFFTYPER: Unrecognized atom type: He (0)
[20:50:45] UFFTYPER: Unrecognized atom type: B_1 (0)
[20:50:45] UFFTYPER: Unrecognized atom type: B_1 (7)
[20:50:45] UFFTYPER: Unrecognized atom type: B_1 (0)
[20:50:45] UFFTYPER: Unrecognized atom type: B_1 (7)

I have searched for these atom labels in my 2D sdf file and they don’t exist.  
So I am not sure how the uff code in rdkit is finding these types.
The relevant code is shown below, in case I am doing something incorrectly.
Any thoughts or suggestions to help me find my mistakes?

Brian


for m in ms:
#add hydrogen atoms to the molecule before generating 3D coordinates
  mHs=Chem.AddHs(m)
#start generating 3D coordinates and optimize the conformation
  embedError=AllChem.EmbedMolecule(mHs,useRandomCoords=True)
  if embedError == 0 :
 UffoptError=AllChem.UFFOptimizeMolecule(mHs,3000)
  elif UffoptError != 0 :
   print ("UFF optimization failed, trying MMFF optimization")
   MMFFoptError=AllChem.MMFFOptimizeMolecule(mHs,3000)
  elif MMFFoptError != 0 :
   print ("MMFF optimizaiton has also failed on: ", molName)
   print ("Continuing on to next molecule")
   continue
  else:
 print("Embedding Failed for: ", molName)
 continue


**
DISCLAIMER
This email and any files transmitted with it, including replies and forwarded 
copies (which may contain alterations) subsequently transmitted from Firmenich, 
are confidential and solely for the use of the intended recipient. The contents 
do not represent the opinion of Firmenich except to the extent that it relates 
to their official business.
**

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/

Re: [Rdkit-discuss] nitrogen valence issues

2017-10-06 Thread Bennion, Brian
The rdkit sdf files were washed with MOE and then read into rdkitagain 
for 3D structure generation.
As there are only a dozen problem cases out of 1.5 million compounds, I just 
removed them from my main file and downloaded the mol files from chembl and 
double check the structures.

Bran

-Original Message-
From: Chris Earnshaw [mailto:cgearns...@gmail.com] 
Sent: Thursday, October 05, 2017 08:46
To: Bennion, Brian <benni...@llnl.gov>
Cc: RDKit Discuss (rdkit-discuss@lists.sourceforge.net) 
<rdkit-discuss@lists.sourceforge.net>
Subject: Re: [Rdkit-discuss] nitrogen valence issues

Hi

Some interesting differences in behaviour compared with my RDkit installation. 
Using the ChEMBL SMILES (freshly downloaded now) -

[NH-][NH+]=NC[C@H]1O[C@@H]2O[C@@H]3[C@@H](CN=[N+]=[N-])O[C@H](O[C@@H]4[C@@H](CN=[N+]=[N-])O[C@H](O[C@@H]5[C@@H](CN=[N+]=[N-])O[C@H](O[C@@H]6[C@@H](CN=[N+]=[N-])O[C@H](O[C@@H]7[C@@H](CN=[N+]=[N-])O[C@H](O[C@@H]8[C@@H](CN=[N+]=[N-])O[C@H](O[C@H]1[C@H](O)[C@H]2O)[C@H](O)[C@H]8O)[C@H](O)[C@H]7O)[C@H](O)[C@H]6O)[C@H](O)[C@H]5O)[C@H](O)[C@H]4O)[C@H](O)[C@H]3O

The problem atoms are the first two. If I convert this to an SD file (using a 
C++ program based on the RDkit libraries) then the resulting SD file contains 
no charge information in the atom block, it's all in M CHG records (which are 
correct) and the problem atoms are still the first two.

The bonding information is incorrect as there's a single bond between the two 
nitrogens and it should be double. Editing the first record in the bond block 
from -
  1  2  1  0
to -
  1  2  2  0
fixes the structure for me, and the resulting SD file can be processed by other 
RDkit programs. I've attached the resulting file in case it helps throw any 
light on what's happening.

I'm puzzled as to why the behaviour is significantly different for you...

Chris

On 5 October 2017 at 15:40, Bennion, Brian <benni...@llnl.gov> wrote:
> The sdf is an rdkit reading of the original smiles string, which if 
> wrong would explain the funky charge settings in the mol block for 
> atoms 84 and 85.  I modified these to 5 and 3 respectively to make the 
> correct charge states, however, that did not resolve the issue.  
> Perhaps the bonding info is also incorrect.  The file is on a remote 
> server so I will repost with attachment if I continue to have problems.
>
> Brian
>
>
> 
> From: Chris Earnshaw <cgearns...@gmail.com>
> Sent: Thursday, October 5, 2017 12:04:02 AM
> To: Bennion, Brian; RDKit Discuss 
> (rdkit-discuss@lists.sourceforge.net)
> Subject: Re: [Rdkit-discuss] nitrogen valence issues
>
> Hi
>
> Be aware that there is a problem with one of the azide groups in
> CHEMBL592333 - in SMILES it's '-N=[NH+]-[NH-]' rather than '-N=[N+]=[N-].
> This doesn't render the structure chemically invalid but it's probably 
> wrong.
>
> What's the provenance of your SD file? It isn't the same as as a fresh 
> download of this structure from CHEMBL, which can be processed by 
> RDkit quite happily (allowing for the structure being wrong!). Is it 
> possible that your file has got corrupted by some other processing step?
>
> Regards,
> Chris
>
> On 5 October 2017 at 03:28, Greg Landrum <greg.land...@gmail.com> wrote:
>>
>> Hi Brian,
>>
>> When you pasted that into the email the formatting of the mol block 
>> did end up screwed up, which makes this hard to reproduce.
>> Could you please attach the mol block to the message as a file?
>>
>> -greg
>>
>> On Thu, Oct 5, 2017 at 2:21 AM, Bennion, Brian <benni...@llnl.gov> wrote:
>>>
>>> Hello,
>>>
>>> After looking at the email list and seeing that this error has 
>>> cropped up several times for aromatic/aliphatic heterocyclic 
>>> nitrogens I still haven’t been able to resolve the valence =4 error 
>>> for one of the azo groups in a molecule that has 7.  The first 
>>> couple of azo groups seem to be interpreted fine.
>>>
>>> Am I doing something incorrect here or is the mol file not formatted 
>>> properly?
>>>
>>> Thanks
>>>
>>> Brian
>>>
>>>
>>>
>>>
>>>
>>> [16:50:29] Explicit valence for atom # 85 N, 4, is greater than 
>>> permitted
>>>
>>> [16:50:29] ERROR: Could not sanitize molecule ending on line 206
>>>
>>> [16:50:29] ERROR: Explicit valence for atom # 85 N, 4, is greater 
>>> than permitted
>>>
>>>
>>>
>>> CHEMBL592333
>>>
>>>3D
>>>
>>>
>>>
>>> 91 98  0  0  0  0  0  0  0  0999 V2000
>>>
>

Re: [Rdkit-discuss] UFF atom type errors

2017-10-06 Thread Bennion, Brian
CHEMBL1796997 is Helium and the same molecule that throws the first UFFTYPER 
warnings.  Originally I had searched my  sdf file only for He1 and found no 
hits.
>From the original Goddard paper in 1992, there is only He4+4 atomtype 
>described explicitly.  I can reason based on the error that RDKIT is 
>classifying He as He1 and then as He for which there is no formal UFF atom 
>type.
Is this presumption correct?

Brian
From: Guillaume GODIN [mailto:guillaume.go...@firmenich.com]
Sent: Thursday, October 05, 2017 22:59
To: Bennion, Brian <benni...@llnl.gov>; RDKit Discuss 
(rdkit-discuss@lists.sourceforge.net) <rdkit-discuss@lists.sourceforge.net>
Subject: RE: UFF atom type errors


Hello,



Can you provide a small sdf contains molecules that have this issue ?



thanks in advance,



BR,



Dr. Guillaume GODIN
Principal Scientist
Chemoinformatic & Datamining
Innovation
CORPORATE R DIVISION
DIRECT LINE +41 (0)22 780 3645
MOBILE  +41 (0)79 536 1039
Firmenich SA
RUE DES JEUNES 1 | CASE POSTALE 239 | CH-1211 GENEVE 8

________
De : Bennion, Brian <benni...@llnl.gov<mailto:benni...@llnl.gov>>
Envoyé : vendredi 6 octobre 2017 06:11
À : RDKit Discuss 
(rdkit-discuss@lists.sourceforge.net<mailto:rdkit-discuss@lists.sourceforge.net>)
Objet : [Rdkit-discuss] UFF atom type errors

Hello,
As part of my workflow, I am attempting to generate a starting 3D structure 
from a 2D representation loaded from an sdf file.
On certain structures I receive the following UFF errors when attempting to 
minimize the structure.

[20:50:43] UFFTYPER: Unrecognized atom type: He1 (0)
[20:50:43] UFFTYPER: Unrecognized atom type: He (0)
[20:50:45] UFFTYPER: Unrecognized atom type: B_1 (0)
[20:50:45] UFFTYPER: Unrecognized atom type: B_1 (7)
[20:50:45] UFFTYPER: Unrecognized atom type: B_1 (0)
[20:50:45] UFFTYPER: Unrecognized atom type: B_1 (7)

I have searched for these atom labels in my 2D sdf file and they don't exist.  
So I am not sure how the uff code in rdkit is finding these types.
The relevant code is shown below, in case I am doing something incorrectly.
Any thoughts or suggestions to help me find my mistakes?

Brian


for m in ms:
#add hydrogen atoms to the molecule before generating 3D coordinates
  mHs=Chem.AddHs(m)
#start generating 3D coordinates and optimize the conformation
  embedError=AllChem.EmbedMolecule(mHs,useRandomCoords=True)
  if embedError == 0 :
 UffoptError=AllChem.UFFOptimizeMolecule(mHs,3000)
  elif UffoptError != 0 :
   print ("UFF optimization failed, trying MMFF optimization")
   MMFFoptError=AllChem.MMFFOptimizeMolecule(mHs,3000)
  elif MMFFoptError != 0 :
   print ("MMFF optimizaiton has also failed on: ", molName)
   print ("Continuing on to next molecule")
   continue
  else:
 print("Embedding Failed for: ", molName)
 continue


**
DISCLAIMER
This email and any files transmitted with it, including replies and forwarded 
copies (which may contain alterations) subsequently transmitted from Firmenich, 
are confidential and solely for the use of the intended recipient. The contents 
do not represent the opinion of Firmenich except to the extent that it relates 
to their official business.
**
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] UFF atom type errors

2017-10-05 Thread Bennion, Brian
Hello,
As part of my workflow, I am attempting to generate a starting 3D structure 
from a 2D representation loaded from an sdf file.
On certain structures I receive the following UFF errors when attempting to 
minimize the structure.

[20:50:43] UFFTYPER: Unrecognized atom type: He1 (0)
[20:50:43] UFFTYPER: Unrecognized atom type: He (0)
[20:50:45] UFFTYPER: Unrecognized atom type: B_1 (0)
[20:50:45] UFFTYPER: Unrecognized atom type: B_1 (7)
[20:50:45] UFFTYPER: Unrecognized atom type: B_1 (0)
[20:50:45] UFFTYPER: Unrecognized atom type: B_1 (7)

I have searched for these atom labels in my 2D sdf file and they don't exist.  
So I am not sure how the uff code in rdkit is finding these types.
The relevant code is shown below, in case I am doing something incorrectly.
Any thoughts or suggestions to help me find my mistakes?

Brian


for m in ms:
#add hydrogen atoms to the molecule before generating 3D coordinates
  mHs=Chem.AddHs(m)
#start generating 3D coordinates and optimize the conformation
  embedError=AllChem.EmbedMolecule(mHs,useRandomCoords=True)
  if embedError == 0 :
 UffoptError=AllChem.UFFOptimizeMolecule(mHs,3000)
  elif UffoptError != 0 :
   print ("UFF optimization failed, trying MMFF optimization")
   MMFFoptError=AllChem.MMFFOptimizeMolecule(mHs,3000)
  elif MMFFoptError != 0 :
   print ("MMFF optimizaiton has also failed on: ", molName)
   print ("Continuing on to next molecule")
   continue
  else:
 print("Embedding Failed for: ", molName)
 continue

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] nitrogen valence issues

2017-10-04 Thread Bennion, Brian
Hello,
After looking at the email list and seeing that this error has cropped up 
several times for aromatic/aliphatic heterocyclic nitrogens I still haven't 
been able to resolve the valence =4 error for one of the azo groups in a 
molecule that has 7.  The first couple of azo groups seem to be interpreted 
fine.
Am I doing something incorrect here or is the mol file not formatted properly?
Thanks
Brian


[16:50:29] Explicit valence for atom # 85 N, 4, is greater than permitted
[16:50:29] ERROR: Could not sanitize molecule ending on line 206
[16:50:29] ERROR: Explicit valence for atom # 85 N, 4, is greater than permitted

CHEMBL592333
   3D

91 98  0  0  0  0  0  0  0  0999 V2000
8.3826   -4.17890. C   0  0  0  0  0  0  0  0  0  0  0  0
7.6967   -2.89680. O   0  0  0  0  0  0  0  0  0  0  0  0
8.5551   -1.59260. C   0  0  0  0  0  0  0  0  0  0  0  0
9.9817   -1.64490. C   0  0  0  0  0  0  0  0  0  0  0  0
   10.7075   -3.00510. C   0  0  0  0  0  0  0  0  0  0  0  0
9.8956   -4.25770. C   0  0  0  0  0  0  0  0  0  0  0  0
9.51452.28820. C   0  0  0  0  0  0  0  0  0  0  0  0
9.87980.82840. O   0  0  0  0  0  0  0  0  0  0  0  0
   11.31180.39830. C   0  0  0  0  0  0  0  0  0  0  0  0
   12.39051.38200. C   0  0  0  0  0  0  0  0  0  0  0  0
   12.01872.92050. C   0  0  0  0  0  0  0  0  0  0  0  0
   10.62723.32730. C   0  0  0  0  0  0  0  0  0  0  0  0
9.82044.38660. O   0  0  0  0  0  0  0  0  0  0  0  0
7.94905.47790. C   0  0  0  0  0  0  0  0  0  0  0  0
8.51794.08830. C   0  0  0  0  0  0  0  0  0  0  0  0
4.38225.05110. O   0  0  0  0  0  0  0  0  0  0  0  0
7.66732.88200. O   0  0  0  0  0  0  0  0  0  0  0  0
6.41505.65410. C   0  0  0  0  0  0  0  0  0  0  0  0
5.56804.42640. C   0  0  0  0  0  0  0  0  0  0  0  0
6.12623.08490. C   0  0  0  0  0  0  0  0  0  0  0  0
   11.0958   -0.91180. O   0  0  0  0  0  0  0  0  0  0  0  0
4.5054   -4.06120. O   0  0  0  0  0  0  0  0  0  0  0  0
6.0427   -3.93760. C   0  0  0  0  0  0  0  0  0  0  0  0
6.8866   -5.11860. C   0  0  0  0  0  0  0  0  0  0  0  0
6.3028   -6.49020. C   0  0  0  0  0  0  0  0  0  0  0  0
4.7637   -6.64580. C   0  0  0  0  0  0  0  0  0  0  0  0
3.9029   -5.41820. C   0  0  0  0  0  0  0  0  0  0  0  0
2.7159   -5.98310. O   0  0  0  0  0  0  0  0  0  0  0  0
0.6356   -5.60260. C   0  0  0  0  0  0  0  0  0  0  0  0
1.9599   -4.88510. C   0  0  0  0  0  0  0  0  0  0  0  0
3.13313.0. C   0  0  0  0  0  0  0  0  0  0  0  0
3.07344.82220. C   0  0  0  0  0  0  0  0  0  0  0  0
1.82162.53730. C   0  0  0  0  0  0  0  0  0  0  0  0
   -2.1064   -0.88350. C   0  0  0  0  0  0  0  0  0  0  0  0
   -1.65800.62550. C   0  0  0  0  0  0  0  0  0  0  0  0
   -1.8630   -3.03920. O   0  0  0  0  0  0  0  0  0  0  0  0
   -1.1133   -1.95280. C   0  0  0  0  0  0  0  0  0  0  0  0
0.52943.27430. C   0  0  0  0  0  0  0  0  0  0  0  0
1.79765.54000. O   0  0  0  0  0  0  0  0  0  0  0  0
0.47904.74180. C   0  0  0  0  0  0  0  0  0  0  0  0
   -0.19400.91290. C   0  0  0  0  0  0  0  0  0  0  0  0
   -0.28002.22970. O   0  0  0  0  0  0  0  0  0  0  0  0
0.3560   -1.63640. C   0  0  0  0  0  0  0  0  0  0  0  0
0.7840   -0.17610. O   0  0  0  0  0  0  0  0  0  0  0  0
2.0400   -3.43620. C   0  0  0  0  0  0  0  0  0  0  0  0
   -0.6446   -4.77990. C   0  0  0  0  0  0  0  0  0  0  0  0
   -0.5472   -3.28550. C   0  0  0  0  0  0  0  0  0  0  0  0
0.7300   -2.59700. O   0  0  0  0  0  0  0  0  0  0  0  0
8.1699   -5.48910. O   0  0  0  0  0  0  0  0  0  0  0  0
   -0.91966.93140. N   0  0  0  0  0  0  0  0  0  0  0  0
3.3718   -2.74380. C   0  0  0  0  0  0  0  0  0  0  0  0
3.4375   -1.24570. N   0  0  0  0  0  0  0  0  0  0  0  0
   12.2233   -2.99030. O   0  0  0  0  0  0  0  0  0  0  0  0
7.3609   -7.54060. O   0  0  0  0  0  0  0  0  0  0  0  0
   -1.9851   -5.45810. O   0  0  0  0  0  0  0  0  0  0  0  0
4.44202.60270. O   0  0  0  0  0  0  0  0  0  0  0  0
   13.81110.93580. O   0  0  0  0  0  0  0  0  0  0  0  0
   -3.5451   -1.33530. O   0  0  0  0  0  0  0  0  0  0  0  0
0.5839   -7.09980. O   0  0  0  0  0  0  0  0  0  0  0  0
   10.5449   -5.62590. O   0  0  0  0  0  0  0  0  0  0  0  0
4.2197   -8.03950. O   0  0  0  0  0  0  0  0  0  0  0  0
   13.26493.76580. 

[Rdkit-discuss] troubles going from 2D to 3D

2017-08-16 Thread Bennion, Brian
Hello All,

I am parsing a set of 2D sd files in rdkit in order to generate a 3D structure. 
 The code is below and is based on what I could find on the  list for errors in 
generating 3D coordinates.
Temp.mol is the downloaded molfile from chembl for compound CHEMBL500809.  I 
must be doing something incorrectly in the code below as it still throws a -1 
at the embed step.


>>suppl = Chem.SDMolSupplier('temp.mol')
>>ms = [x for x in suppl if x is not None]
>>print ("This is the number of entries read in",len(ms))
1
>>for m in ms:
>>tmp=AllChem.Compute2DCoords(m)
>>m3=Chem.AddHs(m)
>>print (AllChem.EmbedMolecule(m3,useRandomCoords=True))
-1
Just finished embedding Molecule
Traceback (most recent call last):
  File "sdf2D2Canonical3DSDF.py", line 45, in 
AllChem.UFFOptimizeMolecule(m3,4000)
ValueError: Bad Conformer Id


This is the 2D structure from CHEMBL

[cid:image001.png@01D31681.8EFC2D80]
  SciTegic12231509382D CHEMBL500809

68 77  0  0  0  0999 V2000
   -1.78371.43550. C   0  0
   -2.49671.02060. C   0  0  1  0  0  0
   -3.21261.43060. C   0  0  1  0  0  0
   -3.21552.25560. O   0  0
   -3.92561.01560. C   0  0  1  0  0  0
   -4.71390.77210. O   0  0
   -2.93391.08730. O   0  0
   -2.82130.39440. C   0  0
   -3.2069   -0.21940. C   0  0  1  0  0  0
   -3.2040   -1.04440. C   0  0  1  0  0  0
   -3.9170   -1.45940. C   0  0
   -4.6329   -1.04940. C   0  0  1  0  0  0
   -5.3460   -1.46440. C   0  0
   -6.0619   -1.05430. C   0  0
   -6.0647   -0.22930. C   0  0
   -6.78060.18070. O   0  0
   -5.35170.18560. C   0  0  1  0  0  0
   -5.35461.01060. O   0  0
   -4.6358   -0.22440. C   0  0  1  0  0  0
   -4.89350.55940. C   0  0
   -3.92280.19060. C   0  0  1  0  0  0
   -5.3431   -2.28940. C   0  0
   -2.4881   -1.45440. O   0  0
   -1.7751   -1.03940. C   0  0
   -1.0592   -1.44950. O   0  0
   -1.7779   -0.21440. C   0  0  2  0  0  0
   -1.06490.20050. O   0  0
   -0.3490   -0.20950. C   0  0
   -0.3461   -1.03450. O   0  0
0.36400.20550. C   0  0
0.36121.03050. O   0  0
1.0799   -0.20450. O   0  0
1.79300.21050. C   0  0  2  0  0  0
2.2833   -0.13990. C   0  0  1  0  0  0
2.5117   -1.02450. C   0  0  1  0  0  0
1.7987   -1.43950. C   0  0
3.2276   -1.43460. C   0  0  1  0  0  0
3.2305   -2.25950. O   0  0
3.9407   -1.01960. C   0  0  1  0  0  0
4.7289   -0.77610. O   0  0
2.9489   -1.09120. O   0  0
2.6107   -0.33870. C   0  0
3.22190.21540. C   0  0  2  0  0  0
3.21901.04040. C   0  0  1  0  0  0
3.93211.45540. C   0  0
4.64801.04540. C   0  0  1  0  0  0
5.36101.46040. C   0  0
6.07691.05040. C   0  0
6.07980.22540. C   0  0
6.7957   -0.18460. O   0  0
5.3667   -0.18960. C   0  0  1  0  0  0
5.3696   -1.01460. O   0  0
4.65080.22040. C   0  0  1  0  0  0
4.9085   -0.56330. C   0  0
3.9378   -0.19460. C   0  0  1  0  0  0
5.35812.28540. C   0  0
2.50311.45040. O   0  0
1.79011.03550. C   0  0
1.07421.44550. O   0  0
   -2.49380.19560. C   0  0  1  0  0  0
   -1.62950.69860. H   0  0
4.8056   -0.69160. H   0  0
4.64452.04540. H   0  0
3.21552.04040. H   0  0
1.4588   -0.70580. H   0  0
   -4.79050.68760. H   0  0
   -4.6294   -2.04940. H   0  0
   -3.2005   -2.04440. H   0  0
  2  1  1  6
  2  3  1  0
  3  4  1  6
  3  5  1  0
  5  6  1  6
  5  7  1  0
  7  8  1  0
 9  8  1  1
  9 10  1  0
10 11  1  0
12 11  1  0
12 13  1  0
13 14  2  0
14 15  1  0
15 16  2  0
15 17  1  0
17 18  1  1
17 19  1  0
19 12  1  0
19 20  1  1
21 19  1  0
21  5  1  0
21  9  1  0
13 22  1  0
10 23  1  0
23 24  1  0
24 25  2  0
24 26  1  0
26 27  1  1
27 28  1  0
28 29  2  0
28 30  1  0
30 31  2  0
30 32  1  0
33 32  1  1
34 33  1  0
34 35  1  0
35 36  1  6
35 37  1  0
37 38  1  6
37 39  1  0
39 40  1  6
39 41  1  0
41 42  1  0
43 42  1  1
43 34  1  0
43 44  1  0
44 45  1  0
46 45  1  0
46 47  1  0
47 48  2  0
48 49  1  0
49 50  2  0
49 51  1  0
51 52  1  1
51 53  1  0
53 46  1  0
53 54  1  1
55 53  1  0
55 39  1  0
55 43  1  0
47 56  1  0
44 57  1  0
57 58  1  0
58 33  1  0
58 59  2  0
60 26  1  0
60  2  1  0
60  9  1  0
60 61  1  1
55 62  1  6
46 63  1  6
44 64  1  1
34 65  1  1
21 66  1  6
12 67  1  6
10 68  1  1
M  END
--
Check 

Re: [Rdkit-discuss] list of failed chembl ids

2017-08-08 Thread Bennion, Brian
Thank you Andrew for the explanation.  I was just commenting to my summer 
intern that you might weigh in.
Brian

From: Andrew Dalke [mailto:da...@dalkescientific.com]
Sent: Tuesday, August 08, 2017 15:21
To: RDKit Discuss (rdkit-discuss@lists.sourceforge.net) 

Subject: Re: [Rdkit-discuss] list of failed chembl ids

On Aug 8, 2017, at 22:20, Peter S. Shenkin 
> wrote:
> But I would be curious to see the 51 CHEMBL SMILES that RDKit could not parse.

As of ChEMBL 23, the following files are available:
  - the sdf.gz file
  - pre-computed RDKit Morgan fingerprints in fps.gz format
  - the database available as an SQLite file

I downloaded those three files, de-tar-gz'ed the SQLite database, and did the 
following:

 1) get the ids from the .sdf.gz file
 2) get the ids from the .fps.gz file
 3) Find the ids which are only in the .sdf.gz file
 4) For each id, find its canonical SMILES in the SQLite file
 5) Print the list of ids
(I also checked that there were no ids in the FPS file which weren't in the 
SDF.)

Here are the SMILES for the 54 structures that method found (Note: this isn't 
51. I know the SD and FPS files are not guaranteed to be perfectly 
synchronized, so perhaps that's the source of the difference?)

Only in .fps: 0 ids
Only in .sdf: 54 ids
   CHEMBL1198593 
COc1cc(ccc1N2=N(N=C(N2)c3ccc(cc3)[N+](=O)[O-])c4ccc(cc4)[N+](=O)[O-])c5ccc(c(OC)c5)N6=N(NC(=N6)c7ccc(cc7)[N+](=O)[O-])c8ccc(cc8)[N+](=O)[O-]
   CHEMBL1201364 O[C@H]1[C@@H](O)[C@@H](O[C@@H]1COP(=O)(O)O)N2=CNc3c(S)ncnc23
   CHEMBL1684167 [Te](Cl)(Cl)c1c1COC
   CHEMBL1684168 [Te](Cl)(Cl)c1c1[C@H](C)OC
   CHEMBL1684169 [Te](Cl)(Cl)c1c1[C@@H](C)OC
   CHEMBL1684170 [Te](Br)(Br)c1c1COC
   CHEMBL1684171 [Te](Br)(Br)c1c1[C@H](C)OC
   CHEMBL1684172 [Te](Br)(Br)c1c1[C@@H](C)OC
   CHEMBL178180 COc1ccc(cc1)[Te](Cl)(Cl)\C(=C\Cl)\C(C)(C)O
   CHEMBL179159 COc1ccc(cc1)[Te]2(Cl)OC3(CC3)/C/2=C\Cl
   CHEMBL180156 COc1ccc(cc1)[Te](Cl)(Cl)\C=C(/Cl)\c2c2
   CHEMBL180355 COc1c1C(=O)\C=C(\c2c2OC)/[Te](Cl)(Cl)Cl
   CHEMBL180844 COc1ccc(cc1)[Te]2(Cl)OC3(C3)/C/2=C\Cl
   CHEMBL181211 OC(C\C(=C/Cl)\[Te](Cl)(Cl)Cl)c1c1
   CHEMBL181880 F[As-](F)(F)(F)(F)F
   CHEMBL1972162 
CC(C)(C)c1cc2c3c(c1)C(O[Te]3(C)OC2(C(F)(F)F)C(F)(F)F)(C(F)(F)F)C(F)(F)F
   CHEMBL1977677 CC(Br)C(=O)N=N1=C2C(=Nc3c13)c452c45
   CHEMBL1992123 CC1(O)C(C)(O)C2(C)O[Te]3(OC4(C)C(C)(O)C(C)(O)C4(C)O3)OC12C
   CHEMBL1992520 
CCN1\C(=C\C#C\C(=C/c2sc3c3[n+]2CC)\C)\Sc4c14.[F-][PH2+5]([F-])([F-])([F-])([F-])[F-]
   CHEMBL1998318 CC12O[Te]34OC(C)(C1(C)O3)C2(C)O4
   CHEMBL2097021 O[Te](=O)(=O)O
   CHEMBL2146197 [Cl-].CC[N+](CC)(CC)Cc1c1.ClC2=C[Te](Cl)(Cl)OC2
   CHEMBL2146209 [Cl-].Cl[Te]1(Cl)OCCO1
   CHEMBL2146259 N.[Cl-].[Cl-].[Cl-].C1C[O-][Te+4][O-]1
   CHEMBL2146289 N.[Cl-].[Cl-].[Cl-].C1C[O-][Te+4][O-]1
   CHEMBL2146290 N.[Cl-].[Cl-].[Cl-].CCC1C[O-][Te+4][O-]1
   CHEMBL2299271 CN1C=NNC1(=S)c2sc3nnc(c4c4)c(c5c5)c3c2O
   CHEMBL3182693 [NH4+].[NH4+].F[Si-2](F)(F)(F)(F)F
   CHEMBL3184182 [Na+].[Na+].F[Si-2](F)(F)(F)(F)F
   CHEMBL3187332 
CC(=O)OCC(NC(=O)C(CC1=C2=CC=CC=C2N=C1)NC(=O)OC(C)(C)C)C3OC(C(OC(=O)C)C3OC(=O)C)N4C=C(C)C(=O)NC4=O
   CHEMBL3187972 CNc1ccc(cc1)C(=O)Oc2cc(ON=[N](O)N(C)C)c(cc2C#N)[N+](=O)[O-]
   CHEMBL3188868 
CN(C)[N](=NOc1cc(ON=[N+]([O-])N2CCN(CC2)C(=O)c3cc(CC4=NNC(=O)c5c45)ccc3F)c(cc1[N+](=O)[O-])[N+](=O)[O-])O
   CHEMBL3211150 CCC1N1C(=O)N2=NC(=CN2)C(O)(c3c3)c4c4
   CHEMBL3348969 
CSCC[C@H](NC(=O)[C@H](CC1=CN=C2=CC=CC=C12)NC(=O)CCNC(=O)OC(C)(C)C)C(=O)N[C@@H](CC(=O)O)C(=O)N[C@@H](Cc3c3)C(=O)N
   CHEMBL3349005 
C[C@@H](O)[C@@H](CO)NC(=O)[C@@H]1CSSC[C@H](NC(=O)[C@H](N)Cc2c2)C(=O)N[C@@H](Cc3c3)C(=O)N[C@H](CC4=CN=C5=CC=CC=C45)C(=O)N[C@@H](N)C(=O)N[C@@H](C(C)O)C(=O)N1
   CHEMBL3392104 [NH4+].[Cl-].Cl[Te]1(Cl)OCCO1
   CHEMBL3397072 
FC1=Fc2c(C=C1)[nH]cc2C3CCN(N4C(=O)N5C=CC=CC5=C(C4=O)c6ccc(F)cc6)CC3
   CHEMBL3544677 
CN(Cc1cnc2nc(N)nc(N)c2n1)c3ccc(cc3)C(=O)N[C@@H](CCC(=O)N[C@@H](CCC(=O)O)C(=O)O)C(=O)O
   CHEMBL3546168 Cl[Te]1(Cl)OCCO1
   CHEMBL3558859 C1C[O-][Te+4][O-]1
   CHEMBL3558860 C1C[O-][Te+4][O-]1
   CHEMBL3558861 CCC1C[O-][Te+4][O-]1
   CHEMBL3559384 CC[N+](CC)(CC)Cc1c1.ClC2=C[Te](Cl)(Cl)OC2
   CHEMBL3561635 O.O.O.O.O=C1O[Mg]2(OC(=O)c3c3O2)Oc4c14
   CHEMBL3580437 O=C1O[Mg]2(OC(=O)c3c3O2)Oc4c14
   CHEMBL3593577 
CN1C(=O)NC2=CN3(=C4NC=CC4=C12)C(C3)N5C(=O)Nc6cnc7[nH]ccc7c56
   CHEMBL3594279 
C[C@H]1O[C@H](C[C@H](O)[C@@H]1O)O[C@H]2[C@@H](O)C[C@H](O[C@H]3[C@@H](O)C[C@H](O[C@@H](C)C[C@H]4CC[C@@H]5[C@H](C[C@@H](O)[C@]6(C)[C@H](CC[C@]56O)C7=CC(=O)O/C/7=C\c8ccc(cc8)N(C)C)[C@@H]4C)O[C@@H]3C)O[C@@H]2C
   CHEMBL361437 COc1ccc(cc1)[Te]2(Cl)OC3(C3)/C/2=C\Br
   CHEMBL3832892 

[Rdkit-discuss] list of failed chembl ids

2017-08-08 Thread Bennion, Brian
Hello,

If anyone is interested, the list of chembl ids for compounds that had such 
crazy 2D sd files are listed below. Several are just different formulations of 
the same parent compound.

181880
450200
1198593
1201364
1977677
1992520
2146259
2146289
2146290
2299271
3182693
3184182
3187332
3188868
3187972
3211150
3349005
3348969
3833021
3397072
3544677
3561635
3593577
3594279
3580437
3558859
3558860
3558861
3832893
3832892
3832897
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] . Re: using rdkit to read in chembl23 1.7 million compounds

2017-08-07 Thread Bennion, Brian
Hello Peter,
Great, that just made me realize that I was not using my most recent conda 
environment version of RDkit.
I reread the 2D sdf file with the latest rdkit version and now only 31 
molecules are tossed out by the SDMolsupplier in RDKit.  51 compounds had 
errors when reading in the smiles strings.
Brian


From: Peter S. Shenkin [mailto:shen...@gmail.com]
Sent: Monday, August 07, 2017 14:26
To: Bennion, Brian <benni...@llnl.gov>
Cc: Chris Swain <sw...@mac.com>; rdkit-discuss@lists.sourceforge.net
Subject: Re: [Rdkit-discuss] . Re: using rdkit to read in chembl23 1.7 million 
compounds

That molecule's SMILES is correctly rendered by RDKit, or at least by the 
version of RDKit behind Slack:

[Inline image 1]


-P.

On Mon, Aug 7, 2017 at 3:54 PM, Bennion, Brian 
<benni...@llnl.gov<mailto:benni...@llnl.gov>> wrote:

The carbocations are in small heterocyclic molecules. see CHEMBL3815233

Brian




From: Chris Swain <sw...@mac.com<mailto:sw...@mac.com>>
Sent: Monday, August 7, 2017 11:46:30 AM
To: 
rdkit-discuss@lists.sourceforge.net<mailto:rdkit-discuss@lists.sourceforge.net>
Subject: [Rdkit-discuss] . Re: using rdkit to read in chembl23 1.7 million 
compounds

I've not tried to read in ChEMBL but I have tried to process other large 
datasets e.g. ZINC. My impression was that problems arose with small 
heterocyclic systems, particularly if fused or containing multiple different 
heteroatoms. I did wonder if the different aromaticity models might be the 
issue.

Chris
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net<mailto:Rdkit-discuss@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net<mailto:Rdkit-discuss@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] mass replacement of External R-groups with many substituents

2017-03-15 Thread Bennion, Brian
Hello All,

I have looked around the email list and studied the daylight guide as well as 
the opensmiles website in hopes of solving my problem.  External r-groups is a 
proposed extension but that is all I could find.

It is possible that I have made it too complicated though.

In discussions with my synthetic chemist we came up with 27 substituents two 
place around our molecule scaffold.
I labeled the scaffolds with R1 in the position that the substitution will 
occur and attached a dummy label to the substituents.  I thought that I could 
do a simple replacement rxn.  However, I have not been successful.

The smarts string so far is listed below.

AllChem.ReactionFromSmarts('[c;R:1]-[c;R:2]-[c;R:3]-[c;R:4]-[c;R:5]([R1;!R:7])-[c;R:6].[R:8]-[*:9]
 >> [c;R:1]-[c;R:2]-[c;R:3]-[c;R:4]-[c;R:5]([*:9])-[c;R:6]')

Basically just adding a group to positions along a benzene ring

Thanks
Brian Bennion



--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] [SOLVED?]RE: protonating proper tertiary amines

2016-09-01 Thread Bennion, Brian
Hello Greg,
I continually placed your test molecule throughout my workflow and narrowed it 
down to two lines.  The reaction product return value is a string that I then 
create a molecule from using the MolFromSmiles function.
I then AddHs to this molecule and an exact copy of it (for use in other 
functions such as finding tertNitrogen atoms.
It would appear that adding hydrogens explicitly and then passing this molecule 
on to other functions causes problems with future manipulations, like adding 
additional protons.

I am still curious about the weird smiles strings, but for now I don't AddH's 
to the copy of the molecule i send out for other functions.  This seems to 
solve the issue.

Thanks for the help.

brian


From: Greg Landrum [greg.land...@gmail.com]
Sent: Wednesday, August 31, 2016 7:49 PM
To: Bennion, Brian
Cc: rdkit-discuss@lists.sourceforge.net
Subject: Re: [Rdkit-discuss] protonating proper tertiary amines



On Thu, Sep 1, 2016 at 1:01 AM, Bennion, Brian 
<benni...@llnl.gov<redir.aspx?REF=2wkB4X-7JHuZ6o8-llu8NKocXG__iA8ff4hmjA2gdlH8mr91k9LTCAFtYWlsdG86YmVubmlvbjFAbGxubC5nb3Y.>>
 wrote:

On one compute node with one thread, two reagents are combined in a synthesis 
function and then the tertNitrogenProt function is called and the current 
molecule is passed through to be searched for the tertiary nitrogen and 
protonated if found.

What I am not clear on, is whether the properties are passed properly from the 
synthesis function to the tertNitrogenProt function.

Yeah, given that the function works correctly when it's called on its own, I 
think the problem is likely to be in the way it's being called.

Question, what are the outputs of UpdatePropertyCache()?  When I test for 
output, only the word _none_ is printed.  I don't know if that means there were 
no properties present, or they did not need to be updated.

UpdatePropertyCache() doesn't return a value, that's why you're seeing None. 
The method just causes the implicit properties on each of the molecule's atoms 
to be re-computed, but it doesn't return anything.

-greg

--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] protonating proper tertiary amines

2016-08-30 Thread Bennion, Brian
Hello Greg,

The source that I am use is shown below.  Also, I need to clarify that all this 
code is wrapped around the ParallelPython job control code.  It allows me to 
send each reaction to a separate cpu on my large clusters.

I have been able to use your steps in your email to check my rdkit install from 
the python interpreter.
Next I manually input my compound as a smiles string and performed your set of 
commands and things work as expected.
However, when wrapped within the PP code the updatepropertycache has no effect. 
 My only thought is that I have not properly passed the molecule between python 
modules (not sure if that makes any sense).

This is the log output for one cycle of the code.  The smiles string has been 
clipped to not reveal proprietary data.  The important thing here is that the 
formal charge is correctly assigned but that the implicit hyrdogen atoms are 
not updated.

LOG
Tertiary nitrogen found in oxime:  ((5, 6, 7, 8),)
This is the symbol and charge for the tertiary nitrogen before:  N 0 
C(=O)N([H])C([H])([H])C([H])([H])C1(C([H])([H])N(C([H])([H])[H])C([H])([H])
This is the symbol and charge for the tertiary nitrogen after:  N 1
test3-10:  SANITIZE_NONE C14H27N3O3 C14H27N3O3+ C14H27N3O3+ 3


 def tertNitrogenProt(molecule,molName1,w_sdf,w_smi):
  patt=rdkit.Chem.MolFromSmarts('[#6]-[#7]([#6])-[#6]')
  matches=molecule.GetSubstructMatches(patt)
  tertNHnum=0
  if matches:
print "Tertiary nitrogen found in: ", matches
for i in matches:
 moleculeStrings=rdkit.Chem.MolToSmiles(molecule,isomericSmiles=True)
 atomSymbol9=molecule.GetAtomWithIdx(i[1]).GetSymbol()
 formalCharge9=molecule.GetAtomWithIdx(i[1]).GetFormalCharge()
 print "This is the symbol and charge for the tertiary nitrogen before: 
",atomSymbol9,formalCharge9,moleculeStrings
#set the formal charge on the protonated tertiary nitrogen to zero
 test7=rdkit.Chem.AllChem.CalcMolFormula(molecule)
 molecule.GetAtomWithIdx(i[1]).SetFormalCharge(1)
 atomSymbol9=molecule.GetAtomWithIdx(i[1]).GetSymbol()
 formalCharge9=molecule.GetAtomWithIdx(i[1]).GetFormalCharge()
 test8=rdkit.Chem.AllChem.CalcMolFormula(molecule)
 print "This is the symbol and charge for the tertiary nitrogen after: 
",atomSymbol9,formalCharge9
#update property cache and check for nonsense
 molecule.UpdatePropertyCache()
 moleculeH=rdkit.Chem.AddHs(molecule)
 test3=rdkit.Chem.SanitizeMol(moleculeH)
 test9=rdkit.Chem.AllChem.CalcMolFormula(moleculeH)
 test10=moleculeH.GetAtomWithIdx(i[1]).GetDegree()
 print "test3-10: ",test3,test7,test8,test9,test10
#start generating 3 coordinates and optimize the conformation
 rdkit.Chem.AllChem.EmbedMolecule(moleculeH)
 rdkit.Chem.AllChem.UFFOptimizeMolecule(moleculeH,1500)
 molName6=molName1+'NH+_'+str(tertNHnum)+'_XOH'
#find molecular formal charge
 moleculeCharge=rdkit.Chem.GetFormalCharge(moleculeH)
 moleculeH.SetProp('i_user_TOTAL_CHARGE',repr(moleculeCharge))
 moleculeH.SetProp('_Name',molName6)
 w_sdf.write(moleculeH)
 w_smi.write(moleculeH)
 molName3=molName1+'NH+_'+str(tertNHnum)+'_XO'
 totalMolecules=oximeSubStructSearch(moleculeH,molName3,w_sdf,w_smi)
 tertNHnum += 1
  else:
print "No tertiary nitrogen matches"
return(molecule,tertNHnum)
  return (moleculeH,tertNHnum)
##



From: Greg Landrum [greg.land...@gmail.com]
Sent: Monday, August 29, 2016 10:41 PM
To: Bennion, Brian
Cc: rdkit-discuss@lists.sourceforge.net
Subject: Re: [Rdkit-discuss] protonating proper tertiary amines

Hi Brian,

On Tue, Aug 30, 2016 at 6:41 AM, Bennion, Brian 
<benni...@llnl.gov> wrote:

I have seemed to hit a wall with what seems like a simple task.

First, I have ~9800 compounds that have a primary amine for a reaction that I 
am completing in rdkit.
About 250 of those compounds have a tertiary alkylamine that is most likely 
protonated at pH 7.4.

The dataset is a set of smiles strings for which the tertiary amine is not 
protonated.   I thought this would be easy enough to fix, just use a smarts 
substructure search, set the formal charge on any hits to one and then AddHs, 
sanitize, embed, and then minimize.

Well, what I get is [N+] with all the other carbons with explicit atoms in the 
resulting smiles files, and if output to sdf I get a positively charged  
diradical positioned at the tertiary nitrogen.

Yes, what's happening here is that AddHs() is using the implicit valence on the 
N atoms to determine how many Hs to add. Since the implicit valence is not 
recomputed when you set the formal charge, you end up with the wrong number of 
Hs attached to the N. A

[Rdkit-discuss] protonating proper tertiary amines

2016-08-29 Thread Bennion, Brian
Hello,

I have seemed to hit a wall with what seems like a simple task.

First, I have ~9800 compounds that have a primary amine for a reaction that I 
am completing in rdkit.
About 250 of those compounds have a tertiary alkylamine that is most likely 
protonated at pH 7.4.

The dataset is a set of smiles strings for which the tertiary amine is not 
protonated.   I thought this would be easy enough to fix, just use a smarts 
substructure search, set the formal charge on any hits to one and then AddHs, 
sanitize, embed, and then minimize.

Well, what I get is [N+] with all the other carbons with explicit atoms in the 
resulting smiles files, and if output to sdf I get a positively charged  
diradical positioned at the tertiary nitrogen.

Reading through the cookbook and this mailing list gave me no other preferred 
methods to protonate nitrogen.  There were some deprotonation examples.
Do I have to add the atom and create the bond manually?

If I have missed something then please point me to a link that I overlooked.

Thank you for such a great tool

Brian Bennion

--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Molecular properties + pickling

2016-03-19 Thread Bennion, Brian
This is not a bug nor a feature per se.
You need to set the compounds as PropteryMOl
Ie

Ketone = [x for x in supplAlkylKetones if x is not None]
  for i in range(len(Ketone)):
 Ketone[i] = PropertyMol(Ketone[i])

From: Maciek Wójcikowski [mailto:mac...@wojcikowski.pl]
Sent: Friday, March 18, 2016 9:35 AM
To: RDKit Discuss
Subject: [Rdkit-discuss] Molecular properties + pickling

Hi all,

Is it a bug or am I doing something wrong - the properties are not passed 
during pickling in python. Here comes the example:

from rdkit import Chem
import cPickle as pickle

mol = Chem.MolFromSmiles('c1c1')
mol.SetProp('aaa', '123')
print list(mol.GetPropNames()) # ['aaa']
mol2 = pickle.loads(pickle.dumps(mol))
print list(mol2.GetPropNames()) # ['']


In [19]: rdkit.__version__
Out[19]: '2015.09.2'


Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl
--
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785231=/4140___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] compatibiltiy between parallel python and rdkit

2015-03-02 Thread Bennion, Brian
Hello,

I am relatively new to rdkit but have found it to be very useful in creating 
compounds with the reaction functions.

Currently my project involves taking a handful of scaffolds and creating 
shrubbery for these scaffolds based on available precursor compounds.  I found 
that creating more than a few hundred compounds takes much to long (i am not 
very patient).  It would be best if I could create 100K compounds an a few 
hours.  Again, I have a perfectly working serial script that does want I need.

That is when I found parallel python and spent a few days learning how to get 
the two projects to play with each other.

I am stuck at a point where I make a command:
molename=m.GetOpt(_Name)
and this stops the program with:

An error has occured during the function execution
Traceback (most recent call last):
  File /usr/lib/python2.7/site-packages/ppworker.py, line 90, in run
__result = __f(*__args)
  File string, line 31, in synthesize
KeyError: '_Name'

It seems that the parallel process does not know what this command is supposed 
to do.
The parallel process is started with this code.
  for index in xrange(parts):
  starti = start+index*step
  endi = min(start+(index+1)*step, end)
# Submit a job which will synthesize a number of compounds
# synthesize - the function
# give starting info to synthesize (hash, starti, endi) - tuple with 
arguments for synthesize
# () - tuple with functions on which function synthesize depends
# (Allchem,) - tuple with module names which must be imported before 
synthesize execution
  jobs.append(job_server.submit(synthesize, (ms1, ms2,rxnR1, rxnR23, oxime, 
starti, endi), (), 
(rdkit,rdkit.Chem,rdkit.Chem.AllChem,rdkit.Chem.Draw,)))


I am naive to think that these two python modules can not play together?
Or can I fix a mistake I have made somewhere


Brian


--
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss