Re: [Rdkit-discuss] changes in chirality in rdkit?
Ok Greg. thank you for the sneak preview. brian From: Greg Landrum Sent: Tuesday, July 14, 2020 11:36 PM To: Bennion, Brian Cc: Rafal Roszak ; RDKit Discuss Subject: Re: [Rdkit-discuss] changes in chirality in rdkit? Hi Brian, I think you're misinterpreting the drawings. Those two images look like they correspond to the same molecule. The easiest way to check things like this without having to interpret drawings is to use Chem.FindMolChiralCenters, which will show you absolute stereo labels for all stereoatoms: In [2]: m1 = Chem.MolFromSmiles('OC[C@@H]1O[C@H](Cn2c(N33)nnc2[C@@H]2OCCC2)CC1') In [3]: m2 = Chem.MolFromSmiles('OC[C@H]1CC[C@@H](Cn2c([C@H]3CCCO3)nnc2N22)O1') In [4]: Chem.FindMolChiralCenters(m1) Out[4]: [(2, 'R'), (4, 'S'), (16, 'R')] In [5]: Chem.FindMolChiralCenters(m2) Out[5]: [(2, 'R'), (5, 'S'), (9, 'R')] Here's the substructure mapping between those molecules: In [16]: match = m1.GetSubstructMatch(m2) In [17]: match[2] Out[17]: 2 In [18]: match[5] Out[18]: 4 In [19]: match[9] Out[19]: 16 The "R" and "S" labels that function produces are not necessarily correct according to IUPAC rules (though in this case they are), but they are consistently calculated. As a preview: the 2020.09 RDKit release will include a new CIP calculator using the algorithm described in this paper https://pubs.acs.org/doi/10.1021/acs.jcim.8b00324. The new code, which I think will be quite helpful for people who need CIP labels, was implemented by Ricardo Rodriguez Schmidt at Schrodinger and derived from John Mayfield's java implementation (https://github.com/SiMolecule/centres). Here's what it says about your molecules: In [6]: from rdkit.Chem import rdCIPLabeler In [7]: rdCIPLabeler.AssignCIPLabels(m1) In [8]: rdCIPLabeler.AssignCIPLabels(m2) In [9]: [(i,x.GetProp("_CIPCode")) for i,x in enumerate(m1.GetAtoms()) if x.HasProp('_CIPCode')] Out[9]: [(2, 'R'), (4, 'S'), (16, 'R')] In [10]: [(i,x.GetProp("_CIPCode")) for i,x in enumerate(m2.GetAtoms()) if x.HasProp('_CIPCode')] Out[10]: [(2, 'R'), (5, 'S'), (9, 'R')] Best, -greg On Tue, Jul 14, 2020 at 6:36 PM Bennion, Brian via Rdkit-discuss mailto:rdkit-discuss@lists.sourceforge.net>> wrote: Hello Rafal, Nice to see you on this forum. I completely expect reordering of smiles strings from program to program. This is why I like to convert them to images The tetrahydrofuran with the methoxy group has inverted stereochemistry of its substituents. The original string is shown in the first image. [cid:173510855394cff311] The second string after RDKit processing is shown in the second image here. [cid:173510855395b16b22] -Original Message- From: Rafal Roszak mailto:rmrmg.c...@gmail.com>> Sent: Tuesday, July 14, 2020 1:25 AM To: Bennion, Brian mailto:benni...@llnl.gov>> Cc: Bennion, Brian via Rdkit-discuss mailto:rdkit-discuss@lists.sourceforge.net>> Subject: Re: [Rdkit-discuss] changes in chirality in rdkit? Hello Brain, > The original smiles string > "OC[C@@H]1O[C@H](Cn2c(N33)nnc2[C@@H]2OCCC2)CC1" > > after conversion with rdkit > "OC[C@H]1CC[C@@H](Cn2c([C@H]3CCCO3)nnc2N22)O1" After visualisation it seems to me that both smiles represent the same structure (stereochemistry is the same, just molecule orientation is diffrent). Canonical smiles from rdkit not allways is the same like canonical smiles from other programs. If you want to prevent atom order you can try use option canonical=False. See example below: >>> mol1=Chem.MolFromSmiles('OC[C@@H]1O[C@H](Cn2c(N33)nnc2[C@@H]2OCC >>> C2)CC1') >>> mol2=Chem.MolFromSmiles('OC[C@H]1CC[C@@H](Cn2c([C@H]3CCCO3)nnc2N2CCC >>> C2)O1') >>> Chem.MolToSmiles(mol1) 'OC[C@H]1CC[C@@H](Cn2c([C@H]3CCCO3)nnc2N22)O1' >>> Chem.MolToSmiles(mol2) 'OC[C@H]1CC[C@@H](Cn2c([C@H]3CCCO3)nnc2N22)O1' #canonical smiles for both smiles are the same (above) but without canonicalisation you will get diffrent smiles: >>> Chem.MolToSmiles(mol1, canonical=False) 'OC[C@@H]1O[C@H](Cn2c(N33)nnc2[C@@H]2OCCC2)CC1' >>> Chem.MolToSmiles(mol2, canonical=False) 'OC[C@H]1CC[C@@H](Cn2c([C@H]3CCCO3)nnc2N22)O1' Best, Rafal ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net<mailto:Rdkit-discuss@lists.sourceforge.net> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] changes in chirality in rdkit?
Hi Brian, I think you're misinterpreting the drawings. Those two images look like they correspond to the same molecule. The easiest way to check things like this without having to interpret drawings is to use Chem.FindMolChiralCenters, which will show you absolute stereo labels for all stereoatoms: In [2]: m1 = Chem.MolFromSmiles('OC[C@@H]1O[C@H](Cn2c(N33)nnc2[C@ @H]2OCCC2)CC1') In [3]: m2 = Chem.MolFromSmiles('OC[C@H]1CC[C@@H](Cn2c([C@H ]3CCCO3)nnc2N22)O1') In [4]: Chem.FindMolChiralCenters(m1) Out[4]: [(2, 'R'), (4, 'S'), (16, 'R')] In [5]: Chem.FindMolChiralCenters(m2) Out[5]: [(2, 'R'), (5, 'S'), (9, 'R')] Here's the substructure mapping between those molecules: In [16]: match = m1.GetSubstructMatch(m2) In [17]: match[2] Out[17]: 2 In [18]: match[5] Out[18]: 4 In [19]: match[9] Out[19]: 16 The "R" and "S" labels that function produces are not necessarily correct according to IUPAC rules (though in this case they are), but they are consistently calculated. As a preview: the 2020.09 RDKit release will include a new CIP calculator using the algorithm described in this paper https://pubs.acs.org/doi/10.1021/acs.jcim.8b00324. The new code, which I think will be quite helpful for people who need CIP labels, was implemented by Ricardo Rodriguez Schmidt at Schrodinger and derived from John Mayfield's java implementation (https://github.com/SiMolecule/centres). Here's what it says about your molecules: In [6]: from rdkit.Chem import rdCIPLabeler In [7]: rdCIPLabeler.AssignCIPLabels(m1) In [8]: rdCIPLabeler.AssignCIPLabels(m2) In [9]: [(i,x.GetProp("_CIPCode")) for i,x in enumerate(m1.GetAtoms()) if x.HasProp('_CIPCode')] Out[9]: [(2, 'R'), (4, 'S'), (16, 'R')] In [10]: [(i,x.GetProp("_CIPCode")) for i,x in enumerate(m2.GetAtoms()) if x.HasProp('_CIPCode')] Out[10]: [(2, 'R'), (5, 'S'), (9, 'R')] Best, -greg On Tue, Jul 14, 2020 at 6:36 PM Bennion, Brian via Rdkit-discuss < rdkit-discuss@lists.sourceforge.net> wrote: > Hello Rafal, > > Nice to see you on this forum. I completely expect reordering of smiles > strings from program to program. This is why I like to convert them to > images > > The tetrahydrofuran with the methoxy group has inverted stereochemistry of > its substituents. > > The original string is shown in the first image. > > > > The second string after RDKit processing is shown in the second image here. > > -Original Message- > From: Rafal Roszak > Sent: Tuesday, July 14, 2020 1:25 AM > To: Bennion, Brian > Cc: Bennion, Brian via Rdkit-discuss > Subject: Re: [Rdkit-discuss] changes in chirality in rdkit? > > > > Hello Brain, > > > > > > > The original smiles string > > > "OC[C@@H]1O[C@H](Cn2c(N33)nnc2[C@@H]2OCCC2)CC1" > > > > > > after conversion with rdkit > > > "OC[C@H]1CC[C@@H](Cn2c([C@H]3CCCO3)nnc2N22)O1" > > > > After visualisation it seems to me that both smiles represent the same > structure (stereochemistry is the same, just molecule orientation is > diffrent). Canonical smiles from rdkit not allways is the same like > canonical smiles from other programs. If you want to prevent atom order you > can try use option canonical=False. See example below: > > > > >>> mol1=Chem.MolFromSmiles('OC[C@@H]1O[C@H](Cn2c(N33)nnc2[C@@H]2OCC > > >>> C2)CC1') > > >>> mol2=Chem.MolFromSmiles('OC[C@H]1CC[C@@H](Cn2c([C@H]3CCCO3)nnc2N2CCC > > >>> C2)O1') > > >>> Chem.MolToSmiles(mol1) > > 'OC[C@H]1CC[C@@H](Cn2c([C@H]3CCCO3)nnc2N22)O1' > > >>> Chem.MolToSmiles(mol2) > > 'OC[C@H]1CC[C@@H](Cn2c([C@H]3CCCO3)nnc2N22)O1' > > #canonical smiles for both smiles are the same (above) but without > canonicalisation you will get diffrent smiles: > > >>> Chem.MolToSmiles(mol1, canonical=False) > > 'OC[C@@H]1O[C@H](Cn2c(N33)nnc2[C@@H]2OCCC2)CC1' > > >>> Chem.MolToSmiles(mol2, canonical=False) > > 'OC[C@H]1CC[C@@H](Cn2c([C@H]3CCCO3)nnc2N22)O1' > > > > > > Best, > > > > Rafal > > > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] changes in chirality in rdkit?
Hello Rafal, Nice to see you on this forum. I completely expect reordering of smiles strings from program to program. This is why I like to convert them to images The tetrahydrofuran with the methoxy group has inverted stereochemistry of its substituents. The original string is shown in the first image. [cid:image001.png@01D659C1.FD411770] The second string after RDKit processing is shown in the second image here. [cid:image002.png@01D659C1.FD411770] -Original Message- From: Rafal Roszak Sent: Tuesday, July 14, 2020 1:25 AM To: Bennion, Brian Cc: Bennion, Brian via Rdkit-discuss Subject: Re: [Rdkit-discuss] changes in chirality in rdkit? Hello Brain, > The original smiles string > "OC[C@@H]1O[C@H](Cn2c(N33)nnc2[C@@H]2OCCC2)CC1" > > after conversion with rdkit > "OC[C@H]1CC[C@@H](Cn2c([C@H]3CCCO3)nnc2N22)O1" After visualisation it seems to me that both smiles represent the same structure (stereochemistry is the same, just molecule orientation is diffrent). Canonical smiles from rdkit not allways is the same like canonical smiles from other programs. If you want to prevent atom order you can try use option canonical=False. See example below: >>> mol1=Chem.MolFromSmiles('OC[C@@H]1O[C@H](Cn2c(N33)nnc2[C@@H]2OCC >>> C2)CC1') >>> mol2=Chem.MolFromSmiles('OC[C@H]1CC[C@@H](Cn2c([C@H]3CCCO3)nnc2N2CCC >>> C2)O1') >>> Chem.MolToSmiles(mol1) 'OC[C@H]1CC[C@@H](Cn2c([C@H]3CCCO3)nnc2N22)O1' >>> Chem.MolToSmiles(mol2) 'OC[C@H]1CC[C@@H](Cn2c([C@H]3CCCO3)nnc2N22)O1' #canonical smiles for both smiles are the same (above) but without canonicalisation you will get diffrent smiles: >>> Chem.MolToSmiles(mol1, canonical=False) 'OC[C@@H]1O[C@H](Cn2c(N33)nnc2[C@@H]2OCCC2)CC1' >>> Chem.MolToSmiles(mol2, canonical=False) 'OC[C@H]1CC[C@@H](Cn2c([C@H]3CCCO3)nnc2N22)O1' Best, Rafal ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] changes in chirality in rdkit?
Hello Brain, > The original smiles string > "OC[C@@H]1O[C@H](Cn2c(N33)nnc2[C@@H]2OCCC2)CC1" > > after conversion with rdkit > "OC[C@H]1CC[C@@H](Cn2c([C@H]3CCCO3)nnc2N22)O1" After visualisation it seems to me that both smiles represent the same structure (stereochemistry is the same, just molecule orientation is diffrent). Canonical smiles from rdkit not allways is the same like canonical smiles from other programs. If you want to prevent atom order you can try use option canonical=False. See example below: >>> mol1=Chem.MolFromSmiles('OC[C@@H]1O[C@H](Cn2c(N33)nnc2[C@@H]2OCCC2)CC1') >>> mol2=Chem.MolFromSmiles('OC[C@H]1CC[C@@H](Cn2c([C@H]3CCCO3)nnc2N22)O1') >>> Chem.MolToSmiles(mol1) 'OC[C@H]1CC[C@@H](Cn2c([C@H]3CCCO3)nnc2N22)O1' >>> Chem.MolToSmiles(mol2) 'OC[C@H]1CC[C@@H](Cn2c([C@H]3CCCO3)nnc2N22)O1' #canonical smiles for both smiles are the same (above) but without canonicalisation you will get diffrent smiles: >>> Chem.MolToSmiles(mol1, canonical=False) 'OC[C@@H]1O[C@H](Cn2c(N33)nnc2[C@@H]2OCCC2)CC1' >>> Chem.MolToSmiles(mol2, canonical=False) 'OC[C@H]1CC[C@@H](Cn2c([C@H]3CCCO3)nnc2N22)O1' Best, Rafal ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] changes in chirality in rdkit?
hello, I am "translating" smiles strings output in a csv file from another program into RDKit canonical strings with this code. If there is something that I am doing incorrectly I would appreciate the input. thanks brian bennion The original smiles string "OC[C@@H]1O[C@H](Cn2c(N33)nnc2[C@@H]2OCCC2)CC1" after conversion with rdkit "OC[C@H]1CC[C@@H](Cn2c([C@H]3CCCO3)nnc2N22)O1" my code is below. protn_pat = re.compile(r'\[([IBnN])\+(@*)(H[1234]*)*\]') line = inFile.readline() while len(line) != 0: fields = line.replace('","',' ').split() mol_name = fields[2] molMOE = fields[3].replace('"','') mol1check = protn_pat.search(molMOE) if mol1check is not None: print("Found crazy MOE string",mol1check,molMOE) mol1 = protn_pat.sub(r'[\1\3\2+]',molMOE) else: mol1 = molMOE try: mol = Chem.MolFromSmiles(mol1) except: mol = None if mol is None: print('mol failed:'+molMOE+' '+mol1+' '+str(count)+'\n') else: rdkitsmichiout.write('\"'+Chem.MolToSmiles(mol, isomericSmiles=True)+'\",') rdkitsmichiout.write('\"'+Chem.inchi.MolToInchi(mol,options='/FixedH')+'\",') rdkitsmichiout.write('\"'+(Chem.inchi.InchiToInchiKey(Chem.inchi.MolToInchi(mol,options='/FixedH')))+'\"\n') ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss