Re: [Rdkit-discuss] Stereochemistry - Differences between RDKit Indigo
Hehe, that is why I keep my computers always really cold when I run RDKit ... - | Markus Sitzmann | markus.sitzm...@gmail.com On 20.08.2015, at 04:33, Peter Shenkin shen...@gmail.com wrote: Maybe when you have a toolkit as blazingly fast as RDKit it captures the chirality of N center before it has time to interconvert -P. On Wed, Aug 19, 2015 at 10:17 PM, John M john.wilkinson...@gmail.com wrote: More odd is the carbon stereocentre with two methyls... Generally trivalent nitrogens are not considered chiral due to inversion of the lone-pair. The two usual exceptions are when they are a bridgehead or in a tight ring (cyclopropane). This is the same in most toolkits, the InChI technical documentation provides useful examples. InChI actually only sees one stereo centre since it strips the proton off: InChI=1S/C13H26N2/c1-4-14-8-5-12(6-9-14)15-10-7-13(15)11(2)3/h11-13H,4-10H2,1-3H3/p+1/t13-/m1/s1 It may well be chiral in this case but since it's not you should also strictly remove the other stereocentre in the para position to the nitrogen For the record just tested and ChemAxon/CDK/OpenBabel do the same. John Regards, John W May john.wilkinson...@gmail.com On 19 August 2015 at 09:00, Rob Smith robtsm...@gmail.com wrote: Dear RDKit community, I'm trying to use RDKit to read in Corina generated stereoisomers (from a Mol file), assign chiral tags and stereochemistry to the structure and output the canonical smiles string for each isomer of a given molecule (in Python), when I do this, half the canonical smiles strings are not unique. When I read in the output from Corina into an Indigo instance, then use the canonical smiles from Indigo to create an RDKit molecule, canonical smiles strings generated from the molecule objects are all unique. I may be missing an option to enable RDKit to 'visualise' the chiral centre adjacent to the protonated nitrogen, so if someone can spot where I've made a mistake, I'd really appreciate it. I've included the output and Python script below. If you require any further information, please let me know. Many thanks, Rob Output: RDKit Read in of Molecule RDKit Output - CCN1CC[C@@H]([N@@H+]2CC[C@@H]2[C@H](C)C)CC1 RDKit Output - CCN1CC[C@@H]([N@@H+]2CC[C@@H]2[C@H](C)C)CC1 RDKit Output - CCN1CC[C@@H]([N@H+]2CC[C@@H]2[C@H](C)C)CC1 RDKit Output - CCN1CC[C@@H]([N@H+]2CC[C@@H]2[C@H](C)C)CC1 RDKit Output - CCN1CC[C@@H]([N@@H+]2CC[C@H]2[C@H](C)C)CC1 RDKit Output - CCN1CC[C@@H]([N@@H+]2CC[C@H]2[C@H](C)C)CC1 RDKit Output - CCN1CC[C@@H]([N@H+]2CC[C@H]2[C@H](C)C)CC1 RDKit Output - CCN1CC[C@@H]([N@H+]2CC[C@H]2[C@H](C)C)CC1 INDIGO Read in of Molecule RDKit Output - CC[N@]1CC[C@@H]([N@@H+]2CC[C@@H]2C(C)C)CC1 RDKit Output - CC[N@]1CC[C@H]([N@@H+]2CC[C@@H]2C(C)C)CC1 RDKit Output - CC[N@]1CC[C@@H]([N@H+]2CC[C@@H]2C(C)C)CC1 RDKit Output - CC[N@]1CC[C@H]([N@H+]2CC[C@@H]2C(C)C)CC1 RDKit Output - CC[N@]1CC[C@@H]([N@@H+]2CC[C@H]2C(C)C)CC1 RDKit Output - CC[N@]1CC[C@H]([N@@H+]2CC[C@H]2C(C)C)CC1 RDKit Output - CC[N@]1CC[C@@H]([N@H+]2CC[C@H]2C(C)C)CC1 RDKit Output - CC[N@]1CC[C@H]([N@H+]2CC[C@H]2C(C)C)CC1 Python script : from rdkit import Chem import subprocess # Used to run Corina from indigo import * def runCorinaTest(inputMol): indigo = Indigo() molFile = Chem.MolToMolBlock(inputMol) corinaCommand = echo \' + molFile + \' | # Then Corina - generate stereoisomers... corinaCommand = corinaCommand + /apps/corina/corina -t n -d canon,stergen,preserve,names,wh,flapn,msc=7,msi=128 -i t=sdf corinaResult = subprocess.check_output([corinaCommand], shell=True) # Gives the stereoisomer species as an SDF string allMoleculeObjects = [] allMolecules = corinaResult.split(\n) # Separate Corina output into individual molecules allMolecules = allMolecules[0:len(allMolecules)-1] print(RDKit Read in of Molecule) for eachMolecule in allMolecules: eachMolecule = eachMolecule + \n mol = Chem.MolFromMolBlock(eachMolecule, sanitize=True, removeHs=True, strictParsing=False) Chem.rdmolops.AssignAtomChiralTagsFromStructure(mol, replaceExistingTags=True) Chem.rdmolops.AssignStereochemistry(mol) print(RDKit Output - + Chem.MolToSmiles(mol, isomericSmiles=True)) print(INDIGO Read in of Molecule) for eachMolecule in allMolecules: eachMolecule = eachMolecule + \n mol = indigo.loadMolecule(eachMolecule) # print(Indigo Output - + mol.canonicalSmiles()) # Use Indigo Canonical Smiles to create RDKit molecule mol = Chem.MolFromSmiles(mol.canonicalSmiles()) if mol is not None: print(RDKit Output - + Chem.MolToSmiles(mol, isomericSmiles=True)) return 0 mol = Chem.MolFromSmiles(CC(C)C1[NH+](C2CCN(CC)CC2)CC1) z = runCorinaTest(mol)
Re: [Rdkit-discuss] Stereochemistry - Differences between RDKit Indigo
This isn't a simple one, so it may take a bit to get to an answer that's comprehensible. There are two things going on here in the RDKit: 1) Ring stereochemistry 2) stereochemistry about nitrogen centers Let's start with the second, because it's easier: RDKit does not generally believe in stereochemistry around three coordinate nitrogens. Here's a very simple example: In [45]: m3 = Chem.MolFromSmiles('Br[N@](F)Cl') In [46]: Chem.MolToSmiles(m3,isomericSmiles=True) Out[46]: 'FN(Cl)Br' The 3D equivalent of that: In [41]: m = Chem.MolFromSmiles('BrN(F)Cl') In [42]: AllChem.EmbedMolecule(m) Out[42]: 0 In [43]: Chem.AssignAtomChiralTagsFromStructure(m) In [44]: Chem.MolToSmiles(m,isomericSmiles=True) Out[44]: 'FN(Cl)Br' Contrast this with what you get for a carbon: In [34]: m2 = Chem.MolFromSmiles('FC(Br)(Cl)I') In [35]: AllChem.EmbedMolecule(m2) Out[35]: 0 In [36]: Chem.AssignAtomChiralTagsFromStructure(m2) In [37]: Chem.MolToSmiles(m2,isomericSmiles=True) Out[37]: 'F[C@](Cl)(Br)I' Back to the first: ring stereochemistry. By this I mean things like C[C@H ]1CC[C@@H](C)CC1 - molecules where the stereochemistry information is really about whether the substituents of the ring are cis or trans relative to the ring plane. The way the RDKit handles this is something of a hack: it doesn't identify those atoms as chiral centers, but it does preserve the chiral tags when generating a canonical SMILES: In [47]: m = Chem.MolFromSmiles('C[C@H]1CC[C@@H](C)CC1') In [48]: Chem.FindMolChiralCenters(m) Out[48]: [] In [49]: Chem.MolToSmiles(m,isomericSmiles=True) Out[49]: 'C[C@H]1CC[C@@H](C)CC1' Curiously, to me at least, it does the same thing with nitrogens; In [52]: m2 = Chem.MolFromSmiles('C[N@@]1CC[C@@H](C)CC1') In [53]: Chem.MolToSmiles(m2,isomericSmiles=True) Out[53]: 'C[C@H]1CC[N@](C)CC1' Lest anyone think that this might make sense because being a ring makes inversion more difficult, that's not what is going on here. If I make the ring truly chiral, then the stereochemistry of the N is removed: In [54]: m3 = Chem.MolFromSmiles('C[N@@]1CO[C@@H](C)CC1') In [55]: Chem.MolToSmiles(m3,isomericSmiles=True) Out[55]: 'C[C@H]1CCN(C)CO1' I believe that this inconsistent behavior is a bug: either N should always have the input stereochemistry preserved (and that should be perceived from the 3D coordinates) or it should never have the input stereochemistry preserved. My initial answer, and I would love input on this, is that three-coordinate N should always have stereochemistry removed. -greg On Thu, Aug 20, 2015 at 2:22 PM, Rob Smith robtsm...@gmail.com wrote: Hi Greg, I've attached the SDF that Corina generates. I'm not convinced it is a problem, more an observation that I'm trying to understand. Looking at the results again today - it seems that from the Corina output Indigo is interpreting the conformer (including whether the ethyl substituent on the piperidine nitrogen is equatorial or axial) - and outputting a canonical smiles string that has the conformer encoded in it (using the chiral flags). Whereas RDKit is reading in the Corina output, discounting whether the nitrogen is axial or equatorial (which due to inversion I can understand) and interpreting it as having only two chiral centers (which is correct). What is confusing me, is that when I supply RDKit with the canonical smiles string from Indigo (which has the conformer encoded in it), and then ask for the isomeric canonical smiles, it supplies the canonical smiles with the conformer still encoded within it. For example, I read in the following canonical smiles string into RDKit: CCN1CC[C@@H]([N@@H+]2CC[C@@H]2[C@H](C)C)CC1 (which was generated by reading in one of the mols in the SD File into RDKit and output the isomeric canonical smiles), running the FindMolChiralCenters on this molecule, correctly reports the number of chiral centres to be 2 (6S, 9R), and then asking it to output the canonical smiles string (with isomericSmiles=True) gives CCN1CCC([N@@H+]2CC[C@@H]2C(C)C)CC1 (1). If I take the same mol file, read it into Indigo, and ask it to output the canonical smiles string, I get: CC(C)[C@H]1CC[N@H+]1[C@@H]1CC[N@@](CC1)CC, if I read this smiles string into RDKit and run FindMolCenters on it, I get (3R, 6S) - which is fine, if I then out the canonical smiles (again with isomericSmiles=True) I get CC[N@]1CC[C@@H]([N@@H+]2CC[C@@H]2C(C)C)CC1. I expected this isomeric canonical smiles to be the same as (1), however RDKit appears to conserve the conformer representation given to it from an isomeric smiles string, but when reading a Mol file doesn't keep all conformer information (axial or equatorial substituents on a nitrogen). Thanks to all for your quick (and quick witted) responses Rob On Thu, Aug 20, 2015 at 3:46 AM, Greg Landrum greg.land...@gmail.com wrote: Hi Rob, The results below are quite strange. As John has already pointed out: there really shouldn't be chirality present on
Re: [Rdkit-discuss] Stereochemistry - Differences between RDKit Indigo
I agree with remove - the chance that you destroy actual information by this is low - or in other words, the chance that steroinformation on three-coordinate N is spurious I would expect as high. Markus On Thu, Aug 20, 2015 at 4:30 PM, Greg Landrum greg.land...@gmail.com wrote: This isn't a simple one, so it may take a bit to get to an answer that's comprehensible. There are two things going on here in the RDKit: 1) Ring stereochemistry 2) stereochemistry about nitrogen centers Let's start with the second, because it's easier: RDKit does not generally believe in stereochemistry around three coordinate nitrogens. Here's a very simple example: In [45]: m3 = Chem.MolFromSmiles('Br[N@](F)Cl') In [46]: Chem.MolToSmiles(m3,isomericSmiles=True) Out[46]: 'FN(Cl)Br' The 3D equivalent of that: In [41]: m = Chem.MolFromSmiles('BrN(F)Cl') In [42]: AllChem.EmbedMolecule(m) Out[42]: 0 In [43]: Chem.AssignAtomChiralTagsFromStructure(m) In [44]: Chem.MolToSmiles(m,isomericSmiles=True) Out[44]: 'FN(Cl)Br' Contrast this with what you get for a carbon: In [34]: m2 = Chem.MolFromSmiles('FC(Br)(Cl)I') In [35]: AllChem.EmbedMolecule(m2) Out[35]: 0 In [36]: Chem.AssignAtomChiralTagsFromStructure(m2) In [37]: Chem.MolToSmiles(m2,isomericSmiles=True) Out[37]: 'F[C@](Cl)(Br)I' Back to the first: ring stereochemistry. By this I mean things like C[C@H]1CC[C@@H](C)CC1 - molecules where the stereochemistry information is really about whether the substituents of the ring are cis or trans relative to the ring plane. The way the RDKit handles this is something of a hack: it doesn't identify those atoms as chiral centers, but it does preserve the chiral tags when generating a canonical SMILES: In [47]: m = Chem.MolFromSmiles('C[C@H]1CC[C@@H](C)CC1') In [48]: Chem.FindMolChiralCenters(m) Out[48]: [] In [49]: Chem.MolToSmiles(m,isomericSmiles=True) Out[49]: 'C[C@H]1CC[C@@H](C)CC1' Curiously, to me at least, it does the same thing with nitrogens; In [52]: m2 = Chem.MolFromSmiles('C[N@@]1CC[C@@H](C)CC1') In [53]: Chem.MolToSmiles(m2,isomericSmiles=True) Out[53]: 'C[C@H]1CC[N@](C)CC1' Lest anyone think that this might make sense because being a ring makes inversion more difficult, that's not what is going on here. If I make the ring truly chiral, then the stereochemistry of the N is removed: In [54]: m3 = Chem.MolFromSmiles('C[N@@]1CO[C@@H](C)CC1') In [55]: Chem.MolToSmiles(m3,isomericSmiles=True) Out[55]: 'C[C@H]1CCN(C)CO1' I believe that this inconsistent behavior is a bug: either N should always have the input stereochemistry preserved (and that should be perceived from the 3D coordinates) or it should never have the input stereochemistry preserved. My initial answer, and I would love input on this, is that three-coordinate N should always have stereochemistry removed. -greg On Thu, Aug 20, 2015 at 2:22 PM, Rob Smith robtsm...@gmail.com wrote: Hi Greg, I've attached the SDF that Corina generates. I'm not convinced it is a problem, more an observation that I'm trying to understand. Looking at the results again today - it seems that from the Corina output Indigo is interpreting the conformer (including whether the ethyl substituent on the piperidine nitrogen is equatorial or axial) - and outputting a canonical smiles string that has the conformer encoded in it (using the chiral flags). Whereas RDKit is reading in the Corina output, discounting whether the nitrogen is axial or equatorial (which due to inversion I can understand) and interpreting it as having only two chiral centers (which is correct). What is confusing me, is that when I supply RDKit with the canonical smiles string from Indigo (which has the conformer encoded in it), and then ask for the isomeric canonical smiles, it supplies the canonical smiles with the conformer still encoded within it. For example, I read in the following canonical smiles string into RDKit: CCN1CC[C@@H]([N@@H+]2CC[C@@H]2[C@H](C)C)CC1 (which was generated by reading in one of the mols in the SD File into RDKit and output the isomeric canonical smiles), running the FindMolChiralCenters on this molecule, correctly reports the number of chiral centres to be 2 (6S, 9R), and then asking it to output the canonical smiles string (with isomericSmiles=True) gives CCN1CCC([N@@H+]2CC[C@@H]2C(C)C)CC1 (1). If I take the same mol file, read it into Indigo, and ask it to output the canonical smiles string, I get: CC(C)[C@H]1CC[N@H+]1[C@@H]1CC[N@@](CC1)CC, if I read this smiles string into RDKit and run FindMolCenters on it, I get (3R, 6S) - which is fine, if I then out the canonical smiles (again with isomericSmiles=True) I get CC[N@]1CC[C@@H]([N@@H+]2CC[C@@H]2C(C)C)CC1. I expected this isomeric canonical smiles to be the same as (1), however RDKit appears to conserve the conformer representation given to it from an isomeric smiles string, but when reading a Mol file doesn't keep all conformer