On Feb 7, 2017, at 01:17, Curt Fischer <curt.r.fisc...@gmail.com> wrote:
> I am confused by this behavior:
> >>> labeled_etoh = Chem.MolFromSmiles('C[13C]O')
> >>> print(Chem.MolToSmiles(labeled_etoh))
> C[C]O
> >>> print(Chem.MolToSmiles(labeled_etoh, isomericSmiles=True))
> C[13C]O
> 1. Why are there any brackets at all in the first output?  Why not just 'CCO'?

The middle atom in "CCO" has two hydrogens. The middle atom in "C[C]O" has no 

> 2. Is there any documentation anywhere that the "isomericSmiles" argument is 
> also an "isotopicSmiles" argument?

I don't believe so. A search via DuckDuckGo of rdkit.org finds only two 
irrelevant matches.

> I am also confused about when Chem.MolToSmiles() puts in H atoms in the 
> output.

SMILES has a short-hand notation to represent hydrogens. "[CH4]" and "C" are 
both methane.

When atom is described using brackets then the number of hydrogens must be 
specified with the H<n> notation.

When an atom is described without brackets then the number of hydrogens is 
based on the permitted valence values. C has a valence of 4, -C- has two single 
bonds, so the middle carbon of CCO has two hydrogen bonds to complete the 

The output mechanism prefers to use the short-hand notation if possible. That 
isn't possible if the sum of hydrogens and bond types is different than one of 
the valence levels, or if there is an isotope, charge, chiral, etc., which 
requires the use of []s.

> >>> three_hb1 = Chem.MolFromSmiles('C[13CH](O)C[13C](=O)O')
> >>> three_hb2 = Chem.MolFromSmiles('C[13C](O)C[13C](=O)O')
> >>> print(Chem.MolToSmiles(three_hb1, isomericSmiles=True))
> C[13CH](O)C[13C](=O)O
> >>> print(Chem.MolToSmiles(three_hb2, isomericSmiles=True))
> C[13C](O)C[13C](=O)O
> >>> print(Chem.MolToSmiles(three_hb1, isomericSmiles=False))
> CC(O)CC(=O)O
> >>> print(Chem.MolToSmiles(three_hb2, isomericSmiles=False))
> C[C](O)CC(=O)O
> 3. Why are there no brackets for three_hb1 output, but there are for 
> three_hb2?

I think you mean "for the isomericSmiles=False" output? The first three_hb1 
output has brackets.

The isotope notation requires []s, so the option of using the short-hand 
notation doesn't exist. In that case the number of hydrogens must be specified 
as otherwise it means the atom has no hydrogens.

> 4. As far as I can tell, the two three_hb molecules are identical.   Why 
> aren't all Hs removed during canonicalization?

The second atom in three_hb1 has 1 hydrogen and three single bonds.

The second atom in three_hb2 has 0 hydrogens and three single bonds.

They are different structures so have different SMILES.



Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
Rdkit-discuss mailing list

Reply via email to