On Feb 7, 2017, at 01:17, Curt Fischer <curt.r.fisc...@gmail.com> wrote: > I am confused by this behavior: > > >>> labeled_etoh = Chem.MolFromSmiles('C[13C]O') > >>> print(Chem.MolToSmiles(labeled_etoh)) > > C[C]O > > >>> print(Chem.MolToSmiles(labeled_etoh, isomericSmiles=True)) > > C[13C]O > > 1. Why are there any brackets at all in the first output? Why not just 'CCO'?
The middle atom in "CCO" has two hydrogens. The middle atom in "C[C]O" has no hydrogens. > 2. Is there any documentation anywhere that the "isomericSmiles" argument is > also an "isotopicSmiles" argument? I don't believe so. A search via DuckDuckGo of rdkit.org finds only two irrelevant matches. > I am also confused about when Chem.MolToSmiles() puts in H atoms in the > output. SMILES has a short-hand notation to represent hydrogens. "[CH4]" and "C" are both methane. When atom is described using brackets then the number of hydrogens must be specified with the H<n> notation. When an atom is described without brackets then the number of hydrogens is based on the permitted valence values. C has a valence of 4, -C- has two single bonds, so the middle carbon of CCO has two hydrogen bonds to complete the valence. The output mechanism prefers to use the short-hand notation if possible. That isn't possible if the sum of hydrogens and bond types is different than one of the valence levels, or if there is an isotope, charge, chiral, etc., which requires the use of []s. > > >>> three_hb1 = Chem.MolFromSmiles('C[13CH](O)C[13C](=O)O') > >>> three_hb2 = Chem.MolFromSmiles('C[13C](O)C[13C](=O)O') > >>> print(Chem.MolToSmiles(three_hb1, isomericSmiles=True)) > > C[13CH](O)C[13C](=O)O > > >>> print(Chem.MolToSmiles(three_hb2, isomericSmiles=True)) > > C[13C](O)C[13C](=O)O > > >>> print(Chem.MolToSmiles(three_hb1, isomericSmiles=False)) > > CC(O)CC(=O)O > > >>> print(Chem.MolToSmiles(three_hb2, isomericSmiles=False)) > > C[C](O)CC(=O)O > > 3. Why are there no brackets for three_hb1 output, but there are for > three_hb2? I think you mean "for the isomericSmiles=False" output? The first three_hb1 output has brackets. The isotope notation requires []s, so the option of using the short-hand notation doesn't exist. In that case the number of hydrogens must be specified as otherwise it means the atom has no hydrogens. > 4. As far as I can tell, the two three_hb molecules are identical. Why > aren't all Hs removed during canonicalization? The second atom in three_hb1 has 1 hydrogen and three single bonds. The second atom in three_hb2 has 0 hydrogens and three single bonds. They are different structures so have different SMILES. Cheers, Andrew da...@dalkescientific.com ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot _______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss