Greg can correct me if I'm wrong(1), but in RDKit there's actually three "levels" of hydrogens:
* "Physical" hydrogens, which are represented as actual, independent atoms in the atom graph. ("Physical hydrogens" is what I'm calling them - I don't know if RDKit has an official term for them.) * "Explicit" hydrogens, which are represented as a numeric annotation on their attached heavy atom. (And *not* as a separate atom object.) * "Implicit" hydrogens, which aren't actually represented anywhere, but are calculated from the standard valence of the heavy atom, and how many are occupied by actual atoms and explicit hydrogens. Generally, except for some coordinate calculations, RDKit seems to be built around working with molecules with explicit or implicit hydrogens. This is why when you read in a molecule, RDKit normally removes any physical hydrogens. (Note that for most file reading code there's a removeHs parameter you can set to False to change this behavior, and read explicitly listed hydrogens as physical hydrogens.) By default "removing hydrogens" means turning them into implicit hydrogens(2), but the RemoveHs() function has an "updateExplicitCount" parameter which will cause the removed hydrogens to be turned into explicit hydrogens instead. The standard MOL file loading code doesn't use this option, though, so the hydrogens in the molecule are usually converted into implicit when you read things in. AddHs(), of course, turns explicit and implicit hydrogens into physical hydrogens. (Though the "explicitOnly" parameter can be used to control this.) It does annotate whether these physical hydrogens came from either the implicit or explicit pool, so you can round trip things through AddHs() and RemoveHs() appropriately. (There's also a "implicitOnly" parameter on RemoveHs() which will only remove those hydrogens.) Regards, -Rocco (1) I don't think the RDKit hydrogen model has ever been formalized in one place for user-facing documentation, so this is the understanding I've gotten from banging my head against various hydrogen-related issues. (2) There's special complications here that there are certain structures, such as imidazole, which needs physical or explicit hydrogens on one of the nitrogens in order to Kekulize properly. If you're implicit only, the RDKit sanitizer will choke. Thus, there's special casing in various Add/RemoveHs function to avoid implicit-izing these critical hydrogens. On Thu, Sep 8, 2016 at 1:46 PM, Dimitri Maziuk <dmaz...@bmrb.wisc.edu> wrote: > On 09/08/2016 10:25 AM, Greg Landrum wrote: > ... > > Why do you want 2D drawings that include H atoms? > > On the subject of H atoms: when I read in the MOL file that has them, I > need to explicitly call AddHs() in order to have them drawn. > > Question: do they actually get stripped off by the reader and re-added > by AddHs()? Or are they there "hidden" somehow and AddHs() just > "unhides" them? > > TIA > -- > Dimitri Maziuk > Programmer/sysadmin > BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu > > > ------------------------------------------------------------ > ------------------ > > _______________________________________________ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > >
------------------------------------------------------------------------------
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss