On 2 March 2010 11:23, Greg Landrum <greg.land...@gmail.com> wrote: > Dear Noel, > > Thanks for the repost; this helps. > > My 2 cents are below. > > On Tue, Mar 2, 2010 at 11:34 AM, Noel O'Boyle <baoille...@gmail.com> wrote: >> On 2 March 2010 09:40, Peter Murray-Rust <pm...@cam.ac.uk> wrote: >>> Thanks, >>> This is a useful initiative >>> >>> On Tue, Mar 2, 2010 at 9:14 AM, Noel O'Boyle <baoille...@gmail.com> wrote: >>>> >>>> (Reposted from my blog following Greg's suggestion ) >>>> >>>> Hello all, >>>> >>>> Right now, I'm adding stereo (i.e. double bond stereochemistry, and >>>> chirality) to the MDL Mol format in OpenBabel. There are three places >>>> where stereochemical information can be stored in these files: the >>>> coordinates, the atom parity (in the atom block), the bond stereo (in >>>> the bond block). >>>> >>>> My current understanding is that where 3D coordinates are present, >>>> there's no need to store stereochemical information in either the atom >>>> parity or the bond block. I think I'll probably set the atom parity >>>> anyway (since I've already written the code, and it helps when you >>>> look at the file to be able to easily identify the chiral centers). > > Agreed that setting parity is a useful service to human readers but, > as is already mentioned below, the spec is quite clear that these > flags should be ignored on read. > >>>> >>> >>> The main problem is lack of information as to whether the geometry (2D or >>> 3D) is definitive or arbitrary. It is impossible to construct a 3D model of >>> (say) alanine without a perceived stereochemistry at the Carbon. Similarly >>> most modern 2D graphic programs will draw a double bond as cis or trans (not >>> normally linear although this was common in typesetting). If the (arbitrary) >>> geometry is then transmitted without details of authoring, then the reader >>> may assume a definitive stereochemistry. Put another way, there is no way of >>> indicating by coordinates alone that stereochemistry is unknown. I thinks >>> it's very important not to use the geometry as definitive unless it is clear >>> that the author specified it (which normally only comes from crystal >>> structures or computational chemistry). >> >> Sure, but I think this is outside the scope here. > > I'm not sure I agree. I think this is one of the critical points when > doing CTABs: when writing 3D or 2D coordinates how do you indicate > what you *don't* know as well as indicating what you *do* know. > > In2D (and 3D) the problem is stereochemistry around double bonds: the > coordinates provided in the output determine the stereochemistry. > Luckily here the CTAB spec provides a way to indicate what isn't > known: you use the 4th field in the bond line to indicate that the > bond is an "either" bond (value 3). Technically this is what should be > done by any toolkit that builds a molecule from the SMILES CC=CC. > > With atomic stereochemistry in 3D structures, the coordinats again > determine the stereochemistry. As far as I know, the CTAB spec doesn't > provide specific guidance about what to do when you have a > stereocenter that's undetermined in your molecule. One possibility is > to make sure that the bonds from that atom have 0 in field 4. Maybe > it's "polite" to assign an either bond here as well (value 4 in this > case) to make explicit to the viewer that the stereochemistry isn't > known. But either of these raise the question of what to do if you > *do* know the stereochemistry. My opinion here, and I'm aware it's one > that many people do not share, is that it's best to treat the 3D case > the same as the 2D one and use a wedged bond to mark atoms where the > stereochemistry is known. It's somewhat ugly, but it has the advantage > of being consistent (yes, yes, I know, when foolish it's the hobgoblin > of little minds... but I don't think it's foolish here). > >>> P. >>> >>>> >>>> For 2D coordinates, there's no need to store the bond stereochemistry >>>> (as this can be worked out from the coordinates), but chirality needs >>>> to be stored explicitly. The normal way to store this is not using >>>> atom parity (but I'll set this anyway for the same reasons as above), >>>> but by setting one of the bonds on the tetrahedral center to up or >>>> down. >>>> >>>> For 0D coordinates, there are no guidelines. I propose to store >>>> cis/trans stereo using the bond stereo (you know, UP [or DOWN] at both >>>> ends of a double bond means cis), and chirality using the atom parity. >>>> The MDL spec states that atom parity should be ignored when read, >>> >>> I know this is the spec and I don't want to get into more arguments about >>> whether it should be changed. At this stage I think it is useful if programs >>> have the capability to read and interpret this field. >> >> I think that I may move this to an option. So, if you don't explicitly >> ask for it, you will just get what the spec says - i.e. no >> stereochemistry will be stored if there are no coordinates. > > This is what I would suggest. Anything else involves introducing > conventions that will work with OB, but that may or may not work with > other toolkits. Since there's no clear answer, or anything that even > really makes much sense, it's probably best to not include stereo info > in 0D CTABs (except for atomic parity). > >> >>>> >>>> but >>>> the alternative is to just forget the stereochemistry, or else to >>>> store both cis/trans stereo *and* chirality in the bond block, which >>>> may just about be possible but is likely to be a real mess. >>>> >>> Is it ambiguous or merely complicated? If the latter then we should use it >>> to remove ambiguity. >> >> As it is (for 2D), it's already ambiguous. The interpretation of a >> hash or wedge bond between two stereocentres is ambiguous (as in one >> toolkit may interpret as describing the stereo only at the start, >> while another might interpret it as describing the stereo at the >> beginning and end). > > As you say, the spec here is ambiguous. I believe that the convention > "wedged bonds only affect the begin atom" is fairly broadly used > though, so that one should be safe. Note: I just tested this in marvin > sketch and chemdraw and chemdraw actually complains about having a > wedged bond connect two stereocenters, marvin assigns stereo only to > the start atom. > >> In the case of 0D, if you cram all of the >> stereochemical information into the bond block it will only get worse; >> you will have situations like a stereochemical center attached to a >> double bond. Can the same single bond be used to indicate both >> cis/trans across the double bond, and the chirality of the center? All >> of these problems can be avoided using conventions, but the spec >> doesn't go that far. > > nasty stuff... better to avoid stereochem in 0D files. > > -greg
Are some of the wedge/hash bonds in typical MOL files unrelated to stereochemistry? That is, are some purely for depiction? If I knew this for sure, I would not retain the wedge/hash bond designations in the input but just work them out from the perceived stereo. - Noel ------------------------------------------------------------------------------ Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev _______________________________________________ Blueobelisk-discuss mailing list Blueobelisk-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss