Dear Noel,

Thanks for the repost; this helps.

My 2 cents are below.

On Tue, Mar 2, 2010 at 11:34 AM, Noel O'Boyle <baoille...@gmail.com> wrote:
> On 2 March 2010 09:40, Peter Murray-Rust <pm...@cam.ac.uk> wrote:
>> Thanks,
>> This is a useful initiative
>>
>> On Tue, Mar 2, 2010 at 9:14 AM, Noel O'Boyle <baoille...@gmail.com> wrote:
>>>
>>> (Reposted from my blog following Greg's suggestion )
>>>
>>> Hello all,
>>>
>>> Right now, I'm adding stereo (i.e. double bond stereochemistry, and
>>> chirality) to the MDL Mol format in OpenBabel. There are three places
>>> where stereochemical information can be stored in these files: the
>>> coordinates, the atom parity (in the atom block), the bond stereo (in
>>> the bond block).
>>>
>>> My current understanding is that where 3D coordinates are present,
>>> there's no need to store stereochemical information in either the atom
>>> parity or the bond block. I think I'll probably set the atom parity
>>> anyway (since I've already written the code, and it helps when you
>>> look at the file to be able to easily identify the chiral centers).

Agreed that setting parity is a useful service to human readers but,
as is already mentioned below, the spec is quite clear that these
flags should be ignored on read.

>>>
>>
>> The main problem is lack of information as to whether the geometry (2D or
>> 3D) is definitive or arbitrary. It is impossible to construct a 3D model of
>> (say) alanine without a perceived stereochemistry at the Carbon. Similarly
>> most modern 2D graphic programs will draw a double bond as cis or trans (not
>> normally linear although this was common in typesetting). If the (arbitrary)
>> geometry is then transmitted without details of authoring, then the reader
>> may assume a definitive stereochemistry. Put another way, there is no way of
>> indicating by coordinates alone that stereochemistry is unknown. I thinks
>> it's very important not to use the geometry as definitive unless it is clear
>> that the author specified it (which normally only comes from crystal
>> structures or computational chemistry).
>
> Sure, but I think this is outside the scope here.

I'm not sure I agree. I think this is one of the critical points when
doing CTABs: when writing 3D or 2D coordinates how do you indicate
what you *don't* know as well as indicating what you *do* know.

In2D (and 3D) the problem is stereochemistry around double bonds: the
coordinates provided in the output determine the stereochemistry.
Luckily here the CTAB spec provides a way to indicate what isn't
known: you use the 4th field in the bond line to indicate that the
bond is an "either" bond (value 3). Technically this is what should be
done by any toolkit that builds a molecule from the SMILES CC=CC.

With atomic stereochemistry in 3D structures, the coordinats again
determine the stereochemistry. As far as I know, the CTAB spec doesn't
provide specific guidance about what to do when you have a
stereocenter that's undetermined in your molecule. One possibility is
to make sure that the bonds from that atom have 0 in field 4. Maybe
it's "polite" to assign an either bond here as well (value 4 in this
case) to make explicit to the viewer that the stereochemistry isn't
known. But either of these raise the question of what to do if you
*do* know the stereochemistry. My opinion here, and I'm aware it's one
that many people do not share, is that it's best to treat the 3D case
the same as the 2D one and use a wedged bond to mark atoms where the
stereochemistry is known. It's somewhat ugly, but it has the advantage
of being consistent (yes, yes, I know, when foolish it's the hobgoblin
of little minds... but I don't think it's foolish here).

>> P.
>>
>>>
>>> For 2D coordinates, there's no need to store the bond stereochemistry
>>> (as this can be worked out from the coordinates), but chirality needs
>>> to be stored explicitly. The normal way to store this is not using
>>> atom parity (but I'll set this anyway for the same reasons as above),
>>> but by setting one of the bonds on the tetrahedral center to up or
>>> down.
>>>
>>> For 0D coordinates, there are no guidelines. I propose to store
>>> cis/trans stereo using the bond stereo (you know, UP [or DOWN] at both
>>> ends of a double bond means cis), and chirality using the atom parity.
>>> The MDL spec states that atom parity should be ignored when read,
>>
>> I know this is the spec and I don't want to get into more arguments about
>> whether it should be changed. At this stage I think it is useful if programs
>> have the capability to read and interpret this field.
>
> I think that I may move this to an option. So, if you don't explicitly
> ask for it, you will just get what the spec says - i.e. no
> stereochemistry will be stored if there are no coordinates.

This is what I would suggest. Anything else involves introducing
conventions that will work with OB, but that may or may not work with
other toolkits. Since there's no clear answer, or anything that even
really makes much sense, it's probably best to not include stereo info
in 0D CTABs (except for atomic parity).

>
>>>
>>> but
>>> the alternative is to just forget the stereochemistry, or else to
>>> store both cis/trans stereo *and* chirality in the bond block, which
>>> may just about be possible but is likely to be a real mess.
>>>
>> Is it ambiguous or merely complicated? If the latter then we should use it
>> to remove ambiguity.
>
> As it is (for 2D), it's already ambiguous. The interpretation of a
> hash or wedge bond between two stereocentres is ambiguous (as in one
> toolkit may interpret as describing the stereo only at the start,
> while another might interpret it as describing the stereo at the
> beginning and end).

As you say, the spec here is ambiguous. I believe that the convention
"wedged bonds only affect the begin atom" is fairly broadly used
though, so that one should be safe. Note: I just tested this in marvin
sketch and chemdraw and chemdraw actually complains about having a
wedged bond connect two stereocenters, marvin assigns stereo only to
the start atom.

> In the case of 0D, if you cram all of the
> stereochemical information into the bond block it will only get worse;
> you will have situations like a stereochemical center attached to a
> double bond. Can the same single bond be used to indicate both
> cis/trans across the double bond, and the chirality of the center? All
> of these problems can be avoided using conventions, but the spec
> doesn't go that far.

nasty stuff... better to avoid stereochem in 0D files.

-greg

------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Blueobelisk-discuss mailing list
Blueobelisk-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss

Reply via email to