Re: [BlueObelisk-discuss] Stereochemistry in MDL files

2010-03-03 Thread Noel O'Boyle
On 2 March 2010 11:23, Greg Landrum greg.land...@gmail.com wrote:
 Dear Noel,

 Thanks for the repost; this helps.

 My 2 cents are below.

 On Tue, Mar 2, 2010 at 11:34 AM, Noel O'Boyle baoille...@gmail.com wrote:
 On 2 March 2010 09:40, Peter Murray-Rust pm...@cam.ac.uk wrote:
 Thanks,
 This is a useful initiative

 On Tue, Mar 2, 2010 at 9:14 AM, Noel O'Boyle baoille...@gmail.com wrote:

 (Reposted from my blog following Greg's suggestion )

 Hello all,

 Right now, I'm adding stereo (i.e. double bond stereochemistry, and
 chirality) to the MDL Mol format in OpenBabel. There are three places
 where stereochemical information can be stored in these files: the
 coordinates, the atom parity (in the atom block), the bond stereo (in
 the bond block).

 My current understanding is that where 3D coordinates are present,
 there's no need to store stereochemical information in either the atom
 parity or the bond block. I think I'll probably set the atom parity
 anyway (since I've already written the code, and it helps when you
 look at the file to be able to easily identify the chiral centers).

 Agreed that setting parity is a useful service to human readers but,
 as is already mentioned below, the spec is quite clear that these
 flags should be ignored on read.



 The main problem is lack of information as to whether the geometry (2D or
 3D) is definitive or arbitrary. It is impossible to construct a 3D model of
 (say) alanine without a perceived stereochemistry at the Carbon. Similarly
 most modern 2D graphic programs will draw a double bond as cis or trans (not
 normally linear although this was common in typesetting). If the (arbitrary)
 geometry is then transmitted without details of authoring, then the reader
 may assume a definitive stereochemistry. Put another way, there is no way of
 indicating by coordinates alone that stereochemistry is unknown. I thinks
 it's very important not to use the geometry as definitive unless it is clear
 that the author specified it (which normally only comes from crystal
 structures or computational chemistry).

 Sure, but I think this is outside the scope here.

 I'm not sure I agree. I think this is one of the critical points when
 doing CTABs: when writing 3D or 2D coordinates how do you indicate
 what you *don't* know as well as indicating what you *do* know.

 In2D (and 3D) the problem is stereochemistry around double bonds: the
 coordinates provided in the output determine the stereochemistry.
 Luckily here the CTAB spec provides a way to indicate what isn't
 known: you use the 4th field in the bond line to indicate that the
 bond is an either bond (value 3). Technically this is what should be
 done by any toolkit that builds a molecule from the SMILES CC=CC.

 With atomic stereochemistry in 3D structures, the coordinats again
 determine the stereochemistry. As far as I know, the CTAB spec doesn't
 provide specific guidance about what to do when you have a
 stereocenter that's undetermined in your molecule. One possibility is
 to make sure that the bonds from that atom have 0 in field 4. Maybe
 it's polite to assign an either bond here as well (value 4 in this
 case) to make explicit to the viewer that the stereochemistry isn't
 known. But either of these raise the question of what to do if you
 *do* know the stereochemistry. My opinion here, and I'm aware it's one
 that many people do not share, is that it's best to treat the 3D case
 the same as the 2D one and use a wedged bond to mark atoms where the
 stereochemistry is known. It's somewhat ugly, but it has the advantage
 of being consistent (yes, yes, I know, when foolish it's the hobgoblin
 of little minds... but I don't think it's foolish here).

 P.


 For 2D coordinates, there's no need to store the bond stereochemistry
 (as this can be worked out from the coordinates), but chirality needs
 to be stored explicitly. The normal way to store this is not using
 atom parity (but I'll set this anyway for the same reasons as above),
 but by setting one of the bonds on the tetrahedral center to up or
 down.

 For 0D coordinates, there are no guidelines. I propose to store
 cis/trans stereo using the bond stereo (you know, UP [or DOWN] at both
 ends of a double bond means cis), and chirality using the atom parity.
 The MDL spec states that atom parity should be ignored when read,

 I know this is the spec and I don't want to get into more arguments about
 whether it should be changed. At this stage I think it is useful if programs
 have the capability to read and interpret this field.

 I think that I may move this to an option. So, if you don't explicitly
 ask for it, you will just get what the spec says - i.e. no
 stereochemistry will be stored if there are no coordinates.

 This is what I would suggest. Anything else involves introducing
 conventions that will work with OB, but that may or may not work with
 other toolkits. Since there's no clear answer, or anything that even
 really makes much sense, it's 

Re: [BlueObelisk-discuss] Stereochemistry in MDL files

2010-03-03 Thread Craig James
Noel O'Boyle wrote:
 Are some of the wedge/hash bonds in typical MOL files unrelated to
 stereochemistry? That is, are some purely for depiction? If I knew
 this for sure, I would not retain the wedge/hash bond designations in
 the input but just work them out from the perceived stereo.

YES.  Lots of them.  We see this all the time - people use wedge/hash to do 
pseudo-perspective drawings.  This is particularly common with metals.

http://www.emolecules.com/image?db=549id=17252456width=400height=400
http://www.emolecules.com/image?db=549id=718320width=400height=400

But I also see it all the time with organic molecules, particularly structures 
that are hard to draw in 2D (semi-cage ring systems that won't lay flat).  
People use hashes and wedges to try to make them look nice, that have nothing 
to do with stereochemistry.

Craig

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Blueobelisk-discuss mailing list
Blueobelisk-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss


Re: [BlueObelisk-discuss] Stereochemistry in MDL files

2010-03-03 Thread Greg Landrum
On Wed, Mar 3, 2010 at 4:00 PM, Noel O'Boyle baoille...@gmail.com wrote:
 On 3 March 2010 14:48, Craig James craig_ja...@emolecules.com wrote:
 Noel O'Boyle wrote:

 Are some of the wedge/hash bonds in typical MOL files unrelated to
 stereochemistry? That is, are some purely for depiction? If I knew
 this for sure, I would not retain the wedge/hash bond designations in
 the input but just work them out from the perceived stereo.

 YES.  Lots of them.  We see this all the time - people use wedge/hash to do
 pseudo-perspective drawings.  This is particularly common with metals.

 http://www.emolecules.com/image?db=549id=17252456width=400height=400
 http://www.emolecules.com/image?db=549id=718320width=400height=400

 But I also see it all the time with organic molecules, particularly
 structures that are hard to draw in 2D (semi-cage ring systems that won't
 lay flat).  People use hashes and wedges to try to make them look nice, that
 have nothing to do with stereochemistry.

 So...should we retain them or not? I think what I'll do is add an
 option to allow users to retain them exactly. However, the default
 will be that the wedges/hashes in the output will be solely dependent
 on the perceived stereochemistry. *Sigh* This applies to all 2D output
 formats.

Having the option to retain them exactly sounds sensible, but that
means retaining all of the user-provided markings, right? This almost
sounds to me like it's a read setting, not a write one. But then I'm
not familiar with the internal flow for processing mols in OB.

-greg

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Blueobelisk-discuss mailing list
Blueobelisk-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss