Re: [Rdkit-discuss] MACCS SMARTS pattern definitions

2011-05-29 Thread Greg Landrum
I'm traveling for the next week, without a laptop, so I'm not really
going to be able to look at stuff until the 7th of June.

-greg

On Sunday, May 29, 2011, Andrew Dalke da...@dalkescientific.com wrote:
 Like I said, working on the validation code is very hard. Or at least tedious.
 There's only 25 bits more to write check cases for.

 One of them is bit 141, defined a

   CH3  2

 That is, at least three matches to the SMARTS [CH3]

 Then down in bit 160 it's

   CH3

 with at least 1 match to the SMARTS [C;H3,H4].

 I think the bit 141 should have the same SMARTS, to include CH4.

 It's hard to construct a real case where this will make
 a difference so I'm not sure this is even appropriate.

 Greg? Any thoughts?


                                 Andrew
                                 da...@dalkescientific.com



 --
 vRanger cuts backup time in half-while increasing security.
 With the market-leading solution for virtual backup and recovery,
 you get blazing-fast, flexible, and affordable data protection.
 Download your free trial now.
 http://p.sf.net/sfu/quest-d2dcopy1
 ___
 Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


--
vRanger cuts backup time in half-while increasing security.
With the market-leading solution for virtual backup and recovery, 
you get blazing-fast, flexible, and affordable data protection.
Download your free trial now. 
http://p.sf.net/sfu/quest-d2dcopy1
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] MACCS SMARTS pattern definitions

2011-05-28 Thread Andrew Dalke
Like I said, working on the validation code is very hard. Or at least tedious.
There's only 25 bits more to write check cases for.

One of them is bit 141, defined a

  CH3  2 

That is, at least three matches to the SMARTS [CH3]

Then down in bit 160 it's

  CH3

with at least 1 match to the SMARTS [C;H3,H4].

I think the bit 141 should have the same SMARTS, to include CH4.

It's hard to construct a real case where this will make
a difference so I'm not sure this is even appropriate.

Greg? Any thoughts?


Andrew
da...@dalkescientific.com



--
vRanger cuts backup time in half-while increasing security.
With the market-leading solution for virtual backup and recovery, 
you get blazing-fast, flexible, and affordable data protection.
Download your free trial now. 
http://p.sf.net/sfu/quest-d2dcopy1
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] MACCS SMARTS pattern definitions

2011-05-27 Thread Andrew Dalke
Hi Greg,

 My reading of the SMARTS theory manual
 (http://www.daylight.com/dayhtml/doc/theory/theory.smarts.html) says
 that [0*] means any atom with a mass of 0, so [!0*] would be any
 atom that doesn't have a mass of 0. What am I missing?

In the Daylight, OpenEye, and OpenBabel data models, an
incoming atom which doesn't have an assigned isotope number
is given the isotope number of 0.

That is, they treat [0S] the same as [S].

I just posted an email to the BlueObelisk-SMILES list on this topic.
The OpenSMILES spec says that these two atoms should be different,
but I don't think that's right.

A problem with the Daylight docs is that they don't distinguish
between atomic weight/atomic mass and isotope number. For example,
at the API level, to get the isotope number you call dt_weight

http://www.daylight.com/dayhtml/doc/man/man3/dt_weight.html
dt_weight(dt_Handle) = dt_Integer

meaning that mass == weight == isotope is always treated as an int.

I see that RDKit doesn't store the isotope, but instead tracks
the atomic mass instead. I don't believe that's the right solution.

 Agreed that using the generic atomic-number form makes a lot more sense.

When my updated definitions, with atomic number, are available,
I'll let you know.

Grrr (in a chuckling sort of way)! Now I have to resynchonize
my definitions to the changes you just made!


Andrew
da...@dalkescientific.com



--
vRanger cuts backup time in half-while increasing security.
With the market-leading solution for virtual backup and recovery, 
you get blazing-fast, flexible, and affordable data protection.
Download your free trial now. 
http://p.sf.net/sfu/quest-d2dcopy1
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] MACCS SMARTS pattern definitions

2011-05-27 Thread Andrew Dalke
On May 27, 2011, at 6:01 AM, Greg Landrum wrote:
 And now a more philosophical point about this.
   ...
 The idea of the MACCS keys is simple: a limited set of structural keys
 that can be used to speed up substructure searches and which have
 since been (ab)used for chemical similarity. It seems like it would be
 a lot more helpful to the community if we had a set of keys like this
 that is based on a truly open definition.
  ...
 What do you think Andrew? Want to work together on this?

Sure!

I've been working on my PubChem-like substructure keys all this week.

The pattern definitions are available at
http://code.google.com/p/chem-fingerprints/source/browse/chemfp/substruct.patterns

Validation is the hardest part, since I mostly only have
the PubChem substructure bits as an oracle of what I'm
supposed to get. I think I'm down to differences in how
CACTVS does aromaticity (lots of mismatches because of that!)
and lack of support for PubChem's PUBCHEM_NONSTANDARDBOND 
bond definitions

  ftp://ftp.ncbi.nlm.nih.gov/pubchem/specifications/pubchem_sdtags.txt

I've got implementations for OpenBabel, RDKit, and OEChem.

If you want to try it out, after you've installed the package,

   rdkit2fps --substruct $STRUCTURE_FILENAME


I've also converted RDKit's MACCS patterns into my format definition at

http://code.google.com/p/chem-fingerprints/source/browse/chemfp/rdmaccs.patterns

which I've used in part as a cross-test to make sure my
implementation using RDKit matches RDKit's own implementation.

I've been calling it rdmaccs. Any problems with that? Want another name?


My hope is to get this out in a 0.95 (or perhaps 1.0 alpha?) build
today and announce it. What's mostly lacking are:
  - full validation (very hard, given aromaticity differences)
  - a solid test suite (that's amazingly hard to do)
  - documentation

Oh yeah, and write up some sort of paper on what I did.



Andrew
da...@dalkescientific.com



--
vRanger cuts backup time in half-while increasing security.
With the market-leading solution for virtual backup and recovery, 
you get blazing-fast, flexible, and affordable data protection.
Download your free trial now. 
http://p.sf.net/sfu/quest-d2dcopy1
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] MACCS SMARTS pattern definitions

2011-05-27 Thread Greg Landrum
On Fri, May 27, 2011 at 12:23 PM, Andrew Dalke
da...@dalkescientific.com wrote:
 Hi Greg,

 My reading of the SMARTS theory manual
 (http://www.daylight.com/dayhtml/doc/theory/theory.smarts.html) says
 that [0*] means any atom with a mass of 0, so [!0*] would be any
 atom that doesn't have a mass of 0. What am I missing?

 In the Daylight, OpenEye, and OpenBabel data models, an
 incoming atom which doesn't have an assigned isotope number
 is given the isotope number of 0.

 That is, they treat [0S] the same as [S].

That is definitely wrong according to the Daylight theory manual:
Isotopic specifications are indicated by preceding the atomic symbol
with a number equal to the desired integral atomic mass. An atomic
mass can only be specified inside brackets.
So [0S] would be S with an atomic mass of 0.

 I just posted an email to the BlueObelisk-SMILES list on this topic.
 The OpenSMILES spec says that these two atoms should be different,
 but I don't think that's right.

We can agree to change it, but it's certainly consistent with what
Daylight says in the theory manual.

-greg

--
vRanger cuts backup time in half-while increasing security.
With the market-leading solution for virtual backup and recovery, 
you get blazing-fast, flexible, and affordable data protection.
Download your free trial now. 
http://p.sf.net/sfu/quest-d2dcopy1
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] MACCS SMARTS pattern definitions

2011-05-27 Thread Greg Landrum
On Fri, May 27, 2011 at 3:47 PM, Andrew Dalke da...@dalkescientific.com wrote:
 On May 27, 2011, at 1:25 PM, Greg Landrum wrote:
 That is definitely wrong according to the Daylight theory manual:
 Isotopic specifications are indicated by preceding the atomic symbol
 with a number equal to the desired integral atomic mass.

 Yes, and I think they are being imprecise, but since SMILES is
 meant for normal chemistry, it's in an area where imprecision
 doesn't make much difference.

 Where does it make a difference? High resolution mass spec,
 for one. The mass of 28Si is not 28.0 but 27.9769265325.

No arguments here. But that doesn't address the [0Si] question.

 I've been looking at how RDKit handles isotopes/mass, and
 I think there are some good examples of how its current
 approach can cause confusion.

There is a lot of room for improvement in the way the RDKit handles
isotopes. (I'm being polite to myself).
When I have the free day for RDKit backend work, I need to go back and
re-examine the way this is done.

 For those who haven't reviewed the code, RDKit turns [Si]
 into an Atom instance with mass of 28.086, that being the
 average abundance of silicon.

correct.

 To generate the isomeric SMILES, RDKit looks at the mass.
 If it's more than 0.1 amu difference from the integral
 atomic mass (28 in this case) then it puts in the atomic
 mass. Otherwise it omits the abundance.

 Thus, since || 28.086 - 28 || = 0.1

  Input: [Si]    gives Output: [Si]


 Suppose I have isotopically pure silicon [28Si].
 RDKit turns this into an Atom with mass 28..
 If I generate the isomeric SMILES I get that

    || 28. - 28 || = 0.1

 which means no atomic number will be displayed
 in the output, so

  Input: [28Si]    gives Output: [Si]

 I tested this with Pubchem compound CID 21732668.
 It has an isomeric SMILES of

  F[28Si](F)(F)P([28Si](F)(F)F)[28Si](F)(F)F

 RDKit converts that into an isomeric SMILES of

  F[Si](F)(F)P([Si](F)(F)F)[Si](F)(F)F

 In other words, the generated SMILES is no longer isotopically
 pure.


 I believe this is wrong.

You will get no argument from me. It's wrong.

 As it stands, the only way to tell if a given atom is supposed
 to be isotopically pure is to see if

  atom.GetMass() == int(atom.GetMass())

 This will only fail for Tc, Pm, Po, At, and the other elements
 which have only very unstable isotopes, and hence where the
 idea of average abundance makes no sense.


 So for purposes of the first bit in the MACCS definition,
 I propose using something like:

 def has_specified_isotope(mol):
  for atom in mol.GetAtoms():
   mass = atom.GetMass()
   if mass == int(mass):
     return True
  return False




 BTW, checking out of curiosity, I see that elements 106 (Sg)
 and higher have a isotopic mass defect which is greater than
 0.1 amu. If RDKit supported Sg then it would always turn

  Input: [Sg]    into Output: [106Sg]

 when making the isomeric SMILES.

 http://en.wikipedia.org/wiki/Isotopes_of_seaborgium
 http://en.wikipedia.org/wiki/Seaborgium

 PubChem does not have any of the reported Sg containing
 molecules. In fact:

  Failed to decode the following as a Molecular Formula or a CID:
  SgO3

 It seems that no molecule containing Sg is in PubChem.



 We can agree to change it, but it's certainly consistent with what
 Daylight says in the theory manual.

 The problem above arises because RDKit uses an average mass when
 no mass is specified. The object model in the manual only allows
 integer masses, and the Daylight API agrees with that. I therefore
 don't see how RDKit's behavior is consistent.

It's consistent to within roundoff error if you specify an isotope.
The theory manual says if you don't specify anything, it's
unspecified mass. I interpreted that to mean average atomic mass.

-greg

--
vRanger cuts backup time in half-while increasing security.
With the market-leading solution for virtual backup and recovery, 
you get blazing-fast, flexible, and affordable data protection.
Download your free trial now. 
http://p.sf.net/sfu/quest-d2dcopy1
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] MACCS SMARTS pattern definitions

2011-05-26 Thread Andrew Dalke
RDKit implements the MACCS keys as a set of SMARTS patterns,
plus a few bits coded by hand.

I don't know how much people know the impact of this on the other
free software projects. OpenBabel and CDK both use copies of the
RDKit definitions for their own MACCS keys. While I've seen
earlier internal definitions, they were held rather closely, so
it's very nice to have a public definition.

I'm reviewing the definitions as part of my chemfp project,
which is one of the advantages to having an open definition.

I've got some question or suggestions about them. For reference,
see http://rdkit.org/Python_Docs/rdkit.Chem.MACCSkeys-pysrc.html


* Bit 1 is

   1:('?',0), # ISOTOPE 

This explicitly isn't defined, but shouldn't it be [!*0] ?

I tried out that SMARTS and I see that a SMILES of C[14CH3]
has two matches in RDKit to [!0*] but in OEChem there's only
one. I think the OEChem version is correct. I verified it at

http://www.daylight.com/daycgi_tutorials/depictmatch.cgi
with the SMILES of C[14CH2][13CH3] and SMARTS of [!0*] .
Daylight matches 2 of the 3 atoms.

I think this is a bug in RDKit, and once fixed it would
mean this bit could be supported.


* Bit 2 is

  #2:('[#103,#104,#105,#106,#107,#106,#109,#110,#111,#112]',0),  # ISOTOPE Not 
complete 
   2:('[#103,#104]',0),  # ISOTOPE Not complete 

I assume the comment is wrong, since this has nothing to do with isotopes.

What's not complete about this definition, and/or why is the first one 
commented out?

* *NOTE* spec wrong occurs on many lines

What does it mean?

* Bit 3 is

 3:('[Ge,As,Se,Sn,Sb,Te,Tl,Pb,Bi]',0), # Group IVa,Va,VIa Periods 4-6 (Ge...)  
*NOTE* spec wrong 

The Tl doesn't look right. Shouldn't the last three be Pb,Bi,Po ?

*  Bit 18 is

   18:('[B,Al,Ga,In,Tl]',0), # Group IIIA (B...) *NOTE* spec wrong 

Boron may be aromatic according to the SMILES spec, so this
should be [B,b, ...] or [#5, ... ].

Also, here's the aromatic elements in OpenBabel:

[se]
[as]
[si]
[ge]
[sb]
[bi]
[te]
[sn]

Not all of these are valid SMARTS according to Daylight, and
RDKit doesn't support the same set of aromatics, so for a
portable version (which I'm working on) they can be written as

[#34], [#33], [#14], ...

Oh, and aromatic lead has been synthesized

 http://www.rsc.org/chemistryworld/News/2010/April/15041002.asp


*  Bit 44 is

  44:('?',0), # OTHER 

Is this one of the undocumented bits or does OTHER mean
something else?

*  Bit 68 says

FIX: incomplete definition

Are there thoughts to complete this? My thought is that it isn't
important one way or the other. Without a good validation set
it would be hard to really pin this down.

There are a number of other bits which are also marked FIX:
incomplete definition. Are they going to be fixed? Again, I
don't think there's a pressing need without validation data.

* Bit 101 says:

  8M Ring or larger. This only handles up to ring sizes of 14 

Is it worthwhile to support larger rings? I don't think so.
If yes, then it could be dealt with outside of the SMARTS,
just like 125 and 166.


BTW, I also verified that all of the CH2 atoms were written
as either [CH2] (if there are two bonds other atoms) or
[C;H2,H3] if there is only one bond (and similar with [NH2]).
While strange chemistries can cause this to fail as a
substructure filter, I recognize that that is outside
the scope of those definitions. 

Cheers,


Andrew
da...@dalkescientific.com



--
vRanger cuts backup time in half-while increasing security.
With the market-leading solution for virtual backup and recovery, 
you get blazing-fast, flexible, and affordable data protection.
Download your free trial now. 
http://p.sf.net/sfu/quest-d2dcopy1
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] MACCS SMARTS pattern definitions

2011-05-26 Thread Greg Landrum
Hi Andrew,

I'm going to divide this into pieces in order to be able to answer in
a reasonable amount of time.
I'll do clarifying questions and quick answers in this one.

On Thu, May 26, 2011 at 4:02 PM, Andrew Dalke da...@dalkescientific.com wrote:
 RDKit implements the MACCS keys as a set of SMARTS patterns,
 plus a few bits coded by hand.

 I don't know how much people know the impact of this on the other
 free software projects. OpenBabel and CDK both use copies of the
 RDKit definitions for their own MACCS keys. While I've seen
 earlier internal definitions, they were held rather closely, so
 it's very nice to have a public definition.

 I'm reviewing the definitions as part of my chemfp project,
 which is one of the advantages to having an open definition.

 I've got some question or suggestions about them. For reference,
 see http://rdkit.org/Python_Docs/rdkit.Chem.MACCSkeys-pysrc.html


 * Bit 1 is

   1:('?',0), # ISOTOPE

 This explicitly isn't defined, but shouldn't it be [!*0] ?

 I tried out that SMARTS and I see that a SMILES of C[14CH3]
 has two matches in RDKit to [!0*] but in OEChem there's only
 one. I think the OEChem version is correct. I verified it at

 http://www.daylight.com/daycgi_tutorials/depictmatch.cgi
 with the SMILES of C[14CH2][13CH3] and SMARTS of [!0*] .
 Daylight matches 2 of the 3 atoms.

 I think this is a bug in RDKit, and once fixed it would
 mean this bit could be supported.


My reading of the SMARTS theory manual
(http://www.daylight.com/dayhtml/doc/theory/theory.smarts.html) says
that [0*] means any atom with a mass of 0, so [!0*] would be any
atom that doesn't have a mass of 0. What am I missing?


 * Bit 2 is

  #2:('[#103,#104,#105,#106,#107,#106,#109,#110,#111,#112]',0),  # ISOTOPE Not 
 complete
   2:('[#103,#104]',0),  # ISOTOPE Not complete

 I assume the comment is wrong, since this has nothing to do with isotopes.

 What's not complete about this definition, and/or why is the first one 
 commented out?

I've got to see if I can find a description of the bits and I'll come
back to these definition questions.


   18:('[B,Al,Ga,In,Tl]',0), # Group IIIA (B...) *NOTE* spec wrong

 Boron may be aromatic according to the SMILES spec, so this
 should be [B,b, ...] or [#5, ... ].

 Also, here's the aromatic elements in OpenBabel:

 [se]
 [as]
 [si]
 [ge]
 [sb]
 [bi]
 [te]
 [sn]

 Not all of these are valid SMARTS according to Daylight, and
 RDKit doesn't support the same set of aromatics, so for a
 portable version (which I'm working on) they can be written as

 [#34], [#33], [#14], ...

 Oh, and aromatic lead has been synthesized

  http://www.rsc.org/chemistryworld/News/2010/April/15041002.asp


Agreed that using the generic atomic-number form makes a lot more sense.

 * Bit 101 says:

  8M Ring or larger. This only handles up to ring sizes of 14

 Is it worthwhile to support larger rings? I don't think so.
 If yes, then it could be dealt with outside of the SMARTS,
 just like 125 and 166.

Agreed that it's not really necessary to support larger rings. Systems
with rings larger than 14 would end up missing a single bit.

 BTW, I also verified that all of the CH2 atoms were written
 as either [CH2] (if there are two bonds other atoms) or
 [C;H2,H3] if there is only one bond (and similar with [NH2]).
 While strange chemistries can cause this to fail as a
 substructure filter, I recognize that that is outside
 the scope of those definitions.

it certainly is for me. :-)

-greg

--
vRanger cuts backup time in half-while increasing security.
With the market-leading solution for virtual backup and recovery, 
you get blazing-fast, flexible, and affordable data protection.
Download your free trial now. 
http://p.sf.net/sfu/quest-d2dcopy1
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] MACCS SMARTS pattern definitions

2011-05-26 Thread Greg Landrum
Hi Andrew,

Second part of my response.

On Thu, May 26, 2011 at 4:02 PM, Andrew Dalke da...@dalkescientific.com wrote:

 * Bit 2 is

  #2:('[#103,#104,#105,#106,#107,#106,#109,#110,#111,#112]',0),  # ISOTOPE Not 
 complete
   2:('[#103,#104]',0),  # ISOTOPE Not complete

 I assume the comment is wrong, since this has nothing to do with isotopes.

 What's not complete about this definition, and/or why is the first one 
 commented out?

You're right, the comment is wrong. The definition is also not
correct, the key should be atomic num103.
The reason the more complete defn is commented out is that the RDKit
periodic table data only go up to #104. I added a comment to that
effect.

 * *NOTE* spec wrong occurs on many lines

 What does it mean?

I'm afraid that's lost in the sands of time. I will remove them.


 * Bit 3 is

  3:('[Ge,As,Se,Sn,Sb,Te,Tl,Pb,Bi]',0), # Group IVa,Va,VIa Periods 4-6 (Ge...) 
  *NOTE* spec wrong

 The Tl doesn't look right. Shouldn't the last three be Pb,Bi,Po ?

Yep.

 *  Bit 18 is

   18:('[B,Al,Ga,In,Tl]',0), # Group IIIA (B...) *NOTE* spec wrong

 Boron may be aromatic according to the SMILES spec, so this
 should be [B,b, ...] or [#5, ... ].

Fixed this.

 *  Bit 44 is

  44:('?',0), # OTHER

 Is this one of the undocumented bits or does OTHER mean
 something else?

It's undocumented


 *  Bit 68 says

    FIX: incomplete definition

 Are there thoughts to complete this?

This is one where the spec is incomplete : it includes the amazingly
helpful (...) at the end.

 My thought is that it isn't
 important one way or the other. Without a good validation set
 it would be hard to really pin this down.

Agreed.


 There are a number of other bits which are also marked FIX:
 incomplete definition. Are they going to be fixed? Again, I
 don't think there's a pressing need without validation data.

Those also have (...). I've updated the comment to make clear that
it's due to an incomplete spec.

I just checked in a set of changes reflecting the above.

-greg

--
vRanger cuts backup time in half-while increasing security.
With the market-leading solution for virtual backup and recovery, 
you get blazing-fast, flexible, and affordable data protection.
Download your free trial now. 
http://p.sf.net/sfu/quest-d2dcopy1
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] MACCS SMARTS pattern definitions

2011-05-26 Thread Greg Landrum
And now a more philosophical point about this.

On Thu, May 26, 2011 at 4:02 PM, Andrew Dalke da...@dalkescientific.com wrote:
 RDKit implements the MACCS keys as a set of SMARTS patterns,
 plus a few bits coded by hand.

 I don't know how much people know the impact of this on the other
 free software projects. OpenBabel and CDK both use copies of the
 RDKit definitions for their own MACCS keys. While I've seen
 earlier internal definitions, they were held rather closely, so
 it's very nice to have a public definition.

 I'm reviewing the definitions as part of my chemfp project,
 which is one of the advantages to having an open definition.

It seems like it would make a lot more sense for all of us if we had a
truly open definition. We'll never get that with MACCS keys because
there's no true public definition of the so-called public keys (at
least not that I know of).

The idea of the MACCS keys is simple: a limited set of structural keys
that can be used to speed up substructure searches and which have
since been (ab)used for chemical similarity. It seems like it would be
a lot more helpful to the community if we had a set of keys like this
that is based on a truly open definition.

Given that we have the MACCS and Pubchem
(ftp://ftp.ncbi.nlm.nih.gov/pubchem/specifications/pubchem_fingerprints.txt)
 keys as templates, that there are ample publications in this space
(including from MDL: http://pubs.acs.org/doi/abs/10.1021/ci010132r),
and that Andrew has kind of already started working on this (info in
this thread: 
http://www.mail-archive.com/rdkit-discuss@lists.sourceforge.net/msg00402.html),
it seems like it shouldn't be all that much work.

What do you think Andrew? Want to work together on this?

-greg

--
vRanger cuts backup time in half-while increasing security.
With the market-leading solution for virtual backup and recovery, 
you get blazing-fast, flexible, and affordable data protection.
Download your free trial now. 
http://p.sf.net/sfu/quest-d2dcopy1
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss