At 16:50 13/06/2003 -0500, Geoff Hutchison wrote:

On Friday, June 13, 2003, at 04:34 PM, Miguel wrote:

Basically, what I have been told is that the algorithms are all published.
They mostly involve geometry ... angles and distances. And the problem is
that there are no open-source implementations.

So, I think it is well worth it to do some research ... literature search
for these algorithms.


I'd expect that any implementation will require some parameterization and/or database, in which case contacting several mailing lists of groups like JMol, CDK, Open Babel, JOELib, etc. should be helpful in providing some data for that effort.

This would be a great addition to the open-source chemical toolkit!
-Geoff

I think this is tremendously important and think that it is a great drawback that there is no open source implementation. IMO there are several (complementary) approaches:


(a) rule-based. This involves abstracting the known literature for common fragments and abstracting rules from them. They can look somewhat like:
- a bond length is the sum of the covalent radii of the atoms
- if the bond is multiple estimate its order (resonance structures, simple Hueckel, etc).
- if the bond is multiple use Pauling's bond number relation to calculate the shortening ( deltaR = 0.7 * log(10) n)


- if a bond is rotatable, stagger the greatest number of atoms, but
- if an electron-donating group is on an aromatic ring, make it coplanar , unless
- there are large o-substituents, etc.


This is (I think) how COBRA and CORINA work, in part. The rule set can get very large very quickly.

[I am writing rules for CDK for the addition of H-atom coordinates and should post these shortly. BUT they will not even be as sophisticated as the ones above.]

(b) fragment based. Have a 3D database of experimental (and high quality calculated) structures. A required structure can then be searched against the library. Sometimes it will hit exactly (e.g. we might expect aspirin, testosterone, etc. to be in the set). Sometimes parts of it will be found (e.g. benzyl penicillin could hit benzyl and 6-amino-penicillanic acid). The fragments have to be assembled and this may involve rules about collisions, aromaticity, etc.

(c) computational. Start with a rough approximation from a/b and run an inexpensive geometry optimisation (e.g. GROMACS (MM), ABINIT (QM)) - I chose these as being open source. MM will fail for unparameterised elements (e.g. metals); the QM may fail for "heavy" or unparameterised elements. But many will get through.

We have recently been running a very large number of such calculations - which we shall announce elsewhere shortly - and shall make the results OpenData. This could be valuable for the (b) approach.

In practice I suspect that:
(a) the final approach will be hybrid and involve all the above.
(b) communal tools *and* databases will be extremely important
(c) it's an awful lot of work
(d) it's worth it.

I also think it's critical that overlap is minimised and that CDK, OB, JOElib, ABINIT, CML, etc. work to make sure their tools interoperate.

Here's a shopping list:
- a rule-based architecture. We don't have one. I don't think we want to bundle Prolog into the system. Are there alternatives?
- a fragment library. I suspect most of these are not Open. For example I think it is against the license for the Cambridge data base to extract fragment libraries. However there are many high-quality unpublished X-ray structures (probably over 1 million) and even 1 percent of these would be an extremely valuable resource. I have a strategy and a technology - If anyone has something to contribute, please mail me. We shall also make our geometries from QM calculation available for a database..
- chemical perception. This is critical, It must be easy to submit a molecule and have the component atoms given types and fragments labelled as aromatic, etc.
- chemical and geometric search. We need search technology for a data repository (I would aim at a diverse set of ca 10000 molecules) which can be distributed and searched by chemical concepts and geometrical concepts. We have been working with an XML repository which gives good performance on *indexed* fields. It's all open and we hope is distributable. It can be searched for exact chemical match but NOT substructure and we urgently need this.
- geometry toolkit. Superpose fragments, merge fragments, etc.
- conformational search...
- inexpensive, easily distributable, MM and QM programs. GROMACS is an obvious starting point but I don't know its parameterisation for non bio-molecules. I haven't used ABINIT though I hope to make contact over the next 2-3 months. Can we devise a protocol that runs in sub-minute time for (say) a 30-atom molecule (inc H atoms) on a PC?
- glueware. We have been using ant and CMLComp (an extension of CML to computation) to create a black-box approach. This has now processed a large number of molecules automatically.


Don't underestimate the effort, but it is an excellent target to aim for and will undoubtedly enhance the various toolkits involved.

P.



-------------------------------------------------------
This SF.NET email is sponsored by: eBay
Great deals on office technology -- on eBay now! Click here:
http://adfarm.mediaplex.com/ad/ck/711-11697-6916-5
_______________________________________________
Jmol-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/jmol-developers

Reply via email to