Is the NMRShiftDB list active?
Here's what we want to do anyway...
It is now possible to compute the 13C chemical shifts of organic
compounds to an exciting degree of accuracy. Henry and I were talking
about this yesterday and if the compound is fairly rigid the results
are believable enough to assign most peaks and to correct errors.
We want to apply this method to as many spectra in NMRShiftDB as we
can. It will be limited by time, flexibility and lots of scientific
effects that we haven't thought of and which will be discovered by the process.
The MW limit is ca 500 (several days) though we'd prefer smaller ones
to start with in which case they might take half a day each. Nick Day
has already setup an effective workflow for crystals from crystalEye
and runs these on much the same timescale - we have processed many
thousand jobs on the Condor system. We now have Gaussian which -
according to Henry - should be easy to configure.
Nick is finishing the experimental part of his thesis and this would
form an interesting final piece.
So what we have to do is (and it must be automatic)
* extract spectra and connection table from NMRShiftDB
* generate 3D coordinates
* generate Gaussian input according to Henry's protocols.
* run the jobs on Condor (or elsewhere)
* collect the output
* parse into CML
* expose the results on our web site (this is where the Open Science comes in).
* annotate the results (humans and machine)
* display the results (primarily agreement between observed and
calculated values, but also much else).
The immediate difficulties we can see are:
* not knowing the stereochemistry. *** WHAT IS THE POSITION IN
NMRSHIFTDB? We can filter out anything that has more than one
potential stereocentre.
* assumptions about completeness of data in NMRShiftDB
* syntactic problems (almost certain to occur in any large data set)
* generating initial 3d coordinates. There are several simple approaches:
- look up the moiety in crystaleye
- join moieties in crystaleye
- use CDK - I think Christoph had something here?
- use the 2D coordinates for "flat" molecules
*** NEW APPROACHES - MUST BE BLUEOBELISK
* optimising the coordinates. Probably a cheap level of theory (PM3)
would work.
* parsing the Gaussian output (though JUMBOMarker has already been
tuned for some Gaussian jobs). It may also be that the archive is enough
* scale. After a few hundred jobs SOME effect of scale will hit us.
We don't know what but every project of this sort has these scale problems
* wikifying the results. Ideally we like to expose the results in a
very similar way to crystaleye with 2D and 3D coordinates and with an
observed/calc graph. Then we have to protect against spam.
Suggestions would be welcome
At present I'd like to know of any immediate problems that we haven't
thought of. If not I suspect Nick will simply download all the data.
Compute resources are probably not the problem at present. But later
we may ask for volunteers.
If this is a success there is a much wider vision. You don't need me
to spell it out, and it's probably a good idea to keep things low key.
P.
Peter Murray-Rust
Unilever Centre for Molecular Sciences Informatics
University of Cambridge,
Lensfield Road, Cambridge CB2 1EW, UK
+44-1223-763069
-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Blueobelisk-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss