At 14:42 15/09/2007, Christoph Steinbeck wrote:
>Peter,
>
>thanks very much for the interesting posting.

I've had a long talk with Henry. Essentially his position is that he 
has a method for calculation which gives significantly increased 
accuracy in an automatic fashion. What I would suggest that Nick does 
is to use Henry's method on a carefully selected subset of NMRShiftDB.

>Warren Here from Wavefunction has been calculating Spektra for all
>NMRShiftDB structures using Spartan on Hartree-Fock level. We have a CD
>with all of them (Stefan, is that right) but we never got all the
>information on the format, I think.

Stefan says it isn't available. In any case is it Open? I think Henry 
would contend his method would be an advance.


>I myself have been calculating about 2000 Gaussian spectra for
>NMRShiftDB on a 16-node cluster which came for exactly this purpose with
>my NMRShiftDB grant.

Is this published? We don't want to pre-empt you.


>BTW, there has been an interesting article on the topic:
>
>Structure Validation of Natural Products by Quantum-Mechanical GIAO
>Calculations of 13C NMR Chemical Shifts
>Giampaolo Barone, et al.
>Chemistry - A European Journal
>Volume 8, Issue 14 , Pages 3233 - 3239
>
>I would suggest an extended protocol, which slows down things a bit but
>make a better research project: We should start with the generation of a
>number of lowest energy conformers (probably not for molecules like
>alpha-Pinene, http://en.wikipedia.org/wiki/Alpha-Pinene), but for more
>floppy compounds, do a Gaussian calculation for each of them and try
>averaging shifts.
>The protocol will be a bit more fragile, though, I guess.
>Another issue will be if there is an option for simulating polar
>solvents and their effects on the shifts?!

I think this is a good idea and henry has been advocating it but Nick 
doesn't have time to generate conformers. So we would stick with 
rigid molecules.


>There seems to be a general agreement that you get best results with
>calculating both the geometry optimization as well as the GIAO shifts on
>B3LYP/6-31G(d) level, but I trust whatever Henry suggests :-)
>With my own Gaussian calculations, I got excellent results for rigid and
>unpolar compounds, just as Barone suggested in his paper.
>
>Please let me know if I can help.

So I'll revise the strategy. The fundamentals are that it must be automatic.

> >
> > Here's what we want to do anyway...
> >
> > We want to apply this method to as many spectra in NMRShiftDB as we
> > can. It will be limited by time, flexibility and lots of scientific
> > effects that we haven't thought of and which will be discovered 
> by the process.
> >
> > The MW limit is ca 500 (several days) though we'd prefer smaller ones
> > to start with in which case they might take half a day each. Nick Day
> > has already setup an effective workflow for crystals from crystalEye
> > and runs these on much the same timescale - we have processed many
> > thousand jobs on the Condor system. We now have Gaussian which -
> > according to Henry - should be easy to configure.
> >
> > Nick is finishing the experimental part of his thesis and this would
> > form an interesting final piece.
> >
> > So what we have to do is (and it must be automatic)
> > * extract spectra and connection table from NMRShiftDB

We do this once only. Nick, can you liaise with Christoph and Stefan 
about how to dump the DB. It's all in CML

> > * generate 3D coordinates

Any suggestions here?

> > * generate Gaussian input according to Henry's protocols.

Henry will feed this to Nick and Nick will write the converter

> > * run the jobs on Condor (or elsewhere)

Should be automatic

> > * collect the output

ditto

> > * parse into CML

This requires JUMBOMarker to read the Gaussian log file. It will be a 
rapid kludge and should take an hour or two once we have got the 
marker running.

> > * expose the results on our web site (this is where the Open 
> Science comes in).

The website/wiki will contain (wiki is editable):

General principles (including the open notebook idea) (wiki)
Henry's thoughts on the protocol (wiki)
Results output as page-per-structure (cf. crystaleye). Probably not 
editable. Will contain:
   -- all main metadata and data
   -- plot of observed vs calc (or deviations)
   -- link to annotation page
Annotation page. We have a standoff annotation page with 
bidirectional links to each calculation. This means we can have 
general comments or comments on several spectra in a synoptic place

> > * annotate the results (humans and machine)

See above

> > * display the results (primarily agreement between observed and

See above

> > calculated values, but also much else).
> >
> > The immediate difficulties we can see are:
> > * not knowing the stereochemistry. *** WHAT IS THE POSITION IN
> > NMRSHIFTDB? We can filter out anything that has more than one
> > potential stereocentre.
> > * assumptions about completeness of data in NMRShiftDB
> > * syntactic problems (almost certain to occur in any large data set)
> > * generating initial 3d coordinates. There are several simple approaches:
> >     - look up the moiety in crystaleye
> >     - join moieties in crystaleye
> >     - use CDK - I think Christoph had something here?
> >     - use the 2D coordinates for "flat" molecules
> >     *** NEW APPROACHES - MUST BE BLUEOBELISK
> > * optimising the coordinates. Probably a cheap level of theory (PM3)
> > would work.

It may be that we run this in two passes - an initial cheap 
optimsation could contain a GIAO calculation as well and we get an 
idea how well this works compared to the more expensive method

> > * parsing the Gaussian output (though JUMBOMarker has already been
> > tuned for some Gaussian jobs). It may also be that the archive is enough
> > * scale. After a few hundred jobs SOME effect of scale will hit us.
> > We don't know what but every project of this sort has these scale problems
> > * wikifying the results. Ideally we like to expose the results in a
> > very similar way to crystaleye with 2D and 3D coordinates and with an
> > observed/calc graph. Then we have to protect against spam.
> > Suggestions would be welcome
> >
> > At present I'd like to know of any immediate problems that we haven't
> > thought of. If not I suspect Nick will simply download all the data.
> >
> > Compute resources are probably not the problem at present. But later
> > we may ask for volunteers.
> >
> > If this is a success there is a much wider vision. You don't need me
> > to spell it out, and it's probably a good idea to keep things low key.
> >
Henry is confident that most "organic" molecules with rigid 
conformations will work well. However there may be groups which have 
strong conformational or solvent effects. Nick will run machine 
learning to identify these.

The method should then be predictive for some or all of:
* are the data typographically correct?
* are the data correctly assigned?
* are the stereocentres correctly assigned?
* are there likely to be conformational effects?
* are there likely to be solvent effects?
* are there other effects (e.g. Henry has found relativistic problems 
with heavier elements)

This should be of great interest and should be of routine interest to 
all authors, editors and readers who care whether the reporting is correct.

P.


Peter Murray-Rust
Unilever Centre for Molecular Sciences Informatics
University of Cambridge,
Lensfield Road,  Cambridge CB2 1EW, UK
+44-1223-763069 


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Blueobelisk-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss

Reply via email to