Hi AR

Please define what you mean by 'over-refinement' as it's not a term I use:
does it mean 'convergence', or 'over-fitting', or 'over-optimisation'
(whatever that means) or something else?

If by "LLG is stabilized" you mean it has converged then I agree that's a
possible stopping criterion, but then so must all the refinement indicators
including R and Rfree (& RMSDs etc) since by definition at convergence there
can be no further significant changes in the parameters to cause R and Rfree
to change further.  You say R & Rfree "are going in opposite directions"
when LLG has stabilized.  It's not possible for R and Rfree to continue to
change if the refinement has converged, since that clearly implies that it
hasn't converged.

Cheers

-- Ian

PS 3 copies of your email is 2 too many (or if this is the list server
acting up again, my apologies).


On Fri, Aug 26, 2011 at 3:55 PM, protein chemistry <
proteinchemistr...@gmail.com> wrote:

> Dear Dr Ian
>
> from your argument i could not understand how many cycles to refine before
> submitting the coordinates to the PDB. what is the upper limit 100 or
> thousand or million???? according to my understanding, its more logical to
> stop the refinement when over refinement is  taking place (when R and Rfree
> are going in opposite directions and LLG is stabilized )
>
>
> On Fri, Aug 26, 2011 at 4:01 PM, Ian Tickle <ianj...@gmail.com> wrote:
>
>> Frank,
>>
>> Point #1 - fair point;  the reason Rfree is popular, though, is because it
>>> is a *relative* metric, i.e. by now we have a sense of what "good" is.
>>> So I predict an uphill fight for LLfree.
>>>
>>
>> Why? I don't see any difference.  As you say Rfree is a relative metric so
>> your sense of what 'good' is relies on comparisons with other Rfrees (i.e.
>> it can only be 'better' or 'worse' not 'good' or 'bad'), but then the same
>> is true of LLfree (note that they both assume that exactly the same data
>> were used and that only the model has changed).  So when choosing between
>> alternative model parameterisations in order to minimise over-fitting we
>> compare their Rfrees and choose the lower one - same with LLfree, or we
>> compare the observed Rfree with the expected Rfree based on Rwork and the
>> obs/param ratio to check for problems with the model - same with LLfree.  In
>> fact you can do it better because the observations in LLfree are weighted in
>> exactly the same way as those in the target function.
>>
>>
>>> Point #2 would hold if we routinely let our refinements run to
>>> convergence;  seems common though to run "10 cycles" or "50 cycles" instead
>>> and draw conclusions from the behaviour of the metrics.  Are the conclusions
>>> really much different from the comparison-at-convergence you advocate?
>>> Which is in practice often less convenient.
>>>
>>> You might do 10 cycles for a quick optimisation of the coordinates, but
>> then I wouldn't place much faith in the R factors!  How can you draw any
>> conclusions from their behaviour: there's no way of predicting how they will
>> change in further cycles, the only way to find out is to do it.  I'm not
>> saying that you need to refine exhaustively on every run, that would be
>> silly since you don't need to know the correct value of the R  factors for
>> every run; but certainly on the final run before PDB submission I would
>> regard stopping the refinement early based on Rfree as implied in Tim's
>> original posting as something akin to 'cheating'.
>>
>> Cheers
>>
>> -- Ian
>>
>
>

Reply via email to