Perhaps I don't understand the k-fold cross-validation, but why would test set 
0 be corrupt? I should have thought every _other_ test set should be corrupt, 
since the structure has been refined against them, with test set 0 excluded. ?
eab

Martin Malý wrote on 6/24/2025 5:54 PM:
Dear all,

I think we went quite far from the original question, however, it's an 
interesting discussion here. We have implemented the k-fold (typically 20-fold) 
cross-validation in paired refinement pipeline PAIREF. From my experience, it's 
quite tricky to have an appropriate starting structure model. If a model has 
been already refined while 5 % reflections were excluded (test set No. 0) and 
then 20-fold cross-validation is run for each test set 0-19, then there is a 
high chance that the result for the test set 0 will be biased/corrupted. It may 
help to shake with coordinates and reset ADPs but these are not always 
sufficient. I also expect (I don't have a proof for it though) that such 
resetting of model would affect reflections at different resolution to a 
different extent. Or to generate new 20 test reflection sets from scratch, that 
may be the safe way...

Actually I would be quite interested what is your best practise when you want 
to reset a structure model to not be biased towards any test reflection set.

Cheers,
Martin


On 24/06/2025 08:23, Tim Gruene wrote:
Hi Valesh,


in addition to the expected variance v=(Rfree)^2/(2T), where T is the
size of your test set (Ian
Tickle et al, https://doi.org/10.1107/S0907444999016868) it also matters
whether you ran enough cycles to stabilise the refinement, i.e. the
value of the target function should not drop any further. SHELXL prints
the maximum shift, which is convenient, but with other programs, you may
need to check the log-files to confirm. Did you consider this during
you study?

Cheers,
Tim


On Mon, 23 Jun 2025 14:55:36 +0000 "Oganesyan, Vaheh"
<[email protected]> wrote:

Hi all,

I’ve spent some time going through different flags for free
reflections for (somewhat stupid) reason to get lower Rfree. I made
sure all different flags were used only for each of sets of
refinement. I wanted to satisfy my inner belief that it doesn’t
matter which flag is used (between 0 and 19). To be short: I failed.

Free reflections have been chosen “randomly”, no suspicion there.
However, there was clear difference at the end. The difference was
about 2-3% difference in Rfree value. This tells me that the
randomness has some sort of rule, which makes “random” choice not so
random. Having found this I also tried to see “a rule” that breaks
this randomness, like even numbers, odds, etc. I did not. Because
this was only on few cases I won’t even try to connect it to # of
molecules per AU, or SG. For each structure (I tried this for 2-3
structures about 10 years ago) it was a different flag. I’m not sure
if DCC also is looking through different flags, but at the end it
finds the best, making these exercises unnecessary. Sorry, cannot
present a case. Was too long ago.

Vaheh Oganesyan, Ph.D.
[cid:[email protected]]
R&D | Biologics Engineering
One Medimmune Way, Gaithersburg, MD 20878
T:  301-398-5851
[email protected]<mailto:[email protected]>



From: CCP4 bulletin board <[email protected]> On Behalf Of Randy
John Read Sent: Monday, June 23, 2025 10:29 AM
To: [email protected]
Subject: Re: [ccp4bb] free R in shells

Hi Ben,

I would be very interested if you have a case where it makes a
difference to do this. At one point I was convinced that it had been
important when we were working on the structure of a Shiga-like toxin
bound to trisaccharide (1bos), with four pentamers in the asymmetric
unit. However, Pavel Afonine challenged me to show that the free set
was less biased when chosen in shells than when chosen randomly, and
even in that relatively extreme case I couldn’t see evidence of it.
So it’s probably not worth the bother. Also, if you select the free
set randomly, it’s distributed over the same resolutions as the
working data, which arguably is important when you’re using it to
calibrate the sigma(A) estimates for likelihood targets.

Best wishes,

Randy

  [...]

-----
Randy J. Read
Department of Haematology, University of Cambridge
Cambridge Institute for Medical Research Tel: +44 1223 336500
The Keith Peters Building
Hills Road E-mail: [email protected]<mailto:[email protected]>
Cambridge CB2 0XY, U.K. www-structmed.cimr.cam.ac.uk


########################################################################

To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1<https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1>

This message was issued to members of
www.jiscmail.ac.uk/CCP4BB<http://www.jiscmail.ac.uk/CCP4BB>, a
mailing list hosted by www.jiscmail.ac.uk<http://www.jiscmail.ac.uk>,
terms & conditions are available at
https://www.jiscmail.ac.uk/policyandsecurity/<https://www.jiscmail.ac.uk/policyandsecurity/>
________________________________

Confidentiality Notice: This message is private and may contain
confidential and proprietary information. If you have received this
message in error, please notify us and remove it from your system and
note that you must not copy, distribute or take any action in
reliance on it. Any unauthorized use or disclosure of the contents of
this message is not permitted and may be unlawful.

########################################################################

To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a
mailing list hosted by www.jiscmail.ac.uk, terms & conditions are
available at https://www.jiscmail.ac.uk/policyandsecurity/



########################################################################

To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/

########################################################################

To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/

Reply via email to