I am absolutely delighted at the response I have gotten to my little "John Henry Challenge"! Three people already have managed to do the "impossible". Congratulations to George Sheldrick, Pavol Skubak and Raj Pannu for finding ways to improve the phases over the ones I originally obtained (using the default settings of mlphare and dm) and build their way out of it. This is quite useful information! At least it is to me.

Nevertheless, I do think Frances Reyes has a point. This was meant to be a map interpretation challenge, and not a SAD-phasing challenge. I appreciate that the two are linked, but the reason I did not initially provide the anomalous data is because I thought it would be too much to ask people to re-do all the phasing, etc. Yes, there do appear to be ways to improve the maps beyond the particular way I phased them, but no matter how good your phasing program is, there will always be a level of anomalous signal that will lead to phases that are "off" enough to make building the model "impossible". Basically, once the map gets bad enough that just as many "wrong" atoms get built in as "right" atoms, then there is no escape. However, I think human beings should still have an advantage when it comes to pattern recognition, and I remain curious to see if an insightful crystallographer can tip that balance in the right direction. I am also still curious to see if tweaking some setting on some automated building program will do that too. So, my original question remains: are automated building programs better than humans? Any human?

I therefore declare the John Henry Challenge still open.


But yes, improving the phases can tip the balance too, and the accuracy of the anomalous differences will ultimately affect the accuracy of the phases, and so on. This is a much broader challenge. And I think the best way to frame it is with the question:
"How low can the anomalous signal be before any conceivable approach fails?"
and perhaps:
"What is the best procedure to use for weak anomalous signal?"

For those who are interested in joining George, Pavol, Raj and others in this new challenge, the full spectrum of "difficulty" from trivial (100% Se incorporation) to a complete waste of time (0% Se, 100% S) is here:
http://bl831.als.lbl.gov/~jamesh/challenge/occ_scan/

The "impossible.mtz" for the John Henry Map Interpretation Challenge was derived from "frac0.79.mtz" and "possible.mtz" from "frac0.78.mtz". These simulated 31% and 32% Se incorporation into Met side chains (respectively). It has now been shown that both of these can be solved automatically if you do the phasing right. But what about frac0.80.mtz? Or frac0.90.mtz ? At least on this one "coordinate" of Se incorporation, the prowess of a particular approach can be given a "score". For example, a "score" of 0.78 means that the indicated procedure could solve the frac0.78.mtz dataset, but not the frac0.79.mtz dataset.

Based on the reports I have gotten back so far, the "difficulty score" lineup is:

score  method
0.86   xds, xscale, right sites, crank2 (Pavol Skubak)
0.78 xds, xscale, right sites, mlphare, dm, phenix.autobuild using 20 models (James Holton) 0.75 xds, xscale, right sites, mlphare, dm, buccaneer/refmac/dm (James Holton)
0.71   xds, xscale, right sites, mlphare, dm, ARP/wARP 7.3 (James Holton)
0.51   xds, xscale, right sites, mlphare, dm, ARP/wARP 6.1.1 (James Holton)

Note that all of these attempts "cheated" on the sites. Finding the sites seems to be harder than solving the structure once you've got them. That lineup is:

score  method
0.82 cheating: xds, xscale, right phases, anomalous difference Fourier (James Holton)
0.79   xds, xscale, shelxc/d/e 3.5A NTRY=10000 (George Sheldrick)
0.74   xds, autorickshaw (Santosh Panjikar)
0.65    xds, xscale, phenix.hyss --search=full (James Holton)
0.60   xds, xscale, shelxc/d with NTRY=100 (James Holton)

Where again the "score" is the dataset where the heavy atom site constellation found is close enough to the "right" one to move forward. This transition, like the model-building one, is remarkably sharp, particularly if you let each step run for a lot of cycles. The graph for model-building is here:
http://bl831.als.lbl.gov/~jamesh/challenge/build_CC_vs_frac.png
Note how the final map quality is pretty much independent of the initial map quality, up to the point where it all goes wrong. I think this again is an example of the solution needing to be at least "half right" before it can be improved. But perhaps someone can prove me wrong on that one?

For those who want the unmerged data, I have all the XDS_ASCII.HKL files here:
http://bl831.als.lbl.gov/~jamesh/challenge/occ_scan/XDS_ASCII.tgz

If you'd like to go all the way back to the images, you can get them from here:
http://bl831.als.lbl.gov/~jamesh/workshop2/
the "badsignal" dataset is what produced frac1.00.mtz, and "goodsignal" produced frac0.00.mtz. You can generate anything in between using the provided img_mix.com script.


Oh, and when it comes to how useful it is to spend "weeks" building manually into a bad map, well I suppose that does indeed depend on what alternatives you have and on the science you are trying to do. I agree that it is always better to have better data, but if you spend too much time trying to improve your crystals and waiting for your next beam time, then somebody else who IS willing to build into dodgy maps will probably do the science for you and publish it first. So, I think in a world of competitive grant renewals it really is critically important to know at what point it actually is "impossible" to solve the structure, as opposed to a situation where trying some new procedure (or collaborator) might be the way to go. I'd like to be able to answer that question for my users, and that's why I'm doing this.

-James Holton
MAD Scientist


On 1/13/2013 12:11 PM, jens Preben Morth wrote:
I agree with Tassos, and btw think that this crystallographer, should be able to go back into the lab and optimize the present crystal conditions to get better crystals. In particularly, when he or she realize that the scientific question they set out to investigate cannot be answered, by analyzing the final structure, with the available data quality.
Preben


On 1/13/13 8:52 PM, Anastassis Perrakis wrote:
I think the real challenge (and one that makes for an excellent macromolecular crystallographer) is how well one can interpret a map with poor phases.
Let me disagree ... An excellent macromolecular crystallographer, is one that given some crystals can derive the best strategy to collect data, process the data optimally, derive phases using all available information, build a model and refine it in such a way that it best explains both data
and geometrical expectations, and do these as efficiently as possible.

Efficiency may suggest using one automated suite or another - or indeed may best be achieved by manual labor - be it in the map or in data collection strategy or refinement or another step: and here I am ignoring the art of transforming hair-needle-crystalline-like-dingbits to a diffracting crystal.

One that can interpret a map with poor phases can be either a genius in 3d orientation - or a not necessarily too intelligent nor experienced but determined student that can drink and breathe this map for a few weeks in a row until a solution is in place. Neither would make an excellent macromolecular crystallographer by necessity.

Tassos

Reply via email to