I can build from the impossible.mtz data in the following two steps: 1. getting the SE substructure from anomalous difference map constructed from impossible.mtz
2. running "combined" model building using the substructure from step 1 and starting from the impossible.mtz map Only impossible.mtz and the sequence (which is probably not really necessary) is used in this solution. It is not a fully automatic solution - step 2 (model building combined with density modif. and phasing via a recently developed multivariate SAD function) was performed automatically using CRANK (which calls Buccaneer, REFMAC and Parrot), step 1 "manually" - using CCP4 tools (cfft and peakmax). Comparing to the deposited model, 96% of the mainchain is (correctly) built and 92% is (correctly) docked and R factor is 21% - clearly, the (relatively) weak anomalous signal is the only limitation in this case. However, the model building procedure did not struggle too much - I expect it would still work if the Se incorporation is decreased somewhat further (as long as the substructure can be obtained in some way). Of course, this is not a "pure" solution in the sense that I started from impossible.mtz rather than from scratch, ie from the data only. Obtaining the substructure from scratch might be more difficult. Pavol On Sat, Jan 12, 2013 at 10:50 PM, James Holton <jmhol...@lbl.gov> wrote: > > Woops! sorry folks. I made a mistake with the I(+)/I(-) entry. They had > the wrong axis convention relative to 3dko and the F in the same file. > Sorry about that. > > The files on the website now should be right. > http://bl831.als.lbl.gov/~jamesh/challenge/possible.mtz > http://bl831.als.lbl.gov/~jamesh/challenge/impossible.mtz > > md5 sums: > c4bdb32a08c884884229e8080228d166 impossible.mtz > caf05437132841b595be1c0dc1151123 possible.mtz > > -James Holton > MAD Scientist > > > On 1/12/2013 8:25 AM, James Holton wrote: > > > Fair enough! > > I have just now added DANO and I(+)/I(-) to the files. I'll be very > interested to see what you can come up with! For the record, the phases > therein came from running mlphare with default parameters but exactly the > correct heavy-atom constellation (all the sulfur atoms in 3dko), and then > running dm with default parameters. > > Yes, there are other ways to run mlphare and dm that give better phases, > but I was only able to determine those parameters by "cheating" (comparing > the resulting map to the right answer), so I don't think it is "fair" to > use those maps. > > I have had a few questions about what is "cheating" and what is not > cheating. I don't have a problem with the use of sequence information > because that actually is something that you realistically would know about > your protein when you sat down to collect data. The sequence of this > molecule is that of 3dko: > http://bl831.als.lbl.gov/~jamesh/challenge/seq.pir > > I also don't have a problem with anyone actually using an automation > program to _help_ them solve the "impossible" dataset as long as they can > explain what they did. Simply putting the above sequence into BALBES > would, of course, be cheating! I suppose one could try eliminating 3dko > and its "homologs" from the BALBES search, but that, in and of itself, is > perhaps relevant to the challenge: "what is the most distance homolog that > still allows you to solve the structure?". That, I think, is also a > stringent test of model-building skill. > > I have already tried ARP/wARP, phenix.autobuild and buccaneer/refmac. > With default parameters, all of these programs fail on both the "possible" > and "impossible" datasets. It was only with some substantial tweaking that > I found a way to get phenix.autobuild to crack the "possible" dataset > (using 20 models in parallel). I have not yet found a way to get any > automation program to build its way out of the "impossible" dataset. > Personally, I think that the breakthrough might be something like what Tom > Terwilliger mentioned. If you build a good enough starting set of atoms, > then I think an automation program should be able to take you the rest of > the way. If that is the case, then it means people like Tom who develop > such programs for us might be able to use that insight to improve the > software, and that is something that will benefit all of us. > > Or, it is entirely possible that I'm just not running the current software > properly! If so, I'd love it if someone who knows better (such as their > developers) could enlighten me. > > -James Holton > MAD Scientist > > On 1/12/2013 3:07 AM, Pavol Skubak wrote: > > > Dear James, > > your challenge in its current form ignores an important source > of information for model building that is available for your > simulated data - namely, it does not allow to use anomalous > phase information in the model building. In difficult cases on > the edge of success such as this one, this typically makes > the difference between building and not building. > > If you can make the F+/F- and Se substructure available, we > can test whether this is the case indeed. However, while I > expect this would push the challenge further significantly, > most likely you would be able to decrease the Se incorporation > of your simulated data further to such levels that the anomalous > signal is again no longer sufficient to build the structure. And > most likely, there would again exist an edge where a small > decrease in the Se incorporation would lead from a model built > to no model built. > > Best regards, > > -- > Pavol Skubak > Biophysical Structural Chemistry > Gorleaus Laboratories > Einsteinweg 55 > Leiden University > LEIDEN 2333CC > the Netherlands > tel: 0031715274414 > web: http://bsc.lic.leidenuniv.nl/people/skubak-0 > > > > -- Pavol Skubak Biophysical Structural Chemistry Gorleaus Laboratories Einsteinweg 55 Leiden University LEIDEN 2333CC the Netherlands tel: 0031715274414 web: http://bsc.lic.leidenuniv.nl/people/skubak-0