Re: [ccp4bb] a challenge

Pavol Skubak Sat, 12 Jan 2013 16:37:23 -0800

I can build from the impossible.mtz data in the following two steps:

1. getting the SE substructure from anomalous difference map
constructed from impossible.mtz


2. running "combined" model building using the substructure
from step 1 and starting from the impossible.mtz map

Only impossible.mtz and the sequence (which is probably not
really necessary) is used in this solution.

It is not a fully automatic solution - step 2 (model building
combined with density modif. and phasing via a recently
developed multivariate SAD function) was performed
automatically using CRANK (which calls Buccaneer, REFMAC
and Parrot), step 1 "manually" - using CCP4 tools (cfft and
peakmax).

Comparing to the deposited model, 96% of the mainchain is
(correctly) built and 92% is (correctly) docked and R factor
is 21% - clearly, the (relatively) weak anomalous signal is the
only limitation in this case. However, the model building
procedure did not struggle too much - I expect it would still
work if the Se incorporation is decreased somewhat further
(as long as the substructure can be obtained in some way).

Of course, this is not a "pure" solution in the sense that
I started from impossible.mtz rather than from scratch, ie
from the data only. Obtaining the substructure from scratch
might be more difficult.

Pavol


On Sat, Jan 12, 2013 at 10:50 PM, James Holton <[email protected]> wrote:

>
> Woops!  sorry folks.  I made a mistake with the I(+)/I(-) entry.  They had
> the wrong axis convention relative to 3dko and the F in the same file.
> Sorry about that.
>
> The files on the website now should be right.
> http://bl831.als.lbl.gov/~jamesh/challenge/possible.mtz
> http://bl831.als.lbl.gov/~jamesh/challenge/impossible.mtz
>
> md5 sums:
> c4bdb32a08c884884229e8080228d166  impossible.mtz
> caf05437132841b595be1c0dc1151123  possible.mtz
>
> -James Holton
> MAD Scientist
>
>
> On 1/12/2013 8:25 AM, James Holton wrote:
>
>
> Fair enough!
>
> I have just now added DANO  and I(+)/I(-) to the files.  I'll be very
> interested to see what you can come up with!  For the record, the phases
> therein came from running mlphare with default parameters but exactly the
> correct heavy-atom constellation (all the sulfur atoms in 3dko), and then
> running dm with default parameters.
>
> Yes, there are other ways to run mlphare and dm that give better phases,
> but I was only able to determine those parameters by "cheating" (comparing
> the resulting map to the right answer), so I don't think it is "fair" to
> use those maps.
>
> I have had a few questions about what is "cheating" and what is not
> cheating.  I don't have a problem with the use of sequence information
> because that actually is something that you realistically would know about
> your protein when you sat down to collect data.  The sequence of this
> molecule is that of 3dko:
> http://bl831.als.lbl.gov/~jamesh/challenge/seq.pir
>
>   I also don't have a problem with anyone actually using an automation
> program to _help_ them solve the "impossible" dataset as long as they can
> explain what they did.  Simply putting the above sequence into BALBES
> would, of course, be cheating!  I suppose one could try eliminating 3dko
> and its "homologs" from the BALBES search, but that, in and of itself, is
> perhaps relevant to the challenge: "what is the most distance homolog that
> still allows you to solve the structure?".  That, I think, is also a
> stringent test of model-building skill.
>
>   I have already tried ARP/wARP, phenix.autobuild and buccaneer/refmac.
> With default parameters, all of these programs fail on both the "possible"
> and "impossible" datasets.  It was only with some substantial tweaking that
> I found a way to get phenix.autobuild to crack the "possible" dataset
> (using 20 models in parallel).  I have not yet found a way to get any
> automation program to build its way out of the "impossible" dataset.
> Personally, I think that the breakthrough might be something like what Tom
> Terwilliger mentioned.  If you build a good enough starting set of atoms,
> then I think an automation program should be able to take you the rest of
> the way.  If that is the case, then it means people like Tom who develop
> such programs for us might be able to use that insight to improve the
> software, and that is something that will benefit all of us.
>
> Or, it is entirely possible that I'm just not running the current software
> properly!  If so, I'd love it if someone who knows better (such as their
> developers) could enlighten me.
>
> -James Holton
> MAD Scientist
>
> On 1/12/2013 3:07 AM, Pavol Skubak wrote:
>
>
>  Dear James,
>
>  your challenge in its current form ignores an important source
> of information for model building that is available for your
> simulated data - namely, it does not allow to use anomalous
> phase information in the model building. In difficult cases on
> the edge of success such as this one, this typically makes
> the difference between building and not building.
>
>  If you can make the F+/F- and Se substructure available, we
> can test whether this is the case indeed. However, while I
> expect this would push the challenge further significantly,
> most likely you would be able to decrease the Se incorporation
> of your simulated data further to such levels that the anomalous
> signal is again no longer sufficient to build the structure. And
> most likely, there would again exist an edge where a small
> decrease in the Se incorporation would lead from a model built
> to no model built.
>
>  Best regards,
>
>  --
> Pavol Skubak
> Biophysical Structural Chemistry
> Gorleaus Laboratories
> Einsteinweg 55
> Leiden University
> LEIDEN  2333CC
> the Netherlands
> tel: 0031715274414
> web: http://bsc.lic.leidenuniv.nl/people/skubak-0
>
>
>
>


-- 
Pavol Skubak
Biophysical Structural Chemistry
Gorleaus Laboratories
Einsteinweg 55
Leiden University
LEIDEN  2333CC
the Netherlands
tel: 0031715274414
web: http://bsc.lic.leidenuniv.nl/people/skubak-0

Re: [ccp4bb] a challenge

Reply via email to