Re: [ccp4bb] Archiving Images for PDB Depositions

Anastassis Perrakis Tue, 01 Nov 2011 04:51:29 -0700

Dear all,

Apologies for a lengthy email in a lengthy chain of emails.

I think Jacob did here a good job refocusing the question. I will tryto answer it in a rather simplistic manner,but from the view point of somebody who might only have relativelylittle time in the field, but has enjoyed theprivilege of seeing it both from the developer and from the userperspective, as well as from environmentsas the synchrotron service-oriented sites, as well as from a cancerhospital. I will only claim my weight=1 obviously,but I want to emphasize that where you stand influences yourperspective.


let me first present the background that shapes my views.

<you can skip this>

When we started with ARP/wARP (two decades for Victor and gettingpretty close for myself!), we (like others) hardlyhad the benefit of large datasets. We had some friends that gladlydonated their data to us to play with,and we have assembled enough data to aid our primitive efforts backthen. The same holds true for many.

At some point, around 2002, we started XtalDepot with Serge Cohen: theidea was to systematically collect phased data,moving one step away from HKL F/SigF to include either HLA/B/C/D orthe search model for the molecular replacement solution.Despite several calls, that archive only acquired around hundredstructures, and yesterday morning was taken off-lineas it was not useful any more and was not visited by anyone any more.Very likely, our effort was redundant because of the JCSGdataset, which has been used by many and many people who are gratefulfor it (I guess the 'almost every talk' of Frank refers to me,

I have never used the JCSG set).

Lately, I am involved to the PDB_REDO project, who was pioneered byGert Vriend and Robbie Joosten (who is now in my lab).Thanks to Gerard K. EDS clean-up and subsequent effort of both Robbieand Garib who made gadzillions of fixes to refmac,now we can not only make maps of PDB entries, but also refine them -all but less than 100 structures. That has costed a significant part ofthe last four-five years of Robbie's life (and has received limitedappreciation from editors of 'important' journals and from referees ofour grants).


</you can skip this>

These experiences are what shapes my view, and my train of thoughtgoes like this:

The PDB collected F/sigF, and to be able to really use them to getmaps first, to re-refine later, and re-build now, has received ratherlimited attention. It starts to have impact to some fields, mostly tomodeling efforts and unlike referee nr.3 I strongly believe it

has a great potential for impact.

My team collected also phases, so did JCSG in a more successful andconsistent scale,and that effort has been used indeed by developers to deliver betterbenchmarkingof many software (to my knowledge it has escaped my attention ifanyone used JCSG data directly for eg by learning techniques,but I apologize if I have missed that). This benchmarking of software,based on 'real' maps for a rather limited set of data,

hundreds and not tens of thousands, was important enough anyway.

That leads me to conclude that archiving images is a good idea on avoluntary basis. Somebody who needs it should convince the fundingbodiesto make the money available, and then take the effort to make theinfrastructure available. I would predict then 100-200 datasets wouldbe collected,and that would really really help developers to make these importantnew algorithms and software we all need. Thats a modest investment,that can teach us a lot. One of the SG groups can make this effort andmost of us would support it, myself included.

Would such data help more than the developers? I doubt it. Is itimportant to make such a resource available to developers? Absolutely?What is the size of the resource needed? Limited to a few hundreds ofdatasets, that can be curated and stored on a modest budget.

Talking about archiving in a PDB-scale might be fantastic inprinciple, but it would require time and resources to a scale thatwould not clearly stand the

cost-benefit trial, especially at times of austerity.

In contrast, a systematic effort of our community to deposit DNA inexisting databanks like AddGene.com, and annotate PDB entries withsuch depositionnumbers, would be cheap, efficient, and could have far-reachingimplications for many people that could really get easily the DNA tostart studyingstructures in the database. That would surely lead to new science,because people interested enough in these structures to claim the DNAand'redo' the project would add new science. One can imagine even SGcenters offering such a service 'please redo structure X for this andthat reason',for a fee that would represent the real costs, that must be low giventhe investment already existing in experience and technology overthere - a subset

of targets could be on a 'request' basis...

Sorry for getting wild ... we can of course now have a referendum todecide in the best curse of action! :-(

A.

PS Rob, you are of course right about sequencing costs, but I was onlytrying to paint the bigger picture...




On Oct 31, 2011, at 18:00, Frank von Delft wrote:

"Loathe being forced to do things"? You mean, like being forced touse
programs developed by others at no cost to yourself?

I'm in a bit of a time-warp here - how exactly do users think our
current suite of software got to be as astonishingly good as it is?10years ago people (non-developers) were saying exactly the samethings -yet almost every talk on phasing and auto-building that I've heardends
up acknowledging the JCSG datasets.

Must have been a waste of time then, I suppose.

phx.




On 31/10/2011 16:29, Adrian Goldman wrote:
I have no problem with this idea as an opt-in. However I loathebeing forced to do things - for my own good or anyone else's. Butunless I read the tenor of this discussion completely wrongly, opt-in is precisely what is not being proposed.
Adrian Goldman

Sent from my iPhone
On 31 Oct 2011, at 18:02, Jacob Keller<j-[email protected]> wrote:
Dear Crystallographers,

I am sending this to try to start a thread which addresses only the
specific issue of whether to archive, at least as a start, images
corresponding to PDB-deposited structures. I believe there couldbe a
real consensus about the low cost and usefulness of this degree of
archiving, but the discussion keeps swinging around to all levels of
archiving, obfuscating who's for what and for what reason. Whatabout
this level, alone? All of the accompanying info is already entered
into the PDB, so there would be no additional costs on that score.
There could just be a simple link, added to the "download files"
pulldown, which could say "go to image archive," or something along
those lines. Images would be pre-zipped, maybe even tarred, andpeople
could just download from there. What's so bad?

The benefits are that sometimes there are structures in which
resolution cutoffs might be unreasonable, or perhaps there is some
potential radiation damage in the later frames that might be
deleterious to interpretations, or perhaps there are ugly featuresin
the images which are invisible or obscure in the statistics.

In any case, it seems to me that this step would be pretty painless,
as it is merely an extension of the current system--just add alink to
the pulldown menu!

Best Regards,

Jacob Keller

--
*******************************************
Jacob Pearson Keller
Northwestern University
Medical Scientist Training Program
email: [email protected]
*******************************************


P please don't print this e-mail unless you really need to
Anastassis (Tassos) Perrakis, Principal Investigator / Staff Member
Department of Biochemistry (B8)
Netherlands Cancer Institute,
Dept. B8, 1066 CX Amsterdam, The Netherlands
Tel: +31 20 512 1951 Fax: +31 20 512 1954 Mobile / SMS: +31 6 28 597791

Re: [ccp4bb] Archiving Images for PDB Depositions

Reply via email to