Dear all,
Apologies for a lengthy email in a lengthy chain of emails.
I think Jacob did here a good job refocusing the question. I will try
to answer it in a rather simplistic manner,
but from the view point of somebody who might only have relatively
little time in the field, but has enjoyed the
privilege of seeing it both from the developer and from the user
perspective, as well as from environments
as the synchrotron service-oriented sites, as well as from a cancer
hospital. I will only claim my weight=1 obviously,
but I want to emphasize that where you stand influences your
perspective.
let me first present the background that shapes my views.
<you can skip this>
When we started with ARP/wARP (two decades for Victor and getting
pretty close for myself!), we (like others) hardly
had the benefit of large datasets. We had some friends that gladly
donated their data to us to play with,
and we have assembled enough data to aid our primitive efforts back
then. The same holds true for many.
At some point, around 2002, we started XtalDepot with Serge Cohen: the
idea was to systematically collect phased data,
moving one step away from HKL F/SigF to include either HLA/B/C/D or
the search model for the molecular replacement solution.
Despite several calls, that archive only acquired around hundred
structures, and yesterday morning was taken off-line
as it was not useful any more and was not visited by anyone any more.
Very likely, our effort was redundant because of the JCSG
dataset, which has been used by many and many people who are grateful
for it (I guess the 'almost every talk' of Frank refers to me,
I have never used the JCSG set).
Lately, I am involved to the PDB_REDO project, who was pioneered by
Gert Vriend and Robbie Joosten (who is now in my lab).
Thanks to Gerard K. EDS clean-up and subsequent effort of both Robbie
and Garib who made gadzillions of fixes to refmac,
now we can not only make maps of PDB entries, but also refine them -
all but less than 100 structures. That has costed a significant part of
the last four-five years of Robbie's life (and has received limited
appreciation from editors of 'important' journals and from referees of
our grants).
</you can skip this>
These experiences are what shapes my view, and my train of thought
goes like this:
The PDB collected F/sigF, and to be able to really use them to get
maps first, to re-refine later, and re-build now, has received rather
limited attention. It starts to have impact to some fields, mostly to
modeling efforts and unlike referee nr.3 I strongly believe it
has a great potential for impact.
My team collected also phases, so did JCSG in a more successful and
consistent scale,
and that effort has been used indeed by developers to deliver better
benchmarking
of many software (to my knowledge it has escaped my attention if
anyone used JCSG data directly for eg by learning techniques,
but I apologize if I have missed that). This benchmarking of software,
based on 'real' maps for a rather limited set of data,
hundreds and not tens of thousands, was important enough anyway.
That leads me to conclude that archiving images is a good idea on a
voluntary basis. Somebody who needs it should convince the funding
bodies
to make the money available, and then take the effort to make the
infrastructure available. I would predict then 100-200 datasets would
be collected,
and that would really really help developers to make these important
new algorithms and software we all need. Thats a modest investment,
that can teach us a lot. One of the SG groups can make this effort and
most of us would support it, myself included.
Would such data help more than the developers? I doubt it. Is it
important to make such a resource available to developers? Absolutely?
What is the size of the resource needed? Limited to a few hundreds of
datasets, that can be curated and stored on a modest budget.
Talking about archiving in a PDB-scale might be fantastic in
principle, but it would require time and resources to a scale that
would not clearly stand the
cost-benefit trial, especially at times of austerity.
In contrast, a systematic effort of our community to deposit DNA in
existing databanks like AddGene.com, and annotate PDB entries with
such deposition
numbers, would be cheap, efficient, and could have far-reaching
implications for many people that could really get easily the DNA to
start studying
structures in the database. That would surely lead to new science,
because people interested enough in these structures to claim the DNA
and
'redo' the project would add new science. One can imagine even SG
centers offering such a service 'please redo structure X for this and
that reason',
for a fee that would represent the real costs, that must be low given
the investment already existing in experience and technology over
there - a subset
of targets could be on a 'request' basis...
Sorry for getting wild ... we can of course now have a referendum to
decide in the best curse of action! :-(
A.
PS Rob, you are of course right about sequencing costs, but I was only
trying to paint the bigger picture...
On Oct 31, 2011, at 18:00, Frank von Delft wrote:
"Loathe being forced to do things"? You mean, like being forced to
use
programs developed by others at no cost to yourself?
I'm in a bit of a time-warp here - how exactly do users think our
current suite of software got to be as astonishingly good as it is?
10
years ago people (non-developers) were saying exactly the same
things -
yet almost every talk on phasing and auto-building that I've heard
ends
up acknowledging the JCSG datasets.
Must have been a waste of time then, I suppose.
phx.
On 31/10/2011 16:29, Adrian Goldman wrote:
I have no problem with this idea as an opt-in. However I loathe
being forced to do things - for my own good or anyone else's. But
unless I read the tenor of this discussion completely wrongly, opt-
in is precisely what is not being proposed.
Adrian Goldman
Sent from my iPhone
On 31 Oct 2011, at 18:02, Jacob Keller<j-
kell...@fsm.northwestern.edu> wrote:
Dear Crystallographers,
I am sending this to try to start a thread which addresses only the
specific issue of whether to archive, at least as a start, images
corresponding to PDB-deposited structures. I believe there could
be a
real consensus about the low cost and usefulness of this degree of
archiving, but the discussion keeps swinging around to all levels of
archiving, obfuscating who's for what and for what reason. What
about
this level, alone? All of the accompanying info is already entered
into the PDB, so there would be no additional costs on that score.
There could just be a simple link, added to the "download files"
pulldown, which could say "go to image archive," or something along
those lines. Images would be pre-zipped, maybe even tarred, and
people
could just download from there. What's so bad?
The benefits are that sometimes there are structures in which
resolution cutoffs might be unreasonable, or perhaps there is some
potential radiation damage in the later frames that might be
deleterious to interpretations, or perhaps there are ugly features
in
the images which are invisible or obscure in the statistics.
In any case, it seems to me that this step would be pretty painless,
as it is merely an extension of the current system--just add a
link to
the pulldown menu!
Best Regards,
Jacob Keller
--
*******************************************
Jacob Pearson Keller
Northwestern University
Medical Scientist Training Program
email: j-kell...@northwestern.edu
*******************************************
P please don't print this e-mail unless you really need to
Anastassis (Tassos) Perrakis, Principal Investigator / Staff Member
Department of Biochemistry (B8)
Netherlands Cancer Institute,
Dept. B8, 1066 CX Amsterdam, The Netherlands
Tel: +31 20 512 1951 Fax: +31 20 512 1954 Mobile / SMS: +31 6 28 597791