After Richard Reid more than 100 million people each year
have to have their shoes examined and one effect is that older
buildings like Heathrow Terminal 3 is the most painful place on earth,
the cost of someone trying light their shoelaces has affect us all.


The discussion on archiving image data sets -
 I guess that less than 1% of the image sets for PDB entries
   are useful to software development (and can be got privately)
 I guess that maybe 1 in 10,000 entries have a series problem that
   may require referees to look at the images (and can be
   accessed upon demand)


The cost of disks for your PC - kitchen table disks from a supermarket,
may be $1 per Gbyte on USB i/o but an archive centre required to maintain
the data will probably need RAID 0/1 - RAID 10, this has high performance,
and highest data protection, i.e. can tolerate multiple drive failures,
but has high redundancy cost overhead, if you havent noticed a large
collection of disks has failures. Look up the problems that the series
of Landsat satellites have had from 1980 onwards with the problems arising
out of the volume of data and the short life of computer compatible tapes
and optical discs. Archiving data lacks glamour it’s the boring day to day
rectification and storage of information, very little money gets spent on
this task,for remote sensing the most significant cost is
transmission/correction and archiving the data - Three semi-trailer loads
of Landsat tapes were found (literally) moldering in a damp basement in
Baltimore after people and funding agencies lost interest. Oh yes and
detectors change every 5 years and processing software gets lost.

At the EBI before we even get a single disk we pay £100,000 for a cabinet
- disks cost around £500 for 300gigbytes (and not the best disks these are
around the same cost for 146 Gigbytes). Disk technology changes every 5
years - an archive cost is to recover the data ever 5 years onto the next
generation of hardware. Molecular Biology and structure research is
carried
out by 1000's of groups not centrally by a single international
treaty setup of a telescope that is run centrally and financed to do
the data archiving. Molecular biology uses some in-house data collection,
most is carried at sync - despite the fact that there are many beamlines,
most data again is from less than 10 sites - these major synchrotron sites
are committed to data storage by various methods of Storage Hierarchy, and
a better solution to a central archive
is issuing a doi or set of doi's to the data associated with a PDB entry
and associating the doi with a PDB entry. Many countries have spent over
the last 5-7 years billion dollars on GRID and distributed data
storage - use this technology to leave the data where it is and
pick it up on demand. Googles solution to large datasets such as
single file tomograms - is to ship disks - there is no simple cheap
FTP/WWW solution to large datasets.

The cost of a central archive is several million dollars per year
to setup and run long term and who will pay - 40% of the pdb comes
from the USA (the biggest single contributor) but with the difficulting in
funding from the EU and national funding priorities is the USA to carry
this cost? Is the cost to be shared as in the table below? So far only the
USA, Japan and Europe (through UK, EU and EMBL), pays for the PDB.
The USA also pays for UniProt and other large scale data gathering
areas are carried out by nationally funded centres not by the large
number of individuals and countries that the PDB comes from.

The administration to get all the datasets is far higher than
the $1/gigabyte on a USB disk that is next to useless for an archive.
The costs of storage are rapidly decreasing but there has not been
a great change in Latencies and bandwidth - If everything gets
faster&cheaper at the same rate then nothing really changes i.e.
more structures are done.

Why inspect the shoes of every PDB entry and every structural biologist
when if we can detect the very rare suspect problem and get an agreed
course of action?

kim

PDB Depositions (1 January 1999 to 26 June 2007)
Country        1999 2000 2001 2002 2003 2004 2005 2006 2007 Total
ARGENTINA        0    0    0    0    0   2     1    6    7    16
AUSTRALIA       52   46   45   59   59   75   94   91   51   572
AUSTRIA         13    2    7    1    2   22   26   20    5    98
BELGIUM         29   28   41   24   38   27   36   50   29   302
BRAZIL           7    2   12   16   34   24   34   78   30   237
CANADA         109  117  131  115  157  185  280  334  183  1611
CHILE            0    1    0    0    0    1    2    0    0     4
CHINA           22   28   32   29   50   66  132  121   61   541
CROATIA          0    1    0    0    1    0    0    5    0     7
CZECH_REPUBLIC   2    1    4    6    5    4   12    3    4    41
CUBA             0    0    0    0    0    1    0    0    0     1
DENMARK         19   34   26   31   44   45   37   58    9   303
FINLAND         14   10   11   23   20   28   37   41   20   204
FRANCE         144  183  183  177  208  254  281  243  138  1811
GERMANY        198  234  222  207  263  315  343  436  220  2438
GREECE           6   20    8    7   17   12   16   12    8   106
HONG_KONG        2    3    7    3    7   11    5    8    9    55
HUNGARY          2    1    5    3    4    5    5    9    1    35
INDIA           35   39   45   71   67   86  112  174   65   694
IRELAND          0    2    1    0    1    2    3    7    0    16
ISRAEL          25   13   32   27   30   38   28   33   24   250
ITALY           35   56   80   80  115  100  127  118   54   765
JAPAN          150  220  240  279  528  702 1102  889 1119  5229
LITHUANIA        0    0    1    0    0    0    0    0    0     1
MEXICO           3    5    2    4    5    3    3    1    2    28
NETHERLANDS     42   20   28   21   32   34   29   30   18   254
NEW_ZEALAND     15   20   14   12   13   16   15   18   12   135
NORWAY          10    5    5   10   14    9   25   19   20   117
PAKISTAN         0    0    0    7    3    0    0    3    0    13
PERU             0    0    0    0    0    1    0    0    0     1
POLAND           3    4   16   10    5   17   11   23   10    99
PORTUGAL         8   15    7   10   15   19   14   10   11   109
RUSSIA           6    7    5    8   13   18   10   26   15   108
SINGAPORE        0    2    3    2   15   13   34   37   22   128
SLOVAKIA         0    0    4    3    2    5    1    0    1    16
SLOVENIJA        0    1    2    3    1    5    0    6    0    18
SOUTH_AFRICA     0    0    0    1    0    1    1    0    1     4
SOUTH_KOREA     43   27   30   34   66   56   61   90   43   450
SPAIN           27   36   38   34   33   54   70   81   34   407
SWEDEN          56   48   92   67   93   90  119  109   92   766
SWITZERLAND     49   29   29   35   53   46   58   98   29   426
TAWAIN           7   16   14   22   41   56   60   88   35   339
THAILAND         0    0    0    0    3    0    4    0    0     7
UNITED_KINGDOM 241  314  286  342  390  427  538  598  295  3431
UNITED_STATES 1148 1210 1322 1387 1765 2119 2295 2573 1425 15244
COMMERCIAL     173  156  169  284  465  363  467  576  276  2929
UNKNOWN         45    4    0    0    0    0    0    0    0    49
VENEZUELA        1    0    0    0    1    0    0    0    0     2
ORGANISATION    65   51   74   97  100  100  151  163   71   872
TOTAL         2806 3011 3273 3551 4778 5457 6679 7285 4449 41289

Reply via email to