All -

No doubt this topic has come up before on the BB: I'd like to ask
about the current capabilities of the various integration programs (in
practice we use only MOSFLM & XDS) for reading compressed diffraction
images from synchrotrons.  AFAICS XDS has limited support for reading
compressed images (TIFF format from the MARCCD detector and CCP4
compressed format from the Oxford Diffraction CCD); MOSFLM doesn't
seem to support reading compressed images at all (I'm sure Harry will
correct me if I'm wrong about this!).  I'm really thinking about
gzipped files here: bzip2 no doubt gives marginally smaller files but
is very slow.  Currently we bring back uncompressed images but it
seems to me that this is not the most efficient way of doing things -
or is it just that my expectation that it's more efficient to read
compressed images and uncompress in memory not realised in practice?
For example the AstexViewer molecular viewer software currently reads
gzipped CCP4 maps directly and gunzips them in memory; this improves
the response time by a modest factor of ~ 1.5, but this is because
electron density maps are 'dense' from a compression point of view;
X-ray diffraction images tend to have much more 'empty space' and the
compression factor is usually considerably higher (as much as
10-fold).

On a recent trip we collected more data than we anticipated & the
uncompressed data no longer fitted on our USB disk (the data is backed
up to the USB disk as it's collected), so we would have definitely
benefited from compression!  However file size is *not* the issue:
disk space is cheap after all.  My point is that compressed images
surely require much less disk I/O to read.  In this respect bringing
back compressed images and then uncompressing back to a local disk
completely defeats the object of compression - you actually more than
double the I/O instead of reducing it!  We see this when we try to
process the ~150 datasets that we bring back on our PC cluster and the
disk I/O completely cripples the disk server machine (and everyone
who's trying to use it at the same time!) unless we're careful to
limit the number of simultaneous jobs.  When we routinely start to use
the Pilatus detector on the beamlines this is going to be even more of
an issue.  Basically we have plenty of processing power from the
cluster: the disk I/O is the bottleneck.  Now you could argue that we
should spread the load over more disks or maybe spend more on faster
disk controllers, but the whole point about disks is they're cheap, we
don't need the extra I/O bandwidth for anything else, and you
shouldn't need to spend a fortune, particularly if there are ways of
making the software more efficient, which after all will benefit
everyone.

Cheers

-- Ian

Reply via email to