Hi Ian

I've looked briefly at implementing gunzip in Mosflm  in the past, but never 
really pursued it. It could probably be done when I have some free time, but 
who knows when that will be? gzip'ing one of my standard test sets gives around 
a 40-50% reduction in size, bzip2 ~60-70%. The speed of doing the compression 
is important too, and is considerably slower than uncompressing (since  with 
uncompressing you know where you are going and have the instructions, whereas 
with compressing you have to find it all out as you proceed).

There are several ways of writing compressed images that (I believe) all the 
major processing packages have implemented - for example, Jan Pieter Abrahams 
has one which has been used for Mar images for a long time, and CBF has more 
than one. There are very good reasons for all detectors to write their images 
using CBFs with some kind of compression (I think that all new MX detectors at 
Diamond, for example, are required to be able to). 

Pilatus images are written using a fast compressor and read (in Mosflm and XDS, 
anyway - I have no idea about d*Trek or HKL, but imagine they would do the job 
every bit as well) using a fast decompressor - so this goes some way towards 
dealing with that particular problem - the image files aren't as big as you'd 
expect from their physical size and 20-bit dynamic range (from the 6M they're 
roughly 6MB, rather than 6MB * 2.5). So that seems about as good as you'd get 
from bzip2 anyway.

I'd be somewhat surprised to see a non-lossy fast algorithm that could give you 
10-fold compression with normal MX type images - the "empty" space between 
Bragg maxima is full of detail ("noise", "diffuse scatter"). If you had a truly 
flat background you could get much better compression, of course. 

On 6 May 2010, at 11:24, Ian Tickle wrote:

> All -
> 
> No doubt this topic has come up before on the BB: I'd like to ask
> about the current capabilities of the various integration programs (in
> practice we use only MOSFLM & XDS) for reading compressed diffraction
> images from synchrotrons.  AFAICS XDS has limited support for reading
> compressed images (TIFF format from the MARCCD detector and CCP4
> compressed format from the Oxford Diffraction CCD); MOSFLM doesn't
> seem to support reading compressed images at all (I'm sure Harry will
> correct me if I'm wrong about this!).  I'm really thinking about
> gzipped files here: bzip2 no doubt gives marginally smaller files but
> is very slow.  Currently we bring back uncompressed images but it
> seems to me that this is not the most efficient way of doing things -
> or is it just that my expectation that it's more efficient to read
> compressed images and uncompress in memory not realised in practice?
> For example the AstexViewer molecular viewer software currently reads
> gzipped CCP4 maps directly and gunzips them in memory; this improves
> the response time by a modest factor of ~ 1.5, but this is because
> electron density maps are 'dense' from a compression point of view;
> X-ray diffraction images tend to have much more 'empty space' and the
> compression factor is usually considerably higher (as much as
> 10-fold).
> 
> On a recent trip we collected more data than we anticipated & the
> uncompressed data no longer fitted on our USB disk (the data is backed
> up to the USB disk as it's collected), so we would have definitely
> benefited from compression!  However file size is *not* the issue:
> disk space is cheap after all.  My point is that compressed images
> surely require much less disk I/O to read.  In this respect bringing
> back compressed images and then uncompressing back to a local disk
> completely defeats the object of compression - you actually more than
> double the I/O instead of reducing it!  We see this when we try to
> process the ~150 datasets that we bring back on our PC cluster and the
> disk I/O completely cripples the disk server machine (and everyone
> who's trying to use it at the same time!) unless we're careful to
> limit the number of simultaneous jobs.  When we routinely start to use
> the Pilatus detector on the beamlines this is going to be even more of
> an issue.  Basically we have plenty of processing power from the
> cluster: the disk I/O is the bottleneck.  Now you could argue that we
> should spread the load over more disks or maybe spend more on faster
> disk controllers, but the whole point about disks is they're cheap, we
> don't need the extra I/O bandwidth for anything else, and you
> shouldn't need to spend a fortune, particularly if there are ways of
> making the software more efficient, which after all will benefit
> everyone.
> 
> Cheers
> 
> -- Ian

Harry
--
Dr Harry Powell, MRC Laboratory of Molecular Biology, MRC Centre, Hills Road, 
Cambridge, CB2 0QH

Reply via email to