Re: [ccp4bb] Processing compressed diffraction images?

Phil Evans Thu, 06 May 2010 05:46:55 -0700

Compression methods such as gzip are unlikely to be optimum for diffraction 
images, and AFAIK the methods in CBF are better (I think Jim Pflugrath did some 
races a long time ago, and I guess others have too). There is no reason for 
data acquisition software ever to write uncompressed images (let alone having 
57 different ways of doing it)


Phil

On 6 May 2010, at 13:38, Ian Tickle wrote:

> Hi Harry
> 
> Thanks for the info.  Speed of compression is not an issue I think
> since compression & backing up of the images are done asynchronously
> with data collection, and currently backing up easily keeps up, so I
> think compression straight to the backup disk would too.  As you saw
> from my reply to Tim my compression factor of 10 was a bit optimistic,
> for images with spots on them (!) it's more like 2 or 3 with gzip, as
> you say.
> 
> I found an old e-mail from James Holton where he suggested lossy
> compression for diffraction images (as long as it didn't change the
> F's significantly!) - I'm not sure whether anything came of that!
> 
> Cheers
> 
> -- Ian
> 
> On Thu, May 6, 2010 at 2:04 PM, Harry Powell <ha...@mrc-lmb.cam.ac.uk> wrote:
>> Hi Ian
>> 
>> I've looked briefly at implementing gunzip in Mosflm  in the past, but never 
>> really pursued it. It could probably be done when I have some free time, but 
>> who knows when that will be? gzip'ing one of my standard test sets gives 
>> around a 40-50% reduction in size, bzip2 ~60-70%. The speed of doing the 
>> compression is important too, and is considerably slower than uncompressing 
>> (since  with uncompressing you know where you are going and have the 
>> instructions, whereas with compressing you have to find it all out as you 
>> proceed).
>> 
>> There are several ways of writing compressed images that (I believe) all the 
>> major processing packages have implemented - for example, Jan Pieter 
>> Abrahams has one which has been used for Mar images for a long time, and CBF 
>> has more than one. There are very good reasons for all detectors to write 
>> their images using CBFs with some kind of compression (I think that all new 
>> MX detectors at Diamond, for example, are required to be able to).
>> 
>> Pilatus images are written using a fast compressor and read (in Mosflm and 
>> XDS, anyway - I have no idea about d*Trek or HKL, but imagine they would do 
>> the job every bit as well) using a fast decompressor - so this goes some way 
>> towards dealing with that particular problem - the image files aren't as big 
>> as you'd expect from their physical size and 20-bit dynamic range (from the 
>> 6M they're roughly 6MB, rather than 6MB * 2.5). So that seems about as good 
>> as you'd get from bzip2 anyway.
>> 
>> I'd be somewhat surprised to see a non-lossy fast algorithm that could give 
>> you 10-fold compression with normal MX type images - the "empty" space 
>> between Bragg maxima is full of detail ("noise", "diffuse scatter"). If you 
>> had a truly flat background you could get much better compression, of course.
>> 
>> On 6 May 2010, at 11:24, Ian Tickle wrote:
>> 
>>> All -
>>> 
>>> No doubt this topic has come up before on the BB: I'd like to ask
>>> about the current capabilities of the various integration programs (in
>>> practice we use only MOSFLM & XDS) for reading compressed diffraction
>>> images from synchrotrons.  AFAICS XDS has limited support for reading
>>> compressed images (TIFF format from the MARCCD detector and CCP4
>>> compressed format from the Oxford Diffraction CCD); MOSFLM doesn't
>>> seem to support reading compressed images at all (I'm sure Harry will
>>> correct me if I'm wrong about this!).  I'm really thinking about
>>> gzipped files here: bzip2 no doubt gives marginally smaller files but
>>> is very slow.  Currently we bring back uncompressed images but it
>>> seems to me that this is not the most efficient way of doing things -
>>> or is it just that my expectation that it's more efficient to read
>>> compressed images and uncompress in memory not realised in practice?
>>> For example the AstexViewer molecular viewer software currently reads
>>> gzipped CCP4 maps directly and gunzips them in memory; this improves
>>> the response time by a modest factor of ~ 1.5, but this is because
>>> electron density maps are 'dense' from a compression point of view;
>>> X-ray diffraction images tend to have much more 'empty space' and the
>>> compression factor is usually considerably higher (as much as
>>> 10-fold).
>>> 
>>> On a recent trip we collected more data than we anticipated & the
>>> uncompressed data no longer fitted on our USB disk (the data is backed
>>> up to the USB disk as it's collected), so we would have definitely
>>> benefited from compression!  However file size is *not* the issue:
>>> disk space is cheap after all.  My point is that compressed images
>>> surely require much less disk I/O to read.  In this respect bringing
>>> back compressed images and then uncompressing back to a local disk
>>> completely defeats the object of compression - you actually more than
>>> double the I/O instead of reducing it!  We see this when we try to
>>> process the ~150 datasets that we bring back on our PC cluster and the
>>> disk I/O completely cripples the disk server machine (and everyone
>>> who's trying to use it at the same time!) unless we're careful to
>>> limit the number of simultaneous jobs.  When we routinely start to use
>>> the Pilatus detector on the beamlines this is going to be even more of
>>> an issue.  Basically we have plenty of processing power from the
>>> cluster: the disk I/O is the bottleneck.  Now you could argue that we
>>> should spread the load over more disks or maybe spend more on faster
>>> disk controllers, but the whole point about disks is they're cheap, we
>>> don't need the extra I/O bandwidth for anything else, and you
>>> shouldn't need to spend a fortune, particularly if there are ways of
>>> making the software more efficient, which after all will benefit
>>> everyone.
>>> 
>>> Cheers
>>> 
>>> -- Ian
>> 
>> Harry
>> --
>> Dr Harry Powell, MRC Laboratory of Molecular Biology, MRC Centre, Hills 
>> Road, Cambridge, CB2 0QH
>>

Re: [ccp4bb] Processing compressed diffraction images?

Reply via email to