Hi Harry

Thanks for the info.  Speed of compression is not an issue I think
since compression & backing up of the images are done asynchronously
with data collection, and currently backing up easily keeps up, so I
think compression straight to the backup disk would too.  As you saw
from my reply to Tim my compression factor of 10 was a bit optimistic,
for images with spots on them (!) it's more like 2 or 3 with gzip, as
you say.

I found an old e-mail from James Holton where he suggested lossy
compression for diffraction images (as long as it didn't change the
F's significantly!) - I'm not sure whether anything came of that!

Cheers

-- Ian

On Thu, May 6, 2010 at 2:04 PM, Harry Powell <[email protected]> wrote:
> Hi Ian
>
> I've looked briefly at implementing gunzip in Mosflm  in the past, but never 
> really pursued it. It could probably be done when I have some free time, but 
> who knows when that will be? gzip'ing one of my standard test sets gives 
> around a 40-50% reduction in size, bzip2 ~60-70%. The speed of doing the 
> compression is important too, and is considerably slower than uncompressing 
> (since  with uncompressing you know where you are going and have the 
> instructions, whereas with compressing you have to find it all out as you 
> proceed).
>
> There are several ways of writing compressed images that (I believe) all the 
> major processing packages have implemented - for example, Jan Pieter Abrahams 
> has one which has been used for Mar images for a long time, and CBF has more 
> than one. There are very good reasons for all detectors to write their images 
> using CBFs with some kind of compression (I think that all new MX detectors 
> at Diamond, for example, are required to be able to).
>
> Pilatus images are written using a fast compressor and read (in Mosflm and 
> XDS, anyway - I have no idea about d*Trek or HKL, but imagine they would do 
> the job every bit as well) using a fast decompressor - so this goes some way 
> towards dealing with that particular problem - the image files aren't as big 
> as you'd expect from their physical size and 20-bit dynamic range (from the 
> 6M they're roughly 6MB, rather than 6MB * 2.5). So that seems about as good 
> as you'd get from bzip2 anyway.
>
> I'd be somewhat surprised to see a non-lossy fast algorithm that could give 
> you 10-fold compression with normal MX type images - the "empty" space 
> between Bragg maxima is full of detail ("noise", "diffuse scatter"). If you 
> had a truly flat background you could get much better compression, of course.
>
> On 6 May 2010, at 11:24, Ian Tickle wrote:
>
>> All -
>>
>> No doubt this topic has come up before on the BB: I'd like to ask
>> about the current capabilities of the various integration programs (in
>> practice we use only MOSFLM & XDS) for reading compressed diffraction
>> images from synchrotrons.  AFAICS XDS has limited support for reading
>> compressed images (TIFF format from the MARCCD detector and CCP4
>> compressed format from the Oxford Diffraction CCD); MOSFLM doesn't
>> seem to support reading compressed images at all (I'm sure Harry will
>> correct me if I'm wrong about this!).  I'm really thinking about
>> gzipped files here: bzip2 no doubt gives marginally smaller files but
>> is very slow.  Currently we bring back uncompressed images but it
>> seems to me that this is not the most efficient way of doing things -
>> or is it just that my expectation that it's more efficient to read
>> compressed images and uncompress in memory not realised in practice?
>> For example the AstexViewer molecular viewer software currently reads
>> gzipped CCP4 maps directly and gunzips them in memory; this improves
>> the response time by a modest factor of ~ 1.5, but this is because
>> electron density maps are 'dense' from a compression point of view;
>> X-ray diffraction images tend to have much more 'empty space' and the
>> compression factor is usually considerably higher (as much as
>> 10-fold).
>>
>> On a recent trip we collected more data than we anticipated & the
>> uncompressed data no longer fitted on our USB disk (the data is backed
>> up to the USB disk as it's collected), so we would have definitely
>> benefited from compression!  However file size is *not* the issue:
>> disk space is cheap after all.  My point is that compressed images
>> surely require much less disk I/O to read.  In this respect bringing
>> back compressed images and then uncompressing back to a local disk
>> completely defeats the object of compression - you actually more than
>> double the I/O instead of reducing it!  We see this when we try to
>> process the ~150 datasets that we bring back on our PC cluster and the
>> disk I/O completely cripples the disk server machine (and everyone
>> who's trying to use it at the same time!) unless we're careful to
>> limit the number of simultaneous jobs.  When we routinely start to use
>> the Pilatus detector on the beamlines this is going to be even more of
>> an issue.  Basically we have plenty of processing power from the
>> cluster: the disk I/O is the bottleneck.  Now you could argue that we
>> should spread the load over more disks or maybe spend more on faster
>> disk controllers, but the whole point about disks is they're cheap, we
>> don't need the extra I/O bandwidth for anything else, and you
>> shouldn't need to spend a fortune, particularly if there are ways of
>> making the software more efficient, which after all will benefit
>> everyone.
>>
>> Cheers
>>
>> -- Ian
>
> Harry
> --
> Dr Harry Powell, MRC Laboratory of Molecular Biology, MRC Centre, Hills Road, 
> Cambridge, CB2 0QH
>

Reply via email to