Hi Harry Thanks for the info. Speed of compression is not an issue I think since compression & backing up of the images are done asynchronously with data collection, and currently backing up easily keeps up, so I think compression straight to the backup disk would too. As you saw from my reply to Tim my compression factor of 10 was a bit optimistic, for images with spots on them (!) it's more like 2 or 3 with gzip, as you say.
I found an old e-mail from James Holton where he suggested lossy compression for diffraction images (as long as it didn't change the F's significantly!) - I'm not sure whether anything came of that! Cheers -- Ian On Thu, May 6, 2010 at 2:04 PM, Harry Powell <[email protected]> wrote: > Hi Ian > > I've looked briefly at implementing gunzip in Mosflm in the past, but never > really pursued it. It could probably be done when I have some free time, but > who knows when that will be? gzip'ing one of my standard test sets gives > around a 40-50% reduction in size, bzip2 ~60-70%. The speed of doing the > compression is important too, and is considerably slower than uncompressing > (since with uncompressing you know where you are going and have the > instructions, whereas with compressing you have to find it all out as you > proceed). > > There are several ways of writing compressed images that (I believe) all the > major processing packages have implemented - for example, Jan Pieter Abrahams > has one which has been used for Mar images for a long time, and CBF has more > than one. There are very good reasons for all detectors to write their images > using CBFs with some kind of compression (I think that all new MX detectors > at Diamond, for example, are required to be able to). > > Pilatus images are written using a fast compressor and read (in Mosflm and > XDS, anyway - I have no idea about d*Trek or HKL, but imagine they would do > the job every bit as well) using a fast decompressor - so this goes some way > towards dealing with that particular problem - the image files aren't as big > as you'd expect from their physical size and 20-bit dynamic range (from the > 6M they're roughly 6MB, rather than 6MB * 2.5). So that seems about as good > as you'd get from bzip2 anyway. > > I'd be somewhat surprised to see a non-lossy fast algorithm that could give > you 10-fold compression with normal MX type images - the "empty" space > between Bragg maxima is full of detail ("noise", "diffuse scatter"). If you > had a truly flat background you could get much better compression, of course. > > On 6 May 2010, at 11:24, Ian Tickle wrote: > >> All - >> >> No doubt this topic has come up before on the BB: I'd like to ask >> about the current capabilities of the various integration programs (in >> practice we use only MOSFLM & XDS) for reading compressed diffraction >> images from synchrotrons. AFAICS XDS has limited support for reading >> compressed images (TIFF format from the MARCCD detector and CCP4 >> compressed format from the Oxford Diffraction CCD); MOSFLM doesn't >> seem to support reading compressed images at all (I'm sure Harry will >> correct me if I'm wrong about this!). I'm really thinking about >> gzipped files here: bzip2 no doubt gives marginally smaller files but >> is very slow. Currently we bring back uncompressed images but it >> seems to me that this is not the most efficient way of doing things - >> or is it just that my expectation that it's more efficient to read >> compressed images and uncompress in memory not realised in practice? >> For example the AstexViewer molecular viewer software currently reads >> gzipped CCP4 maps directly and gunzips them in memory; this improves >> the response time by a modest factor of ~ 1.5, but this is because >> electron density maps are 'dense' from a compression point of view; >> X-ray diffraction images tend to have much more 'empty space' and the >> compression factor is usually considerably higher (as much as >> 10-fold). >> >> On a recent trip we collected more data than we anticipated & the >> uncompressed data no longer fitted on our USB disk (the data is backed >> up to the USB disk as it's collected), so we would have definitely >> benefited from compression! However file size is *not* the issue: >> disk space is cheap after all. My point is that compressed images >> surely require much less disk I/O to read. In this respect bringing >> back compressed images and then uncompressing back to a local disk >> completely defeats the object of compression - you actually more than >> double the I/O instead of reducing it! We see this when we try to >> process the ~150 datasets that we bring back on our PC cluster and the >> disk I/O completely cripples the disk server machine (and everyone >> who's trying to use it at the same time!) unless we're careful to >> limit the number of simultaneous jobs. When we routinely start to use >> the Pilatus detector on the beamlines this is going to be even more of >> an issue. Basically we have plenty of processing power from the >> cluster: the disk I/O is the bottleneck. Now you could argue that we >> should spread the load over more disks or maybe spend more on faster >> disk controllers, but the whole point about disks is they're cheap, we >> don't need the extra I/O bandwidth for anything else, and you >> shouldn't need to spend a fortune, particularly if there are ways of >> making the software more efficient, which after all will benefit >> everyone. >> >> Cheers >> >> -- Ian > > Harry > -- > Dr Harry Powell, MRC Laboratory of Molecular Biology, MRC Centre, Hills Road, > Cambridge, CB2 0QH >
