Compression methods such as gzip are unlikely to be optimum for diffraction images, and AFAIK the methods in CBF are better (I think Jim Pflugrath did some races a long time ago, and I guess others have too). There is no reason for data acquisition software ever to write uncompressed images (let alone having 57 different ways of doing it)
Phil On 6 May 2010, at 13:38, Ian Tickle wrote: > Hi Harry > > Thanks for the info. Speed of compression is not an issue I think > since compression & backing up of the images are done asynchronously > with data collection, and currently backing up easily keeps up, so I > think compression straight to the backup disk would too. As you saw > from my reply to Tim my compression factor of 10 was a bit optimistic, > for images with spots on them (!) it's more like 2 or 3 with gzip, as > you say. > > I found an old e-mail from James Holton where he suggested lossy > compression for diffraction images (as long as it didn't change the > F's significantly!) - I'm not sure whether anything came of that! > > Cheers > > -- Ian > > On Thu, May 6, 2010 at 2:04 PM, Harry Powell <ha...@mrc-lmb.cam.ac.uk> wrote: >> Hi Ian >> >> I've looked briefly at implementing gunzip in Mosflm in the past, but never >> really pursued it. It could probably be done when I have some free time, but >> who knows when that will be? gzip'ing one of my standard test sets gives >> around a 40-50% reduction in size, bzip2 ~60-70%. The speed of doing the >> compression is important too, and is considerably slower than uncompressing >> (since with uncompressing you know where you are going and have the >> instructions, whereas with compressing you have to find it all out as you >> proceed). >> >> There are several ways of writing compressed images that (I believe) all the >> major processing packages have implemented - for example, Jan Pieter >> Abrahams has one which has been used for Mar images for a long time, and CBF >> has more than one. There are very good reasons for all detectors to write >> their images using CBFs with some kind of compression (I think that all new >> MX detectors at Diamond, for example, are required to be able to). >> >> Pilatus images are written using a fast compressor and read (in Mosflm and >> XDS, anyway - I have no idea about d*Trek or HKL, but imagine they would do >> the job every bit as well) using a fast decompressor - so this goes some way >> towards dealing with that particular problem - the image files aren't as big >> as you'd expect from their physical size and 20-bit dynamic range (from the >> 6M they're roughly 6MB, rather than 6MB * 2.5). So that seems about as good >> as you'd get from bzip2 anyway. >> >> I'd be somewhat surprised to see a non-lossy fast algorithm that could give >> you 10-fold compression with normal MX type images - the "empty" space >> between Bragg maxima is full of detail ("noise", "diffuse scatter"). If you >> had a truly flat background you could get much better compression, of course. >> >> On 6 May 2010, at 11:24, Ian Tickle wrote: >> >>> All - >>> >>> No doubt this topic has come up before on the BB: I'd like to ask >>> about the current capabilities of the various integration programs (in >>> practice we use only MOSFLM & XDS) for reading compressed diffraction >>> images from synchrotrons. AFAICS XDS has limited support for reading >>> compressed images (TIFF format from the MARCCD detector and CCP4 >>> compressed format from the Oxford Diffraction CCD); MOSFLM doesn't >>> seem to support reading compressed images at all (I'm sure Harry will >>> correct me if I'm wrong about this!). I'm really thinking about >>> gzipped files here: bzip2 no doubt gives marginally smaller files but >>> is very slow. Currently we bring back uncompressed images but it >>> seems to me that this is not the most efficient way of doing things - >>> or is it just that my expectation that it's more efficient to read >>> compressed images and uncompress in memory not realised in practice? >>> For example the AstexViewer molecular viewer software currently reads >>> gzipped CCP4 maps directly and gunzips them in memory; this improves >>> the response time by a modest factor of ~ 1.5, but this is because >>> electron density maps are 'dense' from a compression point of view; >>> X-ray diffraction images tend to have much more 'empty space' and the >>> compression factor is usually considerably higher (as much as >>> 10-fold). >>> >>> On a recent trip we collected more data than we anticipated & the >>> uncompressed data no longer fitted on our USB disk (the data is backed >>> up to the USB disk as it's collected), so we would have definitely >>> benefited from compression! However file size is *not* the issue: >>> disk space is cheap after all. My point is that compressed images >>> surely require much less disk I/O to read. In this respect bringing >>> back compressed images and then uncompressing back to a local disk >>> completely defeats the object of compression - you actually more than >>> double the I/O instead of reducing it! We see this when we try to >>> process the ~150 datasets that we bring back on our PC cluster and the >>> disk I/O completely cripples the disk server machine (and everyone >>> who's trying to use it at the same time!) unless we're careful to >>> limit the number of simultaneous jobs. When we routinely start to use >>> the Pilatus detector on the beamlines this is going to be even more of >>> an issue. Basically we have plenty of processing power from the >>> cluster: the disk I/O is the bottleneck. Now you could argue that we >>> should spread the load over more disks or maybe spend more on faster >>> disk controllers, but the whole point about disks is they're cheap, we >>> don't need the extra I/O bandwidth for anything else, and you >>> shouldn't need to spend a fortune, particularly if there are ways of >>> making the software more efficient, which after all will benefit >>> everyone. >>> >>> Cheers >>> >>> -- Ian >> >> Harry >> -- >> Dr Harry Powell, MRC Laboratory of Molecular Biology, MRC Centre, Hills >> Road, Cambridge, CB2 0QH >>