Frank von Delft wrote:
Just looked at the algorithm, how it stores the average "non-spot"
through all the images.
What happens with dataset where the "non-spot" (e.g. background)
changes systematically through the dataset, i.e. anisotropic datasets
or thin crystals lying flat in a thin loop? How much worse is
compression for that?
Cheers
phx
Well, what will happen in that case (with the current "algorithm") is
that once a background pixel deviates from the median level by more than
4 "sigmas", it will start to get stored losslessly. Essentially, they
will be treated as "spots" and the overall compression ratio will start
to approach that of bzip2.
A "workaround" for this is simply to store the data set in "chunks"
where the background level is similar, but I suppose a more intelligent
thing to do would be to simply "scale" each image to the median
background image, and store the scale factors (a list of 100 numbers for
a 100-image data set) along with the other ancillary data. I haven't
done that yet. Didn't want to spend too much time on this in case I
incited some kind of revolt.
-James Holton
MAD Scientist
On 07/05/2010 06:07, James Holton wrote:
Ian Tickle wrote:
I found an old e-mail from James Holton where he suggested lossy
compression for diffraction images (as long as it didn't change the
F's significantly!) - I'm not sure whether anything came of that!
Well, yes, something did come of this.... But I don't think Gerard
Bricogne is going to like it.
Details are here:
http://bl831.als.lbl.gov/~jamesh/lossy_compression/
Short version is that I found a way to compress a test lysozyme
dataset by a factor of ~33 with no apparent ill effects on the data.
In fact, anomalous differences were completely unaffected, and Rfree
dropped from 0.287 for the original data to 0.275 when refined
against Fs from the compressed images. This is no doubt a fluke of
the excess noise added by compression, but I think it highlights how
the errors in crystallography are dominated by the inadequacies of
the electron density models we use, and not the quality of our data.
The page above lists two data sets: "A" and "B", and I am interested
to know if and how anyone can "tell" which one of these data sets was
compressed. The first image of each data set can be found here:
http://bl831.als.lbl.gov/~jamesh/lossy_compression/firstimage.tar.bz2
-James Holton
MAD Scientist