emergency file server cleanup

Antonio Diaz Diaz Mon, 29 Sep 2014 12:22:39 -0700

Hello Alexandre,

Alexandre Oliva wrote:

Why? Lzip can compress more than xz with a bit of tuning via --options.


Maybe it can, but when I compared the sizes of the files to decide which
one to keep, .xz files were consistently (if slightly) smaller than .lz
ones.

I guess you mainly mean tarballs, because lzip compresses patches andxdeltas better than xz, sometimes even when passing the --extreme optionto xz. (Updating lzip to 1.16 gives even better results):


   98923 2014-09-29 03:49 linux-libre-3.17-rc7-gnu.xdelta.lz (1.16)
   99065 2014-09-29 03:49 linux-libre-3.17-rc7-gnu.xdelta.lz
   99096 2014-09-29 03:49 linux-libre-3.17-rc7-gnu.xdelta.xz

 7268517 2014-09-29 05:19 patch-3.16-gnu-3.17-rc7-gnu.lz (1.16)
 7284746 2014-09-29 05:19 patch-3.16-gnu-3.17-rc7-gnu.lz
 7272508 2014-09-29 05:19 patch-3.16-gnu-3.17-rc7-gnu.xz (-9e)
 7344044 2014-09-29 05:19 patch-3.16-gnu-3.17-rc7-gnu.xz (-9)

   81530 2014-09-29 05:40 patch-3.17-rc6-gnu-3.17-rc7-gnu.lz (1.16)
   81638 2014-09-29 05:40 patch-3.17-rc6-gnu-3.17-rc7-gnu.lz
   81724 2014-09-29 05:40 patch-3.17-rc6-gnu-3.17-rc7-gnu.xz (-9e)
   82104 2014-09-29 05:40 patch-3.17-rc6-gnu-3.17-rc7-gnu.xz (-9)

Maybe I'm not using the best options to compress tarballs, vcdiffs and
xdeltas with lzip.  Suggestions are certainly welcome.

Vcdiff is already a compressed format. I guess the best option is not tocompress it again and just distribute one plain .vcdiff file perrelease. You save about 66% in size and the (re)compressing time.

About tarballs, when LZMA-utils was renamed to XZ-utils its developerschanged the name of the "algoritnm" to LZMA2 and at the same timeincreased the dictionary size of option -9 from 32 MiB to 64 MiB,misleading users into thinking that the increase in compression ratiowas because of the new "algorithm". (BTW, LZMA2 is not an algorithm, buta container format).

As you can see near the end of the lzip benchmark[1], passing to lzipthe arguments equivalent to those of "xz -9" (or to xz the argumentsequivalent to those of "lzip -9"), will usually make lzip compress morethan xz. But I do not recommend you to do it because using plain "-9" onboth compressors, lzip usually compresses large files about as much asxz, but using half the RAM and requiring half the RAM to decompress.

In the case of small files the difference of memory required todecompress is even larger. The massif tool of valgrind finds that lzipuses 443,384 bytes to decompress 'patch-3.17-rc6-gnu-3.17-rc7-gnu',while xz uses 67,154,552 bytes.

In the lzip benchmark you can also see that each and every one of the 43xz tarballs being distributed in ftp.gnu.org were better compressed by lzip.


[1] http://www.nongnu.org/lzip/lzip_benchmark.txt

Lzip was designed for long-term archiving, having a
tool to recover corrupt files.


I very much doubt it could recover corrupt files to the point that the
original signature would match, because that would require a lot of
redundancy to be added, which is the opposite of what a compressor is
supposed to do.  And if the original signature doesn't match, I wouldn't
trust the result, especially given that we have alternate paths to
obtain the tarballs.

Lziprecover is so awesome that people can't believe it. :-) Most thinkit is just like bzip2recover.

Lziprecover can repair perfectly most files with a single-byte error onthem, without the need of any extra redundance at all. The repaired filewill be identical bit for bit to the original.

Just get one linux-libre tarball and modify the value of a byte (nearthe beginning for a quick test). For example, I modified the byte atoffset 1000 in 'linux-libre-3.12.5-gnu.tar.lz' and lziprecover repairedit in 12 seconds:


e871ba7561ed4833e9349f40d2975f53  linux-libre-3.12.5-gnu.tar.lz
e871ba7561ed4833e9349f40d2975f53  linux-libre-3.12.5-gnu1k.tar_fixed.lz

One byte may seem small, but most file corruptions not produced by I/Oerrors just affect one byte, or even one bit, of the file. Also, unlikemagnetic media, where errors usually affect a whole sector, solid-statedevices tend to produce single byte errors, making of lzip the perfectformat for data stored on such devices.

Even if the repair capability of lziprecover is not needed forlinux-libre files it may save the irreplaceable data of many users,which they would lose if they use bzip2 or xz.

As the author of GNU ddrescue I know about the tragedy of losing dataand how to increase the probability of recovering it. If I have spent 6years developing a whole family of tools around a compression format youcan be sure that it is the best for users. If it weren't, I would justhave continued developing my projects and using the best format for mytarballs. Data compression should not be seen as a popularity contest,but as a service to humankind.


Be the change you wish to see in the world. Drop xz tarballs altogether. ;-)


Best regards,
Antonio.

_______________________________________________
linux-libre mailing list
[email protected]
http://www.fsfla.org/cgi-bin/mailman/listinfo/linux-libre

emergency file server cleanup

Reply via email to