Hello Alexandre,

Alexandre Oliva wrote:
Why? Lzip can compress more than xz with a bit of tuning via --options.

Maybe it can, but when I compared the sizes of the files to decide which
one to keep, .xz files were consistently (if slightly) smaller than .lz
ones.

I guess you mainly mean tarballs, because lzip compresses patches and xdeltas better than xz, sometimes even when passing the --extreme option to xz. (Updating lzip to 1.16 gives even better results):

   98923 2014-09-29 03:49 linux-libre-3.17-rc7-gnu.xdelta.lz (1.16)
   99065 2014-09-29 03:49 linux-libre-3.17-rc7-gnu.xdelta.lz
   99096 2014-09-29 03:49 linux-libre-3.17-rc7-gnu.xdelta.xz

 7268517 2014-09-29 05:19 patch-3.16-gnu-3.17-rc7-gnu.lz (1.16)
 7284746 2014-09-29 05:19 patch-3.16-gnu-3.17-rc7-gnu.lz
 7272508 2014-09-29 05:19 patch-3.16-gnu-3.17-rc7-gnu.xz (-9e)
 7344044 2014-09-29 05:19 patch-3.16-gnu-3.17-rc7-gnu.xz (-9)

   81530 2014-09-29 05:40 patch-3.17-rc6-gnu-3.17-rc7-gnu.lz (1.16)
   81638 2014-09-29 05:40 patch-3.17-rc6-gnu-3.17-rc7-gnu.lz
   81724 2014-09-29 05:40 patch-3.17-rc6-gnu-3.17-rc7-gnu.xz (-9e)
   82104 2014-09-29 05:40 patch-3.17-rc6-gnu-3.17-rc7-gnu.xz (-9)


Maybe I'm not using the best options to compress tarballs, vcdiffs and
xdeltas with lzip.  Suggestions are certainly welcome.

Vcdiff is already a compressed format. I guess the best option is not to compress it again and just distribute one plain .vcdiff file per release. You save about 66% in size and the (re)compressing time.

About tarballs, when LZMA-utils was renamed to XZ-utils its developers changed the name of the "algoritnm" to LZMA2 and at the same time increased the dictionary size of option -9 from 32 MiB to 64 MiB, misleading users into thinking that the increase in compression ratio was because of the new "algorithm". (BTW, LZMA2 is not an algorithm, but a container format).

As you can see near the end of the lzip benchmark[1], passing to lzip the arguments equivalent to those of "xz -9" (or to xz the arguments equivalent to those of "lzip -9"), will usually make lzip compress more than xz. But I do not recommend you to do it because using plain "-9" on both compressors, lzip usually compresses large files about as much as xz, but using half the RAM and requiring half the RAM to decompress.

In the case of small files the difference of memory required to decompress is even larger. The massif tool of valgrind finds that lzip uses 443,384 bytes to decompress 'patch-3.17-rc6-gnu-3.17-rc7-gnu', while xz uses 67,154,552 bytes.

In the lzip benchmark you can also see that each and every one of the 43 xz tarballs being distributed in ftp.gnu.org were better compressed by lzip.

[1] http://www.nongnu.org/lzip/lzip_benchmark.txt


Lzip was designed for long-term archiving, having a
tool to recover corrupt files.

I very much doubt it could recover corrupt files to the point that the
original signature would match, because that would require a lot of
redundancy to be added, which is the opposite of what a compressor is
supposed to do.  And if the original signature doesn't match, I wouldn't
trust the result, especially given that we have alternate paths to
obtain the tarballs.

Lziprecover is so awesome that people can't believe it. :-) Most think it is just like bzip2recover.

Lziprecover can repair perfectly most files with a single-byte error on them, without the need of any extra redundance at all. The repaired file will be identical bit for bit to the original.

Just get one linux-libre tarball and modify the value of a byte (near the beginning for a quick test). For example, I modified the byte at offset 1000 in 'linux-libre-3.12.5-gnu.tar.lz' and lziprecover repaired it in 12 seconds:

e871ba7561ed4833e9349f40d2975f53  linux-libre-3.12.5-gnu.tar.lz
e871ba7561ed4833e9349f40d2975f53  linux-libre-3.12.5-gnu1k.tar_fixed.lz


One byte may seem small, but most file corruptions not produced by I/O errors just affect one byte, or even one bit, of the file. Also, unlike magnetic media, where errors usually affect a whole sector, solid-state devices tend to produce single byte errors, making of lzip the perfect format for data stored on such devices.

Even if the repair capability of lziprecover is not needed for linux-libre files it may save the irreplaceable data of many users, which they would lose if they use bzip2 or xz.

As the author of GNU ddrescue I know about the tragedy of losing data and how to increase the probability of recovering it. If I have spent 6 years developing a whole family of tools around a compression format you can be sure that it is the best for users. If it weren't, I would just have continued developing my projects and using the best format for my tarballs. Data compression should not be seen as a popularity contest, but as a service to humankind.

Be the change you wish to see in the world. Drop xz tarballs altogether. ;-)


Best regards,
Antonio.

_______________________________________________
linux-libre mailing list
[email protected]
http://www.fsfla.org/cgi-bin/mailman/listinfo/linux-libre

Reply via email to