[ quoting the full mail of lasse since it didnt make its way into the bb maillist yet ]

On 02/25/2013 12:05 PM, Lasse Collin wrote:
On 2013-02-25 Denys Vlasenko wrote:
On Sunday 24 February 2013 22:37, John Spencer wrote:
http://www.kernel.org/pub/linux/kernel/v3.0/linux-3.8.tar.xz

using busybox 1.20.2 and xz 5.0.3 or xz 5.0.4:

$ tar xf linux-3.8.tar.xz

i get: "short read" and exit status 1.
however the data seems to be there (at least partial).

the culprit is the file linux-3.8/drivers/media/tuners/mt2063.c

after doing xzcat linux-3.8.tar.xz>  linux-3.8.tar , that file is
truncated after 4096*2+512 bytes.

xzcat is from busybox (not from xz, as i assumed earlier)

the .tar file is truncated at this point as well, it is only 200
MB, but with xzcat from xz package, it is>  500 MB.

Apparently XZ embedded has a bug :(
Not only our in-tree one, but the latest git of it is buggy too:

$ git clone http://git.tukaani.org/xz-embedded.git
$ cd xz-embedded/userspace
$ make
$ ./xzminidec</tmp/linux-3.8.tar.xz | wc -c
./xzminidec: Unsupported check; not verifying file integrity
<......working for some time.......>
201330688

(xzminidec doesn't crash: exit code is zero).

The peculiar thing is that 201330688 is exactly 0x0c001000.

linux-3.8.tar.xz from kernel.org has three concatenated .xz streams.
You can see this with e.g. "xz -l" or "xz -lvv". At least pxz creates
such .xz files. Such files are valid and fine.

xzminidec is a limited example program. It doesn't support concatenated
streams. This is mentioned in the comment in the beginning of
xzminidec.c. One may argue if it is a bug or a feature, but at least
the limitation has been documented.

Busybox' xzcat lacks support for concatenated .xz streams. For
comparison, Busybox' zcat and bzcat do support concatenated streams.

     $ echo foo | gzip>  test.gz
     $ echo bar | gzip>>  test.gz
     $ busybox zcat test.gz
     foo
     bar

     $ echo foo | xz>  test.xz
     $ echo bar | xz>>  test.xz
     $ busybox xzcat test.xz
     foo

liblzma in XZ Utils has a flag to decode concatenated streams to make
it a bit easier to handle such files. I would prefer to not include
such a flag in XZ Embedded, since I think in most embedded situations
(boot loaders, kernels etc.) such a flag is useless. Busybox is an
exception to this.

Below is a patch to add support for concatenated .xz streams. It also
handles possible padding (sequence of zero-bytes) between the streams.
It probably has room for improvement, but it should be a useful starting
point.

diff --git a/archival/libarchive/decompress_unxz.c 
b/archival/libarchive/decompress_unxz.c
index 79b48a1..5ebbd28 100644
--- a/archival/libarchive/decompress_unxz.c
+++ b/archival/libarchive/decompress_unxz.c
@@ -86,8 +86,40 @@ unpack_xz_stream(transformer_aux_data_t *aux, int src_fd, 
int dst_fd)
                        IF_DESKTOP(total += iobuf.out_pos;)
                        iobuf.out_pos = 0;
                }
-               if (r == XZ_STREAM_END) {
-                       break;
+               while (r == XZ_STREAM_END) {
+                       /* Handle concatenated .xz Streams including possible
+                        * Stream Padding.
+                        */
+                       if (iobuf.in_pos == iobuf.in_size) {
+                               int rd = safe_read(src_fd, membuf, BUFSIZ);
+                               if (rd<  0) {
+                                       bb_error_msg(bb_msg_read_error);
+                                       total = -1;
+                                       goto out;
+                               }
+                               if (rd == 0)
+                                       goto out;
+
+                               iobuf.in_size = rd;
+                               iobuf.in_pos = 0;
+                       }
+
+                       /* Stream Padding must always be a multiple of four
+                        * bytes to preserve four-byte alignment. To keep the
+                        * code slightly smaller, we aren't as strict here as
+                        * the .xz spec requires. We just skip all zero-bytes
+                        * without checking the alignment and thus can accept
+                        * files that aren't valid, e.g. the XZ Utils test
+                        * files bad-0pad-empty.xz and bad-0catpad-empty.xz.
+                        */
+                       while (iobuf.in_pos<  iobuf.in_size) {
+                               if (membuf[iobuf.in_pos] != 0) {
+                                       xz_dec_reset(state);
+                                       r = XZ_OK;
+                                       break;
+                               }
+                               ++iobuf.in_pos;
+                       }
                }
                if (r != XZ_OK&&  r != XZ_UNSUPPORTED_CHECK) {
                        bb_error_msg("corrupted data");
@@ -95,6 +127,8 @@ unpack_xz_stream(transformer_aux_data_t *aux, int src_fd, 
int dst_fd)
                        break;
                }
        }
+
+out:
        xz_dec_end(state);
        free(membuf);


By the way, since Busybox' copy of XZ Embedded hasn't been updated
since unxz was added, this bug fix is missing from Busybox:

     
http://git.tukaani.org/?p=xz-embedded.git;a=commitdiff;h=4cec51e1be4797a4bd8b266a1d34cabd7fdb79fd

There is also the following bug fix but I think it doesn't affect
Busybox' unxz:

     
http://git.tukaani.org/?p=xz-embedded.git;a=commitdiff;h=9690fe69dc97eb2e7fe2804e4448a5278cde5411

Another known issue with Busybox' unxz is that most .xz files use CRC64
for integrity checking, but XZ Embedded supports only CRC32. So with
most .xz files, Busybox cannot verify the integrity check. This should
be fixed at XZ Embedded side first, of course.


thanks lasse, i integrated all 3 patches into my busybox setup and can confirm that xzcat/tar behave as expected now.

--JS
_______________________________________________
busybox mailing list
[email protected]
http://lists.busybox.net/mailman/listinfo/busybox

Reply via email to