Re: Patch: Short read problem in unzip and gunzip

Denis Vlasenko Sun, 27 Jan 2008 15:14:17 -0800

On Friday 18 January 2008 04:03, JoSH Lehan wrote:
> Here's a one-liner patch, attached, to work around a nasty problem I
> had when using unzip.
> 
> I would get these errors when using unzip on a zipfile that contained
> multiple files inside it:
> 
> unzip: invalid zip magic 0x[random number]
> 
> If the zipfile contained only one file, this would be the error, instead:
> 
> unzip: short read
> 
> Upon further investigation and using strace, I saw that it was doing a
> read() into a large buffer.
> 
> The buffer is the size of an entire compressed file, which to me,
> seems rather wasteful of memory for large files, but that's something
> I can put on the back burner for now.


You are right, here is it:

static void unzip_extract(zip_header_t *zip_header, int src_fd, int dst_fd)
{
        if (zip_header->formatted.method == 0) {
                /* Method 0 - stored (not compressed) */
                off_t size = zip_header->formatted.ucmpsize;
                if (size)
                        bb_copyfd_exact_size(src_fd, dst_fd, size);
        } else {
                /* Method 8 - inflate */
                inflate_unzip_result res;
                if (inflate_unzip(&res, zip_header->formatted.cmpsize, src_fd, 
dst_fd) < 0)
                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                        bb_error_msg_and_die("inflate error");


Then

inflate_unzip(inflate_unzip_result *res, unsigned bufsize, int in, int out)
{                                        ^^^^^^^^^^^^^^^^
        USE_DESKTOP(long long) int n;
        DECLARE_STATE;

        ALLOC_STATE;

        bytebuffer_max = bufsize + 4; <=============

and

static unsigned fill_bitbuffer(STATE_PARAM unsigned bitbuffer, unsigned 
*current, const unsigned required)
{
        while (*current < required) {
                if (bytebuffer_offset >= bytebuffer_size) {
                        /* Leave the first 4 bytes empty so we can always 
unwind the bitbuffer
                         * to the front of the bytebuffer */
                        bytebuffer_size = safe_read(gunzip_src_fd, 
&bytebuffer[4], bytebuffer_max - 4);
                                                                                
   ^^^^^^^^^^^^^^^^^^
                        if ((int)bytebuffer_size < 1) {
                                error_msg = "unexpected end of file";
                                abort_unzip(PASS_STATE_ONLY);
                        }
                        bytebuffer_size += 4;
                        bytebuffer_offset = 4;
                }


> If this read() returns less than a full buffer, then after the data is
> consumed, the read() is called again, *with the same size buffer as
> before, attempting to read the entire compressed file's size again*.
> 
> So, any partial read() returns will cause the second read() to grab as
> many bytes as it can, up to and beyond the end of the file!
> 
> So, when unzip tries to advance to the next file, it ends up seeing garbage.
> 
> The PK12 header at the start of the next file will already have been
> consumed by the read() call, so what's left for unzip is garbage in
> the middle of the next file, or EOF, which explains the short read.
> 
> This code is common to both unzip and gunzip.  However, gunzip only
> works on a single file at a time, and only stops when it reaches EOF.

Not true. gunzip is far more clever than that.
It actually correctly gunzip's CONCATENATED gzip files!

# gzip TODO_config_nommu
# cat TODO_config_nommu.gz TODO_config_nommu.gz >z.gz
# /usr/srcdevel/bbox/fix/busybox.t4/busybox gunzip z.gz
# /usr/srcdevel/bbox/fix/busybox.t4/busybox gunzip TODO_config_nommu.gz
# ls -l TODO_config_nommu z
-rw-r--r--    1 root     root        17719 Jan 27 22:55 TODO_config_nommu
-rw-r--r--    1 root     root        35438 Jan 27 22:55 z
# echo $((17719*2))
35438

> There's no more file content to read after that point.  So, if gunzip
> does an over-read, there's no problem.

Hmm, interesting... is it also vulnerable to short reads (in above case
of concatenated gz files)?

> This isn't the right fix, but it works.  The gunzip decompression
> seems to be smarter about repeatedly reusing a small buffer.

Indeed it is - see how it uses read data for testing for next
gzip header in unpack_gz_stream().

> I looked 
> in the state structure, and there's no field for "number of
> compressed bytes remaining in this file".

So add one. Set it to ((ofs_t)-1) for gzip and disregard it
if it is < 0, but if it is >= 0 (unzip - initially set to
zip_header->formatted.cmpsize), never read more bytes than needed.

Then you can also fix the problem of allocating
huge buffer too! - just us fixed one repeatedly.

Failing that, at least give a reproducible testcase -
how do you force short reads?
(no promices that I can work on it soon, though... sorry).
--
vda
_______________________________________________
busybox mailing list
[email protected]
http://busybox.net/cgi-bin/mailman/listinfo/busybox

Re: Patch: Short read problem in unzip and gunzip

Reply via email to