On Friday 18 January 2008 04:03, JoSH Lehan wrote:
> Here's a one-liner patch, attached, to work around a nasty problem I
> had when using unzip.
>
> I would get these errors when using unzip on a zipfile that contained
> multiple files inside it:
>
> unzip: invalid zip magic 0x[random number]
>
> If the zipfile contained only one file, this would be the error, instead:
>
> unzip: short read
>
> Upon further investigation and using strace, I saw that it was doing a
> read() into a large buffer.
>
> The buffer is the size of an entire compressed file, which to me,
> seems rather wasteful of memory for large files, but that's something
> I can put on the back burner for now.
You are right, here is it:
static void unzip_extract(zip_header_t *zip_header, int src_fd, int dst_fd)
{
if (zip_header->formatted.method == 0) {
/* Method 0 - stored (not compressed) */
off_t size = zip_header->formatted.ucmpsize;
if (size)
bb_copyfd_exact_size(src_fd, dst_fd, size);
} else {
/* Method 8 - inflate */
inflate_unzip_result res;
if (inflate_unzip(&res, zip_header->formatted.cmpsize, src_fd,
dst_fd) < 0)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
bb_error_msg_and_die("inflate error");
Then
inflate_unzip(inflate_unzip_result *res, unsigned bufsize, int in, int out)
{ ^^^^^^^^^^^^^^^^
USE_DESKTOP(long long) int n;
DECLARE_STATE;
ALLOC_STATE;
bytebuffer_max = bufsize + 4; <=============
and
static unsigned fill_bitbuffer(STATE_PARAM unsigned bitbuffer, unsigned
*current, const unsigned required)
{
while (*current < required) {
if (bytebuffer_offset >= bytebuffer_size) {
/* Leave the first 4 bytes empty so we can always
unwind the bitbuffer
* to the front of the bytebuffer */
bytebuffer_size = safe_read(gunzip_src_fd,
&bytebuffer[4], bytebuffer_max - 4);
^^^^^^^^^^^^^^^^^^
if ((int)bytebuffer_size < 1) {
error_msg = "unexpected end of file";
abort_unzip(PASS_STATE_ONLY);
}
bytebuffer_size += 4;
bytebuffer_offset = 4;
}
> If this read() returns less than a full buffer, then after the data is
> consumed, the read() is called again, *with the same size buffer as
> before, attempting to read the entire compressed file's size again*.
>
> So, any partial read() returns will cause the second read() to grab as
> many bytes as it can, up to and beyond the end of the file!
>
> So, when unzip tries to advance to the next file, it ends up seeing garbage.
>
> The PK12 header at the start of the next file will already have been
> consumed by the read() call, so what's left for unzip is garbage in
> the middle of the next file, or EOF, which explains the short read.
>
> This code is common to both unzip and gunzip. However, gunzip only
> works on a single file at a time, and only stops when it reaches EOF.
Not true. gunzip is far more clever than that.
It actually correctly gunzip's CONCATENATED gzip files!
# gzip TODO_config_nommu
# cat TODO_config_nommu.gz TODO_config_nommu.gz >z.gz
# /usr/srcdevel/bbox/fix/busybox.t4/busybox gunzip z.gz
# /usr/srcdevel/bbox/fix/busybox.t4/busybox gunzip TODO_config_nommu.gz
# ls -l TODO_config_nommu z
-rw-r--r-- 1 root root 17719 Jan 27 22:55 TODO_config_nommu
-rw-r--r-- 1 root root 35438 Jan 27 22:55 z
# echo $((17719*2))
35438
> There's no more file content to read after that point. So, if gunzip
> does an over-read, there's no problem.
Hmm, interesting... is it also vulnerable to short reads (in above case
of concatenated gz files)?
> This isn't the right fix, but it works. The gunzip decompression
> seems to be smarter about repeatedly reusing a small buffer.
Indeed it is - see how it uses read data for testing for next
gzip header in unpack_gz_stream().
> I looked
> in the state structure, and there's no field for "number of
> compressed bytes remaining in this file".
So add one. Set it to ((ofs_t)-1) for gzip and disregard it
if it is < 0, but if it is >= 0 (unzip - initially set to
zip_header->formatted.cmpsize), never read more bytes than needed.
Then you can also fix the problem of allocating
huge buffer too! - just us fixed one repeatedly.
Failing that, at least give a reproducible testcase -
how do you force short reads?
(no promices that I can work on it soon, though... sorry).
--
vda
_______________________________________________
busybox mailing list
[email protected]
http://busybox.net/cgi-bin/mailman/listinfo/busybox