libxml authors and/or maintainers,

I am preparing a new release of zlib which includes a rewrite of the gz* 
functions (to wit, that read and write gzip files).  In the process of testing, 
Mark Brown discovered that libxml encounters a segmentation fault with the 
current beta versions of zlib (1.2.3.5 and later).  Within libxml version 2.7.6 
in xmlIO.c we find this offending code:

#ifdef HAVE_ZLIB_H
        if ((xmlInputCallbackTable[i].opencallback == xmlGzfileOpen) &&
                (strcmp(URI, "-") != 0)) {
            if (((z_stream *)context)->avail_in > 4) {
                char *cptr, buff4[4];
                cptr = (char *) ((z_stream *)context)->next_in;
                if (gzread(context, buff4, 4) == 4) {
                    if (strncmp(buff4, cptr, 4) == 0)
                        ret->compressed = 0;
                    else
                        ret->compressed = 1;
                    gzrewind(context);
                }
            }
        }
#endif

This appears to be an attempt to determine if the input is a gzip file or not.  
It does so by finding and comparing bytes from an internal zlib buffer to the 
data returned by zlib, using "(z_stream *)context" to access the internal 
buffer, where "context" is zlib's otherwise opaque gzFile structure.

This code violates the zlib interface, penetrating the protected gzFile 
structure.  (Gee, this is starting to sound like a sexual assault.)  libxml 
assumes things about zlib that are not documented in the interface (zlib.h) and 
so cannot be relied upon and must not be used in code that professes to be 
dependable.  This includes the assumption that gzFile has the z_stream it uses 
in the structure as the first element, that it uses the z_stream next_in 
pointer for the input buffer, and that gzopen() or gzdopen() reads data from 
the input to determine whether it is a gzip file or not.  None of these 
assumptions are true in the new gz* code in the current beta of zlib, and so 
libxml crashes and burns.

It turns out that this offense is not necessary since the current production 
version of zlib, 1.2.3, available since July of 2005 and widely deployed, has 
the gzdirect() function for exactly this purpose.

Since both libxml and zlib are widely deployed, and since I expect that 
applications usually link to it and zlib dynamically, we have a problem.

The fix to libxml is to check the version of zlib, and to use the current code 
for ZLIB_VERNUM less than 0x1230, and to instead use gzdirect() for ZLIB_VERNUM 
greater than or equal to 0x1230.  However, older versions of libxml without 
this fix will fail with newer versions of zlib.  E.g. if zlib is updated and 
libxml is not.

An untested patch to illustrate the approach is provided below.

Mark Adler


--- xmlIO-2.7.6.c       2009-09-24 08:32:00.000000000 -0700
+++ xmlIO.c     2010-01-18 01:30:27.000000000 -0800
@@ -2518,6 +2518,9 @@
 #ifdef HAVE_ZLIB_H
        if ((xmlInputCallbackTable[i].opencallback == xmlGzfileOpen) &&
                (strcmp(URI, "-") != 0)) {
+#if defined(ZLIB_VERNUM) && ZLIB_VERNUM >= 0x1230
+        ret->compressed = !gzdirect(context);
+#else
            if (((z_stream *)context)->avail_in > 4) {
                char *cptr, buff4[4];
                cptr = (char *) ((z_stream *)context)->next_in;
@@ -2529,6 +2532,7 @@
                    gzrewind(context);
                }
            }
+#endif
        }
 #endif
     }

_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
http://mail.gnome.org/mailman/listinfo/xml

Reply via email to