From: Mike Simons [mailto:[EMAIL PROTECTED] > On Tue, Mar 25, 2003 at 06:05:31PM -0500, Mike Simons wrote: > > On Tue, Mar 25, 2003 at 02:21:21PM -0000, Paul Marquess wrote: > > > You need to be very careful with HTTP deflate encoding. Due > to a common > > > misreading of the HTTP spec, there are two distinct > implementations of HTTP > > > deflate out in the wild. The correct implementation uses RFC 1950. The > > > incorrect implementation uses RFC 1951. > > So... > > rfc1950 documents the 'ZLIB' format, > which is a 2 or 6 byte header, and 4 byte trailer, > around a DEFLATE data stream. > > rfc1951 documents the 'DEFLATE' format (a compression algorithm) > which has no header or trailer > > rfc1952 documents the 'GZIP' format, > which is a 10 or more byte header (lots of flexiblilty), an 8 byte > trailer, around a DEFLATE data stream. > > rfc2616 documents the HTTP/1.1 protocol, > it identifies three content-encodings key words: > "gzip" == rfc1952, "compress" (no reference), > "deflate" which is ZLIB format (rfc1950 format).
You've got it. The rfc195[012] definition of deflate is completely different from the rfc2626 definition. If the HTTP folk had used the term "zlib" instead of "deflate", there may not have been the confusion. > > For the last day I've been under the impression that gzip format had > length of uncompressed stream in the header. This is 100% not correct. > The 8 byte trailer has the length. I had done a bad reading of the > perl module, thought that the length was at the beginning of data: > === > my ($output, $status) = $x->inflate($string); > my ($crc, $len) = unpack ("VV", substr($$string, 0, 8)); > === > > $string is really a reference which is modified by inflate... > > > > Sorry I lost you... > [...] > > So... > > zlib != http, but zlib == rfc1950 == http > > Okay rfc1950 (rfc1951 with a header) is correct for http, it is label > "deflate" by http protocol, even though it called "zlib" by everything > else. > > rfc2616 > === > deflate > The "zlib" format defined in RFC 1950 [31] in combination with > the "deflate" compression mechanism described in RFC 1951 [29]. > === > > > > > One of my > > > goals is to make it easier to use in the HTTP modules (automatically > > > figuring out which of the two deflate implementations is used > when doing a > > > uncompress is already on my list), so if there are any > specific requests, > > > now would be a good time to feed them back to me. > > > > The auto-detection is kinda nice, but what I think is much more > > valuable is to allow a stream based (block by block) compress and > > decompress mode for the data stream if possible. It should also be > > possible for a user to *not* allow auto detection if they want to... > > Detecting "gzip" is trivial from the header. "zlib" is also possible > to guess with with alot of certainty... Yep, both have been designed to allow easy detection. For example, here is a zlib detector sub isZlibFormat($) { my $data = shift ; my $hdr = unpack("n", $data) ; return ($hdr % 31 == 0); } > The raw deflate on the wire is just broken, but it would be possible to > feed to the inflation library call, and as long as no errors happen > over the length of the stream, it must have been okay. I did some tests a while back with the zlib folk, and the worst case was that the inflate failed after about 500 bytes. In the vast majority of cases, it will detect it after about 15 bytes. > As long as the caller can as for (or not for) this feature, it sounds > like a great addition to the Zlib module. > > Later, > Mike Simons > > > The deflate code could be much like what's above, but people would > > HAVE to supply a crc32 and length value to Init do use gzip type > > compression... and when Done is invoked if the CRC doesn't match or > > if the length isn't right you return an error code... ;) > > Ignore this part, this is from my wrong understanding. Too late. I already responded :-) Paul