Hi folks, You need to be very careful with HTTP deflate encoding. Due to a common misreading of the HTTP spec, there are two distinct implementations of HTTP deflate out in the wild. The correct implementation uses RFC 1950. The incorrect implementation uses RFC 1951. The reason for this seems to be down to the overloaded use of the term "deflate". Both zlib and HTTP use the term, but unfortunately they don't mean the same thing. The incorrect implementations use the zlib definition of deflate.
Last time I checked, Internet Explorer does it the wrong way. Opera can cope with either. Older Netscape/Mozilla got it wrong, but I think recent Mozilla has fixed it to handle both. I can't remember what Apache does. This makes things slightly tricky if you are going to process content that advertises itself as "Content-Encoding: deflate", because you can't tell which of the two deflate implementations it is without doing a quick test on the content. It is possible to do it though. You can test for RFC 1950 by checking for the presence of a ZLIB header. To detect RFC 1951, you have to attempt to decompress it. If it passes, you have RFC1951, if it fails, you don't. Things are much worse if you are running in a HTTP Proxy or an Origin Server and an agent advertises "Accept-Encoding: deflate". Which of the two deflate implementations do you use? If you knew all the possible User-Agent headers you will ever get, you could maintain a database that mapped User-Agent to deflate implementation type. Blech! Finally, I'm the author of Compress::Zlib, and I've been giving it a major overhaul over the last couple of months (I've been at it on-and-off for a few months because I don't have a lot of free time at the moment). One of my goals is to make it easier to use in the HTTP modules (automatically figuring out which of the two deflate implementations is used when doing a uncompress is already on my list), so if there are any specific requests, now would be a good time to feed them back to me. Paul > -----Original Message----- > From: David Carter [mailto:[EMAIL PROTECTED] > Sent: 25 March 2003 10:44 > To: 'Mike Simons'; [EMAIL PROTECTED] > Subject: RE: Net::HTTP does not use compressed transfers when it should > > > Mike, > > If you're interested, I have some working perl code that does deflate > decompression. It does it at the application level, and needs to be moved > down into Net::HTTP and/or LWP. There are some wrinkles related > to handling > of window_bits (or similar, don't have the code in front of me at the > moment) that are not at all obvious. > > No need to teach mod_gzip deflate for testing - just find a site on the > internet that already emits "Content-encoding: deflate" & test > with it. Such > as http://www.homedepot.com > > All commercial "http accelerators" I have looked at use content-encoding > rather than transfer-encoding. I think it has something to do with what > Internet Explorer supports, or perhaps even how well it supports > one vs. the > other. It's been a couple of years since I worked with this > extensively, so > the details are a little foggy. > > I have written a Netscape (iPlanet) server plugin & CGI that apply deflate > compression to data returned by any other CGI program, but unfortunately > this code is proprietary. > > --- > David Carter > [EMAIL PROTECTED] > > > > -----Original Message----- > > From: Mike Simons [mailto:[EMAIL PROTECTED] > > Sent: Tuesday, March 25, 2003 2:10 AM > > To: [EMAIL PROTECTED] > > Subject: Re: Net::HTTP does not use compressed transfers when it should > > > > On Mon, Mar 24, 2003 at 01:59:56PM -0500, Mike Simons wrote: > > > Net::HTTP does not play nicely with mod_gzip from apache. > > > > > > Net::HTTP sends 'TE:' headers, mod_gzip looks for > 'Accept-encoding:'. > > > > > > - Any chance 'Accept-encoding:' can be advertised and 'Content- > > Encoding:' > > > results can be decoded by Net::HTTP sometime soon? > > > > So, I have something that works between mod_gzip and Net::HTTP, > > using the gzip transfer type. The data is transparently decompressed > > by the HTTP module and block by block decompression is supported. > > > > Patch attached ... in order for LWP to use this a minor patch is > > needed to the http.pm module. > > > > - Who does code review or where do patches go? > > > > Later, > > Mike Simons > > > > > > I'll try to clean it up somewhat tomorrow... > > > > First Draft BUGS: > > === > > - The HTTP modules advertises support for deflate, but doesn't handle > > that yet... from what I can tell mod_gzip can not send deflate data. > > In order to get deflate working I need to teach mod_gzip to send > > deflate data... > > > > - The documentation isn't updated. > > > > - No attempt was made to support TE and Content-Encoded data at the same > > time. > > > > - No test of this code with Compress::Zlib uninstalled to verify that > > it still works there was done. > > > > - The decompression routine does block by block decompression, but > > in order to do this calls a private Compress::ZLib method > > (_removeGzipHeader) to strip off the gzip header, this is the exact > > same function that the MemGunzip call makes to prepare the strip the > > header... > > While it's unclean calling something else's private method > > it would be worse re-implementing the prune here, because it's size is > > dynamic. > > > > - If compression is requested it is important that client code not pay > > attention to the content-length value... that is not the number of > > bytes to read, call the read method until it returns 0 bytes. > >