"David Carter" <[EMAIL PROTECTED]> writes: > My understanding had always been that content-encoding (when talking about > compression) is in practical terms no different than transfer-encoding. LWP > already handles transfer-encoding (gzip or deflate), so what's the big deal > about it also handling content-encoding compression in a transparent manner?
Tranfer-Encoding and Content-Encoding works at different levels in the HTTP protocol. It makes perfect sense to handle Transfer-Encoding transparantly in a client library. It does not make sense to try to hide Content-Encoding in the same way. > My suggestion would be to make it the default to handle it transparently, > but provide an option to turn it off if someone needs access to the raw > datastream. All GUI browsers "just do it" - the user doesn't have to be > concerned with either content-encoding or transfer-encoding. I disagree. LWP is not a GUI browser and should not hide content-encoding by default. > If you have a file in .tar.gz format, the web server should NOT return a > content-encoding: gzip header. Sure it should. Especially if the Content-Type header describe the type of document you end up with after you 'gunzip' it. > This would incur redundant processing costs > on the server & the client, attempting to re-compress an already compressed > file for little or no gain. Instead, the server would send an appropriate > mime type indicating to the client that this is a compressed archive file > (usually handled in a GUI client by presenting a file download dialog box). I disagree here, but I'm sure practice differ among servers. Apache seems to serve .tar.gz files as: Content-Type: application/x-tar Content-Encoding: x-gzip and I think that is exactly as is should be. > It may not be what the RFCs originally intended, but modern web server > implementations of on-the-fly compression in my experience always use > content-encoding rather than transfer-encoding. Could it have something to do with what MSIE implements? > I've written a server-side > plug-in to do this on the Netscape/iPlanet web server, and have done fairly > extensive research on what's out there in Apache, etc. I'm not opposed to adding stuff to LWP that let you undo Content-Encoding, but it needs to be enabled explictly to make it backwards compatible. LWP currently has code that tries to parse the head section of text/html documents to extract headers, meta and the base. This code fails when the document is compressed, so there is actually need for undo-content-encoding-support in the LWP core. I think most users would be served well with an option that simply tells LWP to try to undo content-encoding for any text/* content, but I'm also thinking that LWP should have some kind of generic filtering mechanism similar to Perl's IO layers. That should be able to deal with content-encoding and might even turn the content into Unicode strings and similar based on the charset parameter. Regards, Gisle