Re: RFC: WWW::Mechanize::Compress or LWP patch?

Gisle Aas Wed, 03 Dec 2003 06:37:21 -0800

"David Carter" <[EMAIL PROTECTED]> writes:

> My understanding had always been that content-encoding (when talking about
> compression) is in practical terms no different than transfer-encoding. LWP
> already handles transfer-encoding (gzip or deflate), so what's the big deal
> about it also handling content-encoding compression in a transparent manner?


Tranfer-Encoding and Content-Encoding works at different levels in the
HTTP protocol.  It makes perfect sense to handle Transfer-Encoding
transparantly in a client library.  It does not make sense to try to
hide Content-Encoding in the same way.

> My suggestion would be to make it the default to handle it transparently,
> but provide an option to turn it off if someone needs access to the raw
> datastream. All GUI browsers "just do it" - the user doesn't have to be
> concerned with either content-encoding or transfer-encoding.

I disagree.  LWP is not a GUI browser and should not hide
content-encoding by default.

> If you have a file in .tar.gz format, the web server should NOT return a
> content-encoding: gzip header.

Sure it should.  Especially if the Content-Type header describe the
type of document you end up with after you 'gunzip' it.

>                              This would incur redundant processing costs
> on the server & the client, attempting to re-compress an already compressed
> file for little or no gain. Instead, the server would send an appropriate
> mime type indicating to the client that this is a compressed archive file
> (usually handled in a GUI client by presenting a file download dialog box). 

I disagree here, but I'm sure practice differ among servers.  Apache
seems to serve .tar.gz files as:

   Content-Type: application/x-tar
   Content-Encoding: x-gzip

and I think that is exactly as is should be.

> It may not be what the RFCs originally intended, but modern web server
> implementations of on-the-fly compression in my experience always use
> content-encoding rather than transfer-encoding.

Could it have something to do with what MSIE implements?

>                                                  I've written a server-side
> plug-in to do this on the Netscape/iPlanet web server, and have done fairly
> extensive research on what's out there in Apache, etc. 

I'm not opposed to adding stuff to LWP that let you undo
Content-Encoding, but it needs to be enabled explictly to make it
backwards compatible.

LWP currently has code that tries to parse the head section of
text/html documents to extract headers, meta and the base.  This code
fails when the document is compressed, so there is actually need for
undo-content-encoding-support in the LWP core.

I think most users would be served well with an option that simply
tells LWP to try to undo content-encoding for any text/* content, but
I'm also thinking that LWP should have some kind of generic filtering
mechanism similar to Perl's IO layers.  That should be able to deal
with content-encoding and might even turn the content into Unicode
strings and similar based on the charset parameter.

Regards,
Gisle

Re: RFC: WWW::Mechanize::Compress or LWP patch?

Reply via email to