Re: Accept-Encoding defaults

Gisle Aas Tue, 15 Jul 2008 04:25:31 -0700

On Sun, Jul 6, 2008 at 7:06 PM, Bill Moseley <[EMAIL PROTECTED]> wrote:
> On Sun, Jul 06, 2008 at 02:36:10PM +0200, Gisle Aas wrote:
>> >    $req->set_default_accept_encoding;
>>
>> I don't like defaults to be set at that level given that we already
>> have a $ua->default_header() method, so I think it should be something
>> like:
>>
>>    $ua->default_header("Accept-Encoding", join(",",
>> HTTP::Message::decodable()));
>
> I saw this yesterday, too, about in the absence of an Accept-Encoding
> server "MAY" send any encoding.
>
>    http://use.perl.org/~rhesa/journal/25952
>
> That may be one argument for having a default, but in practice I'd
> expect it very rare for a server to compress w/o an Accept-Encoding
> header sent by the client.


I think it's correct to not have a default by default.  The library
can't really know what's acceptable without the app telling it.  If
the app for instance wants to mirror the file then the encoding does
not matter.  All it cares about is storing the bytes as sent from the
server.

I've now implemented a HTTP::Message::decodable() function with this patch:

  
http://gitorious.org/projects/libwww-perl/repos/mainline/commits/7721a0b0dbf4b4f8c2201c5e70b7c3762137a8dc

Please review.

>> > I'm not clear if there's a need to also specify a quality for the
>> > encodings in the Accept-Encoding header.
>>
>> I don't think we need to worry about this initially.
>
> And the RFC says qvalues are not permitted with x-gzip and z-compress.

Good.

>
>> > I kind of wonder why $res->content is not decoded by default (and
>> > provide $res->raw_content for those that need it).
>>
>> It's mostly because of history and compatibility with the original
>> content() method.  Both are useful in different contexts.  I don't
>> find the current situation bad. Since decoded_content() can be
>> expensive and can fail I think the longer name makes it obvious what's
>> going on how you should use it.
>
> Agreed, it's not something that could change.  I was just lamenting
> how often I see $res->content used in existing programs and modules.
>
> I don't see using $res->decoded_content as more expensive.   If you
> need decoded content (which is likely the typical use) then you have
> to decode it -- no way around that.

Right.  But there are apps that don't need this too.  Also knowing
that it might be expensive might give you the hint needed to
understand that you might want to cache the result of decoded_content
instead of calling it over and over for the same message.  That is
code like this might not be a good idea:

   if ($res->decoded_content =~ /<html>/) {
       print $res->decoded_content
   }
   elsif ($res->decoded_content =~ /foo/) {
       #...
   }
   elsif ($res->decoded_content =~ /bar/) {
       # ...
   }


>
>
> I can only guess that the beginners are more likely to use
> $res->content directly (as that's the example in the SYNOPSIS)  and
> they perhaps are on slower connections where compression would help
> both the server and client.  But, it's not breaking anything to not
> use compression.
>
> Ignoring decoding (charset), on the other hand, is probably wrong in
> most cases -- even though it's easy to ignore.
>
>
> You have this in the SYNOPSIS of LWP::UserAgent:
>
>        if ($response->is_success) {
>            print $response->content;  # or whatever
>        }
>
> which is, perhaps accidentally, correct since you are printing
> un-decoded (charset) content.  But, I doubt most users are just using
> LWP to print content out directly.
>
> How would you feel about providing new users with more guidance in the
> SYNOPSIS?  That is, use decoded_content in the synopsis for those of
> us that often don't get past that section of the man page.
>
>        if ($response->is_success) {
>            $content = $response->decoded_content;
>        }

I've changed this occurence now.  I also noticed that other places
already used decoded_content in similar examples so this is just an
oversight because these examples predates the decoded_content method.

>
> Now, I suspect that LWP::Simple really should be returning
> decoded_content -- but again, I don't know how to to change that one
> without breaking a large number of existing scripts.

I'm not sure either.  Perhaps just document it better.

>
>
> I think I asked about this some time ago, but might be good for
> HTTP::Message to have decoded_content wrap two methods for
> un-compressing and the charset decoding.  There might be a case where
> we would want uncompressing but not decoding.

You get that by calling $mess->decoded_content(charset => "none")

I don't think there is a use-case for trying to decode the charset
without undoing the Content-Encoding first.

> Hum, I'm not clear about this, but I wonder if the response content is XML
> that will be passed to, say, XML::LibXML should it be passed decoded
> or not.

You just have to read the documentation for XML::LibXML to find out
what kind of input it expects.  Does it want Unicode strings or bytes?
 LWP can provide both.

--Gisle

Re: Accept-Encoding defaults

Reply via email to