Joshua Vickery <[EMAIL PROTECTED]> writes:

> While working with a perl proxy server built in house I found that I was
> getting strange behavior from recent builds of Mozilla.   A discussion of
> the bug is available here:
> 
> http://bugzilla.mozilla.org/show_bug.cgi?id=92140
> 
> After digging a little deeper I found that it seems that
> LPW::UserAgent->request and LWP::UserAgent->simple_request return invalid
> HTTP headers for some web pages.  Here is a simple test script:
> 
> ==========================================================================
> #!/usr/bin/perl
> 
> use LWP::UserAgent;
> use HTTP::Request;
> use HTTP::Response;
> 
> $ua = new LWP::UserAgent;
> $req = HTTP::Request->new("GET", 'http://www.math.grin.edu/');
> 
> $response = $ua->request($req);
> 
> print "Response is:".$response->as_string()."\n";
> ==========================================================================
> 
> In this case, the response returns two 'Content-Type' fields with two
> different values, and according to the folks at Mozilla "the BNF
> definition of Content-Type in RFC 2616, Section 14.17 does not allow multiple 
> values for Content-Type." I suspect that what is happening here is that perl 
> is parsing the HTML and extracting a second Content-Type declaration from one 
> of the Meta tags in the html document, and then storing that as a header.

This is very likely to be what is happening.  You have two options for
dealing with that:

  1) tell LWP not to add headers from the <head> of the HTML by turning
     off the 'parse_head' attribute:

        $ua->parse_head(0);

  2) post-process the request to remove the extra header with something like:

        $ua->content_type(($ua->header("Content-Type"))[0]);

> I believe that this behavior is due to the UserAgent because using telnet I
> do not get multiple 'Content-Type' definitions in the response from the
> server.  The link at the top of this message has more information on the
> matter.  I am working out a workaround in the proxy server, but I wonder if 
> this is not something that should be addressed in the libwww-perl codebase.

What do you think it should do?  We could have the Content-type in the
<head> always override the content-type in the response headers, but
that might throw out information and I don't like that.  We could have
LWP not override the header, but then you often loose the extra
charset parameter that is often what is added in this <head> version
of the header.  The current way might give you surprises, but it does
not throw away information.

Regards,
Gisle

Reply via email to