Joshua Vickery <[EMAIL PROTECTED]> writes:
> While working with a perl proxy server built in house I found that I was
> getting strange behavior from recent builds of Mozilla. A discussion of
> the bug is available here:
>
> http://bugzilla.mozilla.org/show_bug.cgi?id=92140
>
> After digging a little deeper I found that it seems that
> LPW::UserAgent->request and LWP::UserAgent->simple_request return invalid
> HTTP headers for some web pages. Here is a simple test script:
>
> ==========================================================================
> #!/usr/bin/perl
>
> use LWP::UserAgent;
> use HTTP::Request;
> use HTTP::Response;
>
> $ua = new LWP::UserAgent;
> $req = HTTP::Request->new("GET", 'http://www.math.grin.edu/');
>
> $response = $ua->request($req);
>
> print "Response is:".$response->as_string()."\n";
> ==========================================================================
>
> In this case, the response returns two 'Content-Type' fields with two
> different values, and according to the folks at Mozilla "the BNF
> definition of Content-Type in RFC 2616, Section 14.17 does not allow multiple
> values for Content-Type." I suspect that what is happening here is that perl
> is parsing the HTML and extracting a second Content-Type declaration from one
> of the Meta tags in the html document, and then storing that as a header.
This is very likely to be what is happening. You have two options for
dealing with that:
1) tell LWP not to add headers from the <head> of the HTML by turning
off the 'parse_head' attribute:
$ua->parse_head(0);
2) post-process the request to remove the extra header with something like:
$ua->content_type(($ua->header("Content-Type"))[0]);
> I believe that this behavior is due to the UserAgent because using telnet I
> do not get multiple 'Content-Type' definitions in the response from the
> server. The link at the top of this message has more information on the
> matter. I am working out a workaround in the proxy server, but I wonder if
> this is not something that should be addressed in the libwww-perl codebase.
What do you think it should do? We could have the Content-type in the
<head> always override the content-type in the response headers, but
that might throw out information and I don't like that. We could have
LWP not override the header, but then you often loose the extra
charset parameter that is often what is added in this <head> version
of the header. The current way might give you surprises, but it does
not throw away information.
Regards,
Gisle