Thanks for this Gisle.

Now I know where to start, I'll take a look at the code and see if it's within my abilities to patch it. I notice that Doc types and XML declarations are OK so it must be choosy about what it allows and disallows (sensibly enough!)

Phil.


----- Original Message ----- From: "Gisle Aas" <[EMAIL PROTECTED]>
To: "Phil Archer" <[EMAIL PROTECTED]>
Cc: "libwww list" <[EMAIL PROTECTED]>
Sent: Thursday, October 07, 2004 10:26 AM
Subject: Re: Byte Order Mark mucks up headers



"Phil Archer" <[EMAIL PROTECTED]> writes:

I've read Sean Burke's book, I've looked through the archives of this
list and done other searches but can't find an answer to a problem I
have found with LWP. If the character coding for a website has a byte
order mark (things like utf-16, all that "big endian/little endian"
stuff) then LWP can't interpret HTML headers in the usual way. Does
anyone know a way around this?

HML::HeadParser needs to be fixed. It will assume that there is no <head> section when it sees text before anything else. The part of the code responsible for this currently allows whitespace, but needs to be tought that BOM is harmless too. Look at the 'text' method.

Do you want to try to provide a patch?

Regards,
Gisle




Reply via email to