Hi all,

I wonder if any of you experienced the following phenomenon:

I am trying to download TSV (Tab-Separated-Values) contents from an HTTP
server using the WWW::Mechanize package. On the server the encoding is
UNICODE UTF-8.

When I capture the exchange of the script with the server using WireShark,
it is definitely the Windows end-of-line '\r\n' (0x0d0a) which flows across
the interface.

In the script I also have:
        <$mech->add_header('Accept-Charset','utf-8;q=0.7,*;q=0.7');>

Unfortunately, using for example $mech->content(...) I get content with a
three bytes end-of-line '\r\r\n' (0x0d0d0a) for each '\r\n' in the TSV.

I get the same result when I use the <$mech->save_content($filename);>
method.

However, when I use <$mech->get($url, ':content_file' => $filename);> I get
a file with the *correct* end-of-line!

Yes, I know! It is not difficult to RegEx the content right. But still, may
be there is a bug lurking around here...

Anyone care to elucidate?

Regards,
Meir

Reply via email to