Hello Georg,
On Friday, April 1, 2005 at 12:01:15 PM +0200, Georg Bauhaus wrote:
The apostrophy might have been typed as an accent (acute) really
Most probably the RIGHT SINGLE QUOTATION MARK U+2019, , encoded
in UTF-8, then wrongly seen as being CP-1252. It would look like
(a
The solution is to explicitly set the character encoding to utf-8. I do this
in the aspx file's head section and it works fine.
This is kinda wierd though as with an aspx file, it seems that dotnet will
always insert this charset header for you by default (you can see this by
running wget
Message-
From: Hrvoje Niksic [mailto:[EMAIL PROTECTED]
Sent: March 31, 2005 3:19 PM
To: Alan Hunter
Cc: 'wget@sunsite.dk'
Subject: Re: Character encoding
I'm not sure what causes this problem, but I suspect it does not come
from Wget doing something wrong. That Notepad opens the file
areaĆ¢(tm)s = area's. The spidered html page
does then not display properly in IE/Firefox as the char is not decoded.
Is this correct
behaviour, any idea how to fix? It happens onnumber of
chars not just the quote, but I use that as an example. I am not an expert on
character encoding so go easy.
Thanks.
Wget shouldn't alter the page contents, except for converted links.
Is the funny character in places which Wget should know about
(e.g. URLs in links) or in the page text? Could you page a minimal
excerpt from the page, before and after garbling done by Wget?
Alternately, could you post a URL
Hi,
Thanks for the reply. It is the page text that is the problem.
When I started to investigate it further I found that it actually only
happens when the page being wgot is a .aspx (.net asp) file.
I made 3 identical files (as below), one with .html ext, 1 with .aspx ext
and one with .zzz
I'm not sure what causes this problem, but I suspect it does not come
from Wget doing something wrong. That Notepad opens the file
correctly is indicative enough.
Maybe those browsers don't understand UTF-8 (or other) encoding of
Unicode when the file is opened on-disk?