On Fri, Dec 26, 2008 at 15:03, John Refior <jref...@gmail.com> wrote:
snip
> The problem I am having is that a number of these webpages have special
> multibyte characters on them, such as the trademark symbol and registered
> trademark symbol.  For example, in the CSV, the trademark (TM) symbol
> shows up like
>
>   â„¢
>
> Now that's fine in a way, because if I redisplay them on a webpage with
> <meta charset='utf-8'>, Firefox and Internet Explorer display them as
> intended.
snip

The file is already in UTF-8, otherwise it wouldn't display properly
in Firefox or IE.  The problem is either your display or perl doesn't
know that the file is in UTF-8.

The first step is make sure Perl knows it is working with UTF-8.  Add

export PERL_UNICODE=SDL

to your .profile, .bashrc, or whatever you use for your profile.
Logout and log back in.  This tells perl to use UTF-8 for STDIN,
STDOUT, and STDERR (the S), input and output streams (the D), and all
of it dependent on locale (the L).   The next thing to check is the
value in your LANG environment variable.  It should be something like
en_US.UTF-8.  If you are still having problems check to see if your
terminal is expecting something other than UTF-8 (this is highly
dependent on the terminal, so you will need to tell us what terminal
you are using).

-- 
Chas. Owens
wonkden.net
The most important skill a programmer can have is the ability to read.

Reply via email to