On Sep 27, 12:27 pm, [EMAIL PROTECTED] wrote:
> I am trying to use perl on the command line to process text files in
> various ways, one of which is to decode html entities. As far as I can
> see, the following line should work
>
> perl -MHTML::Entities -p -e 'decode_entities($_)'  <input.txt
>
> >output.txt
>
> it does indeed change the html entities, but not into the required
> characters, rather into pairs of unusual characters; and the command
> line returns this:
>
> Wide character in print, <> line 1.
>
> It seems to me it is something to do with internal character encoding
> being messed up but I can't work out how to control it.

Before you can control it you need to know what it is.

>The text files
> processed have MacOS character encoding which is required in the
> finished file,

What is "MacOS character encoding"?

> but perhaps I need to convert to UTF8 before processing
> and back again after?

Perl will do this automatically if you tell it the encoding of the
input and output.

> perl -MHTML::Entities -p -e 'decode_entities($_)'  <input.txt

I think you need something like

perl -MHTML::Entities -p -e "BEGIN { binmode STDIN,
':encoding(whatever)'; binmode STDOUT, ':encoding(whatever)' }
decode_entities($_)"

Where "whatever" is the name Perl uses for that which you are calling
"MacOS character encoding".

For a list of supported encodings:

perldoc Encode::Supported


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/


Reply via email to