On Wed, Jun 26, 2013 at 7:45 AM, Peter Gordon <[email protected]>wrote:
> On Wed, 26 Jun 2013 12:36:01 +1200, Gregory Machin wrote:
> >
> >Looks like the data already is utf8, but the header of the XML
> >specifies otherwise.
> >How do you parse the data? Can you give us a short example file?
> >
> >Jenda
>
> This is a bit of code I adapt to whichever encoding I require.
>
> use open ":encoding(UTF-16le)";
> while( <> ) {
> s/\x{FF}\x{FE}|\x{FFFF}//; # Remove BOM.
> s/[\x0A\x0D]+$//; # Remove CR LF
>
> If you can get the data into a text editor which has a "convert" option,
> you can use it to either find out the encoding &/or change it to utf8.
> If you have a file with mixed encodings, you have my sympathies.
>
Encode::Guess may occasionally be useful:
>
use Encode::Guess;
my $decoder=Encode::Guess->guess("Grégoire");
die $decoder unless $decoder;
print $decoder->name; #---> utf8
--
Charles DeRykus