Re: bug in perl or in my head ;-)

Jay Savage Tue, 19 Jun 2007 05:33:16 -0700

On 6/18/07, Martin Barth <[EMAIL PROTECTED]> wrote:

Hi there,
have a look at:


<snip>
% cat datei
eine test datei
die "u "a "o
% file datei
datei: ASCII text
% cp datei datei.bk
% perl -wpi -e 'use encoding "utf8"; s/"a/ä/' datei
% file datei
datei: ISO-8859 text
% perl -wp -e 'use encoding "utf8"; s/"a/ä/' datei.bk > datei.neu
% file datei.neu
datei.neu: UTF-8 Unicode text
</snip>

I'm a bit confused. Both files should be utf8??
( my xterm is utf8 )

Regards
Martin


Martin,

You haven't told us what Perl thinks the encoding of the first file
is. file is a system command that makes use of number of different
approaches to determine file type including, on some systems, I think
it even makes use of metadata. Actually examining the data in the file
is time-consuming, and therefore a method of last resort, employed
only when some other context doesn't match. It also returns the first
match, not all matches.

Since the -i switch is processed prior to any data being written, it's
entirely possible that file's view of the file doesn't match the
actually encoding of the stream being written. Read some data into a
Perl script and see what Perl thinks it is. My guess is that the data
is actually utf-8, but file mistakenly assumes it's in the default
local encoding for some reason.

At the command line, you can use the -C switch to avoid confusion.

Best,

-- jay
--------------------------------------------------
This email and attachment(s): [  ] blogable; [ x ] ask first; [  ]
private and confidential

daggerquill [at] gmail [dot] com
http://www.tuaw.com  http://www.downloadsquad.com  http://www.engatiki.org

values of β will give rise to dom!

Re: bug in perl or in my head ;-)

Reply via email to