NTFS allows you to store filenames in some liberal Microsoft
interpretation of UTF-16. [1] Windows Explorer makes that easy
to test, just save a file as something Chinese and there you go.

[1] http://blogs.msdn.com/b/michkap/archive/2006/09/10/748699.aspx

How do you save a file with a Chinese or other name requiring
Unicode using Active Perl?

Here's a script to save a file as Катюша.txt (Katyusha) and just
won't work using any of four encodings and even none at all! What
do I have to do in order to just save my Катюша.txt ?

          \,,,/
          (o o)
------oOOo-(_)-oOOo------
use 5.010;
use utf8;
use strict;
use warnings;
use Encode qw/encode/;

my $chars = 'Катюша'; # say length $chars;

my $count = 0;
for ( '', qw/UTF-16 UTF-16BE UTF-16LE UTF-8/ ) {
        say 'encoding: ', $_;
        my $n1 = $chars . '.' . ++$count . '.txt';
        my $n2 = $_ ? encode( $_, $n1 ) : $n1;
        if ( open my $fh, '>:encoding(UTF-16)', $n2 ) {
                print $fh $chars, "\n";
                close $fh;
        }
        else {
                warn "open $n2: $!";
        }
}

The output of this script in cmd.exe using CHCP 1252 is:

encoding:
encoding: UTF-16
open þÿBNH0 . 2 . t x t: Invalid argument at ntfs_uni_filename.pl line
19.
encoding: UTF-16BE
open BNH0 . 3 . t x t: Invalid argument at ntfs_uni_filename.pl line
19.
encoding: UTF-16LE
open BNH. 4 . t x t : Invalid argument at ntfs_uni_filename.pl line
19.
encoding: UTF-8

The filenames it manages to save are disfigured:

  07.01.2012  20:44                17 Катюша.1.txt
  07.01.2012  20:44                17 Катюша.5.txt

Tested versions, outcome always as described:

*  v5.10.1  built for MSWin32-x86-multi-thread
* (v5.12.3) built for MSWin32-x64-multi-thread
* (v5.12.4) built for MSWin32-x86-multi-thread
* (v5.14.0) built for MSWin32-x64-multi-thread
* (v5.14.1) built for MSWin32-x86-multi-thread

Note that DIR in cmd.exe otherwise has no troubles displaying Russian
or Greek filenames; there are problems only with Chinese, Arab and such
exotic stuff, and only because the font I'm using doesn't support those
scripts.

Cygwin perl 5.10.1, by the way, displayed no errors and got it right:

  07.01.2012  20:51                16 Катюша.1.txt
  07.01.2012  20:51                16 0BNH0
  07.01.2012  20:51                16 0BNH0
  07.01.2012  20:51                16 0BNH0.
  07.01.2012  20:51                16 Катюша.5.txt

You can feed either a character string or a UTF-8 octet string to this
Cygwin perl.exe open() and it creates the proper filename, proving that
it's not technically impossible. :)

-- 
Michael Ludwig
_______________________________________________
ActivePerl mailing list
ActivePerl@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

Reply via email to