The script below reduces the problem to its simplest. Notice the
deadly caveats. In my experience (and I have war stories too) the
harder one tries with Perl/Unicode the worse the mess you get into.
You can probably forget about locale -- try “use encoding (":locale")”
in the script below and see what you get! -- and lots of other things.
It's certainly a jungle, and it's growing, but it's getting tidier.
#!/usr/bin/perl
#
# In BBEdit/TextWrangler set this document's
# encoding to Japanese (Shift JIS); always open/reopen
# as Japanese (Shift JIS).
#
# In BBEdit/TextWrangler Preferences/Unix Scripting
# check “use UTF-8” for Unix Script I/O.
#
# When running in Terminal set Window Settings...
# [Display] [Character Set Encoding] to “Unicode (UTF-8)”.
#
### use utf8; # NO !!
# no encoding; # OK, optional
# binmode STDOUT, "UTF-8"; # OK, optional
### binmode STDOUT, ":utf8"; ### NO !! Quite different !!
use Encode qw~from_to~;
while (<DATA>) { /^#/ and next;
from_to ($_, "Shift_JIS", "utf8");
print
}
__DATA__
# Must not contain non-Shift_JIS characters
空欄を埋めたり、完全な文書で質問に答えたり、
一番適切に思う解答を〇で記したりする。
##################################################
That's a nice little script to have on the list, for reference.
Now, as far as my little problem goes, I was able to get some success
with the following:
-----------------snippet------------------
use encoding( 'Shift_JIS' );
...
my $query = new CGI;
...
my $fileToSend = $query->param( 'file-to-send' );
my $FileSent = $query->param( 'FileSent' );
...
elsif ( $FileSent )
{
my $fh;
if ( !defined( $fileToSend ) || length( $fileToSend ) < 1 || !( $fh
= $query->upload( 'file-to-send' ) ) )
{ print $query->header(-status=>$error),
$query->start_html( 'Bad request' ),
$query->h2( 'Failed to find or open file, maybe bad
file name selected.' ),
$query->strong( "Upload request for $fileToSend not
processed." );
exit 0;
}
my $type = $query->uploadInfo( $fileToSend )->{ 'Content-Type' };
if ( $type ne 'text/plain' )
{ print $query->header(-status=>$error),
$query->start_html( 'Bad file type' ),
$query->h2( 'File type must be plain text.' ),
$query->strong( 'Request not processed.' );
exit 0;
}
# One line at a time is STILL not safe if length not already
checked.
# Doing this one line at a time to handle the shift JIS problem,
somehow.
my @fileLines = ();
my $line = '';
# binmode( $fh, ":raw :encoding(Shift_JIS)" );
binmode( $fh, ":raw :utf8" ); # As best as I understand, this
should be wrong.
# binmode( $fh, ":raw" );
while ( $line = <$fh> )
{
my @hexdump = unpack( 'C256', $line ); # debug
my $hexdumpstring = ''; # debug
foreach my $byte ( @hexdump ) # debug
{ $hexdumpstring .= sprintf( '%02x ', $byte ); # debug YUCK!
} # debug
push( @fileLines, $line );
push( @fileLines, $hexdumpstring . "\n" ); # debug
}
@words = @fileLines;
...
---------------end-snippet----------------
This is in spite of the headers, the XML declaration, and the HTML
header meta declaration all declaring the document to be shift-JIS, and
the source itself declaring "use encoding( 'Shift_JIS' );". I should
probably expect that I muffed it when I compiled perl, but I'll need to
push the whole thing onto my Linux/BSD box, bring up apache over there,
and compare notes to have a decent idea what's going on.
In the meantime, Firefox on Linux is no longer uploading the file at
all.
Joel