The script below reduces the problem to its simplest. Notice the deadly caveats. In my experience (and I have war stories too) the harder one tries with Perl/Unicode the worse the mess you get into. You can probably forget about locale -- try “use encoding (":locale")” in the script below and see what you get! -- and lots of other things. It's certainly a jungle, and it's growing, but it's getting tidier.

#!/usr/bin/perl
#
#  In BBEdit/TextWrangler set this document's
#  encoding to Japanese (Shift JIS); always open/reopen
#  as Japanese (Shift JIS).
#
#  In BBEdit/TextWrangler Preferences/Unix Scripting
#  check “use UTF-8” for Unix Script I/O.
#
#  When running in Terminal set Window Settings...
# [Display] [Character Set Encoding] to “Unicode (UTF-8)”.
#
### use utf8; # NO !!
# no encoding; # OK, optional
# binmode STDOUT, "UTF-8"; # OK, optional
### binmode STDOUT, ":utf8"; ### NO !! Quite different !!
use Encode qw~from_to~;
while (<DATA>) { /^#/ and next;
        from_to ($_, "Shift_JIS", "utf8");
        print
}
__DATA__
# Must not contain non-Shift_JIS characters
空欄を埋めたり、完全な文書で質問に答えたり、
一番適切に思う解答を〇で記したりする。
##################################################


That's a nice little script to have on the list, for reference.

Now, as far as my little problem goes, I was able to get some success with the following:

-----------------snippet------------------
use encoding( 'Shift_JIS' );
...
my $query = new CGI;
...
my $fileToSend = $query->param( 'file-to-send' );
my $FileSent = $query->param( 'FileSent' );
...
elsif ( $FileSent )
{
    my $fh;
if ( !defined( $fileToSend ) || length( $fileToSend ) < 1 || !( $fh = $query->upload( 'file-to-send' ) ) )
    {   print $query->header(-status=>$error),
                $query->start_html( 'Bad request' ),
$query->h2( 'Failed to find or open file, maybe bad file name selected.' ), $query->strong( "Upload request for $fileToSend not processed." );
        exit 0;
    }
    my $type = $query->uploadInfo( $fileToSend )->{ 'Content-Type' };
    if ( $type ne 'text/plain' )
    {   print $query->header(-status=>$error),
                $query->start_html( 'Bad file type' ),
                $query->h2( 'File type must be plain text.' ),
                $query->strong( 'Request not processed.' );
        exit 0;
    }

# One line at a time is STILL not safe if length not already checked. # Doing this one line at a time to handle the shift JIS problem, somehow.
    my @fileLines = ();
    my $line = '';
    # binmode( $fh, ":raw :encoding(Shift_JIS)" );
binmode( $fh, ":raw :utf8" ); # As best as I understand, this should be wrong.
    # binmode( $fh, ":raw" );
    while ( $line = <$fh> )
    {
        my @hexdump = unpack( 'C256', $line );  # debug
        my $hexdumpstring = ''; # debug
        foreach my $byte ( @hexdump )   # debug
        {       $hexdumpstring .= sprintf( '%02x ', $byte );    # debug YUCK!
        }       # debug
        push( @fileLines, $line );
        push( @fileLines, $hexdumpstring . "\n" );    # debug
    }

    @words = @fileLines;
...
---------------end-snippet----------------

This is in spite of the headers, the XML declaration, and the HTML header meta declaration all declaring the document to be shift-JIS, and the source itself declaring "use encoding( 'Shift_JIS' );". I should probably expect that I muffed it when I compiled perl, but I'll need to push the whole thing onto my Linux/BSD box, bring up apache over there, and compare notes to have a decent idea what's going on.

In the meantime, Firefox on Linux is no longer uploading the file at all.

Joel

Reply via email to