At 12:00 am +0900 27/12/05, Joel Rees wrote:
I'll have to tell you a war story or two, sometime.
Unicode is a kludge. It's one of the better kludges, and evidence
that kludges make the world go round...
Just as well it doesn't rely on iso-2022-jp or us-ascii. It's not
Unicode that is the kludge; Unicode is simply the assignment of a
unique character to a large range of numbers rather than the
assignment of an arbitrary number of characters to a range any
American president can conceive of. The present temporary problems
with Unicode arise only from a long anarchic heritage of monumental
kludges.
...The frustrating thing about this is that I've been here before,
about three years back when the perl implementation wasn't quite as
complete, but I can't remember what I did, and I don't have access
to the code I built then anymore.
I have the same problem again and again with a mere hour's interval!
The script below reduces the problem to its simplest. Notice the
deadly caveats. In my experience (and I have war stories too) the
harder one tries with Perl/Unicode the worse the mess you get into.
You can probably forget about locale -- try “use encoding
(":locale")” in the script below and see what you get! -- and lots
of other things. It's certainly a jungle, and it's growing, but it's
getting tidier.
#!/usr/bin/perl
#
# In BBEdit/TextWrangler set this document's
# encoding to Japanese (Shift JIS); always open/reopen
# as Japanese (Shift JIS).
#
# In BBEdit/TextWrangler Preferences/Unix Scripting
# check “use UTF-8” for Unix Script I/O.
#
# When running in Terminal set Window Settings...
# [Display] [Character Set Encoding] to “Unicode (UTF-8)”.
#
### use utf8; # NO !!
# no encoding; # OK, optional
# binmode STDOUT, "UTF-8"; # OK, optional
### binmode STDOUT, ":utf8"; ### NO !! Quite different !!
use Encode qw~from_to~;
while (<DATA>) { /^#/ and next;
from_to ($_, "Shift_JIS", "utf8");
print
}
__DATA__
# Must not contain non-Shift_JIS characters
空欄を埋めたり、完全な文書で質問に答えたり、
一番適切に思う解答を〇で記したりする。
##################################################