I'm working on it in my spare time, I attach my current prototype patch.
I have almost completed this, it's only about 400 lines of new code,
mostly in i18n/Sets.st. I have defined a new UnicodeString class, and
modified Character to have support for characters whose Unicode code
point is > 255. For ease of testing and usage, also, I've defined a
syntax $<279> that allows you to refer to a Character by its ASCII
value. It's equivalent to "279 asCharacter" -- I could have instead
inlined this at compile-time, but I prefer to have also a more compact
syntax.
The changes are mostly backwards compatible, but characters should *not*
be compared with ==, but with = unless you're sure the code point is <=
255. Similarly, they should *not* be printed with nextPut:, but with
display:, unless you're sure the code point is <= 127.
What follows is some use cases. This is in a UTF-8 locale but (subject
to the capabilities of your system's iconv function) it works as well
for every other locale.
I am not very expert in the *needs* of people using Unicode, so can you
please confirm that it is (close to) what you need? In particular, I'd
like feedback on what to do when in transcoding is not enabled, because
right now the behavior is inconsistent: see the notes preceded by ***.
Without the I18N package, the behavior is not complete and you can
store, but not print Unicode characters correctly:
Printing a Unicode character:
st> $<279> printNl!
$<16r0117>
Converting a Unicode character to String:
*** maybe should consider returning '?'
st> $<279> asString printNl!
error: Invalid argument <16r0117>: argument must be between $<0> and
$<16r00FF>
Converting a Unicode character to a UTF-32 String:
st> ($<279> asUnicodeString) printNl!
'<16r0117>'
Converting a UTF-32 String with a Unicode character to a byte-encoded
String:
*** maybe should give an error instead
st> $<279> asUnicodeString asString printNl!
'?'
Asking the number of characters to the resulting Strings:
st> $<279> asUnicodeString numberOfCharacters printNl!
1
st> $<279> asUnicodeString asString numberOfCharacters printNl!
error: should not be implemented in this class
Converting ByteArrays or Strings to UnicodeStrings:
st> #[196 151] asUnicodeString first printNl!
error: should not be implemented in this class
-----
After loading the I18N package, everything is much better:
Printing a Unicode character:
st> $<279> printNl!
$ė
Converting a Unicode character to String:
st> $<279> asString printNl!
'ė'
Converting a Unicode character to a UTF-32 String, and then back just by
printing it:
st> ($<279> asUnicodeString) printNl!
'ė'
Converting a UTF-32 String with a Unicode character to a byte-encoded
String:
st> $<279> asUnicodeString asString printNl!
'ė'
Asking the number of characters to the resulting Strings:
st> $<279> asUnicodeString numberOfCharacters printNl!
1
st> $<279> asUnicodeString asString numberOfCharacters printNl!
1
Converting ByteArrays or Strings to UnicodeStrings:
st> #[196 151] asUnicodeString first printNl!
$ė
st> #[196 151] asUnicodeString size printNl!
1
st> #[196 151] asUnicodeString numberOfCharacters printNl!
1
Paolo
_______________________________________________
help-smalltalk mailing list
[email protected]
http://lists.gnu.org/mailman/listinfo/help-smalltalk