I don't know why this mail came to me, but since I was given a
>>>>> "MM" == Motomichi Matsuzaki <[EMAIL PROTECTED]> writes:
MM> * filenames recorded on Unix filesystems (e.g. FFS, MFS) use
MM> an arbitrary codeset, for example Unicode.
Rather, let's use "codepage + codeset" information, so that we can
find the difference between Chinese "BONE" and Japanese "BONE".
WE NEED THEM TO BE DIFFERENT, YOU KNOW.
For example, save filename using 64bit per character, containing
codepage with 32bit, and codeset using UCS-4.
# We have enough diskspace and Memory to handle them, don't worry.
# And even if we didn't now, we will, within 2 years.
Do normalization for codepage against those characters that will not
be effected by codepage, so that comparison will be easier.
Many might say it's rediculous to have filename encoding different
from system call interface coding systems. But this is so only
because BUGGY UNICODE is current trend.
If we could have codeset that does not need codepage, the problem
did not occur. And the very reason why we happend to have this BUGGY
UNICODE, is because they stint bits. We should not do the same
So, there's only two selection.
1) Let's use Unicode for interface, and let's have large enough
bits per characeter internally... like 256bits/character.
2) Let's create Truely Unified coding system, which not only allow
us to describe the "currently used language", but also,
exterminated languages like Cuneiform Characters as well.
And use it for internally, and interface.
( This, also requres lot larger coding space than current.
I think we do need 256bits anyway ).
Kenichi Okuyama@Tokyo Research Lab, IBM-Japan, Co.
To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message