begin quoting Christopher Smith as of Tue, Oct 25, 2005 at 05:27:38PM -0700: > Stewart Stremler wrote: [snip] > > I don't think the problem is ASCII -- that's the sort of simple mapping > > that's capable of being well-defined and standardized. > > You're right that the problem isn't ASCII. The problem is that there > isn't really a canonical spelling of Tchaikovsky in ASCII, and > standardizing on one harder and more problematic than simply using the > Cyrillic representation, particularly when you need to standardize on > all the Tchaikovsky's out there.
Even within ASCII, there's more than one way to spell Shakespeare. That problem isn't really resolved by choosing a glyph-set. And if you have glyphs that look similiar in some font, the problem comes back... and so allowing all glyphs wasn't really a solution anyway. > > i18n and l10n are examples of that simple mapping *within* a language. > > (It's not like "internationalization" is hard to spell or anything. Or > > type, if you aren't hunting-and-pecking your way around the keyboard.) > > I'm not a hunt-and-pecker, and I make enough typeohs without spelling > out internationalization all the time. Presumably your have software that can help. :) > More importantly, I can glance at > i18n and recognize what it represents much more quickly than > internationalization. If you can do that, then remapping into ASCII should be a simple thing. [snip] > > Including the use of it. Is that microsoft.com with an oh, or some other > > glyph that *looks* exactly like an oh? > > > > Take the confusion we have with fonts where 1 and l look the same -- > > that's one of the major issues of Unicode write small. > > Yes, but those problems don't go away in a world with multiple character > sets. Indeed. But they do go away if there's a default representation in a non-ambiguous character set. [snip] > > I think the problem is that unicode tried to solve the wrong problem. > > Or perhaps, people looked for it as the solution to the wrong problem. Hm... > > The real problem wasn't "how do we let everyone have single-character > > glyphs", but "how do we let people write in their own language on a > > computer". > > Once you deal with the problem for a while, you discover that having a > way to represent glyphs as distinct entities (which is what a character > really is) is very much a needed capability in software, and not really > seperable from the problem of letting people write in their own language > on a computer. I assert that (english) words can be considered glyphs (think cursive), and therefore deserve the same sort of treatment. > > Since we're ready to accept bloat at the outset, a better > > approach (to my way of thinking) would be to toss out ANSI, stick with > > ASCII, and redefine those ANSI characters as indicators for variable > > length strings that should constitute a glyph. > > That is pretty similar to what UTF-8 is. The problem is that that isn't > the entire problem. UTF8 is *almost* what I want. :) > > Old software still works -- and given the correct display smarts (e.g. > > rewrite printf), works transparently. > > /me falls out of chair > > No, it breaks the first time it makes the assumption that a character or > glyph is exactly one byte, or at the very least fixed width (sadly there > is almost as much software that thinks fixed width 16-bit characters are > all you need as there is software that thinks fixed width 8-bit > characters are all you need). Think of how many C programs you've seen > that look for a specific byte in a string somewhere, without considering > the possibility that it might be part of a multiple byte character. > Indeed, a lot of old lexers suffer from this problem. If I'm on an old system, I *want* that. For me, that's a FEATURE, not a problem. > > You could look at the raw data if you wanted, in an unambiguous format > > that would still be readable if it were only a character here or there. > > Everyone wins, except for those who will need five-or-more character > > strings to represent a glyph. > > Yup. I turns out that basically most Asian countries hate UTF-8 because > it makes their characters bigger than local character sets. And when we go to UCS-16 or UCS-32, we'll all hate *that*. Plus, they're still dealing with simplified character sets, so what we obviously need is UCS-64, right? (I haven't gone and looked up how big Unicode actually gets...) Presumably, we could stop when everyone on the planet gets their own n-bit character space. That would be fair. > > (But as those languages use a glyph-per-word, more or less, this > > shouldn't be a problem -- nobody was demanding that a sizable subset > > of the english dictionary be mapped into unicode space. Fair's fair.) > > Actually, not all of those language use glyph-per-word, and the issue is > that there is a more compact and efficient representation. People tend ...so we can avoid bloat in our XML documents... > to feel slighted when they are forced in to such things while you don't > see much of a negative impact. And they're suprised when I feel I'm being force into such things because they don't acknowledge a negative impact on me? [snip] > Unicode has evolved significantly from the early days, and as such it > has done a reasonable job of addressing needs that have emerged since > its inception. I find its problems are mostly of the "design by > committee" nature and the tendancy to see it as the solution to all > things, and that's hard to avoid with standards. True, true. Design by committee tends to aim at making everyone equally unhappy. -Stewart -- [email protected] http://www.kernel-panic.org/cgi-bin/mailman/listinfo/kplug-lpsg
