On Thu, Aug 03, 2006 at 03:40:29PM +1000, George W Gerrity wrote: > Please. Let's not have yet another *NIX font encoding and presenting > scheme! Why don't you set up a team to rationalise the existing > encodings and presentation methods.
This is the sort of mentality that sickens me. "Please oh please don't make something good because there's so much crap out there that you should fix instead!" This is the sort of mentality that lead to abominations like BIND and Sendmail surviving as long as they did, OpenSSH (in all its glory of vulnerabilities) being forked from the old SSH code instead of rewritten from scratch, etc. > The biggest headache in *NIX > (with the exception of Mac OS X's underlying version) is the > haphazard way that handling of non-ASCII characters and the I18n has > developed. It is especially grotty at the system level, and as you The system level has nothing to do with fonts... Until you get to fonts and rendering, m17n and i18n are extremely trivial. > commented below, one of the reasons is that (English-only speaking) > *NIX systems people think that handling of non-ASCII charsets should > somehow be trivial and not bulky in code. I'm not English-only speaking yet I'm quite confident that it should be trivial and not bulky in code, and that applications should not even have to think about it. The difference between your approach (and the approach of people who have written most of the existing applications with extensive script support) and mine is exactly the same as the difference between the early efforts at converting to Unicode (especially by MS) and UTF-8: The MS/Unicode approach was to pull the rug out from under everyone and force them to drop C, drop UNIX, drop all existing internet protocols, and store text as 16bit characters in UCS-2. The UTF-8 approach on the other hand recognizes that most of the time when programs are dealing with text they don't care about the encoding or meaning of the text at all. At most they care about some codepoints in the first 128 positions that have special meaning to the software. Thus Unicode can be supported _without_ any special effort from developers. The obvious exception to this comes when it's time to display the text on a visual device for the user. :) Terminals, if they work correctly with the necessary scripts, provide a very clean solution to the problem because the application doesn't have to think about the presentation of the text. Historically it meant the application could just assume 1 byte == 1 character position for non-control characters. Now, the same requires mbtowc/wcwidth, but it's not any huge burden. Surely a lot less burden than doing the text rendering yourself. But what about applications that _do_ want/need to do the text rendering themselves? This must include at least the terminal emulator, :) and also imaging programs, presentation apps, visually-oriented web browsers, ... As long as the program does not need to do its _own_ text display it may be able to rely on a widget set, which basically gives all the same advantages as using a terminal with regard to implementation simplicity. (However now we need to add widget sets to the list of things that need to do text rendering..) This whole line of questioning raises a lot more questions than it answers and I'm going to easily get sidetracked if I continue... > I am no longer up-to-date with kernel and system details in *NIX, and > am not a developer � perhaps an interested bystander is where I fit > in � but I used to do a lot of coding in that area, so I know how > difficult it can be. My view is that what is needed is a modular (and Why modular? "Modular" is the magic panacea word among people writing this bloatware, and all it does is massively increase memory requirements and complexity. > unified) way of slotting in support for handling various alphabets > and languages, The view that supporting a new alphabet or language requires a new module is fundamentally wrong. All it should require is proper information in the font. > based on Unicode categories, that can be easily set up > at system build time. So at build time you either choose "bloatware with m17n" or "legacy ascii/latin1 crap"? Sounds like the current problem we're stuck with. The bloatware distros will mostly choose the former and the ones targetting more advanced users who dislike bloat will choose the latter, perpetuating the problem that competent developers despise m17n and i18n and therefore do not include support in their programs. > Moreover, *NIX is greatly in need of a way of > unifying all the various ways for formatting and representing > characters at all level, using system-level code. Huh? What does this even mean? Are you confusing glyphs with characters? Representing characters is trivial. > This may even imply > some minor tweaking of the POSIX standard. ..... > I know that a real-life problem (with a deadline?) has got you No deadline except being tired of having a legacy system. > energised to tackle this can of worms, but a quick fix or re- > invention of the wheel is just not the way to go. Someone once said: "when the wheel is square you need to reinvent it". > Someone with energy > and know-how has got to get a team together and fix what is broken in > the guts of *NIX so that it presents a good, clean interface for I18n > and multiple character set representation. Absolutely not. This is the bloatware doctrine, that new interfaces and libs are a panacea, that they're best designed by teams and committees, etc. What's needed is _simplicity_. When you have simplicity everything else follows. There is a possibility here to solve a simple, almost-trivial unsolved problem. What you propose is abandoning the simple problem and trying to solve much more difficult problems instead, many of which will not be solved anytime in the near future due as much to personal and political reasons as to technical ones. Moreover, even if the more difficult problem is solved, the solution will not be useful to anyone except the ./configure --enable-bloat crowd (myself included). Why would I want to abandon a real solvable problem in order to attempt at solving a problem that's uninteresting to me? Rich -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
