On Thu, Aug 03, 2006 at 03:40:29PM +1000, George W Gerrity wrote:
> Please. Let's not have yet another *NIX font encoding and presenting  
> scheme! Why don't you set up a team to rationalise the existing  
> encodings and presentation methods.

This is the sort of mentality that sickens me. "Please oh please don't
make something good because there's so much crap out there that you
should fix instead!" This is the sort of mentality that lead to
abominations like BIND and Sendmail surviving as long as they did,
OpenSSH (in all its glory of vulnerabilities) being forked from the
old SSH code instead of rewritten from scratch, etc.

> The biggest headache in *NIX  
> (with the exception of Mac OS X's underlying version) is the  
> haphazard way that handling of non-ASCII characters and the I18n has  
> developed. It is especially grotty at the system level, and as you  

The system level has nothing to do with fonts... Until you get to
fonts and rendering, m17n and i18n are extremely trivial.

> commented below, one of the reasons is that (English-only speaking)  
> *NIX systems people think that handling of non-ASCII charsets should  
> somehow be trivial and not bulky in code.

I'm not English-only speaking yet I'm quite confident that it should
be trivial and not bulky in code, and that applications should not
even have to think about it.

The difference between your approach (and the approach of people who
have written most of the existing applications with extensive script
support) and mine is exactly the same as the difference between the
early efforts at converting to Unicode (especially by MS) and UTF-8:
The MS/Unicode approach was to pull the rug out from under everyone
and force them to drop C, drop UNIX, drop all existing internet
protocols, and store text as 16bit characters in UCS-2. The UTF-8
approach on the other hand recognizes that most of the time when
programs are dealing with text they don't care about the encoding or
meaning of the text at all. At most they care about some codepoints in
the first 128 positions that have special meaning to the software.
Thus Unicode can be supported _without_ any special effort from
developers.

The obvious exception to this comes when it's time to display the
text on a visual device for the user. :) Terminals, if they work
correctly with the necessary scripts, provide a very clean solution to
the problem because the application doesn't have to think about the
presentation of the text. Historically it meant the application could
just assume 1 byte == 1 character position for non-control characters.
Now, the same requires mbtowc/wcwidth, but it's not any huge burden.
Surely a lot less burden than doing the text rendering yourself.

But what about applications that _do_ want/need to do the text
rendering themselves? This must include at least the terminal
emulator, :) and also imaging programs, presentation apps,
visually-oriented web browsers, ... As long as the program does not
need to do its _own_ text display it may be able to rely on a widget
set, which basically gives all the same advantages as using a terminal
with regard to implementation simplicity. (However now we need to add
widget sets to the list of things that need to do text rendering..)

This whole line of questioning raises a lot more questions than it
answers and I'm going to easily get sidetracked if I continue...

> I am no longer up-to-date with kernel and system details in *NIX, and  
> am not a developer � perhaps an interested bystander is where I fit  
> in � but I used to do a lot of coding in that area, so I know how  
> difficult it can be. My view is that what is needed is a modular (and  

Why modular? "Modular" is the magic panacea word among people writing
this bloatware, and all it does is massively increase memory
requirements and complexity.

> unified) way of slotting in support for handling various alphabets  
> and languages,

The view that supporting a new alphabet or language requires a new
module is fundamentally wrong. All it should require is proper
information in the font.

> based on Unicode categories, that can be easily set up  
> at system build time.

So at build time you either choose "bloatware with m17n" or "legacy
ascii/latin1 crap"? Sounds like the current problem we're stuck with.
The bloatware distros will mostly choose the former and the ones
targetting more advanced users who dislike bloat will choose the
latter, perpetuating the problem that competent developers despise
m17n and i18n and therefore do not include support in their programs.

> Moreover, *NIX is greatly in need of a way of  
> unifying all the various ways for formatting and representing  
> characters at all level, using system-level code.

Huh? What does this even mean? Are you confusing glyphs with
characters? Representing characters is trivial.

> This may even imply  
> some minor tweaking of the POSIX standard.

.....

> I know that a real-life problem (with a deadline?) has got you  

No deadline except being tired of having a legacy system.

> energised to tackle this can of worms, but a quick fix or re- 
> invention of the wheel is just not the way to go.

Someone once said: "when the wheel is square you need to reinvent it".

> Someone with energy  
> and know-how has got to get a team together and fix what is broken in  
> the guts of *NIX so that it presents a good, clean interface for I18n  
> and multiple character set representation.

Absolutely not. This is the bloatware doctrine, that new interfaces
and libs are a panacea, that they're best designed by teams and
committees, etc. What's needed is _simplicity_. When you have
simplicity everything else follows.

There is a possibility here to solve a simple, almost-trivial unsolved
problem. What you propose is abandoning the simple problem and trying
to solve much more difficult problems instead, many of which will not
be solved anytime in the near future due as much to personal and
political reasons as to technical ones. Moreover, even if the more
difficult problem is solved, the solution will not be useful to anyone
except the ./configure --enable-bloat crowd (myself included). Why
would I want to abandon a real solvable problem in order to attempt at
solving a problem that's uninteresting to me?

Rich


--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to