Re: suit file

2009-05-04 Thread Rich Felker
On Mon, May 04, 2009 at 01:01:52PM +0200, Jan Willem Stumpel wrote: Ben Wiley Sittler wrote: It's a font suitcase, and IIRC the font data is actually in the resource fork. At least under Mac OS X, fontforge seems to be able to deal with these. If you have the file on a non-Mac OS

Re: suit file

2009-05-03 Thread Rich Felker
On Sun, May 03, 2009 at 08:02:40AM +0200, Jan Willem Stumpel wrote: I have a font for an exotic language (Javanese) that I want to convert to UTF-8 encoding. Problem is, the font file was made on a Macintosh using Fontographer, and it has a .suit file extension that Fontforge doesn't know how

Re: i18n fonts

2007-12-18 Thread Rich Felker
On Wed, Dec 19, 2007 at 02:01:26PM +1100, Russell Shaw wrote: Russell Shaw wrote: Rich Felker wrote: On Mon, Dec 03, 2007 at 02:16:00PM +1100, Russell Shaw wrote: Hi, I can parse in the gsub tables. I was trying to do the gpos tables, but the OpenType spec doesn't define ValueRecord

Re: i18n fonts

2007-12-02 Thread Rich Felker
On Mon, Dec 03, 2007 at 02:16:00PM +1100, Russell Shaw wrote: Hi, I was thinking of making a multilingual text editor. I don't get how glyphs are done outside of english. I've read the Unicode Standard book. When a paragraph of unicode characters is processed, the glyphs are layed out

Re: Unicode, ISO/IEC 10646 Synchronization Issues for UTF-8

2007-04-27 Thread Rich Felker
On Fri, Apr 27, 2007 at 05:15:16PM +0600, Christopher Fynn wrote: N3266 was discussed and rejected by WG2 yesterday. As you pointed out there are all sorts of problems with this proposal, and accepting it would break many existing implementations. That's good to hear. In followup, I think the

Re: Unicode, ISO/IEC 10646 Synchronization Issues for UTF-8

2007-04-27 Thread Rich Felker
On Fri, Apr 27, 2007 at 12:41:22PM -0700, Ben Wiley Sittler wrote: glad it was rejected. the only really sensible approach i have yet seen is utf-8b (see my take on it here: http://bsittler.livejournal.com/10381.html and another implementation here: http://hyperreal.org/~est/utf-8b/ ) the

Re: Unicode, ISO/IEC 10646 Synchronization Issues for UTF-8

2007-04-26 Thread Rich Felker
On Thu, Apr 26, 2007 at 03:44:33PM +0600, Christopher Fynn wrote: N3266 UCS Transformation Formats summary, non-error and error sequences – feedback on N3248 http://std.dkuug.dk/jtc1/sc2/wg2/docs/N3266.doc I must say this is a rather stupid looking proposal. The C0 controls already have

Re: terminal status [Re: wcwidth and locale]

2007-04-24 Thread Rich Felker
On Mon, Apr 23, 2007 at 12:16:29AM +0800, Abel Cheung wrote: On 4/17/07, Rich Felker [EMAIL PROTECTED] wrote: What is the output of: echo -e '日本語\b\bhello' Wait. Quick question: how much should '\b' backstep when wide characters are encountered? - a whole wide character? - a single byte

Re: Questions about Unicode-aware C programs under Linux

2007-04-17 Thread Rich Felker
On Tue, Apr 17, 2007 at 08:47:19AM +, Ali Majdzadeh wrote: The program does not print the line read from the file to stdout (some junks are printed). I also used cat ./persian.txt | iconv -t utf-8 in.txt to produce a UTF-8 oriented file. If your native encoding is not UTF-8 then of course

Re: Questions about Unicode-aware C programs under Linux

2007-04-17 Thread Rich Felker
On Tue, Apr 17, 2007 at 03:17:48PM +, Ali Majdzadeh wrote: Hi Rich Thanks for your attention. I do use UTF-8 but the files I am dealing with are encoded using a strange encoding system, I used iconv to convert them into UTF-8. By the way, another question, if all those stdio.h and

Re: wcwidth and locale

2007-04-16 Thread Rich Felker
On Tue, Apr 17, 2007 at 12:11:12AM +0800, Abel Cheung wrote: On 4/11/07, Rich Felker [EMAIL PROTECTED] wrote: Indeed, glibc's character data is horribly outdated and incorrect. There are plenty of unsupported nonspacing characters, even characters that were present in Unicode 4.0. It also

terminal status [Re: wcwidth and locale]

2007-04-16 Thread Rich Felker
On Tue, Apr 17, 2007 at 02:04:32AM +0800, Abel Cheung wrote: This is only an issue on character-cell devices which use wcwidth. I'm exactly talking about those apps, like terminals. Given how utterly abysmal current terminals' Unicode support is, this seems like a relatively minor issue. I

Re: wcwidth and locale

2007-04-10 Thread Rich Felker
On Mon, Apr 09, 2007 at 12:26:51PM -0400, SrinTuar wrote: Just a question: Does anyone know of locales where ambiguous char-cell width characters, such as ※☠☢☣☤ ♀♂★☆ are treated as double width rather than single width? Ambiguous width from a Unicode perspective means just that the

Re: wcwidth and locale

2007-04-10 Thread Rich Felker
On Tue, Apr 10, 2007 at 12:36:28PM +0200, Egmont Koblinger wrote: Though I cannot answer your original question, I've just found recently that glibc's wcwidth database suffers from problems. There are a lot of letters or letter-like symbols that are unprintable according to glibc (wcwidth

Re: perl unicode support [BACK OFF-TOPIC]

2007-04-07 Thread Rich Felker
On Sat, Apr 07, 2007 at 01:46:22PM +0200, Marcin 'Qrczak' Kowalczyk wrote: For example in my language Kogut a string is a sequence of Unicode code points. My implementation uses two string representations internally: if it contains no characters above U+00FF, then it’s stored as a sequence of

Re: perl unicode support [BACK OFF-TOPIC]

2007-04-07 Thread Rich Felker
On Sat, Apr 07, 2007 at 08:21:25PM +0200, Marcin 'Qrczak' Kowalczyk wrote: Using UTF-8 would have accomplished the same thing without special-casing. Then iterating over strings and specifying string fragments could not be done by code point indices, and it’s not obvious how a good

Re: perl unicode support [BACK ON-TOPIC]

2007-04-05 Thread Rich Felker
On Wed, Apr 04, 2007 at 11:56:35PM -0400, Daniel B. wrote: Rich Felker wrote: Null termination is not the security problem. Broken languages that DON'T use null-termination are the security problem, particularly mixing them with C. C is the language that handles one out of 256

Re: perl unicode support [BACK OFF-TOPIC]

2007-04-05 Thread Rich Felker
On Thu, Apr 05, 2007 at 12:54:54PM +0200, Marcin 'Qrczak' Kowalczyk wrote: Dnia 05-04-2007, czw o godzinie 02:04 -0400, Rich Felker napisał(a): Just look how much that already happens anyway... the use of : as a separator in PATH-type strings, the use of spaces to separate command line

Re: perl unicode support

2007-03-31 Thread Rich Felker
On Sat, Mar 31, 2007 at 06:56:05PM -0400, Daniel B. wrote: Normally, you should not have to ever convert strings between encodings. Then how do you process, say, a multi-part MIME body that has parts in different character encodings? Excellent example. Email is absolutely

Re: perl unicode support

2007-03-31 Thread Rich Felker
On Sat, Mar 31, 2007 at 07:44:39PM -0400, Daniel B. wrote: Rich Felker wrote: Again, software which does not handle corner cases correctly is crap. Why are you confusing special-case with corner case? I never said that software shouldn't handle corner cases such as illegal UTF-8

Re: perl unicode support [off]

2007-03-30 Thread Rich Felker
On Fri, Mar 30, 2007 at 11:56:56AM +0200, Egmont Koblinger wrote: On Thu, Mar 29, 2007 at 04:46:14PM -0400, Rich Felker wrote: I am a mathematician I nearly became a mathematican, too. Just a few weeks before I had to choose university I changed my mind and went to study informatics

Re: perl unicode support

2007-03-30 Thread Rich Felker
On Fri, Mar 30, 2007 at 01:30:58PM +0200, Egmont Koblinger wrote: On Fri, Mar 30, 2007 at 05:07:55PM +0600, Christopher Fynn wrote: Hi, IMO these days all browsers should come with their default encoding set to UTF-8 What do you mean by a browser's default encoding? Is it the

Re: Perl Unicode support

2007-03-30 Thread Rich Felker
On Fri, Mar 30, 2007 at 05:17:32PM +0200, Fredrik Jervfors wrote: I say that his browser mush show è correctly, it doesn't matter what its locale is. That depends on the configuration of the browser. The browser should by default (programmer's choice really) think in the encoding X

Re: Perl Unicode support

2007-03-30 Thread Rich Felker
On Fri, Mar 30, 2007 at 06:44:49PM +0200, Egmont Koblinger wrote: On Fri, Mar 30, 2007 at 05:17:32PM +0200, Fredrik Jervfors wrote: If Y's computer supports the encoding X used [...] Yes, I assumed in my examples that both computers support both encodings. Glibc supports all well-known

Re: Perl Unicode support

2007-03-30 Thread Rich Felker
On Fri, Mar 30, 2007 at 07:06:52PM +0200, Egmont Koblinger wrote: On Fri, Mar 30, 2007 at 11:46:12AM -0400, Rich Felker wrote: What does “supports the encoding” mean? Applications cannot select the locale they run in, aside from requesting the “C” or “POSIX” locale. This isn't so. First

Re: perl unicode support

2007-03-29 Thread Rich Felker
On Thu, Mar 29, 2007 at 12:01:28PM +0200, Egmont Koblinger wrote: On Wed, Mar 28, 2007 at 02:35:32PM -0400, Rich Felker wrote: matches or not _does_ depend on the character set that you use. It's not perl's flaw that it couldn't decide, it's impossible to decide in theory unless you

Re: perl unicode support

2007-03-29 Thread Rich Felker
On Thu, Mar 29, 2007 at 12:24:43PM +0200, Egmont Koblinger wrote: On Wed, Mar 28, 2007 at 05:57:35PM -0400, SrinTuar wrote: The regex library can ask the locale what encoding things are in, just like everybody else The locale tells you which encoding your system uses _by default_. This

Re: perl unicode support

2007-03-29 Thread Rich Felker
On Thu, Mar 29, 2007 at 07:15:37PM +0200, Egmont Koblinger wrote: or failing that ask the programmer to explicitly qualify them as one of its supported encodings. I do not think the strings should have built in machinery that does this work behind the scenes implicitly. If you have the

Re: perl unicode support

2007-03-28 Thread Rich Felker
On Wed, Mar 28, 2007 at 07:49:57PM +0200, Egmont Koblinger wrote: matches or not _does_ depend on the character set that you use. It's not perl's flaw that it couldn't decide, it's impossible to decide in theory unless you know the charset. It is perl's flaw. The LC_CTYPE category of the

Re: perl unicode support

2007-03-28 Thread Rich Felker
On Wed, Mar 28, 2007 at 02:24:26PM -0500, David Starner wrote: On 3/27/07, Rich Felker [EMAIL PROTECTED] wrote: On Tue, Mar 27, 2007 at 06:44:42PM -0500, David Starner wrote: On 3/27/07, Rich Felker [EMAIL PROTECTED] wrote: This is one of the very few places where a computer should ever

Re: perl unicode support

2007-03-28 Thread Rich Felker
On Wed, Mar 28, 2007 at 10:39:49PM -0400, Daniel B. wrote: Well of course you need to think in bytes when you're interpreting the stream of bytes as a stream of characters, which includes checking for invalid UTF-8 sequences. And what do you do if they're present? Of course, it

Re: perl unicode support

2007-03-28 Thread Rich Felker
On Wed, Mar 28, 2007 at 11:05:56PM -0400, Daniel B. wrote: wrote: 2007/3/28, Egmont Koblinger [EMAIL PROTECTED]: ...f you only handle _texts_ then probably the best approach is to convert every string as soon as they arrive at your application to some Unicode

Re: perl unicode support [BACK ON-TOPIC]

2007-03-28 Thread Rich Felker
On Mon, Mar 26, 2007 at 05:28:43PM -0400, SrinTuar wrote: I frequenty run into problems with utf-8 in perl, and I was wondering if anyone else had encountered similar things. [...] Can we get back on-topic with this, and look for solutions to the problems? Maybe Larry has some thoughts for us?

Re: orthographic imperialism

2007-03-28 Thread Rich Felker
On Thu, Mar 29, 2007 at 12:41:06AM -0400, William J Poser wrote: [EMAIL PROTECTED] has made several claims about writing systems for indigenous languages that I, as a linguist with a strong interest in writing systems and substantial experience working with indigenous people, not only as a

Re: perl unicode support

2007-03-27 Thread Rich Felker
On Tue, Mar 27, 2007 at 06:31:11PM +0200, Egmont Koblinger wrote: On Tue, Mar 27, 2007 at 11:16:58AM -0400, SrinTuar wrote: That would be contradictory to the whole concept of Unicode. A human-readable string should never be considered an array of bytes, it is an array of characters!

Re: perl unicode support

2007-03-27 Thread Rich Felker
On Tue, Mar 27, 2007 at 06:44:42PM -0500, David Starner wrote: On 3/27/07, Rich Felker [EMAIL PROTECTED] wrote: This is not a simple task at all, and in fact it's a task that a computer should (almost) never do... Of course. Why shouldn't an editor go through and change 257 headings

Re: perl unicode support

2007-03-27 Thread Rich Felker
On Tue, Mar 27, 2007 at 10:07:11PM -0400, Daniel B. wrote: wrote: That would be contradictory to the whole concept of Unicode. A human-readable string should never be considered an array of bytes, it is an array of characters! Hrm, that statement I think I would

Re: perl unicode support

2007-03-27 Thread Rich Felker
On Tue, Mar 27, 2007 at 11:53:15PM -0400, SrinTuar wrote: 007/3/27, Daniel B. [EMAIL PROTECTED]: What about when it breaks a string into substrings at some delimiter, say, using a regular expression? It has to break the underlying byte string at a character boundary. Unless you pass 

Re: perl unicode support

2007-03-26 Thread Rich Felker
On Mon, Mar 26, 2007 at 05:28:43PM -0400, SrinTuar wrote: I frequenty run into problems with utf-8 in perl, and I was wondering if anyone else had encountered similar things. One thing I've noticed is that when processing characters, I often get warnings about wide characters in print, or

Re: Non-ASCII characters in file names

2007-03-18 Thread Rich Felker
On Sun, Mar 18, 2007 at 08:41:48AM -0700, Ben Wiley Sittler wrote: awesome, and thank you! however, utf-8 filenames given on the command line still do not work... the get turned into iso-8859-1, which is then utf-8 encoded before saving (?!) here's my (partial) utf-8 workaround for emacs so

Re: High-Speed UTF-8 to UTF-16 Conversion

2007-03-17 Thread Rich Felker
On Fri, Mar 16, 2007 at 07:16:55PM -0700, Ben Wiley Sittler wrote: I believe it's more DHTML that is the problem. DOMString is specified to be UTF-16. Likewise for ECMAScript strings, IIRC, although they may still be officially UCS-2. Indeed, this was what I was thinking of. Thanks for

Re: How to enter accented UTF-8 character on GNOME terminal

2007-03-17 Thread Rich Felker
On Sat, Mar 17, 2007 at 07:05:01AM +, Colin Paul Adams wrote: I can't find this in the GNOME help, so I thought I'd try asking here. I want to be rename a file so it has an a-umlaut (lower case) in the name. My LANG is en_GB.UTF-8. I don't know how to type the accented character.

Re: Non-ASCII characters in file names

2007-03-17 Thread Rich Felker
On Sat, Mar 17, 2007 at 09:51:53AM -0700, Ben Wiley Sittler wrote: emacs seems not to handle utf-8 filenames at all, regardless of locale. (setq file-name-coding-system 'utf-8) ~Rich -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/

Re: Non-ASCII characters in file names

2007-03-17 Thread Rich Felker
On Sat, Mar 17, 2007 at 08:25:43AM +, Colin Paul Adams wrote: Now this is where it gets interesting. My URI resolver translates the file name (the URI is relative to a base file: URI) into a UTF-8 byte sequence which gets passed to the fopen call (the program is supposed to work on other

Re: High-Speed UTF-8 to UTF-16 Conversion

2007-03-15 Thread Rich Felker
On Wed, Mar 14, 2007 at 02:01:04PM -0700, Rob Cameron wrote: As part of my research program into high-speed XML/Unicode/text processing using SIMD techniques, I have experimented extensively with the UTF-8 to UTF-16 conversion problem.I've generally been comparing performance of my

Re: High-Speed UTF-8 to UTF-16 Conversion

2007-03-15 Thread Rich Felker
on leveraging patents against them. Sincerely, Rich Felker -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/

Re: High-Speed UTF-8 to UTF-16 Conversion

2007-03-15 Thread Rich Felker
On Thu, Mar 15, 2007 at 11:43:51AM -0700, Rob Cameron wrote: Rich, I would agree that the abuse of software patents is fundamentally wrong and that patent reform is highly overdue. I am doing something about it. The use of software patents is the abuse of software patents. There is

Re: High-Speed UTF-8 to UTF-16 Conversion

2007-03-15 Thread Rich Felker
On Thu, Mar 15, 2007 at 01:28:58PM -0700, Rob Cameron wrote: Simon, You asked about relevance. The UTF-8 to UTF-16 bottleneck is widely cited in literature on XML processing performance. And why would you do this? Simply keep the data as UTF-8. There's no good reason for using UTF-16 at

Re: c++ strings and UTF-8 (other charsets)

2007-03-08 Thread Rich Felker
On Thu, Mar 08, 2007 at 10:18:55PM -0500, Daniel B. wrote: wrote: I have yet to encounter a case where a character count is useful. Well, if an an editor the user tries to move forward three characters, you probably want to increment a character count (an offset from the

Re: c++ strings and UTF-8 (other charsets)

2007-03-01 Thread Rich Felker
On Thu, Mar 01, 2007 at 09:41:44AM +0100, Marcel Ruff wrote: Are you thinking of Java's _modified_ version of UTF-8 (http://en.wikipedia.org/wiki/UTF-8#Java)? Uhg, disgusting... Yes - this is an open serious issue for my approach! Has anybody some practical advice on this?

Re: c++ strings and UTF-8 (other charsets)

2007-03-01 Thread Rich Felker
On Thu, Mar 01, 2007 at 07:53:52PM +0100, Marcel Ruff wrote: Are you thinking of Java's _modified_ version of UTF-8 (http://en.wikipedia.org/wiki/UTF-8#Java)? The first sentence from the above wiki says: In normal usage, the Java programming language

Re: c++ strings and UTF-8 (other charsets)

2007-02-28 Thread Rich Felker
On Tue, Feb 27, 2007 at 07:49:17PM -0500, Daniel B. wrote: Marcel Ruff wrote: As UTF-8 may not contain '\0' ... Yes it can. No, I think he just meant to say a string of non-NUL _characters_ may not contain a 0 _byte_. The NUL character is not valid text or a valid part of a string

Re: c++ strings and UTF-8 (other charsets)

2007-02-27 Thread Rich Felker
On Tue, Feb 27, 2007 at 09:49:50AM -0500, SrinTuar wrote: On Mon, Feb 26, 2007 at 03:35:05PM +0100, Stephane Bortzmeyer wrote: Old code doesn't need to be ported. Very strange advice, indeed. You might want to read up on the history of UTF-8. Here are some references for anyone wanting

Re: c++ strings and UTF-8 (other charsets)

2007-02-26 Thread Rich Felker
On Mon, Feb 26, 2007 at 03:35:05PM +0100, Stephane Bortzmeyer wrote: On Mon, Feb 26, 2007 at 08:10:59AM +0100, Marcel Ruff [EMAIL PROTECTED] wrote a message of 65 lines which said: As UTF-8 may not contain '\0' you can simply use all functions as before (strcmp(), std::string etc.).

Re: c++ strings and UTF-8 (other charsets)

2007-02-25 Thread Rich Felker
On Sat, Feb 24, 2007 at 06:13:37PM +0100, Julien Claassen wrote: Hi! What I meant about UTF-8-strings in c++: I mean in c and c++ they're not standard like in Java. UTF-16, used by Java, is also variable-width. It can be either 2 bytes or 4 bytes per character. Support for the characters

Re: A call for fixing aterm/rxvt/etc...

2007-02-25 Thread Rich Felker
On Sat, Feb 24, 2007 at 01:39:25AM -0500, Rich Felker wrote: using luit for this sounds appealing, but in my experience luit (a) crashes frequently and (b) is easily confused by escape sequences and has no user interface for resetting all its iso-2022 state, so in practice it works

A call for fixing aterm/rxvt/etc...

2007-02-23 Thread Rich Felker
These days we have at least xterm, urxvt, mlterm, gnome-terminal, and konsole which support utf-8 fairly well, but on the flip side there's still a huge number of terminal emulators which do not respect the user's encoding at all and always behave in a legacy-8bit-codepage way. Trying to help

Re: A call for fixing aterm/rxvt/etc...

2007-02-23 Thread Rich Felker
On Fri, Feb 23, 2007 at 04:24:29PM -0800, Ben Wiley Sittler wrote: just two cents: i did this some years back for the links and elinks web browsers (it's the utf-8 i/o option available in some versions FWIW: ELinks has since been fixed (in the development versions, not yet released but working

Re: c++ strings and UTF-8 (other charsets)

2007-02-20 Thread Rich Felker
On Mon, Feb 19, 2007 at 06:49:20PM +0100, Julien Claassen wrote: Hello! I've got one question. I'm writing a library in c++, which needs to handle different character sets. I suppose for internal purposes UTF-8 is quite sufficient. So is there a standard string class in the libstdc++ which

Re: Do combinations need to be defined in advance?

2006-12-12 Thread Rich Felker
On Tue, Dec 12, 2006 at 08:56:06PM +0600, Christopher Fynn wrote: Rich Felker wrote: Whether it's possible to support all combinations efficiently, I don't know. The OpenType system is very poorly designed from what I can tell. In the Tibetan fonts I've examined, rather than just saying

Re: Do combinations need to be defined in advance?

2006-12-11 Thread Rich Felker
On Mon, Dec 11, 2006 at 05:49:22PM +0100, Andries Brouwer wrote: On Mon, Dec 11, 2006 at 05:06:23PM +0100, Jan Willem Stumpel wrote: I am beginning to think that the responsibility for correct combining accents behaviour rests primarily with the rendering engine, rather than with the

Re: Xlib UTF-8 support

2006-12-07 Thread Rich Felker
On Thu, Dec 07, 2006 at 01:36:01PM +0900, Jiro SEKIBA wrote: At Thu, 07 Dec 2006 03:17:32 +0100, Mirco Bakker wrote: The programm (written in C) uses only the standard Xlib. The writing is done using XmbDrawString() (AFAIK function of choice). I also tried Xutf8DrawString

Re: Xlib UTF-8 support

2006-12-06 Thread Rich Felker
On Wed, Dec 06, 2006 at 10:06:09PM -0500, Michael B Allen wrote: Two things. First, I believe Pango is becoming the defacto method for rendering non-Latin1 text in general purpose applications (I've never I'm hoping we can remedy this situation. Xft/pango is extremely slow compared to the core

Re: utf8 and solaris

2006-11-18 Thread Rich Felker
On Sat, Nov 18, 2006 at 07:43:54PM +0530, Balaji.Ramdoss wrote: Folks, well this is not on linux. I have an issue in Sun Solaris box where octal values gets displayed instead of symbols like ^,| as \136, \075. This happens if I set my LC_CTYPE to en_US.UTF-8 locale and I have the set

Re: Proposed fix for Malayalam ( other Indic?) chars and wcwidth

2006-11-09 Thread Rich Felker
On Tue, Nov 07, 2006 at 01:13:24AM -0800, rajeev joseph sebastian wrote: Well, I think I misunderstood ... No problem. --- In the first para, I asked whether it was possible to use TrueType in the terminal. If we cannot, then we need to use some hybrid of bitmap fonts and OT fonts,

Re: Proposed fix for Malayalam ( other Indic?) chars and wcwidth

2006-11-06 Thread Rich Felker
On Mon, Nov 06, 2006 at 10:14:20AM -0800, rajeev joseph sebastian wrote: I can say that you have done a good job. My point has so far been that some kind of special font system should be created. In any case, the use of straight TTF or OTF is not possible. (is it?). in that case, it may be

Re: Proposed fix for Malayalam ( other Indic?) chars and wcwidth

2006-11-05 Thread Rich Felker
On Sun, Nov 05, 2006 at 12:59:03PM -0800, rajeev joseph sebastian wrote: Well, most correctly implemented Unicode-aware applicatons do this also: have 2 backing stores, one for text and the other for glyphs. Use the glyph representation for display. When a selection is done, the map between

Re: Proposed fix for Malayalam ( other Indic?) chars and wcwidth

2006-11-02 Thread Rich Felker
On Wed, Nov 01, 2006 at 01:34:14PM +0600, Christopher Fynn wrote: Yes, Indic scripts like Malayalam need specific console fonts. I think for console applications legibility is more important that beauty. Why not use the typefaces used in old-fashioned Indian typewriters as a starting

Re: Proposed fix for Malayalam ( other Indic?) chars and wcwidth

2006-10-31 Thread Rich Felker
On Tue, Oct 31, 2006 at 09:37:34AM -0800, rajeev joseph sebastian wrote: Hi Rich Felker, I find your work to provide support for Indic text on console/terminal to be admirable, and yes, any kind of display is far better than none at all (and I do not consider your statement insulting

Re: Proposed fix for Malayalam ( other Indic?) chars and wcwidth

2006-10-30 Thread Rich Felker
On Mon, Oct 30, 2006 at 04:17:54AM -0800, rajeev joseph sebastian wrote: Hello Rich Felker, It is impossible to fit Malayalam glyphs into a given width class, if you want even barely aesthetic text. This is because a given sequence of Unicode characters may map into somewhat different

Re: Proposed fix for Malayalam ( other Indic?) chars and wcwidth

2006-10-16 Thread Rich Felker
Sorry I originally replied off-list to Bruno because the list mail was slow coming thru and I thought he was just mailing me in private.. On Mon, Oct 16, 2006 at 05:38:45PM -0700, Ben Wiley Sittler wrote: just tried this in a few terminals, here are the results: GNOME Terminal 2.16.1: U+0D30

Proposed fix for Malayalam ( other Indic?) chars and wcwidth

2006-10-13 Thread Rich Felker
Working on uuterm[1], I've run into a problem with the characters 0D4A-0D4C and possibly others like them, in regards to wcwidth(3) behavior. These characters are combining marks that attach on both sides of a cluster, and have canonical equivalence to the two separate pieces from which they are

Re: Announcing uuterm and ucf (universal charcell font)

2006-10-09 Thread Rich Felker
On Mon, Oct 09, 2006 at 12:37:24PM -0600, Wesley J. Landaker wrote: On Thursday 05 October 2006 16:03, Rich Felker wrote: A few comments on Why not just use OpenType??: - The GSUB model does not adapt well to a character cell device where characters are organized into cells and where

Re: Announcing uuterm and ucf (universal charcell font)

2006-10-06 Thread Rich Felker
[cc'ing the list since i think it's relevant] On Fri, Oct 06, 2006 at 04:55:51PM -0400, Daniel Glassey wrote: btw there is discussion about trying to integrate as much as possible on http://live.gnome.org/UnifiedTextLayoutEngine that you might like to contribute to. well sadly i think the

Announcing uuterm and ucf (universal charcell font)

2006-10-05 Thread Rich Felker
After much work, I finally have a working (but still experimental) version of uuterm and the ucf bitmap font format I proposed in August. Source for uuterm is browsable at http://svn.mplayerhq.hu/uuterm/ and a sample ucf font is linked from the included README. Since ucf is probably more

Re: Bidi considered harmful? :)

2006-09-05 Thread Rich Felker
On Tue, Sep 05, 2006 at 12:57:08AM -0500, David Starner wrote: On 9/5/06, Rich Felker [EMAIL PROTECTED] wrote: In all seriousness, though, unless you're dealing with image, music, or movie files, text weighs in quite heavy in size. As opposed to what? The vast majority of content is one

Re: Bidi considered harmful? :)

2006-09-05 Thread Rich Felker
On Tue, Sep 05, 2006 at 08:07:14AM -0600, Mark Leisher wrote: Rich Felker wrote: On Mon, Sep 04, 2006 at 08:19:02PM -0600, Mark Leisher wrote: My last gasp on this conversation: I don't think you really understand what you are talking about and won't until you get some hands-on experience

Re: Bidi considered harmful? :)

2006-09-04 Thread Rich Felker
On Mon, Sep 04, 2006 at 08:19:02PM -0600, Mark Leisher wrote: Rich Felker wrote: It went farther because it imposed language-specific semantics in places where they do not belong. These semantics are correct with sentences written in human languages which would not have been hard

Re: Bidi considered harmful? :)

2006-09-04 Thread Rich Felker
On Mon, Sep 04, 2006 at 11:44:26PM -0500, David Starner wrote: On 9/1/06, Rich Felker [EMAIL PROTECTED] wrote: IMO the answer is common sense. Languages that have a low information per character density (lots of letters/marks per word, especially Indic) should be in 2-byte range and those

Re: Bidi considered harmful? :)

2006-09-01 Thread Rich Felker
On Fri, Sep 01, 2006 at 04:32:40PM +1000, George W Gerrity wrote: I did try to tell you that doing a terminal emulation properly would be complex. I don't know if the algorithm is broken: I doubt it. But it is difficult getting it to work properly and it essentially requires internal

Re: Bidi considered harmful? :)

2006-09-01 Thread Rich Felker
On Fri, Sep 01, 2006 at 09:36:44AM -0600, Mark Leisher wrote: Rich Felker wrote: If that were the problem it would be trivial. The problems are much more fundamental. The key examples you should look at are things like: printf(%s %d %d %s\n, string1, number2, number3, string4); where

Re: Bidi considered harmful? :)

2006-09-01 Thread Rich Felker
On Fri, Sep 01, 2006 at 03:46:44PM -0600, Mark Leisher wrote: Did it every occur to you that it wasn't the word processing mentality of the Unicode designers that led to ambiguities surviving in plain text? It is simply the fact that there is no nice neat solution. Unicode went farther than

Bidi considered harmful? :)

2006-08-31 Thread Rich Felker
I read an old thread on the XFree88 i18n list started by Markus Kuhn suggesting (rather strongly) that bidi should not be supported at the terminal level, as well accusations (from other sources) by the author of Yudit that UAX#9 bidi algo results in serious security issues due to the

Re: About Mongolian written horizontally

2006-08-18 Thread Rich Felker
On Fri, Aug 18, 2006 at 06:25:16AM +0200, Werner LEMBERG wrote: I've now received an answer from Dr. Oliver Korff, an expert for Mongolian who has written MonTeX. Here a rough translation; see below for the German version. Classical Mongolian _can_ be written horizontally if you have

Re: Indic scripts and wcwidth: comments?

2006-08-18 Thread Rich Felker
On Fri, Aug 18, 2006 at 03:39:17AM -0700, rajeev joseph sebastian wrote: Hello Rich Felker, start quote 1. Does any existing character cell application (terminal emulator) both display correctly-rendered Indic text and conform to WI1, i.e. does it update column position

Re: Next Generation Console Font?

2006-08-17 Thread Rich Felker
On Sun, Aug 06, 2006 at 07:34:16AM -0400, Chris Heath wrote: To my knowledge there is still no official standard as to which characters have which width, but POSIX specifies the function used to obtain the width of each character (and defines the results as 'locale-specific'), and Markus

Re: Next Generation Console Font?

2006-08-05 Thread Rich Felker
On Fri, Aug 04, 2006 at 02:16:04PM +1000, George W Gerrity wrote: Actually, that is what I was opposing. But any solution to console representation has to handle three things together \windows-1252-0277 localisation, internationalisation, and multilingualisation \windows-1252-0277 or

Re: Next Generation Console Font?

2006-08-05 Thread Rich Felker
On Fri, Aug 04, 2006 at 09:04:43AM +0200, Werner LEMBERG wrote: With my proposed context system it doesn't save but a few bytes total in the font file since the context rules can be shared by all the characters that need them. Details, please. I've got an email I was preparing to send

Re: Next Generation Console Font?

2006-08-05 Thread Rich Felker
On Sat, Aug 05, 2006 at 09:11:39AM +0200, Werner LEMBERG wrote: A terminal is a character-cell device, with fixed-width character cells. This is not open to discussion, but fear not, it's not a problem! Actually, this limitation makes some things more complicated, because you have to

Re: Next Generation Console Font?

2006-08-05 Thread Rich Felker
To follow up on my original proposal and some of the alterrations and simplifications I've made as a result of these discussions and discussions with other people outside of this list, here's a summary of the problem I'm trying to solve and how I plan to solve it: Practical problems: - no

Re: Next Generation Console Font?

2006-08-05 Thread Rich Felker
On Sat, Aug 05, 2006 at 11:11:02AM +0200, Werner LEMBERG wrote: BTW another issue of the substitution rules is that, as far as I can tell, they can delete or insert extra glyphs arbitrarily. Of course. How would you handle a ligature? `f' + `l' = `fl' -- this means that a character has

Re: Next Generation Console Font?

2006-08-04 Thread Rich Felker
On Fri, Aug 04, 2006 at 02:05:00PM +1000, Russell Shaw wrote: Subpixel only works on LCDs, which produce ugly output. I think sub-pixel rendering also works for a crt, but a sudden change in pixel value (such as the edge of a black square on a white background) is smeared (convolved with the

Re: Next Generation Console Font?

2006-08-03 Thread Rich Felker
On Thu, Aug 03, 2006 at 08:41:35AM +0200, Werner LEMBERG wrote: What about using bitmap-only TrueType fonts, as planned by the X Windows people? Could you direct me to good information? I have serious doubts but I'd at least like to read what they have to say.

Re: Next Generation Console Font?

2006-08-03 Thread Rich Felker
On Thu, Aug 03, 2006 at 03:40:29PM +1000, George W Gerrity wrote: Please. Let's not have yet another *NIX font encoding and presenting scheme! Why don't you set up a team to rationalise the existing encodings and presentation methods. This is the sort of mentality that sickens me. Please

Re: Next Generation Console Font?

2006-08-03 Thread Rich Felker
On Thu, Aug 03, 2006 at 03:07:09PM +1000, Russell Shaw wrote: Rich Felker wrote: ... snip long stuff I agree on the total crappiness of current mainstream GUI implementations. Thanks. It's refreshing to have some support from the non-bloat crowd in m17n issues. Usually there's the standard

Re: Next Generation Console Font?

2006-08-03 Thread Rich Felker
On Fri, Aug 04, 2006 at 03:46:29AM +1000, Russell Shaw wrote: One possible approach I've considered is having the client application provide an X font server to serve its own fonts, the sole purpose being to allow them to be cached on the server side. The same thing can be done with serverside

Re: Next Generation Console Font?

2006-08-03 Thread Rich Felker
On Fri, Aug 04, 2006 at 01:30:34AM +0200, Werner LEMBERG wrote: What you probably mean is that some language data needs to be proprocessed into a normalized form before it is fed into the font, for example Indic and Arabic scripts. What sort of preprocessing? Reordering vowels?

Re: Next Generation Console Font?

2006-08-02 Thread Rich Felker
A revised, simplified file format proposal based on my original sketch, some of Markus's ideas for NCF, and an evaluation of which optimizations were likely to benefit actual font data. Definitions: All numeric fields are variable length coded, using the high bit of each byte as a continuation

Re: Next Generation Console Font?

2006-08-02 Thread Rich Felker
On Thu, Aug 03, 2006 at 12:21:56AM +0200, Werner LEMBERG wrote: A revised, simplified file format proposal based on my original sketch, some of Markus's ideas for NCF, and an evaluation of which optimizations were likely to benefit actual font data. What about using bitmap-only TrueType

Next Generation Console Font?

2006-08-01 Thread Rich Felker
To Markus et. al.: I read in the ancient archives for this list some ideas regarding a so-called next generation console font, supporting unicode level-3 combining in a character cell environment. I'm presently working on a new terminal emulator called uuterm (think of the uu as µ-ucs or

  1   2   >