On Mon, May 04, 2009 at 01:01:52PM +0200, Jan Willem Stumpel wrote:
Ben Wiley Sittler wrote:
It's a font suitcase, and IIRC the font data is actually in
the resource fork. At least under Mac OS X, fontforge seems
to be able to deal with these. If you have the file on a
non-Mac OS
On Sun, May 03, 2009 at 08:02:40AM +0200, Jan Willem Stumpel wrote:
I have a font for an exotic language (Javanese) that I want to
convert to UTF-8 encoding. Problem is, the font file was made on a
Macintosh using Fontographer, and it has a .suit file extension
that Fontforge doesn't know how
On Wed, Dec 19, 2007 at 02:01:26PM +1100, Russell Shaw wrote:
Russell Shaw wrote:
Rich Felker wrote:
On Mon, Dec 03, 2007 at 02:16:00PM +1100, Russell Shaw wrote:
Hi,
I can parse in the gsub tables. I was trying to do the gpos tables,
but the OpenType spec doesn't define ValueRecord
On Mon, Dec 03, 2007 at 02:16:00PM +1100, Russell Shaw wrote:
Hi,
I was thinking of making a multilingual text editor.
I don't get how glyphs are done outside of english.
I've read the Unicode Standard book.
When a paragraph of unicode characters is processed, the glyphs
are layed out
On Fri, Apr 27, 2007 at 05:15:16PM +0600, Christopher Fynn wrote:
N3266 was discussed and rejected by WG2 yesterday. As you pointed out
there are all sorts of problems with this proposal, and accepting it
would break many existing implementations.
That's good to hear. In followup, I think the
On Fri, Apr 27, 2007 at 12:41:22PM -0700, Ben Wiley Sittler wrote:
glad it was rejected. the only really sensible approach i have yet
seen is utf-8b (see my take on it here:
http://bsittler.livejournal.com/10381.html and another implementation
here: http://hyperreal.org/~est/utf-8b/ )
the
On Thu, Apr 26, 2007 at 03:44:33PM +0600, Christopher Fynn wrote:
N3266
UCS Transformation Formats summary, non-error and error sequences –
feedback on N3248
http://std.dkuug.dk/jtc1/sc2/wg2/docs/N3266.doc
I must say this is a rather stupid looking proposal. The C0 controls
already have
On Mon, Apr 23, 2007 at 12:16:29AM +0800, Abel Cheung wrote:
On 4/17/07, Rich Felker [EMAIL PROTECTED] wrote:
What is the output of:
echo -e '日本語\b\bhello'
Wait. Quick question: how much should '\b' backstep when wide characters are
encountered?
- a whole wide character?
- a single byte
On Tue, Apr 17, 2007 at 08:47:19AM +, Ali Majdzadeh wrote:
The program does not print the line read from the file to stdout (some junks
are printed). I also used cat ./persian.txt | iconv -t utf-8 in.txt to
produce a UTF-8 oriented file.
If your native encoding is not UTF-8 then of course
On Tue, Apr 17, 2007 at 03:17:48PM +, Ali Majdzadeh wrote:
Hi Rich
Thanks for your attention. I do use UTF-8 but the files I am dealing with
are encoded using a strange encoding system, I used iconv to convert them
into UTF-8. By the way, another question, if all those stdio.h and
On Tue, Apr 17, 2007 at 12:11:12AM +0800, Abel Cheung wrote:
On 4/11/07, Rich Felker [EMAIL PROTECTED] wrote:
Indeed, glibc's character data is horribly outdated and incorrect.
There are plenty of unsupported nonspacing characters, even characters
that were present in Unicode 4.0. It also
On Tue, Apr 17, 2007 at 02:04:32AM +0800, Abel Cheung wrote:
This is only an issue on character-cell devices which use wcwidth.
I'm exactly talking about those apps, like terminals.
Given how utterly abysmal current terminals' Unicode support is, this
seems like a relatively minor issue. I
On Mon, Apr 09, 2007 at 12:26:51PM -0400, SrinTuar wrote:
Just a question:
Does anyone know of locales where ambiguous char-cell width
characters, such as ※☠☢☣☤ ♀♂★☆ are treated as double
width rather than
single width?
Ambiguous width from a Unicode perspective means just that the
On Tue, Apr 10, 2007 at 12:36:28PM +0200, Egmont Koblinger wrote:
Though I cannot answer your original question, I've just found recently that
glibc's wcwidth database suffers from problems. There are a lot of letters
or letter-like symbols that are unprintable according to glibc (wcwidth
On Sat, Apr 07, 2007 at 01:46:22PM +0200, Marcin 'Qrczak' Kowalczyk wrote:
For example in my language Kogut a string is a sequence of Unicode code
points. My implementation uses two string representations internally:
if it contains no characters above U+00FF, then it’s stored as a
sequence of
On Sat, Apr 07, 2007 at 08:21:25PM +0200, Marcin 'Qrczak' Kowalczyk wrote:
Using UTF-8 would have accomplished the same thing without
special-casing.
Then iterating over strings and specifying string fragments could not be
done by code point indices, and it’s not obvious how a good
On Wed, Apr 04, 2007 at 11:56:35PM -0400, Daniel B. wrote:
Rich Felker wrote:
Null termination is not the security problem. Broken languages that
DON'T use null-termination are the security problem, particularly
mixing them with C.
C is the language that handles one out of 256
On Thu, Apr 05, 2007 at 12:54:54PM +0200, Marcin 'Qrczak' Kowalczyk wrote:
Dnia 05-04-2007, czw o godzinie 02:04 -0400, Rich Felker napisał(a):
Just look how much that already happens anyway... the use
of : as a separator in PATH-type strings, the use of spaces to
separate command line
On Sat, Mar 31, 2007 at 06:56:05PM -0400, Daniel B. wrote:
Normally, you should not have to ever convert strings between
encodings.
Then how do you process, say, a multi-part MIME body that has parts
in different character encodings?
Excellent example. Email is absolutely
On Sat, Mar 31, 2007 at 07:44:39PM -0400, Daniel B. wrote:
Rich Felker wrote:
Again, software which does not handle corner cases correctly is crap.
Why are you confusing special-case with corner case?
I never said that software shouldn't handle corner cases such as illegal
UTF-8
On Fri, Mar 30, 2007 at 11:56:56AM +0200, Egmont Koblinger wrote:
On Thu, Mar 29, 2007 at 04:46:14PM -0400, Rich Felker wrote:
I am a mathematician
I nearly became a mathematican, too. Just a few weeks before I had to choose
university I changed my mind and went to study informatics
On Fri, Mar 30, 2007 at 01:30:58PM +0200, Egmont Koblinger wrote:
On Fri, Mar 30, 2007 at 05:07:55PM +0600, Christopher Fynn wrote:
Hi,
IMO these days all browsers should come with their default encoding set
to UTF-8
What do you mean by a browser's default encoding? Is it the
On Fri, Mar 30, 2007 at 05:17:32PM +0200, Fredrik Jervfors wrote:
I say that his browser mush show è correctly, it doesn't matter what its
locale is.
That depends on the configuration of the browser.
The browser should by default (programmer's choice really) think in the
encoding X
On Fri, Mar 30, 2007 at 06:44:49PM +0200, Egmont Koblinger wrote:
On Fri, Mar 30, 2007 at 05:17:32PM +0200, Fredrik Jervfors wrote:
If Y's computer supports the encoding X used [...]
Yes, I assumed in my examples that both computers support both encodings.
Glibc supports all well-known
On Fri, Mar 30, 2007 at 07:06:52PM +0200, Egmont Koblinger wrote:
On Fri, Mar 30, 2007 at 11:46:12AM -0400, Rich Felker wrote:
What does “supports the encoding” mean? Applications cannot select the
locale they run in, aside from requesting the “C” or “POSIX” locale.
This isn't so. First
On Thu, Mar 29, 2007 at 12:01:28PM +0200, Egmont Koblinger wrote:
On Wed, Mar 28, 2007 at 02:35:32PM -0400, Rich Felker wrote:
matches or not _does_ depend on the character set that you use. It's not
perl's flaw that it couldn't decide, it's impossible to decide in theory
unless you
On Thu, Mar 29, 2007 at 12:24:43PM +0200, Egmont Koblinger wrote:
On Wed, Mar 28, 2007 at 05:57:35PM -0400, SrinTuar wrote:
The regex library can ask the locale what encoding things are in, just
like everybody else
The locale tells you which encoding your system uses _by default_. This
On Thu, Mar 29, 2007 at 07:15:37PM +0200, Egmont Koblinger wrote:
or failing that ask the programmer to explicitly qualify them as one of
its supported encodings. I do not think the strings should have built in
machinery that does this work behind the scenes implicitly.
If you have the
On Wed, Mar 28, 2007 at 07:49:57PM +0200, Egmont Koblinger wrote:
matches or not _does_ depend on the character set that you use. It's not
perl's flaw that it couldn't decide, it's impossible to decide in theory
unless you know the charset.
It is perl's flaw. The LC_CTYPE category of the
On Wed, Mar 28, 2007 at 02:24:26PM -0500, David Starner wrote:
On 3/27/07, Rich Felker [EMAIL PROTECTED] wrote:
On Tue, Mar 27, 2007 at 06:44:42PM -0500, David Starner wrote:
On 3/27/07, Rich Felker [EMAIL PROTECTED] wrote:
This is one of the very few
places where a computer should ever
On Wed, Mar 28, 2007 at 10:39:49PM -0400, Daniel B. wrote:
Well of course you need to think in bytes when you're interpreting the
stream of bytes as a stream of characters, which includes checking for
invalid UTF-8 sequences.
And what do you do if they're present?
Of course, it
On Wed, Mar 28, 2007 at 11:05:56PM -0400, Daniel B. wrote:
wrote:
2007/3/28, Egmont Koblinger [EMAIL PROTECTED]:
...f you only handle _texts_ then
probably the best approach is to convert every string as soon as they
arrive
at your application to some Unicode
On Mon, Mar 26, 2007 at 05:28:43PM -0400, SrinTuar wrote:
I frequenty run into problems with utf-8 in perl, and I was wondering
if anyone else
had encountered similar things.
[...]
Can we get back on-topic with this, and look for solutions to the
problems? Maybe Larry has some thoughts for us?
On Thu, Mar 29, 2007 at 12:41:06AM -0400, William J Poser wrote:
[EMAIL PROTECTED] has made several claims about writing systems
for indigenous languages that I, as a linguist with a strong
interest in writing systems and substantial experience working
with indigenous people, not only as a
On Tue, Mar 27, 2007 at 06:31:11PM +0200, Egmont Koblinger wrote:
On Tue, Mar 27, 2007 at 11:16:58AM -0400, SrinTuar wrote:
That would be contradictory to the whole concept of Unicode. A
human-readable string should never be considered an array of bytes, it is
an
array of characters!
On Tue, Mar 27, 2007 at 06:44:42PM -0500, David Starner wrote:
On 3/27/07, Rich Felker [EMAIL PROTECTED] wrote:
This is not a simple task at all, and in fact it's a task that a
computer should (almost) never do...
Of course. Why shouldn't an editor go through and change 257 headings
On Tue, Mar 27, 2007 at 10:07:11PM -0400, Daniel B. wrote:
wrote:
That would be contradictory to the whole concept of Unicode. A
human-readable string should never be considered an array of bytes, it is
an
array of characters!
Hrm, that statement I think I would
On Tue, Mar 27, 2007 at 11:53:15PM -0400, SrinTuar wrote:
007/3/27, Daniel B. [EMAIL PROTECTED]:
What about when it breaks a string into substrings at some delimiter,
say, using a regular expression? It has to break the underlying byte
string at a character boundary.
Unless you pass
On Mon, Mar 26, 2007 at 05:28:43PM -0400, SrinTuar wrote:
I frequenty run into problems with utf-8 in perl, and I was wondering
if anyone else
had encountered similar things.
One thing I've noticed is that when processing characters, I often get
warnings about
wide characters in print, or
On Sun, Mar 18, 2007 at 08:41:48AM -0700, Ben Wiley Sittler wrote:
awesome, and thank you! however, utf-8 filenames given on the command
line still do not work... the get turned into iso-8859-1, which is
then utf-8 encoded before saving (?!)
here's my (partial) utf-8 workaround for emacs so
On Fri, Mar 16, 2007 at 07:16:55PM -0700, Ben Wiley Sittler wrote:
I believe it's more DHTML that is the problem.
DOMString is specified to be UTF-16. Likewise for ECMAScript strings,
IIRC, although they may still be officially UCS-2.
Indeed, this was what I was thinking of. Thanks for
On Sat, Mar 17, 2007 at 07:05:01AM +, Colin Paul Adams wrote:
I can't find this in the GNOME help, so I thought I'd try asking here.
I want to be rename a file so it has an a-umlaut (lower case) in the
name.
My LANG is en_GB.UTF-8.
I don't know how to type the accented character.
On Sat, Mar 17, 2007 at 09:51:53AM -0700, Ben Wiley Sittler wrote:
emacs seems not to handle utf-8 filenames at all, regardless of locale.
(setq file-name-coding-system 'utf-8)
~Rich
--
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/
On Sat, Mar 17, 2007 at 08:25:43AM +, Colin Paul Adams wrote:
Now this is where it gets interesting.
My URI resolver translates the file name (the URI is relative to a
base file: URI) into a UTF-8 byte sequence which gets passed to the
fopen call (the program is supposed to work on other
On Wed, Mar 14, 2007 at 02:01:04PM -0700, Rob Cameron wrote:
As part of my research program into high-speed XML/Unicode/text
processing using SIMD techniques, I have experimented extensively
with the UTF-8 to UTF-16 conversion problem.I've generally been
comparing performance of my
on
leveraging patents against them.
Sincerely,
Rich Felker
--
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/
On Thu, Mar 15, 2007 at 11:43:51AM -0700, Rob Cameron wrote:
Rich,
I would agree that the abuse of software patents is fundamentally
wrong and that patent reform is highly overdue. I am doing
something about it.
The use of software patents is the abuse of software patents. There is
On Thu, Mar 15, 2007 at 01:28:58PM -0700, Rob Cameron wrote:
Simon,
You asked about relevance. The UTF-8 to UTF-16 bottleneck
is widely cited in literature on XML processing performance.
And why would you do this? Simply keep the data as UTF-8. There's no
good reason for using UTF-16 at
On Thu, Mar 08, 2007 at 10:18:55PM -0500, Daniel B. wrote:
wrote:
I have yet to encounter a case where a character count is useful.
Well, if an an editor the user tries to move forward three characters,
you probably want to increment a character count (an offset from
the
On Thu, Mar 01, 2007 at 09:41:44AM +0100, Marcel Ruff wrote:
Are you thinking of Java's _modified_ version of UTF-8
(http://en.wikipedia.org/wiki/UTF-8#Java)?
Uhg, disgusting...
Yes - this is an open serious issue for my approach!
Has anybody some practical advice on this?
On Thu, Mar 01, 2007 at 07:53:52PM +0100, Marcel Ruff wrote:
Are you thinking of Java's _modified_ version of UTF-8
(http://en.wikipedia.org/wiki/UTF-8#Java)?
The first sentence from the above wiki says:
In normal usage, the Java programming language
On Tue, Feb 27, 2007 at 07:49:17PM -0500, Daniel B. wrote:
Marcel Ruff wrote:
As UTF-8 may not contain '\0' ...
Yes it can.
No, I think he just meant to say a string of non-NUL _characters_ may
not contain a 0 _byte_. The NUL character is not valid text or a
valid part of a string
On Tue, Feb 27, 2007 at 09:49:50AM -0500, SrinTuar wrote:
On Mon, Feb 26, 2007 at 03:35:05PM +0100, Stephane Bortzmeyer wrote:
Old code doesn't need to be ported.
Very strange advice, indeed.
You might want to read up on the history of UTF-8.
Here are some references for anyone wanting
On Mon, Feb 26, 2007 at 03:35:05PM +0100, Stephane Bortzmeyer wrote:
On Mon, Feb 26, 2007 at 08:10:59AM +0100,
Marcel Ruff [EMAIL PROTECTED] wrote
a message of 65 lines which said:
As UTF-8 may not contain '\0' you can simply use all functions as
before (strcmp(), std::string etc.).
On Sat, Feb 24, 2007 at 06:13:37PM +0100, Julien Claassen wrote:
Hi!
What I meant about UTF-8-strings in c++: I mean in c and c++ they're not
standard like in Java.
UTF-16, used by Java, is also variable-width. It can be either 2 bytes
or 4 bytes per character. Support for the characters
On Sat, Feb 24, 2007 at 01:39:25AM -0500, Rich Felker wrote:
using luit for this sounds appealing, but in my experience luit (a)
crashes frequently and (b) is easily confused by escape sequences and
has no user interface for resetting all its iso-2022 state, so in
practice it works
These days we have at least xterm, urxvt, mlterm, gnome-terminal, and
konsole which support utf-8 fairly well, but on the flip side there's
still a huge number of terminal emulators which do not respect the
user's encoding at all and always behave in a legacy-8bit-codepage
way.
Trying to help
On Fri, Feb 23, 2007 at 04:24:29PM -0800, Ben Wiley Sittler wrote:
just two cents: i did this some years back for the links and elinks
web browsers (it's the utf-8 i/o option available in some versions
FWIW: ELinks has since been fixed (in the development versions, not
yet released but working
On Mon, Feb 19, 2007 at 06:49:20PM +0100, Julien Claassen wrote:
Hello!
I've got one question. I'm writing a library in c++, which needs to handle
different character sets. I suppose for internal purposes UTF-8 is quite
sufficient. So is there a standard string class in the libstdc++ which
On Tue, Dec 12, 2006 at 08:56:06PM +0600, Christopher Fynn wrote:
Rich Felker wrote:
Whether it's possible to support all combinations efficiently, I don't
know. The OpenType system is very poorly designed from what I can
tell. In the Tibetan fonts I've examined, rather than just saying
On Mon, Dec 11, 2006 at 05:49:22PM +0100, Andries Brouwer wrote:
On Mon, Dec 11, 2006 at 05:06:23PM +0100, Jan Willem Stumpel wrote:
I am beginning to think that the responsibility for correct
combining accents behaviour rests primarily with the rendering
engine, rather than with the
On Thu, Dec 07, 2006 at 01:36:01PM +0900, Jiro SEKIBA wrote:
At Thu, 07 Dec 2006 03:17:32 +0100,
Mirco Bakker wrote:
The programm (written in C) uses only the standard Xlib. The
writing is done using XmbDrawString() (AFAIK function of choice).
I also tried Xutf8DrawString
On Wed, Dec 06, 2006 at 10:06:09PM -0500, Michael B Allen wrote:
Two things. First, I believe Pango is becoming the defacto method for
rendering non-Latin1 text in general purpose applications (I've never
I'm hoping we can remedy this situation. Xft/pango is extremely slow
compared to the core
On Sat, Nov 18, 2006 at 07:43:54PM +0530, Balaji.Ramdoss wrote:
Folks, well this is not on linux. I have an issue in Sun Solaris box
where octal values gets displayed instead of symbols like ^,| as
\136, \075.
This happens if I set my LC_CTYPE to en_US.UTF-8 locale and I have the
set
On Tue, Nov 07, 2006 at 01:13:24AM -0800, rajeev joseph sebastian wrote:
Well, I think I misunderstood ...
No problem.
---
In the first para, I asked whether it was possible to use TrueType
in the terminal. If we cannot, then we need to use some hybrid of
bitmap fonts and OT fonts,
On Mon, Nov 06, 2006 at 10:14:20AM -0800, rajeev joseph sebastian wrote:
I can say that you have done a good job. My point has so far been
that some kind of special font system should be created. In any
case, the use of straight TTF or OTF is not possible. (is it?). in
that case, it may be
On Sun, Nov 05, 2006 at 12:59:03PM -0800, rajeev joseph sebastian wrote:
Well, most correctly implemented Unicode-aware applicatons do this also:
have 2 backing stores, one for text and the other for glyphs. Use
the glyph representation for display. When a selection is done, the
map between
On Wed, Nov 01, 2006 at 01:34:14PM +0600, Christopher Fynn wrote:
Yes, Indic scripts like Malayalam need specific console fonts. I think
for console applications legibility is more important that beauty.
Why not use the typefaces used in old-fashioned Indian typewriters as a
starting
On Tue, Oct 31, 2006 at 09:37:34AM -0800, rajeev joseph sebastian wrote:
Hi Rich Felker,
I find your work to provide support for Indic text on
console/terminal to be admirable, and yes, any kind of display is
far better than none at all (and I do not consider your statement
insulting
On Mon, Oct 30, 2006 at 04:17:54AM -0800, rajeev joseph sebastian wrote:
Hello Rich Felker,
It is impossible to fit Malayalam glyphs into a given width class,
if you want even barely aesthetic text. This is because a given
sequence of Unicode characters may map into somewhat different
Sorry I originally replied off-list to Bruno because the list mail was
slow coming thru and I thought he was just mailing me in private..
On Mon, Oct 16, 2006 at 05:38:45PM -0700, Ben Wiley Sittler wrote:
just tried this in a few terminals, here are the results:
GNOME Terminal 2.16.1:
U+0D30
Working on uuterm[1], I've run into a problem with the characters
0D4A-0D4C and possibly others like them, in regards to wcwidth(3)
behavior. These characters are combining marks that attach on both
sides of a cluster, and have canonical equivalence to the two separate
pieces from which they are
On Mon, Oct 09, 2006 at 12:37:24PM -0600, Wesley J. Landaker wrote:
On Thursday 05 October 2006 16:03, Rich Felker wrote:
A few comments on Why not just use OpenType??:
- The GSUB model does not adapt well to a character cell device where
characters are organized into cells and where
[cc'ing the list since i think it's relevant]
On Fri, Oct 06, 2006 at 04:55:51PM -0400, Daniel Glassey wrote:
btw there is discussion about trying to integrate as much as possible on
http://live.gnome.org/UnifiedTextLayoutEngine that you might like to
contribute to.
well sadly i think the
After much work, I finally have a working (but still experimental)
version of uuterm and the ucf bitmap font format I proposed in
August. Source for uuterm is browsable at
http://svn.mplayerhq.hu/uuterm/ and a sample ucf font is linked from
the included README.
Since ucf is probably more
On Tue, Sep 05, 2006 at 12:57:08AM -0500, David Starner wrote:
On 9/5/06, Rich Felker [EMAIL PROTECTED] wrote:
In all seriousness, though, unless you're dealing with image, music,
or movie files, text weighs in quite heavy in size.
As opposed to what? The vast majority of content is one
On Tue, Sep 05, 2006 at 08:07:14AM -0600, Mark Leisher wrote:
Rich Felker wrote:
On Mon, Sep 04, 2006 at 08:19:02PM -0600, Mark Leisher wrote:
My last gasp on this conversation: I don't think you really understand
what you are talking about and won't until you get some hands-on
experience
On Mon, Sep 04, 2006 at 08:19:02PM -0600, Mark Leisher wrote:
Rich Felker wrote:
It went farther because it imposed language-specific semantics in
places where they do not belong. These semantics are correct with
sentences written in human languages which would not have been hard
On Mon, Sep 04, 2006 at 11:44:26PM -0500, David Starner wrote:
On 9/1/06, Rich Felker [EMAIL PROTECTED] wrote:
IMO the answer is common sense. Languages that have a low information
per character density (lots of letters/marks per word, especially
Indic) should be in 2-byte range and those
On Fri, Sep 01, 2006 at 04:32:40PM +1000, George W Gerrity wrote:
I did try to tell you that doing a terminal emulation properly would
be complex. I don't know if the algorithm is broken: I doubt it. But
it is difficult getting it to work properly and it essentially
requires internal
On Fri, Sep 01, 2006 at 09:36:44AM -0600, Mark Leisher wrote:
Rich Felker wrote:
If that were the problem it would be trivial. The problems are much
more fundamental. The key examples you should look at are things like:
printf(%s %d %d %s\n, string1, number2, number3, string4); where
On Fri, Sep 01, 2006 at 03:46:44PM -0600, Mark Leisher wrote:
Did it every occur to you that it wasn't the word processing mentality
of the Unicode designers that led to ambiguities surviving in plain
text? It is simply the fact that there is no nice neat solution. Unicode
went farther than
I read an old thread on the XFree88 i18n list started by Markus Kuhn
suggesting (rather strongly) that bidi should not be supported at the
terminal level, as well accusations (from other sources) by the author
of Yudit that UAX#9 bidi algo results in serious security issues due
to the
On Fri, Aug 18, 2006 at 06:25:16AM +0200, Werner LEMBERG wrote:
I've now received an answer from Dr. Oliver Korff, an expert for
Mongolian who has written MonTeX. Here a rough translation; see below
for the German version.
Classical Mongolian _can_ be written horizontally if you have
On Fri, Aug 18, 2006 at 03:39:17AM -0700, rajeev joseph sebastian wrote:
Hello Rich Felker,
start quote
1. Does any existing character cell application (terminal emulator)
both display correctly-rendered Indic text and conform to WI1, i.e.
does it update column position
On Sun, Aug 06, 2006 at 07:34:16AM -0400, Chris Heath wrote:
To my knowledge there is still no official standard as to which
characters have which width, but POSIX specifies the function used to
obtain the width of each character (and defines the results as
'locale-specific'), and Markus
On Fri, Aug 04, 2006 at 02:16:04PM +1000, George W Gerrity wrote:
Actually, that is what I was opposing. But any solution to console
representation has to handle three things together \windows-1252-0277
localisation,
internationalisation, and multilingualisation \windows-1252-0277 or
On Fri, Aug 04, 2006 at 09:04:43AM +0200, Werner LEMBERG wrote:
With my proposed context system it doesn't save but a few bytes
total in the font file since the context rules can be shared by all
the characters that need them.
Details, please.
I've got an email I was preparing to send
On Sat, Aug 05, 2006 at 09:11:39AM +0200, Werner LEMBERG wrote:
A terminal is a character-cell device, with fixed-width character
cells. This is not open to discussion, but fear not, it's not a
problem!
Actually, this limitation makes some things more complicated, because
you have to
To follow up on my original proposal and some of the alterrations and
simplifications I've made as a result of these discussions and
discussions with other people outside of this list, here's a summary
of the problem I'm trying to solve and how I plan to solve it:
Practical problems:
- no
On Sat, Aug 05, 2006 at 11:11:02AM +0200, Werner LEMBERG wrote:
BTW another issue of the substitution rules is that, as far as I can
tell, they can delete or insert extra glyphs arbitrarily.
Of course. How would you handle a ligature? `f' + `l' = `fl' -- this
means that a character has
On Fri, Aug 04, 2006 at 02:05:00PM +1000, Russell Shaw wrote:
Subpixel only works on LCDs, which produce ugly output.
I think sub-pixel rendering also works for a crt, but a sudden change
in pixel value (such as the edge of a black square on a white background)
is smeared (convolved with the
On Thu, Aug 03, 2006 at 08:41:35AM +0200, Werner LEMBERG wrote:
What about using bitmap-only TrueType fonts, as planned by the X
Windows people?
Could you direct me to good information? I have serious doubts but
I'd at least like to read what they have to say.
On Thu, Aug 03, 2006 at 03:40:29PM +1000, George W Gerrity wrote:
Please. Let's not have yet another *NIX font encoding and presenting
scheme! Why don't you set up a team to rationalise the existing
encodings and presentation methods.
This is the sort of mentality that sickens me. Please
On Thu, Aug 03, 2006 at 03:07:09PM +1000, Russell Shaw wrote:
Rich Felker wrote:
... snip long stuff
I agree on the total crappiness of current mainstream GUI implementations.
Thanks. It's refreshing to have some support from the non-bloat crowd
in m17n issues. Usually there's the standard
On Fri, Aug 04, 2006 at 03:46:29AM +1000, Russell Shaw wrote:
One possible approach I've considered is having the client application
provide an X font server to serve its own fonts, the sole purpose
being to allow them to be cached on the server side. The same thing
can be done with serverside
On Fri, Aug 04, 2006 at 01:30:34AM +0200, Werner LEMBERG wrote:
What you probably mean is that some language data needs to be
proprocessed into a normalized form before it is fed into the
font, for example Indic and Arabic scripts.
What sort of preprocessing? Reordering vowels?
A revised, simplified file format proposal based on my original
sketch, some of Markus's ideas for NCF, and an evaluation of which
optimizations were likely to benefit actual font data.
Definitions:
All numeric fields are variable length coded, using the high bit of
each byte as a continuation
On Thu, Aug 03, 2006 at 12:21:56AM +0200, Werner LEMBERG wrote:
A revised, simplified file format proposal based on my original
sketch, some of Markus's ideas for NCF, and an evaluation of which
optimizations were likely to benefit actual font data.
What about using bitmap-only TrueType
To Markus et. al.:
I read in the ancient archives for this list some ideas regarding a
so-called next generation console font, supporting unicode level-3
combining in a character cell environment. I'm presently working on a
new terminal emulator called uuterm (think of the uu as µ-ucs or
1 - 100 of 112 matches
Mail list logo