i was really suprised by this, too.
i did a bit of work for a company that does searchable literature a couple
of months ago. they were having trouble with "bad unicode". the problem
was stuff like this:
CA: Corporate Author
Nizhegorodskai͡a͡ gosudarstvennai͡a͡
selʹskokhozi͡a͡ĭstvennai͡a͡ akademii͡a͡
the character that probablly doesn't look right is a combining double breve.
it's actually good data. i tracked down the cover of this book and it's really
spelled like that.
the problem is that the unicode folk didn't have the foresight to include
stuff like this.
- erik
On Fri May 19 17:05:24 CDT 2006, [EMAIL PROTECTED] wrote:
> isn´t there enough space to keep all them there?
>
> On 5/19/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
> > á is a single codepoint. sure. but there are useful letters that don't
> > exist in unicode unless they are composed. e.g. romanized russian,
> > accented cyrillic, etc.
> >
> > - erik
> >
> > On Fri May 19 17:00:38 CDT 2006, [EMAIL PROTECTED] wrote:
> > > I think that á is just a single rune, not two different ones composed. If
> > > to type them, you have to type several keys, it´s just a keyboard issue,
> > > isn´t it? I don´t understand why this could go to a upper layer. Is there
> > > any other problem? (besides having to use utf8 for i/o, I mean).