On Fri May 19 19:13:39 CDT 2006, [EMAIL PROTECTED] wrote:
> > no.  the unicode sequences (e.g. U+0069 U+0361) are correct.
> > i checked this and several other examples with the actual books.
> 
>   How did you check it ? Visual inspection ? 

since these were actual books, i know of no other way. ;-)

>   Since I'm no expert
>   in UNICODE I'm quite curious to know how one is supposed to
>   tell between a real character and a combination of a diacritic
>   and some other character when they are visually indistinguishable ?

say i have a random accented letter.  suppose that U+x is the cp for
the letter.  suppose U+y is the cp for the accent.  suppose that we're lucky
and there exists U+w ≡ U+xU+y.  then U+w should be the same glyph
as U+xU+y.

cannonical composition would yield
        compose(U+xU+y) U+w
        compose(U+w)            U+w
while cannonical decompostion would yield
        decompose(U+xU+y)       U+xU+y
        decompose(U+w)          U+xU+y


>   I would expect unicode to always favor single glyphs from a particular 
>   page over anything else.

it's always a single glyph.  don't confuse letters, codepoints, and glyphs.

> 
>   btw, could you send me a .png with the actual title ?

i'll send you a png of the character.  i don't have the books.

what language rule are you trying to get at?

- erik

> 
> > i think you misunderstand how unicode works.  
> 
>   That could very well be the case ;-) But I know how Russian language
>   works regardless of what committee members think.
> 
> > a base cp like U+0069 followed by a combining cp like U+0361 
> > make a single character.  this identification is called "composition".
> > unicode contains some precomposed cps, but not U+0069 U+0361.
> 
>   That's ok. My only point is -- I would expect anybody who enters 
>   titles into a database adhere to the rules of the language the
>   title is written in. Maybe its too much to expect, though.
> 
> Thanks,
> Roman.
> 

Reply via email to