> So... if you want to help make people more aware of the grapheme_* 
> functions, one place to start would be editing the documentation for the 
> various string, mbstring, and grapheme functions to use consistent 
> terminology, and sign-post each other more clearly. 
> http://doc.php.net/tutorial/

Yes I agree, Also I've edited documentation before in the svn days. I already 
planned to read up on how this is working nowadays.

Also I'm planning an outline for a conference talk on the subject. I've 
educated people on unicode related subjects before, and think I have a few very 
good stories that can give insight into this for unsuspecting developers.

I love the analogy that most Europeans understand. For the city of Cologne, 
there are two equally valid ways to write it's German name. Köln and Koeln. 
(Used when hindered by technical limitations, or maybe in informal 
conversation) Every German can extra_e_decode() and extra_e_encode(). Same for 
Straße and Strasse.

Ligatures in fonts make it harder though, sometimes they intentionally 
obfuscate what's happening in the unicode layer. You might know this from 
special programming fonts with glyphs for ===, <> and such.

Some Dutch fonts do a special ligature that combines ij, which was in the Dutch 
alphabet when I was a kid, 'y' was not. Unicode U+0132 and U+0133 describe this 
symbol, but I've never seen them used. Fonts combining ij to one visual entity 
is more common.

I imagine most languages and cultures have these kind of edge-cases.

Reply via email to