Re: Unicode handling comparison

Jakob Ovrum Wed, 27 Nov 2013 08:56:58 -0800

On Wednesday, 27 November 2013 at 16:18:34 UTC, Wyatt wrote:

I agree with the assertion that people SHOULD know how unicodeworks if they want to work with it, but the way our docs arenow is off-putting enough that most probably won't learnanything. If they know, they know; if they don't, the wall ofjargon is intimidating and hard to grasp (more examples upfront of more things that you'd actually use std.uni for).Even though I'm decently familiar with Unicode, I was havingtrouble following all that (e.g. Isn't "noe\u0308l" a graphemecluster according to std.uni?). On the flip side, std.utf hasa serious dearth of examples and the relationship between thetwo isn't clear.

I thought it was nice that std.uni had a proper terminologysection, complete with links to Unicode documents to kick-startbeginners to Unicode. It mentions its relationship with std.utfright at the top.

Maybe the first paragraph is just too thin, and it's hard to seethe big picture. Maybe it should include a small leadingparagraph detailing the three levels of Unicode granularity thatD/Phobos chooses; arrays of code units -> ranges of code points-> std.uni for graphemes and algorithms.

Yes, please. While operations on single codepoints andcharacters seem pretty robust (i.e. you can do lots of thingswith and to them), it feels like it just falls apart when youtry to work with strings. It honestly surprised me how manythings in std.uni don't seem to work on ranges.
-Wyatt

Most string code is Unicode-correct as long as it works on codepoints and all inputs are of the same normalization format;explicit grapheme-awareness is rarely a necessity. By that I meanthe most common string operations, such as searching, getting asubstring etc. will work without any special grapheme decoding(beyond normalization).

The hiccups appear when code points are shuffled around, or theorder is changed. Apart from these rare string manipulationcases, grapheme awareness is necessary for rendering code.

Re: Unicode handling comparison

Reply via email to