On Wednesday, October 24, 2012 04:54:36 Jonathan M Davis wrote: > That being said, there _is_ a bug in commonPrefix that I just noticed when > looking it over. It currently operates on code units rather than code > points. It can operate on strings just fine like it's doing now (even > returning a slice), but it needs to decode the code points as it iterates > over them, and it's not doing that.
Wait. No. I think that it's (mostly) okay. I was thinking that you could have different sequences of code units which resolved to the same code point, and upon reflection, I don't think that you can. It's graphemes which can be represented by multiple sequences of code points, not code points which can be represented by multiple sequences of code units (unicode is overly confusing to say the least). There's still an issue with the predicate though (hence the "mostly" above). If anything _other_ than == or != is used, then the code units would have to be decoded in order to pass dchars to the predicate. So, commonPrefix should be fine as-is in all cases except for when a custom predicate is given, and it's operating on narrow strings. - Jonathan M Davis