On Thu, Nov 17, 2011 at 10:52 PM, Marvin Humphrey <[email protected]> wrote: > > OK, I remain at least academically interested in what sort of performance > advantages 'simple' case folding affords us, and at what penalty in terms of > relevancy. >
I think it depends how its implemented, I'm not sure there is really a performance advantage to the simpler one. In ICU at least, the recursive part of nfkc_cf is computed up-front, into the data files, and you get normalization+case folding at runtime in one-pass (versus utf8proc's multiple passes, and its not clear all the corner cases are working there) As far as relevance, I think realistically only german users (ß/SS) or anyone with ancient greek would care if you cheated and used the simple one instead, especially if you are already normalizing anyway. But that was just my point: if you are normalizing anyway, why not just choose a normalization form that also does the case folding too. -- lucidimagination.com
