Hello everybody (long time, first time), I came across an interesting post a few days ago with a method for counting the number of characters in UTF-8: http://www.daemonology.net/blog/2008-06-05-faster-utf8-strlen.html
Rather than checking byte-by-byte, it loads as many bytes as fit into the word-size of the of the system and works on them in parallel. Does anyone know any good resources that can help explain how one might come up with lines of code like the following 2? u = ((u & (ONEMASK * 0x80)) >> 7) & ((~u) >> 6); count += (u * ONEMASK) >> ((sizeof(size_t) - 1) * 8); In particular, I'm not sure how the multiplications fit in. Taking "ONEMASK * 0x80" as a constant, the 1st line is pretty straight-forward, but I haven't a clue for the 2nd. Apologies if I've managed to push the entire content of one of my early CS classes out of my head. :) Thanks! Mark -- You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
