On Wed, Mar 14, 2007 at 02:01:04PM -0700, Rob Cameron wrote:
> As part of my research program into high-speed XML/Unicode/text
> processing using SIMD techniques, I have experimented extensively
> with the UTF-8 to UTF-16 conversion problem. I've generally been
> comparing performance of my software with that of iconv under
> Linux and Mac OS X. Are there any substantially faster implementations
> available? Currently our u8u16-0.9 software runs about 3X to 25X faster
> than iconv depending on platform and data characteristics.
GNU iconv is an extremely bad implementation to test for performance.
It has high overhead per call (so it will only be remotely fast on
very large runs, not individual character conversions), and even then
I don't suspect it would be very fast.
Why not just write the naive conversion algorithm yourself? For the
UTF-8 decoding, refer to uClibc's implementation of mbrtowc for UTF-8
locales, which is probably the fastest I've seen. I also have an
implementation in i386 asm which might be slightly faster.
> u8u16-0.9 is available as open source software under an OSL 3.0 license
> at http://u8u16.costar.sfu.ca/
Thanks. I'll take a look.
Linux-UTF8: i18n of Linux on all levels