Re: High-Speed UTF-8 to UTF-16 Conversion

ＳｒｉｎＴｕａｒ Thu, 15 Mar 2007 13:54:30 -0800

You asked about relevance.   The UTF-8 to UTF-16 bottleneck
is widely cited in literature on XML processing performance.


The problem then appears to be UTF-16. You have to spend an awful lot
less time transcoding if you simply leave the XML in utf-8 like it
should be.  (Even if these means either dropping Java, or proposing a
fix to the language ) UTF-16 is the worst possible choice of unicode
encoding, having all the downsides of the other choices and none of
their upsides.

This type of transcoding is the height of trivial, and I can only
imagine it being significant if implemented in a highly inefficient
manner. (perhaps a naive implementation in native Java its not so
swift.)

Giuseppe Psaila, "On the problem of coupling Java Algorithms and XML Parsers
17th Int. Conf on Database Systems and Applications, 2006.


The article does not seems to be freely readable.


In any case, a patent pending vector implementation is not exactly interesting.

--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Re: High-Speed UTF-8 to UTF-16 Conversion

Reply via email to