Changjian Sun said:

> For cross-platform software (NT,Solaris,HP,AIX), the only 3rd-party 
> unicode support
> I found so far is IBM ICU. 
> It's a very good support for cross-platform software internationalization. 
> However, 
> ICU internally uses UTF-16, For our application using UTF-8 as input and 
> output,
> I have to convert from UTF-8 to UTF-16, before calling ICU functions (such 
> as ucol_strcoll() )
> 
> I'm worried about the performance overhead of this conversion.

You shouldn't be.

The conversion from UTF-8 to UTF-16 and back is algorithmic and very
fast.

If you are expecting better performance from a library that takes UTF-8
API's and then does all its internal processing in UTF-8 *without*
converting to UTF-16, then I think you are mistaken. UTF-8 is a bad
form for much of the kind of internal processing that ICU has to do
for all kinds of things -- particularly for collation weighting, for
example. Any library worth its salt would *first* convert to UTF-16
(or UTF-32) internally, anyway, before doing any significant semantic
manipulation of the characters.

> Are there any other cross-platform 3rd party unicode supports with better 
> UTF-8 handling ?

In my opinion, it is unlikely that there are *any* good Unicode libraries
that provide pure UTF-8 handling only, inside and out. It is just
more efficient, elegant, and higher-performance to take the form
conversion hit, but then use a better processing form for manipulating
the characters.

UTF-8 shines as a legacy API and protocol compatibility form.
But it stinks as a processing form.

--Ken

> Thanks a lot.
> 
> -Changjian Sun

Reply via email to