I would like to get the performance of string handling in Julia improved 
(and also correct a number of flaws in handling Unicode).
Currently, many of the operations are very slow (because of the way strings 
are represented, and also simply
because the algorithms used were not as fast as they could be).
For communicating with Java where the String type is always UTF-16, C/C++, 
where wide strings are either UTF-16 or UTF-32,
databases (which frequently use UTF-16 in Asia, because UTF-8 can take more 
space, depending on the data), or using Unicode
APIs in Windows or the ICU libraries which use UTF-16, the performance of 
the conversions between the ASCIIString/UTF8String
types that Julia normally uses for strings and UTF-16 can be critical for 
an applications performance.

I have a PR (#11004) to fix the performance issue #10959 I raised (in the 
GitHub repository ScottPJones/julia branch spj/fast_utf),
if anybody would like to try this out.
My own testing shows that it gives about a 2-10x improvement (most of the 
time, 10x), you can get my latest benchmark results
along with the code I used to benchmark it at my 
gist, https://gist.github.com/ScottPJones/bb712f7b85d1d8d91a9a.

I am curious if these sorts of performance improvements (along with better 
error handling and Unicode input validation)
would be important to anybody else, as I am trying to convince people to 
merge this PR into the Julia base...

Thanks, Scott


Reply via email to