Re: [classlib][icu] Bringing ICU level up to 3.8

Tim Ellison Wed, 17 Oct 2007 04:47:40 -0700

Alexei Zakharov wrote:
> I've created a small benchmark too. It takes Leo Tolstoy's "War and
> Peace" Book One as input and converts it from Russian CP-1251 to
> UTF-16 (10 times) and back (also 10 times). You may find the
> benchmark's source code and a build file at [1].  The first difference
> from your benchmark is the language & encoding - Russian in my case.
> The second difference is the set of tested VMs - I've run the
> benchmark on RI, J9 and DLRVM.


Interesting numbers.  How common is converting that size of data do you
think in real world applications?  I'm only guessing, but I would think
that most conversions are short strings, with perhaps the occasional
long XML document.  While the converters should not go pathological on
such a long string I am concerned that we optimize for the right case.

> You may find results below. BTW the results shows that in this
> particular test our internal providers (from
> org.apache.harmony.niochar.charset package) are faster than both
> versions of ICU. Another interesting fact is terrible ICU performance
> on DLRVM. However, on J9 it works rather fast. And this is something
> that should be fixed IMO (bad performance on DRLVM I mean). And
> finally, yes, ICU4JNI is a little bit faster than ICU4J in this test.
> However, "War and Peace" is a rather big book (paper version of the
> first part contains about 400 pages, if repeated 10 times = 4000
> pages), but difference in numbers is not so big.
> 
> [1] http://people.apache.org/~ayza/icu_experiments/

I had a quick look at the benchmark you were using, and have a couple of
observations:

I see that it uses a MappedByteBuffer (which is a type if direct
ByteBuffer), so that will exercise the *native* code encoding/decoding
loop in Harmony.  I wonder how things change if you use a Java-heap
ByteBuffer so it uses the *Java* code encoding/decoding loop.

It would also be interesting to vary the inputs across a number of
string lengths to see if there is a reasonable heuristic we should add
to avoid incurring the JNI overhead even when the buffer is direct.

So the numbers you show are useful, but it would be even more useful to
see graphs of time vs. input size for direct and non-direct byte buffers
too.

(Left as an exercise for the reader <g>)

Regards,
Tim

> RI
> ---
> Built-in
> <sun.nio.cs.MS1251$Decoder> Decoding time: 571 millis
> <sun.nio.cs.MS1251$Encoder> Encoding time: 351 millis
> 
> ICU4j
> <com.ibm.icu.charset.CharsetMBCS$CharsetDecoderMBCS> Decoding time: 430 millis
> <com.ibm.icu.charset.CharsetMBCS$CharsetEncoderMBCS> Encoding time: 551 millis
> 
> ICU4JNI
> <com.ibm.icu.charset.CharsetMBCS$CharsetDecoderMBCS> Decoding time: 401 millis
> <com.ibm.icu.charset.CharsetMBCS$CharsetEncoderMBCS> Encoding time: 540 millis
> 
> J9
> ---
> Built-in
> <org.apache.harmony.niochar.charset.CP_1251$Decoder> Decoding time: 231 millis
> <org.apache.harmony.niochar.charset.CP_1251$Encoder> Encoding time: 430 millis
> 
> ICU4j
> <com.ibm.icu.charset.CharsetMBCS$CharsetDecoderMBCS> Decoding time: 781 millis
> <com.ibm.icu.charset.CharsetMBCS$CharsetEncoderMBCS> Encoding time: 620 millis
> 
> ICU4JNI
> <com.ibm.icu.charset.CharsetMBCS$CharsetDecoderMBCS> Decoding time: 561 millis
> <com.ibm.icu.charset.CharsetMBCS$CharsetEncoderMBCS> Encoding time: 371 millis
> 
> 
> DRLVM
> ---
> Built-in
> <org.apache.harmony.niochar.charset.CP_1251$Decoder> Decoding time: 351 millis
> <org.apache.harmony.niochar.charset.CP_1251$Encoder> Encoding time: 540 millis
> 
> ICU4j
> <com.ibm.icu.charset.CharsetMBCS$CharsetDecoderMBCS> Decoding time: 6660 
> millis
> <com.ibm.icu.charset.CharsetMBCS$CharsetEncoderMBCS> Encoding time: 1071 
> millis
> 
> ICU4JNI
> <com.ibm.icu.charset.CharsetMBCS$CharsetDecoderMBCS> Decoding time: 6179 
> millis
> <com.ibm.icu.charset.CharsetMBCS$CharsetEncoderMBCS> Encoding time: 451 millis
> 
> With Best Regards,
> Alexei
> 
> 2007/10/11, Oliver Deakin <[EMAIL PROTECTED]>:
>> Tony Wu wrote:
>>> On 10/8/07, Oliver Deakin <[EMAIL PROTECTED]> wrote:
>>>> Are there any particular
>>>> benchmarks you had in mind for this?
>>>>
>>>>
>>> ya, there is a micro benchmark on HARMONY-3709
>>>
>>>
>> <SNIP!>
>>
>> I have run the micro benchmark on Harmony with it's current ICU
>> configuration (icu4jni 3.4.4) and on Harmony with pure icu4j 3.8. The
>> results are pretty much as expected - for small jobs icu4j is
>> significantly faster, for large jobs icu4jni comes out on top (full
>> results at the end of this email). It seems that performance-wise there
>> are benefits on both sides depending on the work we are doing.
>>
>> Regards,
>> Oliver
>

Re: [classlib][icu] Bringing ICU level up to 3.8

Reply via email to