Thanks for these results Alexei - it's interesting to see that icu4j
does not lag far behind icu4jni even on such a large conversion.
I have discovered that ICU4J 3.8 does not support ISO-2022 charsets
currently [1], which causes one test
(tests.api.java.io.InputStreamReaderTest.test_read()) to fail. This
would only be a temporary issue and I do not see it as a major issue.
However, I am not familiar with this charset and as such cannot fully
gauge the impact of it's absence on the community. Would this lack of
support be an issue?
If the short-term lack of ISO-2022 support is not a problem, then Id
like to move ahead to completely use icu4j 3.8 and remove the icu4jni
and icu4c dependencies in classlib. I will give it a couple of days and,
if there are no objections, I will go ahead and apply the changes required.
Regards,
Oliver
[1] http://bugs.icu-project.org/trac/ticket/5791
Alexei Zakharov wrote:
Hi Oliver,
I've created a small benchmark too. It takes Leo Tolstoy's "War and
Peace" Book One as input and converts it from Russian CP-1251 to
UTF-16 (10 times) and back (also 10 times). You may find the
benchmark's source code and a build file at [1]. The first difference
from your benchmark is the language & encoding - Russian in my case.
The second difference is the set of tested VMs - I've run the
benchmark on RI, J9 and DLRVM.
You may find results below. BTW the results shows that in this
particular test our internal providers (from
org.apache.harmony.niochar.charset package) are faster than both
versions of ICU. Another interesting fact is terrible ICU performance
on DLRVM. However, on J9 it works rather fast. And this is something
that should be fixed IMO (bad performance on DRLVM I mean). And
finally, yes, ICU4JNI is a little bit faster than ICU4J in this test.
However, "War and Peace" is a rather big book (paper version of the
first part contains about 400 pages, if repeated 10 times = 4000
pages), but difference in numbers is not so big.
[1] http://people.apache.org/~ayza/icu_experiments/
RI
---
Built-in
<sun.nio.cs.MS1251$Decoder> Decoding time: 571 millis
<sun.nio.cs.MS1251$Encoder> Encoding time: 351 millis
ICU4j
<com.ibm.icu.charset.CharsetMBCS$CharsetDecoderMBCS> Decoding time: 430 millis
<com.ibm.icu.charset.CharsetMBCS$CharsetEncoderMBCS> Encoding time: 551 millis
ICU4JNI
<com.ibm.icu.charset.CharsetMBCS$CharsetDecoderMBCS> Decoding time: 401 millis
<com.ibm.icu.charset.CharsetMBCS$CharsetEncoderMBCS> Encoding time: 540 millis
J9
---
Built-in
<org.apache.harmony.niochar.charset.CP_1251$Decoder> Decoding time: 231 millis
<org.apache.harmony.niochar.charset.CP_1251$Encoder> Encoding time: 430 millis
ICU4j
<com.ibm.icu.charset.CharsetMBCS$CharsetDecoderMBCS> Decoding time: 781 millis
<com.ibm.icu.charset.CharsetMBCS$CharsetEncoderMBCS> Encoding time: 620 millis
ICU4JNI
<com.ibm.icu.charset.CharsetMBCS$CharsetDecoderMBCS> Decoding time: 561 millis
<com.ibm.icu.charset.CharsetMBCS$CharsetEncoderMBCS> Encoding time: 371 millis
DRLVM
---
Built-in
<org.apache.harmony.niochar.charset.CP_1251$Decoder> Decoding time: 351 millis
<org.apache.harmony.niochar.charset.CP_1251$Encoder> Encoding time: 540 millis
ICU4j
<com.ibm.icu.charset.CharsetMBCS$CharsetDecoderMBCS> Decoding time: 6660 millis
<com.ibm.icu.charset.CharsetMBCS$CharsetEncoderMBCS> Encoding time: 1071 millis
ICU4JNI
<com.ibm.icu.charset.CharsetMBCS$CharsetDecoderMBCS> Decoding time: 6179 millis
<com.ibm.icu.charset.CharsetMBCS$CharsetEncoderMBCS> Encoding time: 451 millis
With Best Regards,
Alexei
2007/10/11, Oliver Deakin <[EMAIL PROTECTED]>:
Tony Wu wrote:
On 10/8/07, Oliver Deakin <[EMAIL PROTECTED]> wrote:
Are there any particular
benchmarks you had in mind for this?
ya, there is a micro benchmark on HARMONY-3709
<SNIP!>
I have run the micro benchmark on Harmony with it's current ICU
configuration (icu4jni 3.4.4) and on Harmony with pure icu4j 3.8. The
results are pretty much as expected - for small jobs icu4j is
significantly faster, for large jobs icu4jni comes out on top (full
results at the end of this email). It seems that performance-wise there
are benefits on both sides depending on the work we are doing.
Regards,
Oliver
--
Oliver Deakin
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU