[ http://issues.apache.org/jira/browse/XERCESC-1444?page=all ]
Axel Weiss updated XERCESC-1444:
--------------------------------
Attachment: transcoding-bench-results3.txt
iconv-transcoder-190956.diff
Attached are
1. some benchmark results (details explained below)
2. a patch that improves markably the iconv transcoder.
Benchmark investigations: I compared three variants of the iconv transcoder
against the icu transcoder: first is the current per-symbol transcoder with
dynamic buffer growth, second is a naive variant that pre-calculates the needed
size (calcRequiredSize) and calls transcoding methods with fixed buffer size,
third is the new block-oriented iconv transcoder algorithm. Each transcoder has
been run with different data sets: different sizes (3, 20, 100, 3000, 55000 and
700000 chars for the utf-8 coding), and differrent symbol mixings (irregularity
= a measure for the degree of multi-byte symbols). The data has been produced
by a random string generator with built-in properties (e.g. the targeted
irregularity).
The benchmark has been run on a 1050-MHz Athlon SMP (SuSE-Linux 9.3) with
statistical timing analysis. Due to the nature of the operating system, the
measured timings are somewhat unreliable, but should be comparable against each
other with an uncertainity of a few persents.
The results have been assembled in four tables (see attachment). I give a brief
summary of the results, so you can have an idea about the timing properties of
the different transcoder variants.
1. Comparing ICU with the current per-symbol iconv transoder:
+ transcoding very small strings (up to ~10 symbols), iconv is faster than icu
+ for longer strings, icu performs about five times better than iconv
+ icu takes longer time to convert char to xmlch than vice versa (comparing the
number of symbols, not strlen)
2. Comparing the current iconv transcoder with a naive approach:
+ the naive algorithm almost always takes double time (which was expected, but
verified, though)
3. Comparing the current iconv transcoder with the new, block-oriented algorithm
+ only with very small strings, the execution times are almost the same
+ in other cases, the new algorithm is six to ten times faster
4. Comparing icu transcoder with the new iconv algorithm
+ up to a length of 3000, the new algorithm outperforms icu (!)
+ for longer strings, icu is better, especially for the char->xmlch direction
So, I hope you like the new iconv transoder. Can you test it on the different
platforms (esp. for eliminating compiling issues like the current one)?
Cheers,
Axel
> "xercesc/util/Transcoders/Iconv/IconvTransService.cpp", line 414: Error: An
> integer constant expression is required within the array subscript operator
> -------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: XERCESC-1444
> URL: http://issues.apache.org/jira/browse/XERCESC-1444
> Project: Xerces-C++
> Type: Bug
> Components: Utilities
> Versions: Nightly build (please specify the date)
> Environment: % uname -a
> SunOS merlin.sce.carleton.ca 5.9 Generic_118558-04 sun4u sparc
> SUNW,Sun-Blade-100
> % which CC
> /opt/SUNWspro/bin/CC
> %
> Reporter: Greg Franks
> Attachments: diff.out, iconv-transcoder-190956.diff,
> transcoding-bench-results3.txt
>
> Compiled from
> % svn info
> Path: .
> URL: http://svn.apache.org/repos/asf/xerces/c/trunk
> Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68
> Revision: 191274
> Compiling xercesc/util/Transcoders/Iconv/IconvTransService.cpp
> "xercesc/util/Transcoders/Iconv/IconvTransService.cpp", line 414: Error: An
> integer constant expression is required within$
> 1 Error(s) detected.
> make[2]: *** [xercesc/util/Transcoders/Iconv/IconvTransService.lo] Error 1
> The Sunpro compiler doesn't like MB_CUR_MAX.
> while (toTranscode[srcCursor])
> {
> char mbBuf[MB_CUR_MAX];
> int len = wctomb(mbBuf, toTranscode[srcCursor++]), j;
> if (len < 0)
> MB_CUR_MAX is defined in /usr/include/iso/stdlib_iso.h. I don't know if this
> should/should not be dragged in automagically based on compiler flags. I'll
> poke around a bit more.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]