[ http://issues.apache.org/jira/browse/XERCESC-1444?page=all ]

Axel Weiss updated XERCESC-1444:
--------------------------------

    Attachment: transcoding-bench-results3.txt
                iconv-transcoder-190956.diff

Attached are
1. some benchmark results (details explained below)
2. a patch that improves markably the iconv transcoder.

Benchmark investigations: I compared three variants of the iconv transcoder 
against the icu transcoder: first is the current per-symbol transcoder with 
dynamic buffer growth, second is a naive variant that pre-calculates the needed 
size (calcRequiredSize) and calls transcoding methods with fixed buffer size, 
third is the new block-oriented iconv transcoder algorithm. Each transcoder has 
been run with different data sets: different sizes (3, 20, 100, 3000, 55000 and 
700000 chars for the utf-8 coding), and differrent symbol mixings (irregularity 
= a measure for the degree of multi-byte symbols). The data has been produced 
by a random string generator with built-in properties (e.g. the targeted 
irregularity).

The benchmark has been run on a 1050-MHz Athlon SMP (SuSE-Linux 9.3) with 
statistical timing analysis. Due to the nature of the operating system, the 
measured timings are somewhat unreliable, but should be comparable against each 
other with an uncertainity of a few persents.

The results have been assembled in four tables (see attachment). I give a brief 
summary of the results, so you can have an idea about the timing properties of 
the different transcoder variants.

1. Comparing ICU with the current per-symbol iconv transoder:
+ transcoding very small strings (up to ~10 symbols), iconv is faster than icu
+ for longer strings, icu performs about five times better than iconv
+ icu takes longer time to convert char to xmlch than vice versa (comparing the 
number of symbols, not strlen)

2. Comparing the current iconv transcoder with a naive approach:
+ the naive algorithm almost always takes double time (which was expected, but 
verified, though)

3. Comparing the current iconv transcoder with the new, block-oriented algorithm
+ only with very small strings, the execution times are almost the same
+ in other cases, the new algorithm is six to ten times faster

4. Comparing icu transcoder with the new iconv algorithm
+ up to a length of 3000, the new algorithm outperforms icu (!)
+ for longer strings, icu is better, especially for the char->xmlch direction

So, I hope you like the new iconv transoder. Can you test it on the different 
platforms (esp. for eliminating compiling issues like the current one)?

Cheers,
                            Axel

> "xercesc/util/Transcoders/Iconv/IconvTransService.cpp", line 414: Error: An 
> integer constant expression is required within the array subscript operator
> -------------------------------------------------------------------------------------------------------------------------------------------------------
>
>          Key: XERCESC-1444
>          URL: http://issues.apache.org/jira/browse/XERCESC-1444
>      Project: Xerces-C++
>         Type: Bug
>   Components: Utilities
>     Versions: Nightly build (please specify the date)
>  Environment: % uname -a
> SunOS merlin.sce.carleton.ca 5.9 Generic_118558-04 sun4u sparc 
> SUNW,Sun-Blade-100
> % which CC
> /opt/SUNWspro/bin/CC
> % 
>     Reporter: Greg Franks
>  Attachments: diff.out, iconv-transcoder-190956.diff, 
> transcoding-bench-results3.txt
>
> Compiled from
> % svn info
> Path: .
> URL: http://svn.apache.org/repos/asf/xerces/c/trunk
> Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68
> Revision: 191274
> Compiling xercesc/util/Transcoders/Iconv/IconvTransService.cpp
> "xercesc/util/Transcoders/Iconv/IconvTransService.cpp", line 414: Error: An 
> integer constant expression is required within$
> 1 Error(s) detected.
> make[2]: *** [xercesc/util/Transcoders/Iconv/IconvTransService.lo] Error 1
> The Sunpro compiler doesn't like MB_CUR_MAX.
>         while (toTranscode[srcCursor])
>         {
>                 char mbBuf[MB_CUR_MAX];
>                 int len = wctomb(mbBuf, toTranscode[srcCursor++]), j;
>                 if (len < 0)
> MB_CUR_MAX is defined in /usr/include/iso/stdlib_iso.h.  I don't know if this 
> should/should not be dragged in automagically based on compiler flags.  I'll 
> poke around a bit more.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to