Boris Kolpackov wrote:
I'd prefer https://issues.apache.org/jira/browse/XERCESC-1242 to be
fixed in this release also.
I and Alberto (who is a local regex guru ;-)) discussed applying this
patch to 2.8.0. However, the changes are extensive and it is unclear
what impact they will have on the performance of "normal" cases (short
strings). After all the patch essentially substitutes program stack
(which is quite fast) with a heap-based stack (which is quite slow).
So we decided not to apply it to 2.8.0. It is however in 3.0.0 which
will hopefully be released soon.
To make situation more clear here some numbers from my performance test
(modified "SAX2Count -v=always -f -p" which parsing MemBufInputSource
many times, cached grammar, simple schema, simple regex "[A-Z]*"):
1 element with 1 attribute - BCB5
attr. length performance change (gain if +)
10000 +5.03%
1000 -5.67%
100 -3.61%
10 -1.56%
1 element with 1 attribute - gcc 4.1.2
attr. length performance change (gain if +)
100000 -13.05%
10000 +0.8%
1000 -12.9%
100 -7.27%
10 -4.84%
128 elements with 1 attribute - BCB5
attr. length performance change (gain if +)
10000 +3.37%
1000 -7.31%
100 -13.78%
10 -23.09%
128 elements with 1 attribute - gcc 4.1.2
attr. length performance change (gain if +)
10000 -5.51%
1000 -15.67%
100 -19.92%
10 -20.77%
So you are right, there is some performance loss. We should see less
performance change in real life applications though.
Good luck!
Vitaly
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]