[
https://issues.apache.org/jira/browse/XERCESJ-1382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12730844#action_12730844
]
Bene commented on XERCESJ-1382:
-------------------------------
Thank you for the information.
Then we are using the default setting (coalescing=false), since we don't call
setCoalescing(true) anywhere.
All times below represent parsing durations of the same document
(xerces_performance_problem.xml), containing
XML with embedded XML within a CDATA section.
I can confirm, that removing all newlines from the entire file improves parsing
time about a factor of 8!!!
With newlines (with_newlines.png) : Parse duration for 247628 characters:
5125 ms.
Without newlines (without_newlines.png): Parse duration for 241316 characters:
640 ms.
Unfortunately removing the newlines is not a workaround for us, since we
heavily rely on
filesystem diffs of xml documents. Besides the xml documents must be easily
readable by humans.
So I did some further analysis. What I found out is that the performance
problem only occurs if the document is also being
validated against an XSD (resource.xsd) during parsing.
Please see the two screenshots:
no_xsd_validation.png 172 ms
xsd_validation.png 5016 ms
I assume that some performance optimizations, which have been done for the
non-validating parser
are not yet used by the XSD-parser (especially not used by
org.apache.xerces.impl.xs.XMLSchemaValidator)
I hope this information is helpful.
> Performance problem in org.apache.xerces.dom.CharacterDataImpl appendData
> -------------------------------------------------------------------------
>
> Key: XERCESJ-1382
> URL: https://issues.apache.org/jira/browse/XERCESJ-1382
> Project: Xerces2-J
> Issue Type: Bug
> Components: DOM (Level 3 Core)
> Affects Versions: 2.6.0, 2.9.1
> Environment: Windows XP SP2; JRE 1.6.0_13; Xerces2 Java Parser 2.9.1
> Release (Xerces-J-bin.2.9.1.zip)
> Reporter: Bene
> Priority: Critical
> Attachments: xerces_performance_problem.png,
> xerces_performance_problem.xml
>
>
> It takes too long to parse a large XML Document, if the document contains
> CDATA sections, which contain embedded XML.
> The problem initially occured with Xerces 2.6.0, where it took about 30
> seconds !!! to parse an XML document with about 250 KB.
> So we upgraded to Xerces 2.9.1, which improves parse time to about 5 seconds.
> Unfortunately this is still much too slow!
> I tried to find similar bug reports and there are many:
> XERCESJ-102
> XERCESJ-1268
> XALANJ-2398
> Unfortunately the issue is still not fixed, so I decided to create this
> report.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]