[ 
https://issues.apache.org/jira/browse/XERCESJ-1382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12730844#action_12730844
 ] 

Bene commented on XERCESJ-1382:
-------------------------------

Thank you for the information. 
Then we are using the default setting (coalescing=false), since we don't call 
setCoalescing(true) anywhere.

All times below represent parsing durations of the same document 
(xerces_performance_problem.xml), containing
XML with embedded XML within a CDATA section.

I can confirm, that removing all newlines from the entire file improves parsing 
time about a factor of 8!!!
With    newlines (with_newlines.png)   : Parse duration for 247628 characters: 
5125 ms. 
Without newlines (without_newlines.png): Parse duration for 241316 characters:  
640 ms. 

Unfortunately removing the newlines is not a workaround for us, since we 
heavily rely on
filesystem diffs of xml documents. Besides the xml documents must be easily 
readable by humans.

So I did some further analysis. What I found out is that the performance 
problem only occurs if the document is also being 
validated against an XSD (resource.xsd) during parsing.
Please see the two screenshots:
no_xsd_validation.png            172 ms
xsd_validation.png                      5016 ms

I assume that some performance optimizations, which have been done for the 
non-validating parser
are not yet used by the XSD-parser (especially not used by 
org.apache.xerces.impl.xs.XMLSchemaValidator)

I hope this information is helpful.


> Performance problem in org.apache.xerces.dom.CharacterDataImpl appendData
> -------------------------------------------------------------------------
>
>                 Key: XERCESJ-1382
>                 URL: https://issues.apache.org/jira/browse/XERCESJ-1382
>             Project: Xerces2-J
>          Issue Type: Bug
>          Components: DOM (Level 3 Core)
>    Affects Versions: 2.6.0, 2.9.1
>         Environment: Windows XP SP2; JRE 1.6.0_13; Xerces2 Java Parser 2.9.1 
> Release (Xerces-J-bin.2.9.1.zip)
>            Reporter: Bene
>            Priority: Critical
>         Attachments: xerces_performance_problem.png, 
> xerces_performance_problem.xml
>
>
> It takes too long to parse a large XML Document, if the document contains 
> CDATA sections, which contain embedded XML.
> The problem initially occured with Xerces 2.6.0, where it took about 30 
> seconds !!! to parse an XML document with about 250 KB.
> So we upgraded to Xerces 2.9.1, which improves parse time to about 5 seconds. 
> Unfortunately this is still much too slow!
> I tried to find similar bug reports and there are many:
> XERCESJ-102
> XERCESJ-1268
> XALANJ-2398
> Unfortunately the issue is still not fixed, so I decided to create this 
> report.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to