[ https://issues.apache.org/jira/browse/XERCESJ-970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chris Simmons updated XERCESJ-970: ---------------------------------- Attachment: comments.txt Actually the buffer doesn't double in size when you append a char array, it grows linearly. This char array method is what's used when parsing comments. The attached patch includes a test harness and fix. On my box without the fix it takes over 8 seconds to parse a large comment, with the fix 1/8 second. We have a real life example where a client commented out 40k lines of XML with a similar resulting performance hit. > Large comments are extremely slow to parse > ------------------------------------------ > > Key: XERCESJ-970 > URL: https://issues.apache.org/jira/browse/XERCESJ-970 > Project: Xerces2-J > Issue Type: Bug > Components: XNI > Affects Versions: 2.2.0, 2.2.1, 2.3.0, 2.4.0, 2.5.0, 2.6.0, 2.6.1, 2.6.2 > Environment: Windows XP running Java 1.4.2 > Reporter: Sean Griffin > Priority: Minor > Attachments: comments.txt > > > Very large comments drastically increase the parsing time for both SAX and > DOM implementations. Running the sax.Counter and dom.Counter samples with a > 410KB file where the entire thing is uncommented results in parse times in > the 100ms to 300ms range. However, if I comment out 95% of the file and run > the same samples the parse times jump to between 40 and 50 seconds. I ran > the same samples using the Aelfred parser shipped with Saxon 7.9 and, while > the file with the large comment was slower than without the comment, it > jumped by only 100ms or so. > I briefly compared the code between the two parsers, and they don't look > significantly different when it comes to handling comments. The only main > difference I noticed was around low/high byte character checks. I suspect it > is an inefficiency in the XMLStringBuffer class, but I'm not seeing anything. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: j-dev-unsubscr...@xerces.apache.org For additional commands, e-mail: j-dev-h...@xerces.apache.org