[ https://issues.apache.org/jira/browse/XERCESJ-1276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16276496#comment-16276496 ]
Antti S. Lankila commented on XERCESJ-1276: ------------------------------------------- Does a me-too help anything about finally resolving this bug report? I discovered this same hot spot during slow XML validation using VisualVM on JDK 1.9 using the pure-java xerces parser (rather than the native C++ xerces shipping in JDK, because all I could see that some native code was using all the time). In my case, the pure-java Xerces was only slightly slower than the C++ version, e.g. validating XML file took 1m50s using the native code, 2m5s using the pure-Java version. However, with the Java code, I could replaces xercesImpl with the one supplied on the 01/Feb/14 version, which dropped runtime from 2m5s to just 5s. A significant improvement! Incidentally libxml2 based validation was about as slow as Xerces, so I imagine this library is also using linear scan for identity constraint validation. At first, I thought it's just that schema validation is slow. However, I learnt that .Net's XML validator was quite fast, doing the same work in about 10s, which told me that there is a quality of implementation issue in Xerces. > Improve performance of XML Schema Identity-constraint validation --- > XMLSchemaValidator$ValueStoreBase.contains() is painfully slow. > ------------------------------------------------------------------------------------------------------------------------------------ > > Key: XERCESJ-1276 > URL: https://issues.apache.org/jira/browse/XERCESJ-1276 > Project: Xerces2-J > Issue Type: Bug > Components: XML Schema 1.0 Structures > Affects Versions: 2.6.2, 2.9.1 > Reporter: Kenny MacLeod > Labels: gsoc, gsoc2013, mentor > Attachments: XMLSchemaValidator.java, > Xerces-J-src.2.11.0_patch1276.txt, xerces-binaries-patched-over-2.11.0.zip, > xerces-value-store.txt > > > Under certain conditions, the contains() method in > XMLSchemaValidator$ValueStoreBase can cripple the performance of parsing and > validation. > I'm not sure what those conditions are, but as a guideline figure I was using > JAXB2 to deserialize a 22meg XML file. Without schema validation, it took 5 > seconds. With validation, it took over 3 minutes (JDK 1.5.0_10 on win32). My > profiler pointed the finger squarely at that method XMLSchemaValidator. > Suspicions were aroused further when seeing this comment in the source: > public boolean contains() { > // REVISIT: we can improve performance by using hash codes, > instead of > // traversing global vector that could be quite large. > This is present in Xerces 2.6.2 contained with JDK1.5.0_10, and also in the > source for 2.9.1. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: j-dev-unsubscr...@xerces.apache.org For additional commands, e-mail: j-dev-h...@xerces.apache.org