[ https://issues.apache.org/jira/browse/XERCESJ-1276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13108416#comment-13108416 ]
Natan Cox edited comment on XERCESJ-1276 at 9/20/11 7:26 AM: ------------------------------------------------------------- I too had the same problem. The reason is that for comparison it loops over a Vector. I created a fix that improves performance from 5000s to 27s on my laptop for a big file. I have an xs:key constraint with 500.000 values. Solution * I added a fValueSet to do faster lookups. * I filled and cleared everywhere where fValues is filled or cleared. * And I added a "isContainsCandidate" check to rule out any candidates who do not have all values in fValueSet. * If we cannot rule out (which should be the exception) the main algorithm kicks in again. Note: I took 2.11.0 code as base! And if formatting is not to Xerces-J standard: I sincerely apologize. was (Author: natan.cox): I too had the same problem. The reason is that for comparison it loops over a Vector. I created a fix that improves performance from 5000s to 27s on my laptop for a big file. I have xs:key constraint with 500.000 values. Solution * I added a fValueSet to do faster lookups. * I filled and cleared everywhere where fValues is filled or cleared. * And I added a "isContainsCandidate" check to rule out any candidates who do not have all values in fValueSet. * If we cannot rule out (which should be the exception) the main algorithm kicks in again. Note: I took 2.11.0 code as base! And if formatting is not to Xerces-J standard: I sincerely apologize. > XMLSchemaValidator$ValueStoreBase.contains() is painfully slow > -------------------------------------------------------------- > > Key: XERCESJ-1276 > URL: https://issues.apache.org/jira/browse/XERCESJ-1276 > Project: Xerces2-J > Issue Type: Bug > Components: XML Schema 1.0 Structures > Affects Versions: 2.6.2, 2.9.1 > Reporter: Kenny MacLeod > Attachments: XMLSchemaValidator.java > > > Under certain conditions, the contains() method in > XMLSchemaValidator$ValueStoreBase can cripple the performance of parsing and > validation. > I'm not sure what those conditions are, but as a guideline figure I was using > JAXB2 to deserialize a 22meg XML file. Without schema validation, it took 5 > seconds. With validation, it took over 3 minutes (JDK 1.5.0_10 on win32). My > profiler pointed the finger squarely at that method XMLSchemaValidator. > Suspicions were aroused further when seeing this comment in the source: > public boolean contains() { > // REVISIT: we can improve performance by using hash codes, > instead of > // traversing global vector that could be quite large. > This is present in Xerces 2.6.2 contained with JDK1.5.0_10, and also in the > source for 2.9.1. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: j-dev-unsubscr...@xerces.apache.org For additional commands, e-mail: j-dev-h...@xerces.apache.org