[ 
https://issues.apache.org/jira/browse/XERCESJ-1276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13108416#comment-13108416
 ] 

Natan Cox edited comment on XERCESJ-1276 at 9/20/11 7:26 AM:
-------------------------------------------------------------

I too had the same problem. The reason is that for comparison it loops over a 
Vector.

I created a fix that improves performance from 5000s to 27s on my laptop for a 
big file. 
I have an xs:key constraint with 500.000 values.

Solution
* I added a fValueSet to do faster lookups. 
* I filled and cleared everywhere where fValues is filled or cleared. 
* And I added a "isContainsCandidate" check to rule out any candidates who do 
not have all values in fValueSet. 
* If we cannot rule out (which should be the exception) the main algorithm 
kicks in again.

Note: I took 2.11.0 code as base!
And if formatting is not to Xerces-J standard: I sincerely apologize.

      was (Author: natan.cox):
    I too had the same problem. The reason is that for comparison it loops over 
a Vector.

I created a fix that improves performance from 5000s to 27s on my laptop for a 
big file. 
I have xs:key constraint with 500.000 values.

Solution
* I added a fValueSet to do faster lookups. 
* I filled and cleared everywhere where fValues is filled or cleared. 
* And I added a "isContainsCandidate" check to rule out any candidates who do 
not have all values in fValueSet. 
* If we cannot rule out (which should be the exception) the main algorithm 
kicks in again.

Note: I took 2.11.0 code as base!
And if formatting is not to Xerces-J standard: I sincerely apologize.
  
> XMLSchemaValidator$ValueStoreBase.contains() is painfully slow
> --------------------------------------------------------------
>
>                 Key: XERCESJ-1276
>                 URL: https://issues.apache.org/jira/browse/XERCESJ-1276
>             Project: Xerces2-J
>          Issue Type: Bug
>          Components: XML Schema 1.0 Structures
>    Affects Versions: 2.6.2, 2.9.1
>            Reporter: Kenny MacLeod
>         Attachments: XMLSchemaValidator.java
>
>
> Under certain conditions, the contains() method in 
> XMLSchemaValidator$ValueStoreBase can cripple the performance of parsing and 
> validation.
> I'm not sure what those conditions are, but as a guideline figure I was using 
> JAXB2 to deserialize a 22meg XML file.  Without schema validation, it took 5 
> seconds.  With validation, it took over 3 minutes (JDK 1.5.0_10 on win32). My 
> profiler pointed the finger squarely at that method XMLSchemaValidator.
> Suspicions were aroused further when seeing this comment in the source:
> public boolean contains() {
>             // REVISIT: we can improve performance by using hash codes, 
> instead of
>             // traversing global vector that could be quite large.
> This is present in Xerces 2.6.2 contained with JDK1.5.0_10, and also in the 
> source for 2.9.1.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: j-dev-unsubscr...@xerces.apache.org
For additional commands, e-mail: j-dev-h...@xerces.apache.org

Reply via email to