[ 
https://issues.apache.org/jira/browse/XERCESJ-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17425287#comment-17425287
 ] 

Christopher Sahnwaldt commented on XERCESJ-1227:
------------------------------------------------

For what it's worth, I finally uploaded the sparse array version of CMStateSet 
I wrote 15 years ago to GitHub: 
[https://github.com/jcsahnwaldt/xerces-sparse-CMStateSet/blob/master/src/org/apache/xerces/impl/dtd/models/CMStateSet.java]

I copied the XML and XSD test files provided by [~mukul_gandhi] above. On my 
machine, validating test.xml with maxOccurs.xsd takes about 7.5 to 8 seconds 
with the original Xerces code, and 1.7 to 2 seconds with the sparse CMStateSet. 
Still much slower than the {{assert count(*) le 5000}} solution, but maybe it 
helps.

Disclaimer: I haven't tested the CMStateSet code thoroughly. It seems to work 
well, but it may also be slower than the original version in other use cases.

> Poor performance / OutOfMemoryError for sequences, choices and nested with 
> large minOccurs/maxOccurs
> ----------------------------------------------------------------------------------------------------
>
>                 Key: XERCESJ-1227
>                 URL: https://issues.apache.org/jira/browse/XERCESJ-1227
>             Project: Xerces2-J
>          Issue Type: Bug
>          Components: XML Schema 1.0 Structures, XML Schema 1.1 Structures
>    Affects Versions: 2.9.0
>            Reporter: Michael Glavassevich
>            Priority: Minor
>              Labels: gsoc, gsoc2014, mentor
>
> We now handle large minOccurs/maxOccurs on element/wildcard particles more 
> gracefully by creating a compact representation in the DFA and using counters 
> to check the occurence constraints, however we will still fully expand the 
> content model for minOccurs/maxOccurs on sequences and choices which could 
> still lead to an OutOfMemoryError or very poor performance (i.e. could still 
> take several minutes to build the DFA).  Sequences, choices and nested 
> minOccurs/maxOccurs are somewhat tricker to handle. We would need a more 
> general solution than the one implemented for elements and wildcards to 
> improve those.
> With the introduction of XML Schema 1.1 support we would also need to 
> consider how to improve this for the enhanced xs:all model groups.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: j-dev-unsubscr...@xerces.apache.org
For additional commands, e-mail: j-dev-h...@xerces.apache.org

Reply via email to