Hi all,
    The below mentioned findings were made, using Xerces-J 2.12.1 XML
Schema 1.1 distribution (available at http://xerces.apache.org/mirrors.cgi).
I've used JRE 1.8 to run my XML Schema validations.

During the past, many XML Schema 1.1 users on Xerces-J forums have
expressed concerns that, xs:assert requires quite a lot of memory and run
time during XML Schema validations when used with Xerces-J. I thought that,
I should analyze this aspect a little bit deeply and share my findings with
list members here. For this, I ran various kinds of XML Schema 1.1
validations involving xs:assert (and some without xs:assert).

If you're interested in this topic, I would request you to please download
the XML and XSD document samples I've uploaded at
https://drive.google.com/drive/folders/13lYOY-ECK8_AxbBLq9EcN56dK63Y-KCN?usp=sharing
[1] (the downloadable zip archive is about 3.9 MB, when you'll do 'download
all').

The XML documents that I've posted, have data with the following pattern,

<?xml version="1.0" encoding="UTF-8"?>
<result>
   <AnalyticsArrangementKey id="5">8833857916</AnalyticsArrangementKey>
   <AnalyticsArrangementKey id="5">8833857923</AnalyticsArrangementKey>
   <AnalyticsArrangementKey id="5">8833857947</AnalyticsArrangementKey>
   <AnalyticsArrangementKey id="5">8833857949</AnalyticsArrangementKey>
   <AnalyticsArrangementKey id="5">8833858104</AnalyticsArrangementKey>
   ... more sibling 'AnalyticsArrangementKey' elements
</result>

The file input_large.xml has size of about 65 MB and has 979224 sibling
'AnalyticsArrangementKey' elements (all these elements are very shallow).
The file input_small.xml obeys the same schemas, but is very small (it has
10 'AnalyticsArrangementKey' sibling elements, all of them being very
shallow).

To start with, I'll mentioned that, the file input_small.xml validates very
quickly with the XSD documents that I've posted, for all the scenarios that
I've analyzed. Therefore, there are no problems to worry about for this
case.

I do XSD validations in following two ways,

1) Using the Xerces-J jaxp.SourceValidator sample.

2) Using the java file XS11Validator.java, that I've provided on the link
[1] mentioned above.

I find using XS11Validator.java, to be better performant than the sample
jaxp.SourceValidator, and I'll share few run time details about these,
below.

Below are my findings, when using an XML input document input_large.xml for
XML Schema validations,

1) Using test_1.xsd (this is XSD 1.0 kind of schema). Not using JVM options
-Xms & -Xmx. In this case, default value for -Xmx would be used (which I
think is 256 MB). With jaxp.SourceValidator, the time taken to complete
validation is 24 minutes.

2) Using assert_1.xsd. Not using JVM options -Xms & -Xmx. With
jaxp.SourceValidator, the time taken to complete validation is 23 minutes.

3) Using test_1.xsd. Not using JVM options -Xms & -Xmx. With
XS11Validator.java, the time taken to complete validation is 10 minutes.

4) Using assert_1.xsd. Not using JVM options -Xms & -Xmx. With
XS11Validator.java, the time taken to complete validation is 17 minutes.

5) Using test_1.xsd. Using JVM options -Xms1024m and -Xmx4096m (that I can
comfortably provide on my workstation). With jaxp.SourceValidator, the time
taken to complete validation is 21 minutes.

6) Using assert_1.xsd. Using JVM options -Xms1024m and -Xmx4096m. With
XS11Validator.java, the time taken to complete validation is 12 minutes.

7) Using assert_2.xsd. Using JVM options -Xms1024m and -Xmx4096m. With
XS11Validator.java, the time taken to complete validation is 4 minutes.

8) Using assert_3.xsd. Not using JVM options -Xms & -Xmx. With
XS11Validator.java, the time taken to complete validation is 6 minutes.

9) Using assert_3.xsd. Using JVM options -Xms1024m and -Xmx4096m. With
XS11Validator.java, the time taken to complete validation is 6 minutes.

Following are my significant observations, from the above mentioned (9)
tests,

a) Using an XML Schema validator like XS11Validator.java instead of the
sample jaxp.SourceValidator, should be a preferred approach for production
like deployments.

b) Use JVM options -Xms & -Xmx whenever possible for XML Schema validation,
when validating large XML documents.

c) Compare the results (1) and (2) above. xs:assert doesn't take more time
as compared to corresponding XSD 1.0 schema.

d) Compare the results (1) and (3), and (2) and (4) above. Using
XS11Validator.java improves run-time as compared to jaxp.SourceValidator.

e) Think about result (7). The validation outcome is valid in this case.
Valid outcomes take less time to complete, as compared to invalid outcomes.
I think, there's is overhead of printing large number of results to the
console.

f) Think about results (8) and (9). In this case, the xs:assert works on a
large XDM tree, as compared to other xs:assert cases as mentioned in this
mail.

Finally, I think that, all xs:assert cases except assert_3.xsd (in context
of this mail), will terminate to completion no matter how large number of
iterations it shall be, and the memory requirements do not grow after a
certain maximum.

I hope that, this mail has been useful.



-- 
Regards,
Mukul Gandhi

Reply via email to