[jira] [Commented] (XERCESJ-1745) Save/Restore serialized "compiled" parser-validator

Mike Beckerle (Jira) Wed, 25 May 2022 11:28:05 -0700


    [ 
https://issues.apache.org/jira/browse/XERCESJ-1745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17542182#comment-17542182
 ]


Mike Beckerle commented on XERCESJ-1745:
----------------------------------------

Thank you for the link. I looked into the faq-grammars page and related example 
source code. 

Alas, none of these data structures are serializable, so using grammar pools 
and preloaded grammers seems to acheve "compile once at start-up" behavior 
which I believe we're already getting via the factory patterns that support 
providing the schema once and then creating parsers from that factory. 

With respect to the serialization form. I wanted to clarify our need is simpler 
than what many people would think is needed for serializability. We do not need 
any compatibility of the serializations across Xerces versions/builds. 

For our needs the saved serialized representation can be completely tied to 
exactly the same version/build of Xerces that created it. Reloading a 
serialization created from a different version/build of Xerces can just be a 
fatal error. 

This actually removes the need for a great deal of the maintenance complexity 
associated with serializability. 

> Save/Restore serialized "compiled" parser-validator
> ---------------------------------------------------
>
>                 Key: XERCESJ-1745
>                 URL: https://issues.apache.org/jira/browse/XERCESJ-1745
>             Project: Xerces2-J
>          Issue Type: New Feature
>          Components: Other, Serialization
>    Affects Versions: 2.12.2
>            Reporter: Mike Beckerle
>            Priority: Major
>
> Feature requested by Apache Daffodil project PMC.
>  
> We use Xerces-J to validate XML files. 
>  
> The schemas of these files are huge. Think 300+ fairly large XSD files all 
> included/imported together. Megabytes of XSD. 
>  
> In order to validate+parse faster, we know Xerces does something akin to 
> "compiling" the XSD into lower-level data structures. 
>  
> The requested feature is to make this "compilation" step of the large XSD 
> schema explicit, and then be able to serialize the resulting java object to a 
> file. Subsequently one can reload this pre-compiled object so as not to face 
> this compiling overhead at startup time.
>  
> An API call to explicitly force this compilation step, so that the time taken 
> to do it can be measured, is an important part of this feature. This 
> compilation can also occur automatically on first use, without requiring an 
> explicit "compile it now" API call, and that would retain perfect 
> compatiblity with Xerces APIs today. 
>  
>  But for very large XSD, it is of value to be able to time this compile 
> activity, so a  new API method to cause Xerces to do this compilation step 
> explicitly (and which is separate from the serialization of the resulting 
> object) is of value. 
>  
> In summary I think numerous internal data structures within Xerces would have 
> to be made Serializable, and a compileParser(), 
> saveParser(java.io.OutputStream) and restoreParser(java.io.InputStream) or 
> something along those lines are needed. 
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: j-dev-unsubscr...@xerces.apache.org
For additional commands, e-mail: j-dev-h...@xerces.apache.org

[jira] [Commented] (XERCESJ-1745) Save/Restore serialized "compiled" parser-validator

Reply via email to