This is an automated email from the ASF dual-hosted git repository.

slawrence pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/daffodil.git


The following commit(s) were added to refs/heads/main by this push:
     new 5dea1ca  Do no store the XMLSchemaFactory in a DaffodilXMLLoader
5dea1ca is described below

commit 5dea1caffebd89b91b538218dc751c7b62c63773
Author: Steve Lawrence <[email protected]>
AuthorDate: Fri Oct 15 15:13:13 2021 -0400

    Do no store the XMLSchemaFactory in a DaffodilXMLLoader
    
    The XMLSchemaFactory keeps references to lots of large objects when it
    loads XML. By making this a val in the DaffodilXMLLoader, anything that
    uses and stores a DaffodilXMLLoader will have a large retained size that
    can never be garbage collected. And each DFDLSchemaFile has a
    DaffodilXMLLoader for that file, so we end up holding on to a large
    amount of memory that cannot be freed, especialy when there are lots of
    schema files. This has been shown to create up to a 7x increase in
    memory usage during schema compliation, which can easily lead to
    OutOfMemoryExceptions.
    
    To resolve this, this changes the schemaFactory member from a val to a
    def. This allows each created schemaFactory to be garbage collected
    after it is used to load and verify a DFDL schema file, which
    drastically reduces overall memory usage when there are lots of
    individual schema files. Each DaffodilXMLLoader only uses this
    schemaFactory for validation, and schemas are only validated once, so
    there are no benefits to storing each factory--it is only ever used
    once.
    
    DAFFODIL-2572
---
 .../scala/org/apache/daffodil/xml/DaffodilXMLLoader.scala   | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git 
a/daffodil-lib/src/main/scala/org/apache/daffodil/xml/DaffodilXMLLoader.scala 
b/daffodil-lib/src/main/scala/org/apache/daffodil/xml/DaffodilXMLLoader.scala
index b4dbd27..65050a7 100644
--- 
a/daffodil-lib/src/main/scala/org/apache/daffodil/xml/DaffodilXMLLoader.scala
+++ 
b/daffodil-lib/src/main/scala/org/apache/daffodil/xml/DaffodilXMLLoader.scala
@@ -41,6 +41,7 @@ import scala.xml.SAXParseException
 import scala.xml.SAXParser
 import scala.xml.parsing.NoBindingFactoryAdapter
 
+import org.apache.xerces.jaxp.validation.XMLSchemaFactory
 import org.apache.xerces.xni.parser.XMLInputSource
 
 import org.apache.xml.resolver.Catalog
@@ -426,9 +427,17 @@ class DaffodilXMLLoader(val errorHandler: 
org.xml.sax.ErrorHandler)
    * right into Xerces. This is accomplished by
    * using the below SchemaFactory and SchemaFactory.newSchema calls.  The
    * newSchema call is what forces schema validation to take place.
+   *
+   * Note that this is intentionally a def rather than a val because an
+   * XMLSchemaFactory actually holds on to many large objects used when loading
+   * a schema, which can be massive memory leak. By making it a def, this
+   * factory and all its references can be garbage collected after a schema is
+   * loaded. And the schemaFactory is only used when validating a DFDL schema,
+   * so we should only create one factory per DaffodilXMLLoader, so saving it
+   * as val does not gain anything.
    */
-  private lazy val schemaFactory = {
-    val sf = new org.apache.xerces.jaxp.validation.XMLSchemaFactory()
+  private def schemaFactory: XMLSchemaFactory = {
+    val sf = new XMLSchemaFactory()
     sf.setResourceResolver(resolver)
     //
     // despite setting the errorHandler here, the validator

Reply via email to