This is an automated email from the ASF dual-hosted git repository.
slawrence pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/daffodil.git
The following commit(s) were added to refs/heads/main by this push:
new 5dea1ca Do no store the XMLSchemaFactory in a DaffodilXMLLoader
5dea1ca is described below
commit 5dea1caffebd89b91b538218dc751c7b62c63773
Author: Steve Lawrence <[email protected]>
AuthorDate: Fri Oct 15 15:13:13 2021 -0400
Do no store the XMLSchemaFactory in a DaffodilXMLLoader
The XMLSchemaFactory keeps references to lots of large objects when it
loads XML. By making this a val in the DaffodilXMLLoader, anything that
uses and stores a DaffodilXMLLoader will have a large retained size that
can never be garbage collected. And each DFDLSchemaFile has a
DaffodilXMLLoader for that file, so we end up holding on to a large
amount of memory that cannot be freed, especialy when there are lots of
schema files. This has been shown to create up to a 7x increase in
memory usage during schema compliation, which can easily lead to
OutOfMemoryExceptions.
To resolve this, this changes the schemaFactory member from a val to a
def. This allows each created schemaFactory to be garbage collected
after it is used to load and verify a DFDL schema file, which
drastically reduces overall memory usage when there are lots of
individual schema files. Each DaffodilXMLLoader only uses this
schemaFactory for validation, and schemas are only validated once, so
there are no benefits to storing each factory--it is only ever used
once.
DAFFODIL-2572
---
.../scala/org/apache/daffodil/xml/DaffodilXMLLoader.scala | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)
diff --git
a/daffodil-lib/src/main/scala/org/apache/daffodil/xml/DaffodilXMLLoader.scala
b/daffodil-lib/src/main/scala/org/apache/daffodil/xml/DaffodilXMLLoader.scala
index b4dbd27..65050a7 100644
---
a/daffodil-lib/src/main/scala/org/apache/daffodil/xml/DaffodilXMLLoader.scala
+++
b/daffodil-lib/src/main/scala/org/apache/daffodil/xml/DaffodilXMLLoader.scala
@@ -41,6 +41,7 @@ import scala.xml.SAXParseException
import scala.xml.SAXParser
import scala.xml.parsing.NoBindingFactoryAdapter
+import org.apache.xerces.jaxp.validation.XMLSchemaFactory
import org.apache.xerces.xni.parser.XMLInputSource
import org.apache.xml.resolver.Catalog
@@ -426,9 +427,17 @@ class DaffodilXMLLoader(val errorHandler:
org.xml.sax.ErrorHandler)
* right into Xerces. This is accomplished by
* using the below SchemaFactory and SchemaFactory.newSchema calls. The
* newSchema call is what forces schema validation to take place.
+ *
+ * Note that this is intentionally a def rather than a val because an
+ * XMLSchemaFactory actually holds on to many large objects used when loading
+ * a schema, which can be massive memory leak. By making it a def, this
+ * factory and all its references can be garbage collected after a schema is
+ * loaded. And the schemaFactory is only used when validating a DFDL schema,
+ * so we should only create one factory per DaffodilXMLLoader, so saving it
+ * as val does not gain anything.
*/
- private lazy val schemaFactory = {
- val sf = new org.apache.xerces.jaxp.validation.XMLSchemaFactory()
+ private def schemaFactory: XMLSchemaFactory = {
+ val sf = new XMLSchemaFactory()
sf.setResourceResolver(resolver)
//
// despite setting the errorHandler here, the validator