This is an automated email from the ASF dual-hosted git repository.
slawrence pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/daffodil-vscode.git
The following commit(s) were added to refs/heads/main by this push:
new 848cfce Enable reproducible JAXB generated TDML code
848cfce is described below
commit 848cfcecf9d763316aab600a476e63d645b54c39
Author: Steve Lawrence <[email protected]>
AuthorDate: Mon Feb 10 10:04:35 2025 -0500
Enable reproducible JAXB generated TDML code
The resource generator in build.sbt currently extracts all schemas from
the daffodil-lib jar and uses xjc to generate JAXB code that allows
reading and writing TDML files. Note that this includes the large and
complex XML schema for DFDL because that schema is referenced and used
by the TDML schema to support embedded DFDL schemas and configurations.
However, because the XML Schema for DFDL is so large and complex, it
requires special xjc bindings, and also generates java that is not
deterministic, making reproducible builds difficult.
Fortunately, VS Code doesn't really care about embedded schemas in TDML
files--it really only uses JAXB for modifying TDML test cases. So it
does not need the many JAXB generated classes that cause the above
issues related to embedded schemas.
To avoid generating these classes, after the TDML schema is extracted,
an XSLT is run to replace the embedded DFDL schema and configuration
complex types with "any" elements. This prevents xjc from generating
special types for embedded schemas and configurations, instead
generating code that just treats them as generic XML Objects. This is
sufficient for VS Code uses.
Additionally, this allows us to only extract tdml.xsd from daffodil-lib
since that is the only schema we need now. We can also remove the
bindings.xjb file needed only for the no longer used XML Schema for
DFDL.
Also changes how we get the TDML namespace for the DocumentPart element,
something with this changed how namespaces work. It looks like JAXB no
longers maintain namespace declarations.
This also moves the resource generator into the xjc/sources task. The
two tasks actually ran in parallel and so could lead to a situation
where the resource generator hadn't completed when xjc/sources was
accessed and led to inconsistent results. The XSD files are really
resources anyways, so this is probably the more accurate way to handle
this. Additinoally, caching is added so we aren't constant extrating
generating JAXB files--instead we only generate if daffodil-lib changes,
which should be rare.
Closes #1142
---
build.sbt | 112 +++++++++++++++------
debugger/src/main/resources/bindings.xjb | 84 ----------------
.../main/scala/org.apache.daffodil.tdml/TDML.scala | 8 +-
.../scala/org.apache.daffodil.tdml/TDMLSuite.scala | 11 +-
4 files changed, 84 insertions(+), 131 deletions(-)
diff --git a/build.sbt b/build.sbt
index dc1ad5f..5188b69 100644
--- a/build.sbt
+++ b/build.sbt
@@ -18,6 +18,12 @@
import com.github.retronym.sbtxjc.SbtXjcPlugin
import Classpaths.managedJars
+import java.io.ByteArrayOutputStream
+import java.io.ByteArrayInputStream
+import javax.xml.transform.TransformerFactory
+import javax.xml.transform.stream.StreamResult
+import javax.xml.transform.stream.StreamSource
+
//Fixes build issues on java11+
run / fork := true
Global / lintUnusedKeysOnLoad := false
@@ -136,48 +142,88 @@ lazy val xjcSettings =
xjcCommandLine += "-p",
xjcCommandLine += "org.apache.daffodil.tdml",
xjcCommandLine += "-no-header",
- xjcBindings += "debugger/src/main/resources/bindings.xjb",
xjcJvmOpts ++= extraJvmOptions,
xjcLibs := Seq(
"com.sun.xml.bind" % "jaxb-impl" % "2.3.9",
"org.glassfish.jaxb" % "jaxb-xjc" % "2.3.9",
"javax.activation" % "activation" % "1.1.1"
),
- Compile / xjc / sources := Seq(
- file(
- Seq(resourceManaged.value, "xsd")
- .mkString(java.io.File.separator)
- )
- ),
Compile / doc / sources := Seq(file("")),
- Compile / resourceGenerators += Def.task {
- // This is going to be the directory that contains the DFDL schema
files. We extract the files from the jar to this directory,
- // but the directory structure is maintained. The directory structure
will be flattened so that the DFDL schema files are
- // directly contained by this directory.
- //
- // Note that baseDirectory is ${workspaceDir}/server/sbtXjc/
- lazy val xsdDir = Seq(resourceManaged.value,
"xsd").mkString(java.io.File.separator)
-
- // Get the daffodil-lib jar from the dependencies.
- val jarsToExtract: Seq[File] =
- managedJars(Test, Set[String]("jar"), update.value) map { _.data }
filter { _.getName.contains("daffodil-lib") }
-
- // Extract the DFDL schema files from the daffodil-lib jar. We ignore
the XMLSchema.xsd file because it contains a DTD, and
- // the JAXB process is not happy with DTDs without a particular
setting being set. Consequently, this file is not strictly
- // necessary for the generation of Java classes.
- jarsToExtract foreach { jar =>
+ Compile / xjc / sources := {
+ val stream = (Compile / xjc / streams).value
+
+ // We are going to extract XSD files from Daffodil jars needed by xjc to
generate JAXB
+ // classes
+ lazy val xjcSourceDir = crossTarget.value / "xjc"
+
+ // Get the daffodil-lib jar from the dependencies, this is the only jar
we need to extract
+ // files from
+ val daffodilLibJar = managedJars(Test, Set[String]("jar"), update.value)
+ .map(_.data)
+ .find(_.getName.contains("daffodil-lib"))
+ .get
+
+ // cache the results of jar extraction so we don't keep extracting files
(which would
+ // trigger xjc again) everytime we compile.
+ val cachedFun = FileFunction.cached(stream.cacheDirectory /
"xjcSources") { _ =>
+ // Extract the DFDL TDML schema file used for JAXB generation.
IO.unzip(
- jar,
- new File(xsdDir),
- NameFilter.fnToNameFilter(f =>
- !f.endsWith("XMLSchema.xsd") && f.endsWith(".xsd") && f.startsWith(
- Seq("org", "apache", "daffodil", "xsd").mkString("/")
- )
- )
+ daffodilLibJar,
+ xjcSourceDir,
+ NameFilter.fnToNameFilter(f => f ==
"org/apache/daffodil/xsd/tdml.xsd")
)
+
+ // The TDML schema supports embedded DFDL schemas and configurations,
and it references
+ // the schema for DFDL schema when doing so. This DFDL schema is
pretty complex, which
+ // requires extra complexity like the need for an xjc bindings file
and also hits edge
+ // cases where xjc generates non-deterministic java code, leading to
non-reproducible
+ // builds.
+ //
+ // Fortunately, VS Code does not need embedded DFDL schemas or config
converted to
+ // specific objects, so we use XSLT to replace those parts of the
schema with <any>
+ // elements. This allows JAXB to read TDML files containing embedded
DFDL schemas, but
+ // they are just converted to generic XML Objects and avoids those
complex edge cases.
+ val tdmlFile = xjcSourceDir / "org" / "apache" / "daffodil" / "xsd" /
"tdml.xsd"
+ val tdmlXslt = """
+ |<xsl:stylesheet
+ | xmlns:xs="http://www.w3.org/2001/XMLSchema"
+ | xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
+ | <xsl:template match="@*|node()">
+ | <xsl:copy>
+ | <xsl:apply-templates select="@*|node()"/>
+ | </xsl:copy>
+ | </xsl:template>
+ | <xsl:template match="xs:complexType[@name='defineSchemaType']">
+ | <xs:complexType name='defineSchemaType'>
+ | <xs:sequence>
+ | <xs:any />
+ | </xs:sequence>
+ | <xs:anyAttribute />
+ | </xs:complexType>
+ | </xsl:template>
+ | <xsl:template match="xs:complexType[@name='defineConfigType']">
+ | <xs:complexType name='defineConfigType'>
+ | <xs:sequence>
+ | <xs:any />
+ | </xs:sequence>
+ | <xs:anyAttribute />
+ | </xs:complexType>
+ | </xsl:template>
+ |</xsl:stylesheet>""".stripMargin
+ val xslt = new StreamSource(new
ByteArrayInputStream(tdmlXslt.getBytes))
+ val input = new StreamSource(tdmlFile)
+ val output = new ByteArrayOutputStream()
+ val factory = TransformerFactory.newInstance()
+ val transformer = factory.newTransformer(xslt);
+ transformer.transform(input, new StreamResult(output))
+ IO.write(tdmlFile, output.toByteArray())
+
+ val xsdFiles = (xjcSourceDir ** "*.xsd").get
+ xsdFiles.toSet
}
- // Get File objects for each DFDL schema file that was extracted.
- new File(Seq(xsdDir, "org", "apache", "daffodil",
"xsd").mkString("/")).listFiles().toSeq
- }.taskValue
+ // only invalidate the cache if the daffodil lib jar changed and so
there could be a
+ // chance the tdml.xsd file has been updated
+ cachedFun(Set(daffodilLibJar)).toSeq
+ }
)
diff --git a/debugger/src/main/resources/bindings.xjb
b/debugger/src/main/resources/bindings.xjb
deleted file mode 100644
index 43d1a84..0000000
--- a/debugger/src/main/resources/bindings.xjb
+++ /dev/null
@@ -1,84 +0,0 @@
-<!--
- Licensed to the Apache Software Foundation (ASF) under one or more
- contributor license agreements. See the NOTICE file distributed with
- this work for additional information regarding copyright ownership.
- The ASF licenses this file to You under the Apache License, Version 2.0
- (the "License"); you may not use this file except in compliance with
- the License. You may obtain a copy of the License at
-
- http://www.apache.org/licenses/LICENSE-2.0
-
- Unless required by applicable law or agreed to in writing, software
- distributed under the License is distributed on an "AS IS" BASIS,
- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- See the License for the specific language governing permissions and
- limitations under the License.
--->
-
-<jxb:bindings version="2.1"
- xmlns:tdml="http://www.ibm.com/xmlns/dfdl/testData"
- xmlns:jxb="http://java.sun.com/xml/ns/jaxb"
- xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/"
- xmlns:xs="http://www.w3.org/2001/XMLSchema">
- <jxb:globalBindings>
- <jxb:serializable uid="1" />
- </jxb:globalBindings>
- <jxb:bindings
schemaLocation="../../../target/scala-2.12/resource_managed/xsd/org/apache/daffodil/xsd/DFDL_part3_model.xsd"
version="1.0">
- <jxb:bindings
node="//xs:attributeGroup[@name='SetVariableAG']/xs:attribute[@name='value']">
- <jxb:property name="ValueAttribute"/>
- </jxb:bindings>
- <jxb:bindings node="//xs:element[@name='simpleType']">
- <jxb:class name="simpleTypeClass"/>
- </jxb:bindings>
- <jxb:bindings node="//xs:element[@name='group']">
- <jxb:class name="groupClass"/>
- </jxb:bindings>
- <jxb:bindings node="//xs:element[@name='choice']">
- <jxb:class name="choiceClass"/>
- </jxb:bindings>
- <jxb:bindings node="//xs:element[@name='sequence']">
- <jxb:class name="sequenceClass"/>
- </jxb:bindings>
- <jxb:bindings node="//xs:element[@name='element']">
- <jxb:class name="elementClass"/>
- </jxb:bindings>
- </jxb:bindings>
- <jxb:bindings
schemaLocation="../../../target/scala-2.12/resource_managed/xsd/org/apache/daffodil/xsd/DFDL_part2_attributes.xsd"
version="1.0">
- <jxb:bindings
node="//xs:attributeGroup[@name='BaseAGQualified']/xs:attribute[@name='ref']">
- <jxb:property name="RefAttribute"/>
- </jxb:bindings>
- <jxb:bindings
node="//xs:attributeGroup[@name='CommonAG']/xs:attribute[@name='emptyElementParsePolicy']">
- <jxb:property name="EmptyElementParsePolicyAttribute"/>
- </jxb:bindings>
- </jxb:bindings>
- <jxb:bindings
schemaLocation="../../../target/scala-2.12/resource_managed/xsd/org/apache/daffodil/xsd/DFDL_part1_simpletypes.xsd"
version="1.0">
- <jxb:bindings node="//xs:simpleType[@name='ByteOrderEnum']">
- <jxb:typesafeEnumClass name="ByteOrderEnumType"/>
- </jxb:bindings>
- <jxb:bindings node="//xs:simpleType[@name='BitOrderEnum']">
- <jxb:typesafeEnumClass name="BitOrderEnumType"/>
- </jxb:bindings>
- <jxb:bindings
node="//xs:simpleType[@name='EmptyElementParsePolicyEnum']">
- <jxb:typesafeEnumClass name="EmptyElementParsePolicyEnumType"/>
- </jxb:bindings>
- </jxb:bindings>
- <jxb:bindings
schemaLocation="../../../target/scala-2.12/resource_managed/xsd/org/apache/daffodil/xsd/dafext.xsd"
version="1.0">
- <jxb:bindings node="//xs:attribute[@name='parseUnparsePolicy']">
- <jxb:property name="ParseUnparsePolicyExt"/>
- </jxb:bindings>
- <jxb:bindings node="//xs:simpleType[@name='PropertyNameType']">
- <jxb:typesafeEnumClass name="PropertyNameTypeExt"/>
- </jxb:bindings>
- <jxb:bindings node="//xs:complexType[@name='PropertyType']">
- <jxb:class name="PropertyTypeClass"/>
- </jxb:bindings>
- <jxb:bindings node="//xs:element[@name='property']">
- <jxb:class name="propertyClass"/>
- </jxb:bindings>
- </jxb:bindings>
- <jxb:bindings
schemaLocation="../../../target/scala-2.12/resource_managed/xsd/org/apache/daffodil/xsd/dfdlx.xsd"
version="1.0">
- <jxb:bindings node="//xs:simpleType[@name='PropertyNameType']">
- <jxb:typesafeEnumClass name="PropertyNameTypeX"/>
- </jxb:bindings>
- </jxb:bindings>
-</jxb:bindings>
diff --git a/debugger/src/main/scala/org.apache.daffodil.tdml/TDML.scala
b/debugger/src/main/scala/org.apache.daffodil.tdml/TDML.scala
index 3fa6f40..960e25d 100644
--- a/debugger/src/main/scala/org.apache.daffodil.tdml/TDML.scala
+++ b/debugger/src/main/scala/org.apache.daffodil.tdml/TDML.scala
@@ -23,8 +23,8 @@ import javax.xml.bind.JAXBContext
import javax.xml.bind.JAXBElement
import javax.xml.bind.Marshaller
import javax.xml.namespace.QName
-import javax.xml.bind.annotation.XmlType
import scala.collection.JavaConverters._
+import org.apache.daffodil.lib.xml.XMLUtils
object TDML {
// Create a ParserTestCaseType object that can be put into a TestSuite
@@ -71,12 +71,8 @@ object TDML {
// These lines are necessary because there is no @XmlRootElement
annotation on the DocumentPartType class in JAXB
// Ideally, we would want to have JAXB add the annotation - probably with
the bindings.xjb file. The only way I found
// that did that required an external plugin just to add the annotation
(https://github.com/highsource/jaxb2-annotate-plugin).
- // We are getting the namespace from the JAXB class so that we don't have
to hard-code it here
- // Unfortunately, it seems like hard-coding the class name isn't an easy
thing to avoid. There is a name in the XmlType
- // annotation, but it is documentPartType instead of documentPart. We
would need to remove the Type from this anyway.
- val tdmlNamespacePrefix =
classOf[DocumentPartType].getAnnotation(classOf[XmlType]).namespace()
val docPartElement = new JAXBElement[DocumentPartType](
- new QName(tdmlNamespacePrefix, "documentPart"),
+ new QName(XMLUtils.TDML_NAMESPACE.toString, "documentPart"),
classOf[DocumentPartType],
docPart
)
diff --git a/debugger/src/test/scala/org.apache.daffodil.tdml/TDMLSuite.scala
b/debugger/src/test/scala/org.apache.daffodil.tdml/TDMLSuite.scala
index d0b8b52..460fa67 100644
--- a/debugger/src/test/scala/org.apache.daffodil.tdml/TDMLSuite.scala
+++ b/debugger/src/test/scala/org.apache.daffodil.tdml/TDMLSuite.scala
@@ -37,15 +37,10 @@ class TDMLSuite extends munit.FunSuite {
val tdmlDescription = "Test TDML Description"
val tdmlPath = Paths.get("./testTDML.tdml").toAbsolutePath()
val expectedNSHashSet = HashSet[String](
- "http://www.ibm.com/xmlns/dfdl/testData",
- "urn:ogf:dfdl:2013:imp:daffodil.apache.org:2018:ext",
- "http://www.ogf.org/dfdl/dfdl-1.0/",
- "http://www.ogf.org/dfdl/dfdl-1.0/extensions",
- "http://www.w3.org/2001/XMLSchema",
- "urn:ogf:dfdl:2013:imp:daffodil.apache.org:2018:int"
+ "http://www.ibm.com/xmlns/dfdl/testData"
)
val tdmlSingleTestCase = """<?xml version="1.0" encoding="UTF-8"
standalone="yes"?>
-<ns1:testSuite xmlns:ns1="http://www.ibm.com/xmlns/dfdl/testData"
xmlns:ns2="urn:ogf:dfdl:2013:imp:daffodil.apache.org:2018:ext"
xmlns:ns3="http://www.ogf.org/dfdl/dfdl-1.0/"
xmlns:ns4="http://www.ogf.org/dfdl/dfdl-1.0/extensions"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:ns6="urn:ogf:dfdl:2013:imp:daffodil.apache.org:2018:int"
suiteName="TestTDMLName" defaultRoundTrip="onePass">
+<ns1:testSuite xmlns:ns1="http://www.ibm.com/xmlns/dfdl/testData"
suiteName="TestTDMLName" defaultRoundTrip="onePass">
<ns1:parserTestCase name="TestTDMLName" root="file"
model="debugger/src/test/data/emptySchema.xml" roundTrip="onePass"
description="Test TDML Description">
<ns1:document>
<ns1:documentPart
type="file">debugger/src/test/data/emptyData.xml</ns1:documentPart>
@@ -56,7 +51,7 @@ class TDMLSuite extends munit.FunSuite {
</ns1:parserTestCase>
</ns1:testSuite>"""
val tdmlDoubleTestCase = """<?xml version="1.0" encoding="UTF-8"
standalone="yes"?>
-<ns1:testSuite xmlns:ns1="http://www.ibm.com/xmlns/dfdl/testData"
xmlns:ns2="urn:ogf:dfdl:2013:imp:daffodil.apache.org:2018:ext"
xmlns:ns3="http://www.ogf.org/dfdl/dfdl-1.0/"
xmlns:ns4="http://www.ogf.org/dfdl/dfdl-1.0/extensions"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:ns6="urn:ogf:dfdl:2013:imp:daffodil.apache.org:2018:int"
suiteName="TestTDMLName" defaultRoundTrip="onePass">
+<ns1:testSuite xmlns:ns1="http://www.ibm.com/xmlns/dfdl/testData"
suiteName="TestTDMLName" defaultRoundTrip="onePass">
<ns1:parserTestCase name="TestTDMLName" root="file"
model="debugger/src/test/data/emptySchema.xml" roundTrip="onePass"
description="Test TDML Description">
<ns1:document>
<ns1:documentPart
type="file">debugger/src/test/data/emptyData.xml</ns1:documentPart>