This is an automated email from the ASF dual-hosted git repository.

slawrence pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/daffodil-vscode.git


The following commit(s) were added to refs/heads/main by this push:
     new 848cfce  Enable reproducible JAXB generated TDML code
848cfce is described below

commit 848cfcecf9d763316aab600a476e63d645b54c39
Author: Steve Lawrence <[email protected]>
AuthorDate: Mon Feb 10 10:04:35 2025 -0500

    Enable reproducible JAXB generated TDML code
    
    The resource generator in build.sbt currently extracts all schemas from
    the daffodil-lib jar and uses xjc to generate JAXB code that allows
    reading and writing TDML files. Note that this includes the large and
    complex XML schema for DFDL because that schema is referenced and used
    by the TDML schema to support embedded DFDL schemas and configurations.
    
    However, because the XML Schema for DFDL is so large and complex, it
    requires special xjc bindings, and also generates java that is not
    deterministic, making reproducible builds difficult.
    
    Fortunately, VS Code doesn't really care about embedded schemas in TDML
    files--it really only uses JAXB for modifying TDML test cases. So it
    does not need the many JAXB generated classes that cause the above
    issues related to embedded schemas.
    
    To avoid generating these classes, after the TDML schema is extracted,
    an XSLT is run to replace the embedded DFDL schema and configuration
    complex types with "any" elements. This prevents xjc from generating
    special types for embedded schemas and configurations, instead
    generating code that just treats them as generic XML Objects. This is
    sufficient for VS Code uses.
    
    Additionally, this allows us to only extract tdml.xsd from daffodil-lib
    since that is the only schema we need now. We can also remove the
    bindings.xjb file needed only for the no longer used XML Schema for
    DFDL.
    
    Also changes how we get the TDML namespace for the DocumentPart element,
    something with this changed how namespaces work. It looks like JAXB no
    longers maintain namespace declarations.
    
    This also moves the resource generator into the xjc/sources task. The
    two tasks actually ran in parallel and so could lead to a situation
    where the resource generator hadn't completed when xjc/sources was
    accessed and led to inconsistent results. The XSD files are really
    resources anyways, so this is probably the more accurate way to handle
    this. Additinoally, caching is added so we aren't constant extrating
    generating JAXB files--instead we only generate if daffodil-lib changes,
    which should be rare.
    
    Closes #1142
---
 build.sbt                                          | 112 +++++++++++++++------
 debugger/src/main/resources/bindings.xjb           |  84 ----------------
 .../main/scala/org.apache.daffodil.tdml/TDML.scala |   8 +-
 .../scala/org.apache.daffodil.tdml/TDMLSuite.scala |  11 +-
 4 files changed, 84 insertions(+), 131 deletions(-)

diff --git a/build.sbt b/build.sbt
index dc1ad5f..5188b69 100644
--- a/build.sbt
+++ b/build.sbt
@@ -18,6 +18,12 @@
 import com.github.retronym.sbtxjc.SbtXjcPlugin
 import Classpaths.managedJars
 
+import java.io.ByteArrayOutputStream
+import java.io.ByteArrayInputStream
+import javax.xml.transform.TransformerFactory
+import javax.xml.transform.stream.StreamResult
+import javax.xml.transform.stream.StreamSource
+
 //Fixes build issues on java11+
 run / fork := true
 Global / lintUnusedKeysOnLoad := false
@@ -136,48 +142,88 @@ lazy val xjcSettings =
     xjcCommandLine += "-p",
     xjcCommandLine += "org.apache.daffodil.tdml",
     xjcCommandLine += "-no-header",
-    xjcBindings += "debugger/src/main/resources/bindings.xjb",
     xjcJvmOpts ++= extraJvmOptions,
     xjcLibs := Seq(
       "com.sun.xml.bind" % "jaxb-impl" % "2.3.9",
       "org.glassfish.jaxb" % "jaxb-xjc" % "2.3.9",
       "javax.activation" % "activation" % "1.1.1"
     ),
-    Compile / xjc / sources := Seq(
-      file(
-        Seq(resourceManaged.value, "xsd")
-          .mkString(java.io.File.separator)
-      )
-    ),
     Compile / doc / sources := Seq(file("")),
-    Compile / resourceGenerators += Def.task {
-      // This is going to be the directory that contains the DFDL schema 
files. We extract the files from the jar to this directory,
-      //   but the directory structure is maintained. The directory structure 
will be flattened so that the DFDL schema files are
-      //   directly contained by this directory.
-      //
-      // Note that baseDirectory is ${workspaceDir}/server/sbtXjc/
-      lazy val xsdDir = Seq(resourceManaged.value, 
"xsd").mkString(java.io.File.separator)
-
-      // Get the daffodil-lib jar from the dependencies.
-      val jarsToExtract: Seq[File] =
-        managedJars(Test, Set[String]("jar"), update.value) map { _.data } 
filter { _.getName.contains("daffodil-lib") }
-
-      // Extract the DFDL schema files from the daffodil-lib jar. We ignore 
the XMLSchema.xsd file because it contains a DTD, and
-      //   the JAXB process is not happy with DTDs without a particular 
setting being set. Consequently, this file is not strictly
-      //   necessary for the generation of Java classes.
-      jarsToExtract foreach { jar =>
+    Compile / xjc / sources := {
+      val stream = (Compile / xjc / streams).value
+
+      // We are going to extract XSD files from Daffodil jars needed by xjc to 
generate JAXB
+      // classes
+      lazy val xjcSourceDir = crossTarget.value / "xjc"
+
+      // Get the daffodil-lib jar from the dependencies, this is the only jar 
we need to extract
+      // files from
+      val daffodilLibJar = managedJars(Test, Set[String]("jar"), update.value)
+        .map(_.data)
+        .find(_.getName.contains("daffodil-lib"))
+        .get
+
+      // cache the results of jar extraction so we don't keep extracting files 
(which would
+      // trigger xjc again) everytime we compile.
+      val cachedFun = FileFunction.cached(stream.cacheDirectory / 
"xjcSources") { _ =>
+        // Extract the DFDL TDML schema file used for JAXB generation.
         IO.unzip(
-          jar,
-          new File(xsdDir),
-          NameFilter.fnToNameFilter(f =>
-            !f.endsWith("XMLSchema.xsd") && f.endsWith(".xsd") && f.startsWith(
-              Seq("org", "apache", "daffodil", "xsd").mkString("/")
-            )
-          )
+          daffodilLibJar,
+          xjcSourceDir,
+          NameFilter.fnToNameFilter(f => f == 
"org/apache/daffodil/xsd/tdml.xsd")
         )
+
+        // The TDML schema supports embedded DFDL schemas and configurations, 
and it references
+        // the schema for DFDL schema when doing so. This DFDL schema is 
pretty complex, which
+        // requires extra complexity like the need for an xjc bindings file 
and also hits edge
+        // cases where xjc generates non-deterministic java code, leading to 
non-reproducible
+        // builds.
+        //
+        // Fortunately, VS Code does not need embedded DFDL schemas or config 
converted to
+        // specific objects, so we use XSLT to replace those parts of the 
schema with <any>
+        // elements. This allows JAXB to read TDML files containing embedded 
DFDL schemas, but
+        // they are just converted to generic XML Objects and avoids those 
complex edge cases.
+        val tdmlFile = xjcSourceDir / "org" / "apache" / "daffodil" / "xsd" / 
"tdml.xsd"
+        val tdmlXslt = """
+          |<xsl:stylesheet
+          |  xmlns:xs="http://www.w3.org/2001/XMLSchema";
+          |  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"; version="1.0">
+          |  <xsl:template match="@*|node()">
+          |    <xsl:copy>
+          |      <xsl:apply-templates select="@*|node()"/>
+          |    </xsl:copy>
+          |  </xsl:template>
+          |  <xsl:template match="xs:complexType[@name='defineSchemaType']">
+          |    <xs:complexType name='defineSchemaType'>
+          |      <xs:sequence>
+          |        <xs:any />
+          |      </xs:sequence>
+          |      <xs:anyAttribute />
+          |    </xs:complexType>
+          |  </xsl:template>
+          |  <xsl:template match="xs:complexType[@name='defineConfigType']">
+          |    <xs:complexType name='defineConfigType'>
+          |      <xs:sequence>
+          |        <xs:any />
+          |      </xs:sequence>
+          |      <xs:anyAttribute />
+          |    </xs:complexType>
+          |  </xsl:template>
+          |</xsl:stylesheet>""".stripMargin
+        val xslt = new StreamSource(new 
ByteArrayInputStream(tdmlXslt.getBytes))
+        val input = new StreamSource(tdmlFile)
+        val output = new ByteArrayOutputStream()
+        val factory = TransformerFactory.newInstance()
+        val transformer = factory.newTransformer(xslt);
+        transformer.transform(input, new StreamResult(output))
+        IO.write(tdmlFile, output.toByteArray())
+
+        val xsdFiles = (xjcSourceDir ** "*.xsd").get
+        xsdFiles.toSet
       }
 
-      // Get File objects for each DFDL schema file that was extracted.
-      new File(Seq(xsdDir, "org", "apache", "daffodil", 
"xsd").mkString("/")).listFiles().toSeq
-    }.taskValue
+      // only invalidate the cache if the daffodil lib jar changed and so 
there could be a
+      // chance the tdml.xsd file has been updated
+      cachedFun(Set(daffodilLibJar)).toSeq
+    }
   )
diff --git a/debugger/src/main/resources/bindings.xjb 
b/debugger/src/main/resources/bindings.xjb
deleted file mode 100644
index 43d1a84..0000000
--- a/debugger/src/main/resources/bindings.xjb
+++ /dev/null
@@ -1,84 +0,0 @@
-<!--
-  Licensed to the Apache Software Foundation (ASF) under one or more
-  contributor license agreements.  See the NOTICE file distributed with
-  this work for additional information regarding copyright ownership.
-  The ASF licenses this file to You under the Apache License, Version 2.0
-  (the "License"); you may not use this file except in compliance with
-  the License.  You may obtain a copy of the License at
-
-      http://www.apache.org/licenses/LICENSE-2.0
-
-  Unless required by applicable law or agreed to in writing, software
-  distributed under the License is distributed on an "AS IS" BASIS,
-  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-  See the License for the specific language governing permissions and
-  limitations under the License.
--->
-
-<jxb:bindings version="2.1"
-          xmlns:tdml="http://www.ibm.com/xmlns/dfdl/testData";
-          xmlns:jxb="http://java.sun.com/xml/ns/jaxb";
-          xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/";
-          xmlns:xs="http://www.w3.org/2001/XMLSchema";>
-    <jxb:globalBindings>
-        <jxb:serializable uid="1" />
-    </jxb:globalBindings>
-    <jxb:bindings 
schemaLocation="../../../target/scala-2.12/resource_managed/xsd/org/apache/daffodil/xsd/DFDL_part3_model.xsd"
 version="1.0">
-        <jxb:bindings 
node="//xs:attributeGroup[@name='SetVariableAG']/xs:attribute[@name='value']">
-            <jxb:property name="ValueAttribute"/>
-        </jxb:bindings>
-        <jxb:bindings node="//xs:element[@name='simpleType']">
-            <jxb:class name="simpleTypeClass"/>
-        </jxb:bindings>
-        <jxb:bindings node="//xs:element[@name='group']">
-            <jxb:class name="groupClass"/>
-        </jxb:bindings>
-        <jxb:bindings node="//xs:element[@name='choice']">
-            <jxb:class name="choiceClass"/>
-        </jxb:bindings>
-        <jxb:bindings node="//xs:element[@name='sequence']">
-            <jxb:class name="sequenceClass"/>
-        </jxb:bindings>
-        <jxb:bindings node="//xs:element[@name='element']">
-            <jxb:class name="elementClass"/>
-        </jxb:bindings>
-    </jxb:bindings>
-    <jxb:bindings 
schemaLocation="../../../target/scala-2.12/resource_managed/xsd/org/apache/daffodil/xsd/DFDL_part2_attributes.xsd"
 version="1.0">
-        <jxb:bindings 
node="//xs:attributeGroup[@name='BaseAGQualified']/xs:attribute[@name='ref']">
-            <jxb:property name="RefAttribute"/>
-        </jxb:bindings>
-         <jxb:bindings 
node="//xs:attributeGroup[@name='CommonAG']/xs:attribute[@name='emptyElementParsePolicy']">
-            <jxb:property name="EmptyElementParsePolicyAttribute"/>
-        </jxb:bindings>
-    </jxb:bindings>
-    <jxb:bindings 
schemaLocation="../../../target/scala-2.12/resource_managed/xsd/org/apache/daffodil/xsd/DFDL_part1_simpletypes.xsd"
 version="1.0">
-        <jxb:bindings node="//xs:simpleType[@name='ByteOrderEnum']">
-            <jxb:typesafeEnumClass name="ByteOrderEnumType"/>
-        </jxb:bindings>
-        <jxb:bindings node="//xs:simpleType[@name='BitOrderEnum']">
-            <jxb:typesafeEnumClass name="BitOrderEnumType"/>
-        </jxb:bindings>
-        <jxb:bindings 
node="//xs:simpleType[@name='EmptyElementParsePolicyEnum']">
-            <jxb:typesafeEnumClass name="EmptyElementParsePolicyEnumType"/>
-        </jxb:bindings>
-    </jxb:bindings>
-    <jxb:bindings 
schemaLocation="../../../target/scala-2.12/resource_managed/xsd/org/apache/daffodil/xsd/dafext.xsd"
 version="1.0">
-        <jxb:bindings node="//xs:attribute[@name='parseUnparsePolicy']">
-            <jxb:property name="ParseUnparsePolicyExt"/>
-        </jxb:bindings>
-        <jxb:bindings node="//xs:simpleType[@name='PropertyNameType']">
-            <jxb:typesafeEnumClass name="PropertyNameTypeExt"/>
-        </jxb:bindings>
-        <jxb:bindings node="//xs:complexType[@name='PropertyType']">
-            <jxb:class name="PropertyTypeClass"/>
-        </jxb:bindings>
-        <jxb:bindings node="//xs:element[@name='property']">
-            <jxb:class name="propertyClass"/>
-        </jxb:bindings>
-    </jxb:bindings>
-    <jxb:bindings 
schemaLocation="../../../target/scala-2.12/resource_managed/xsd/org/apache/daffodil/xsd/dfdlx.xsd"
 version="1.0">
-        <jxb:bindings node="//xs:simpleType[@name='PropertyNameType']">
-            <jxb:typesafeEnumClass name="PropertyNameTypeX"/>
-        </jxb:bindings>
-    </jxb:bindings>
-</jxb:bindings>
diff --git a/debugger/src/main/scala/org.apache.daffodil.tdml/TDML.scala 
b/debugger/src/main/scala/org.apache.daffodil.tdml/TDML.scala
index 3fa6f40..960e25d 100644
--- a/debugger/src/main/scala/org.apache.daffodil.tdml/TDML.scala
+++ b/debugger/src/main/scala/org.apache.daffodil.tdml/TDML.scala
@@ -23,8 +23,8 @@ import javax.xml.bind.JAXBContext
 import javax.xml.bind.JAXBElement
 import javax.xml.bind.Marshaller
 import javax.xml.namespace.QName
-import javax.xml.bind.annotation.XmlType
 import scala.collection.JavaConverters._
+import org.apache.daffodil.lib.xml.XMLUtils
 
 object TDML {
   // Create a ParserTestCaseType object that can be put into a TestSuite
@@ -71,12 +71,8 @@ object TDML {
     // These lines are necessary because there is no @XmlRootElement 
annotation on the DocumentPartType class in JAXB
     // Ideally, we would want to have JAXB add the annotation - probably with 
the bindings.xjb file. The only way I found
     //   that did that required an external plugin just to add the annotation 
(https://github.com/highsource/jaxb2-annotate-plugin).
-    // We are getting the namespace from the JAXB class so that we don't have 
to hard-code it here
-    // Unfortunately, it seems like hard-coding the class name isn't an easy 
thing to avoid. There is a name in the XmlType
-    //   annotation, but it is documentPartType instead of documentPart. We 
would need to remove the Type from this anyway.
-    val tdmlNamespacePrefix = 
classOf[DocumentPartType].getAnnotation(classOf[XmlType]).namespace()
     val docPartElement = new JAXBElement[DocumentPartType](
-      new QName(tdmlNamespacePrefix, "documentPart"),
+      new QName(XMLUtils.TDML_NAMESPACE.toString, "documentPart"),
       classOf[DocumentPartType],
       docPart
     )
diff --git a/debugger/src/test/scala/org.apache.daffodil.tdml/TDMLSuite.scala 
b/debugger/src/test/scala/org.apache.daffodil.tdml/TDMLSuite.scala
index d0b8b52..460fa67 100644
--- a/debugger/src/test/scala/org.apache.daffodil.tdml/TDMLSuite.scala
+++ b/debugger/src/test/scala/org.apache.daffodil.tdml/TDMLSuite.scala
@@ -37,15 +37,10 @@ class TDMLSuite extends munit.FunSuite {
   val tdmlDescription = "Test TDML Description"
   val tdmlPath = Paths.get("./testTDML.tdml").toAbsolutePath()
   val expectedNSHashSet = HashSet[String](
-    "http://www.ibm.com/xmlns/dfdl/testData";,
-    "urn:ogf:dfdl:2013:imp:daffodil.apache.org:2018:ext",
-    "http://www.ogf.org/dfdl/dfdl-1.0/";,
-    "http://www.ogf.org/dfdl/dfdl-1.0/extensions";,
-    "http://www.w3.org/2001/XMLSchema";,
-    "urn:ogf:dfdl:2013:imp:daffodil.apache.org:2018:int"
+    "http://www.ibm.com/xmlns/dfdl/testData";
   )
   val tdmlSingleTestCase = """<?xml version="1.0" encoding="UTF-8" 
standalone="yes"?>
-<ns1:testSuite xmlns:ns1="http://www.ibm.com/xmlns/dfdl/testData"; 
xmlns:ns2="urn:ogf:dfdl:2013:imp:daffodil.apache.org:2018:ext" 
xmlns:ns3="http://www.ogf.org/dfdl/dfdl-1.0/"; 
xmlns:ns4="http://www.ogf.org/dfdl/dfdl-1.0/extensions"; 
xmlns:xs="http://www.w3.org/2001/XMLSchema"; 
xmlns:ns6="urn:ogf:dfdl:2013:imp:daffodil.apache.org:2018:int" 
suiteName="TestTDMLName" defaultRoundTrip="onePass">
+<ns1:testSuite xmlns:ns1="http://www.ibm.com/xmlns/dfdl/testData"; 
suiteName="TestTDMLName" defaultRoundTrip="onePass">
     <ns1:parserTestCase name="TestTDMLName" root="file" 
model="debugger/src/test/data/emptySchema.xml" roundTrip="onePass" 
description="Test TDML Description">
         <ns1:document>
             <ns1:documentPart 
type="file">debugger/src/test/data/emptyData.xml</ns1:documentPart>
@@ -56,7 +51,7 @@ class TDMLSuite extends munit.FunSuite {
     </ns1:parserTestCase>
 </ns1:testSuite>"""
   val tdmlDoubleTestCase = """<?xml version="1.0" encoding="UTF-8" 
standalone="yes"?>
-<ns1:testSuite xmlns:ns1="http://www.ibm.com/xmlns/dfdl/testData"; 
xmlns:ns2="urn:ogf:dfdl:2013:imp:daffodil.apache.org:2018:ext" 
xmlns:ns3="http://www.ogf.org/dfdl/dfdl-1.0/"; 
xmlns:ns4="http://www.ogf.org/dfdl/dfdl-1.0/extensions"; 
xmlns:xs="http://www.w3.org/2001/XMLSchema"; 
xmlns:ns6="urn:ogf:dfdl:2013:imp:daffodil.apache.org:2018:int" 
suiteName="TestTDMLName" defaultRoundTrip="onePass">
+<ns1:testSuite xmlns:ns1="http://www.ibm.com/xmlns/dfdl/testData"; 
suiteName="TestTDMLName" defaultRoundTrip="onePass">
     <ns1:parserTestCase name="TestTDMLName" root="file" 
model="debugger/src/test/data/emptySchema.xml" roundTrip="onePass" 
description="Test TDML Description">
         <ns1:document>
             <ns1:documentPart 
type="file">debugger/src/test/data/emptyData.xml</ns1:documentPart>

Reply via email to