[GitHub] [daffodil] mbeckerle commented on a diff in pull request #912: WIP: Draft - Config feature to allow XMLConversionControl to preserve CRLF and CR in data

GitBox Tue, 17 Jan 2023 13:51:10 -0800


mbeckerle commented on code in PR #912:
URL: https://github.com/apache/daffodil/pull/912#discussion_r1071656027



##########
daffodil-cli/src/it/resources/org/apache/daffodil/CLI/trace_input.dfdl.xsd:
##########
@@ -1,56 +1,56 @@
-<?xml version="1.0" encoding="UTF-8"?>

Review Comment:
   This file also identical. I don't know why it is here in this changeset. 



##########
daffodil-cli/src/it/resources/org/apache/daffodil/CLI/input/test_DFDL-714.txt:
##########
@@ -1 +1 @@
-Hello World 
+Hello World 

Review Comment:
   I don't know why this file showed up here. It is identical to prior version. 



##########
daffodil-cli/src/it/scala/org/apache/daffodil/xml/TestXMLConversionControl.scala:
##########
@@ -0,0 +1,132 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.daffodil.xml
+
+import org.apache.commons.io.FileUtils
+import org.junit.Test
+import org.apache.daffodil.CLI.Util._
+import org.apache.daffodil.Main.ExitCode
+import org.junit.Assert.assertTrue
+
+import java.nio.charset.StandardCharsets
+
+class TestXMLConversionControl {
+
+  //
+  // To run tests conveniently under IntelliJ IDEA,
+  // rename the src/test dir to src/test1. Rename the src/it dir to src/test.
+  // Then modify this val to be "test".
+  // Then you can run these as ordinary junit-style tests under the IDE.
+  val test = "it"

Review Comment:
   I'm of the opinion that the 267 integration tests run fast enough now that 
we should just lump them in under regular src/test now, and not bother with the 
separate src/it.
   
   However, it would be good to move the stage build tasks so that you don't 
have to wait for the javadoc/scaladoc to build every time you want to make a 
small changes and then rerun the tests. 



##########
daffodil-lib/src/main/scala/org/apache/daffodil/util/CharacterSetRemapper.scala:
##########
@@ -0,0 +1,138 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.daffodil.util
+
+/**
+ * A abstract base for Remappers which convert strings.
+ *
+ * The public interface is just `def remap(s: String): String`.
+ *
+ * There are protected methods that implementations must provide.
+ *
+ * Contains shared implementation methods also.
+ *
+ * NOTE: This is inner loop stuff. Keep it and derived classes lean and fast.
+ * Use a java-like coding style. While loops, not map/flatmap/etc. avoid 
tuples.
+ */
+trait CharacterSetRemapper {
+
+  /**
+   * Remaps the string. Returns the original string object if no remapping is 
required.
+   */
+  def remap(s: String): String = remapImpl(s)
+
+  /**
+   * Remaps 1 character, does not consider any context.
+   */
+  def remapChar(c: Char): Char = remap(0, c, 0).toChar
+
+  /**
+   * Remaps characters. Provides the previous and following characters since 
some remappings
+   * require this context.
+   *
+   * Plays a trick with negating the return value in order to avoid having to
+   * return more than one value, which is potentially less efficient.
+   *
+   * @param prev The character prior to the one being considered. (Needed for 
surrogates)
+   * @param curr The character under consideration for remapping.
+   * @param next The next character afterwards. (Needed for surrogates and 
CRLF pairs)
+   * @return The remapped character (as an Int) or that same remapped 
character Int
+   *         value negated, which signals that curr+next was remapped to a 
single character.
+   *         Such as is needed if CRLF is remapped to just LF.
+   */
+  protected def remap (prev: Int, curr: Int, next: Int): Int
+
+  private def needsRemapping(s: String): Boolean = {
+    // a one liner in scala,
+    //
+    //    `s.exists{ remapChar(_) != _ }`
+    //
+    // but we need a fast java-like while loop...
+    var pos = 0
+    var c = 0.toChar
+    val len = s.length
+    if (len != 0)
+      while (pos < len) {
+        c = s(pos)
+        if (remapChar(c) != c)

Review Comment:
   Bug. Yes, a lone surrogate (which you can only tell from context)  must be 
remapped. 



##########
daffodil-lib/src/main/scala/org/apache/daffodil/util/CharacterSetRemapper.scala:
##########
@@ -0,0 +1,138 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.daffodil.util
+
+/**
+ * A abstract base for Remappers which convert strings.
+ *
+ * The public interface is just `def remap(s: String): String`.
+ *
+ * There are protected methods that implementations must provide.
+ *
+ * Contains shared implementation methods also.
+ *
+ * NOTE: This is inner loop stuff. Keep it and derived classes lean and fast.
+ * Use a java-like coding style. While loops, not map/flatmap/etc. avoid 
tuples.
+ */
+trait CharacterSetRemapper {
+
+  /**
+   * Remaps the string. Returns the original string object if no remapping is 
required.
+   */
+  def remap(s: String): String = remapImpl(s)
+
+  /**
+   * Remaps 1 character, does not consider any context.
+   */
+  def remapChar(c: Char): Char = remap(0, c, 0).toChar
+
+  /**
+   * Remaps characters. Provides the previous and following characters since 
some remappings
+   * require this context.
+   *
+   * Plays a trick with negating the return value in order to avoid having to
+   * return more than one value, which is potentially less efficient.
+   *
+   * @param prev The character prior to the one being considered. (Needed for 
surrogates)
+   * @param curr The character under consideration for remapping.
+   * @param next The next character afterwards. (Needed for surrogates and 
CRLF pairs)
+   * @return The remapped character (as an Int) or that same remapped 
character Int
+   *         value negated, which signals that curr+next was remapped to a 
single character.
+   *         Such as is needed if CRLF is remapped to just LF.
+   */
+  protected def remap (prev: Int, curr: Int, next: Int): Int
+
+  private def needsRemapping(s: String): Boolean = {

Review Comment:
   That's a good fix. Avoids on average 1/2 a pass over the data. Fixes 
surrogate bug with needsRemapping. 



##########
daffodil-cli/src/it/scala/org/apache/daffodil/xml/TestXMLConversionControl.scala:
##########
@@ -0,0 +1,132 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.daffodil.xml
+
+import org.apache.commons.io.FileUtils
+import org.junit.Test
+import org.apache.daffodil.CLI.Util._
+import org.apache.daffodil.Main.ExitCode
+import org.junit.Assert.assertTrue
+
+import java.nio.charset.StandardCharsets
+
+class TestXMLConversionControl {
+
+  //
+  // To run tests conveniently under IntelliJ IDEA,
+  // rename the src/test dir to src/test1. Rename the src/it dir to src/test.
+  // Then modify this val to be "test".
+  // Then you can run these as ordinary junit-style tests under the IDE.
+  val test = "it"
+
+  @Test def test_CLI_XMLConversionControlConvertCR(): Unit = {
+    withTempFile { output =>
+      val schema = 
path(s"daffodil-cli/src/$test/resources/org/apache/daffodil/CLI/aString.dfdl.xsd")
+      val config = 
path(s"daffodil-cli/src/$test/resources/org/apache/daffodil/CLI/config-convertCR.cfg.xml")
+      val input = 
path(s"daffodil-cli/src/$test/resources/org/apache/daffodil/CLI/input/inputWithCRLFs.txt.dat")
+
+      runCLI(args"parse -s $schema -c $config --root a -o $output $input") {
+        cli => cli.expect("")
+      }(ExitCode.Success)
+
+      val res = FileUtils.readFileToString(output.toFile, 
StandardCharsets.UTF_8)
+      assertTrue(res.contains("<ex:a 
xmlns:ex=\"urn:ex\">abc\ndef\nghi</ex:a>"))
+    }
+  }
+
+  @Test def test_CLI_XMLConversionControlPreserveCRParse(): Unit = {
+    withTempFile { output =>
+      val schema = 
path(s"daffodil-cli/src/$test/resources/org/apache/daffodil/CLI/aString.dfdl.xsd")
+      val config = 
path(s"daffodil-cli/src/$test/resources/org/apache/daffodil/CLI/config-preserveCR.cfg.xml")
+      val input = 
path(s"daffodil-cli/src/$test/resources/org/apache/daffodil/CLI/input/inputWithCRLFs.txt.dat")
+
+      runCLI(args"parse -s $schema -c $config --root a -o $output $input") { 
cli =>
+        cli.expect("")
+      }(ExitCode.Success)
+
+      val res = FileUtils.readFileToString(output.toFile, 
StandardCharsets.UTF_8)
+      assertTrue(res.contains("<ex:a 
xmlns:ex=\"urn:ex\">abc\uE00D\ndef\uE00D\nghi</ex:a>"))
+    }
+  }
+
+  @Test def test_CLI_XMLConversionControlPreserveCRRoundTrip(): Unit = {
+    withTempFile { output =>
+      withTempFile { xmlOut =>
+        val schema = 
path(s"daffodil-cli/src/$test/resources/org/apache/daffodil/CLI/aString.dfdl.xsd")
+        val config = 
path(s"daffodil-cli/src/$test/resources/org/apache/daffodil/CLI/config-preserveCR.cfg.xml")
+        val input = 
path(s"daffodil-cli/src/$test/resources/org/apache/daffodil/CLI/input/inputWithCRLFs.txt.dat")
+
+        var cmd = args"parse -s $schema -c $config --root a -o $xmlOut $input "
+        runCLI(cmd) { cli =>
+          cli.expect(s"")
+        }(ExitCode.Success)
+
+        cmd = args"unparse -s $schema -c $config --root a -o $output $xmlOut"
+        runCLI(cmd) { cli =>
+          cli.expect(s"")
+        }(ExitCode.Success)
+
+
+        val xml = FileUtils.readFileToString(xmlOut.toFile, 
StandardCharsets.UTF_8)
+        println(xml)
+        assertTrue(xml.toString.contains("abc\uE00D\ndef\uE00D\nghi"))
+      }
+
+      val xml = FileUtils.readFileToString(output.toFile, 
StandardCharsets.UTF_8)
+      assertTrue(xml.toString.contains("abc\r\ndef\r\nghi"))
+    }
+  }
+
+  @Test def test_CLI_XMLConversionControlPreserveCRUnparseToFile(): Unit = {
+    withTempFile { output =>
+      val schema = 
path(s"daffodil-cli/src/$test/resources/org/apache/daffodil/CLI/aString.dfdl.xsd")
+      val config = 
path(s"daffodil-cli/src/$test/resources/org/apache/daffodil/CLI/config-preserveCR.cfg.xml")
+
+      runCLI(args"unparse -s $schema -c $config --root a -o $output ") { cli =>
+        cli.send("<ex:a xmlns:ex='urn:ex'>abc\uE00D\ndef\uE00D\nghi</ex:a>", 
inputDone = true)
+      }(ExitCode.Success)
+
+      val res = FileUtils.readFileToString(output.toFile, 
StandardCharsets.UTF_8)
+      assertTrue(res.contains("abc\r\ndef\r\nghi"))
+    }
+  }
+
+  //
+  // Illustrates a problem with the expect library (perhaps?)

Review Comment:
   Working as designed then. I will just remove this test. 



##########
daffodil-core/src/test/scala/org/apache/daffodil/dsom/TestExternalVariables.scala:
##########
@@ -332,7 +332,7 @@ class TestExternalVariables {
     val dp1 = pf.onPath("/")
     val dp2 = pf.onPath("/").withExternalVariables(variables)
 
-    val outputter = new ScalaXMLInfosetOutputter()
+    val outputter = new 
ScalaXMLInfosetOutputter(dp2.daffodilConfig.xmlConversionControl)

Review Comment:
   This is a good point. 
   
   I only attached the config to the dp because i learned by trying it, that 
the xmlConversionControl object needed to get passed to exactly everywhere the 
dp was already being passed.  Because to run something, in the CLI, or in the 
TDML runner, you have to configure both the daffodil DP, AND at least one 
infoset inputter/outputter, which usually means one of the XML ones. So the two 
objects (DP and config info) just go together everywhere. 
   
   But I will look into refactoring this so that the DP object doesn't carry 
the config at all. 
   
   That will straighten out the NullInfosetInputter as well. 
   
   



##########
daffodil-propgen/src/main/resources/org/apache/daffodil/xsd/dafext.xsd:
##########
@@ -98,504 +117,1034 @@
       - minExclusive
       - maxExclusive
   -->
-  <xs:element name="tunables">
-    <xs:complexType>
-      <xs:all>
-        <xs:element name="allowExpressionResultCoercion" type="xs:boolean" 
default="true" minOccurs="0">
-          <xs:annotation>
-            <xs:documentation>
-              Defines how Daffodil coerces expressions where the result type 
differs
-              from the expected type. As an example, assume the expected type 
of an
-              expression is an xs:string, but the expression is { 3 }. In this 
case, the
-              expression result is an xs:int, which should not be 
automatically coerced
-              to an xs:string. Instead, the expression should be { 
xs:string(3) } or { "3" }
-              If the value of this tunable is false, these types of 
expressions will
-              result in a schema definition error. If the value is true, 
Daffodil will
-              provide a warning and attempt to coerce the result type to the 
expected
-              type.
-            </xs:documentation>
-          </xs:annotation>
-        </xs:element>
-        <xs:element name="allowExternalPathExpressions" type="xs:boolean" 
default="false" minOccurs="0">
-          <xs:annotation>
-            <xs:documentation>
-              By default, path expressions in Daffodil will only work 
correctly if path
-              steps are used in an expression defined in the schema when 
compiled. To
-              enable the use of other expressions (e.g. during debugging, 
where not all
-              expressions are known at schema compile time), set this tunable 
to true.
-              This may cause a degredation of performance in path expression 
evaluation,
-              so this should be avoided when in production. This flag is 
automatically
-              enabled when debugging is enabled.
-            </xs:documentation>
-          </xs:annotation>
-        </xs:element>
-        <xs:element name="blobChunkSizeInBytes" default="4096" minOccurs="0">
-          <xs:annotation>
-            <xs:documentation>
-              When reading/writing blob data, the maximum number of bytes to 
read/write
-              at a time. This is also used when parsing xs:hexBinary data.
-            </xs:documentation>
-          </xs:annotation>
-          <xs:simpleType>
-            <xs:restriction base="xs:int">
-              <xs:minInclusive value="1" />
-              <xs:maxInclusive value="268435455" /> <!-- Limit to (MaxInt / 8) 
because some places convert this tunable to bits -->
-            </xs:restriction>
-          </xs:simpleType>
-        </xs:element>
-        <xs:element name="defaultEmptyElementParsePolicy" 
type="daf:TunableEmptyElementParsePolicy" default="treatAsEmpty" minOccurs="0">
-          <xs:annotation>
-            <xs:documentation>
-              Defines the default empty element parse policy to use if it is 
not defined
-              in a schema. This is only used if 
requireEmptyElementParsePolicyProperty is
-              false.
-            </xs:documentation>
-          </xs:annotation>
-        </xs:element>
-        <xs:element name="defaultInitialRegexMatchLimitInChars" default="32" 
minOccurs="0">
-          <xs:annotation>
-            <xs:documentation>
-              Deprecated. This tunable no longer has any affect and is only 
kept for
-              backwards compatability.
-            </xs:documentation>
-          </xs:annotation>
-          <xs:simpleType>
-            <xs:restriction base="xs:int">
-              <xs:minInclusive value="1" />
-            </xs:restriction>
-          </xs:simpleType>
-        </xs:element>
-        <xs:element name="errorOnUnsupportedJavaVersion" type="xs:boolean" 
default="true" minOccurs="0">
-          <xs:annotation>
-            <xs:documentation>
-              Deprecated. This tunable no longer has any affect and is only 
kept for
-              backwards compatability.
-            </xs:documentation>
-          </xs:annotation>
-        </xs:element>
-        <xs:element name="generatedNamespacePrefixStem" type="xs:string" 
default="tns" minOccurs="0">
-          <xs:annotation>
-            <xs:documentation>
-              Stem to use when generating a namespace prefix when one is not 
defined for
-              the target naespace.
-            </xs:documentation>
-          </xs:annotation>
-        </xs:element>
-        <xs:element name="initialElementOccurrencesHint" default="10" 
minOccurs="0">
-          <xs:annotation>
-            <xs:documentation>
-              Initial array buffer size allocated for recurring 
elements/arrays.
-            </xs:documentation>
-          </xs:annotation>
-          <xs:simpleType>
-            <xs:restriction base="xs:int">
-              <xs:minInclusive value="1" />
-            </xs:restriction>
-          </xs:simpleType>
-        </xs:element>
-        <xs:element name="initialRegexMatchLimitInCharacters" default="64" 
minOccurs="0">
-          <xs:annotation>
-            <xs:documentation>
-              Initial number of characters to match when performing regular 
expression
-              matches on input data. When a regex fails to match, more data 
may be
-              consumed up to the maximumRegexMatchLengthInCharacters tunable.
-            </xs:documentation>
-          </xs:annotation>
-          <xs:simpleType>
-            <xs:restriction base="xs:int">
-              <xs:minInclusive value="1" />
-            </xs:restriction>
-          </xs:simpleType>
-        </xs:element>
-        <xs:element name="infosetWalkerSkipMin" default="32" minOccurs="0">
-          <xs:annotation>
-            <xs:documentation>
-              Daffodil periodically walks the internal infoset to send events 
to the configured
-              InfosetOutputter, skipping at least this number of walk 
attempts. Larger values
-              mean delayed InfosetOutputter events and more memory usage; 
Smaller values mean
-              more CPU usage. Set this value to zero to never skip any walk 
attempts. This is
-              specifically for advanced testing behavior and should not need 
to be changed by users.
-            </xs:documentation>
-          </xs:annotation>
-          <xs:simpleType>
-            <xs:restriction base="xs:int">
-              <xs:minInclusive value="0" />
-            </xs:restriction>
-          </xs:simpleType>
-        </xs:element>
-        <xs:element name="infosetWalkerSkipMax" default="2048" minOccurs="0">
-          <xs:annotation>
-            <xs:documentation>
-              Daffodil periodically walks the internal infoset to send events 
to the configured
-              InfosetOutputter. On walks where no progress is made, the number 
of walks to skip
-              is increased with the assumption that something is blocking it 
(like an
-              unresolved point of uncertainty), up to this maximum value. 
Higher values mean
-              less attempts are made when blocked for a long time, but with 
potentially more
-              delays and memory usage before InfosetOutputter events are 
created. This is
-              specifically for advanced testing behavior and should not need 
to be changed by users.
-            </xs:documentation>
-          </xs:annotation>
-          <xs:simpleType>
-            <xs:restriction base="xs:int">
-              <xs:minInclusive value="0" />
-            </xs:restriction>
-          </xs:simpleType>
-        </xs:element>
-        <xs:element name="inputFileMemoryMapLowThreshold" type="xs:int" 
default="33554432" minOccurs="0">
-          <xs:annotation>
-            <xs:documentation>
-              Deprecated. This tunable no longer has any affect and is only 
kept for
-              backwards compatability.
-            </xs:documentation>
-          </xs:annotation>
-        </xs:element>
-        <xs:element name="maxBinaryDecimalVirtualPoint" default="200" 
minOccurs="0">
-          <xs:annotation>
-            <xs:documentation>
-              The largest allowed value of the dfdl:binaryDecimalVirtualPoint 
property.
-            </xs:documentation>
-          </xs:annotation>
-          <xs:simpleType>
-            <xs:restriction base="xs:int">
-              <xs:minInclusive value="1" />
-            </xs:restriction>
-          </xs:simpleType>
-        </xs:element>
-        <xs:element name="maxByteArrayOutputStreamBufferSizeInBytes" 
default="2097152000" minOccurs="0">
-          <xs:annotation>
-            <xs:documentation>
-              When unparsing, this is the maximum size of the buffer that the
-              ByteArrayOutputStream can grow to before switching to a file 
based
-              output stream.
-            </xs:documentation>
-          </xs:annotation>
-          <xs:simpleType>
-            <xs:restriction base="xs:int">
-              <xs:minInclusive value="0" />
-            </xs:restriction>
-          </xs:simpleType>
-        </xs:element>
-        <xs:element name="maxDataDumpSizeInBytes" default="256" minOccurs="0">
-          <xs:annotation>
-            <xs:documentation>
-              The maximum size of data to retrive when when getting data to 
display
-              for debugging.
-            </xs:documentation>
-          </xs:annotation>
-          <xs:simpleType>
-            <xs:restriction base="xs:int">
-              <xs:minInclusive value="1" />
-            </xs:restriction>
-          </xs:simpleType>
-        </xs:element>
-        <xs:element name="maxFieldContentLengthInBytes" type="xs:int" 
default="1048576" minOccurs="0">
-          <xs:annotation>
-            <xs:documentation>
-              Deprecated. This tunable no longer has any affect and is only 
kept for
-              backwards compatability.
-            </xs:documentation>
-          </xs:annotation>
-        </xs:element>
-        <xs:element name="maxHexBinaryLengthInBytes" default="2147483647" 
minOccurs="0">
-          <xs:annotation>
-            <xs:documentation>
-              The maximum size allowed for an xs:hexBinary element.
-            </xs:documentation>
-          </xs:annotation>
-          <xs:simpleType>
-            <xs:restriction base="xs:int">
-              <xs:minInclusive value="1" />
-            </xs:restriction>
-          </xs:simpleType>
-        </xs:element>
-        <xs:element name="maxLengthForVariableLengthDelimiterDisplay" 
default="10" minOccurs="0">
-          <xs:annotation>
-            <xs:documentation>
-              When unexpected text is found where a delimiter is expected, 
this is the maximum
-              number of bytes (characters) to display when the expected 
delimiter is a variable
-              length delimiter.
-            </xs:documentation>
-          </xs:annotation>
-          <xs:simpleType>
-            <xs:restriction base="xs:int">
-              <xs:minInclusive value="1" />
-            </xs:restriction>
-          </xs:simpleType>
-        </xs:element>
-        <xs:element name="maxLookaheadFunctionBits" default="512" 
minOccurs="0">
-          <xs:annotation>
-            <xs:documentation>
-              Max distance that the DPath lookahead function is permitted to 
look.
-              Distance is defined by the distance to the last bit accessed, and
-              so it is offset+bitsize.
-            </xs:documentation>
-          </xs:annotation>
-          <xs:simpleType>
-            <xs:restriction base="xs:long">
-              <xs:minInclusive value="1" />
-            </xs:restriction>
-          </xs:simpleType>
-        </xs:element>
-        <xs:element name="maxOccursBounds" default="2147483647" minOccurs="0">
-          <xs:annotation>
-            <xs:documentation>
-              Maximum number of occurances of an array element.
-            </xs:documentation>
-          </xs:annotation>
-          <xs:simpleType>
-            <xs:restriction base="xs:long">
-              <xs:minInclusive value="1" />
-            </xs:restriction>
-          </xs:simpleType>
-        </xs:element>
-        <xs:element name="maxSkipLengthInBytes" default="1024" minOccurs="0">
-          <xs:annotation>
-            <xs:documentation>
-              Maximum number of bytes allowed to skip in a skip region.
-            </xs:documentation>
-          </xs:annotation>
-          <xs:simpleType>
-            <xs:restriction base="xs:int">
-              <xs:minInclusive value="1" />
-            </xs:restriction>
-          </xs:simpleType>
-        </xs:element>
-        <xs:element name="maxValidYear" default="9999" minOccurs="0">
-          <xs:annotation>
-            <xs:documentation>
-              Due to differences in the DFDL spec and ICU4J SimpleDateFormat, 
we must
-              have SimpleDateFormat parse in lenient mode, which allows the 
year value
-              to overflow with very large years into possibly negative years. 
This
-              tunable tunable sets an upper limit for values to prevent 
overflow.
-            </xs:documentation>
-          </xs:annotation>
-          <xs:simpleType>
-            <xs:restriction base="xs:int">
-              <xs:minInclusive value="1" />
-            </xs:restriction>
-          </xs:simpleType>
-        </xs:element>
-        <xs:element name="maximumRegexMatchLengthInCharacters" 
default="1048576" minOccurs="0">
-          <xs:annotation>
-            <xs:documentation>
-              Maximum number of characters to match when performing regular 
expression
-              matches on input data.
-            </xs:documentation>
-          </xs:annotation>
-          <xs:simpleType>
-            <xs:restriction base="xs:int">
-              <xs:minInclusive value="1" />
-            </xs:restriction>
-          </xs:simpleType>
-        </xs:element>
-        <xs:element name="maximumSimpleElementSizeInCharacters" 
default="1048576" minOccurs="0">
-          <xs:annotation>
-            <xs:documentation>
-              Maximum number of characters to parse when parsing string data.
-            </xs:documentation>
-          </xs:annotation>
-          <xs:simpleType>
-            <xs:restriction base="xs:int">
-              <xs:minInclusive value="1" />
-            </xs:restriction>
-          </xs:simpleType>
-        </xs:element>
-        <xs:element name="minBinaryDecimalVirtualPoint" default="-200" 
minOccurs="0">
-          <xs:annotation>
-            <xs:documentation>
-              The smallest allowed value of the dfdl:binaryDecimalVirtualPoint 
property.
-            </xs:documentation>
-          </xs:annotation>
-          <xs:simpleType>
-            <xs:restriction base="xs:int">
-              <xs:maxInclusive value="-1" />
-            </xs:restriction>
-          </xs:simpleType>
-        </xs:element>
-        <xs:element name="minValidYear" type="xs:int" default="0" 
minOccurs="0">
-          <xs:annotation>
-            <xs:documentation>
-              Due to differences in the DFDL spec and ICU4J SimpleDateFormat, 
we must
-              have SimpleDateFormat parse in lenient mode, which allows the 
year value
-              to overflow with very large years into possibly negative years. 
This
-              tunable tunable sets an upper limit for values to prevent 
underflow.
-            </xs:documentation>
-          </xs:annotation>
-        </xs:element>
-        <xs:element name="outputStreamChunkSizeInBytes" default="65536" 
minOccurs="0">
-          <xs:annotation>
-            <xs:documentation>
-              When writing file data to the output stream during unparse, this
-              is the maximum number of bytes to write at a time.
-            </xs:documentation>
-          </xs:annotation>
-          <xs:simpleType>
-            <xs:restriction base="xs:int">
-              <xs:minInclusive value="1" />
-            </xs:restriction>
-          </xs:simpleType>
-        </xs:element>
-        <xs:element name="parseUnparsePolicy" 
type="daf:TunableParseUnparsePolicyTunable" default="fromRoot" minOccurs="0">
-          <xs:annotation>
-            <xs:documentation>
-              Whether to compile a schema to support only parsing, only 
unparsing, both, or to
-              use the daf:parseUnparsePolicy from the root node. All child 
elements of the root
-              must have a compatable daf:parseUnaprsePolicy property.
-            </xs:documentation>
-          </xs:annotation>
-        </xs:element>
-        <xs:element name="readerByteBufferSize" type="xs:int" default="8192" 
minOccurs="0">
-          <xs:annotation>
-            <xs:documentation>
-              Deprecated. This tunable no longer has any affect and is only 
kept for
-              backwards compatability.
-            </xs:documentation>
-          </xs:annotation>
-        </xs:element>
-        <xs:element name="releaseUnneededInfoset" type="xs:boolean" 
default="true" minOccurs="0">
-          <xs:annotation>
-            <xs:documentation>
-              Daffodil will periodically release internal infoset elements 
that it determines
-              are no longer needed, thus freeing memory. Setting this value to 
false will
-              prevent this from taking place. This should usually only be used 
while debugging
-              or with very specific tests.
-            </xs:documentation>
-          </xs:annotation>
-        </xs:element>
-        <xs:element name="requireBitOrderProperty" type="xs:boolean" 
default="false" minOccurs="0">
-          <xs:annotation>
-            <xs:documentation>
-              If true, require that the dfdl:bitOrder property is specified. 
If false, use a
-              default value if the property is not defined in the schema.
-            </xs:documentation>
-          </xs:annotation>
-        </xs:element>
-        <xs:element name="requireEmptyElementParsePolicyProperty" 
type="xs:boolean" default="false" minOccurs="0">
-          <xs:annotation>
-            <xs:documentation>
-              If true, require that the dfdl:emptyElementParsePolicy property 
is specified in
-              the schema. If false, and not defined in the schema, uses the
-              defaultEmptyElementParsePolicy as the value of 
emptyElementParsePolicy.
-            </xs:documentation>
-          </xs:annotation>
-        </xs:element>
-        <xs:element name="requireEncodingErrorPolicyProperty" 
type="xs:boolean" default="false" minOccurs="0">
-          <xs:annotation>
-            <xs:documentation>
-              If true, require that the dfdl:encodingErrorPolicy property is 
specified. If
-              false, use a default value if the property is not defined in the 
schema.
-            </xs:documentation>
-          </xs:annotation>
-        </xs:element>
-        <xs:element name="requireFloatingProperty" type="xs:boolean" 
default="false" minOccurs="0">
-          <xs:annotation>
-            <xs:documentation>
-              If true, require that the dfdl:floating property is specified. If
-              false, use a default value if the property is not defined in the 
schema.
-            </xs:documentation>
-          </xs:annotation>
-        </xs:element>
-        <xs:element name="requireTextBidiProperty" type="xs:boolean" 
default="false" minOccurs="0">
-          <xs:annotation>
-            <xs:documentation>
-              If true, require that the dfdl:testBidi property is specified. If
-              false, use a default value if the property is not defined in the 
schema.
-            </xs:documentation>
-          </xs:annotation>
-        </xs:element>
-        <xs:element name="requireTextStandardBaseProperty" type="xs:boolean" 
default="false" minOccurs="0">
-          <xs:annotation>
-            <xs:documentation>
-              If true, require that the dfdl:textStandardBase property is 
specified. If false
-              and the property is missing, behave as if the property is set to 
10.
-            </xs:documentation>
-          </xs:annotation>
-        </xs:element>
-        <xs:element name="saxUnparseEventBatchSize" default="100" 
minOccurs="0">
-          <xs:annotation>
-            <xs:documentation>
-              Daffodil's SAX Unparse API allows events to be batched in memory 
to minimize the
-              frequency of context switching between the SAXInfosetInputter 
thread that processes
-              the events, and the DaffodilUnparseContentHandler thread that 
generates the events.
-              Setting this value to a low number will increase the frequency 
of context switching,
-              but will reduce the memory footprint. Swtting it to a high 
number will decrease the
-              frequency of context switching, but increase the memory 
footprint.
-            </xs:documentation>
-          </xs:annotation>
-          <xs:simpleType>
-            <xs:restriction base="xs:int">
-              <xs:minInclusive value="1" />
-            </xs:restriction>
-          </xs:simpleType>
-        </xs:element>
-        <xs:element name="suppressSchemaDefinitionWarnings" 
type="daf:TunableSuppressSchemaDefinitionWarnings" 
default="emptyElementParsePolicyError" minOccurs="0">
-          <xs:annotation>
-            <xs:documentation>
-              Space-separated list of schema definition warnings that should 
be ignored,
-              or "all" to ignore all warnings.
-            </xs:documentation>
-          </xs:annotation>
-        </xs:element>
-        <xs:element name="tempFilePath" type="xs:string" default="This string 
is ignored. Default value is taken from java.io.tmpdir property"  minOccurs="0">
-          <xs:annotation>
-            <xs:documentation>
-              When unparsing, use this path to store temporary files that may 
be genrated.
-              The default value (empty string) will result in the use of the 
java.io.tmpdir
-              property being used as the path.
-            </xs:documentation>
-          </xs:annotation>
-        </xs:element>
-        <xs:element name="unqualifiedPathStepPolicy" 
type="daf:TunableUnqualifiedPathStepPolicy" default="noNamespace" minOccurs="0">
-          <xs:annotation>
-            <xs:documentation>
-              Defines how to lookup DFDL expression path steps that to not 
include a
-              namespace prefix. Values are:
-              - noNamespace: only match elements that do not have a namespace
-              - defaultNamespace: only match elements defined in the default 
namespace
-              - preferDefaultNamespace: match elements defined in the default 
namespace;
-                  if non are found, match elemnts that do not have a namespace
-            </xs:documentation>
-          </xs:annotation>
-        </xs:element>
-        <xs:element name="unparseSuspensionWaitOld" default="100" 
minOccurs="0">
-          <xs:annotation>
-            <xs:documentation>
-              While unparsing, some unparse actions require "suspending" which
-              requires buffering unparse output until the suspension can be
-              evaluated. Daffodil periodically attempts to reevaluate these
-              suspensions so that these buffers can be released. We attempt to
-              evaluate young suspensions shortly after creation with the hope
-              that it will succeed and we can release associated buffers. But 
if
-              a young suspension fails it is moved to the old suspension list.
-              Old suspensions are evaluated less frequently since they are less
-              likely to succeeded. This minimizes the overhead related to
-              evaluating suspensions that are likely to fail. The
-              unparseSuspensionWaitYoung and unparseSuspensionWaitOld
-              values determine how many elements are unparsed before evaluating
-              young and old suspensions, respectively.
-            </xs:documentation>
-          </xs:annotation>
-          <xs:simpleType>
-            <xs:restriction base="xs:int">
-              <xs:minInclusive value="1" />
-            </xs:restriction>
-          </xs:simpleType>
-        </xs:element>
-        <xs:element name="unparseSuspensionWaitYoung" default="5" 
minOccurs="0">
-          <xs:annotation>
-            <xs:documentation>
-              See unparseSuspensionWaitOld
-            </xs:documentation>
-          </xs:annotation>
-          <xs:simpleType>
-            <xs:restriction base="xs:int">
-              <xs:minInclusive value="1" />
-            </xs:restriction>
-          </xs:simpleType>
-        </xs:element>
-      </xs:all>
-    </xs:complexType>
-  </xs:element>
+  <xs:element name="tunables" type="tns:tunablesType"/>

Review Comment:
   Yeah, I'm inclined to remove all changes to the config file stuff, switch to 
a simple properties file for the xml controls, and treat the existing config 
file stuff as legacy. 
   
   However, it has become clear to me that some schemas just require a config 
file in order to be used, and the same will hold for these xml conversion 
control parameters. Without them the CLI parse/unparse, the TDML runner, or 
using them via API are going to always require a config along with certain XML 
conversion controls. 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [daffodil] mbeckerle commented on a diff in pull request #912: WIP: Draft - Config feature to allow XMLConversionControl to preserve CRLF and CR in data

Reply via email to