mbeckerle commented on code in PR #912: URL: https://github.com/apache/daffodil/pull/912#discussion_r1071656027
########## daffodil-cli/src/it/resources/org/apache/daffodil/CLI/trace_input.dfdl.xsd: ########## @@ -1,56 +1,56 @@ -<?xml version="1.0" encoding="UTF-8"?> Review Comment: This file also identical. I don't know why it is here in this changeset. ########## daffodil-cli/src/it/resources/org/apache/daffodil/CLI/input/test_DFDL-714.txt: ########## @@ -1 +1 @@ -Hello World +Hello World Review Comment: I don't know why this file showed up here. It is identical to prior version. ########## daffodil-cli/src/it/scala/org/apache/daffodil/xml/TestXMLConversionControl.scala: ########## @@ -0,0 +1,132 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.daffodil.xml + +import org.apache.commons.io.FileUtils +import org.junit.Test +import org.apache.daffodil.CLI.Util._ +import org.apache.daffodil.Main.ExitCode +import org.junit.Assert.assertTrue + +import java.nio.charset.StandardCharsets + +class TestXMLConversionControl { + + // + // To run tests conveniently under IntelliJ IDEA, + // rename the src/test dir to src/test1. Rename the src/it dir to src/test. + // Then modify this val to be "test". + // Then you can run these as ordinary junit-style tests under the IDE. + val test = "it" Review Comment: I'm of the opinion that the 267 integration tests run fast enough now that we should just lump them in under regular src/test now, and not bother with the separate src/it. However, it would be good to move the stage build tasks so that you don't have to wait for the javadoc/scaladoc to build every time you want to make a small changes and then rerun the tests. ########## daffodil-lib/src/main/scala/org/apache/daffodil/util/CharacterSetRemapper.scala: ########## @@ -0,0 +1,138 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.daffodil.util + +/** + * A abstract base for Remappers which convert strings. + * + * The public interface is just `def remap(s: String): String`. + * + * There are protected methods that implementations must provide. + * + * Contains shared implementation methods also. + * + * NOTE: This is inner loop stuff. Keep it and derived classes lean and fast. + * Use a java-like coding style. While loops, not map/flatmap/etc. avoid tuples. + */ +trait CharacterSetRemapper { + + /** + * Remaps the string. Returns the original string object if no remapping is required. + */ + def remap(s: String): String = remapImpl(s) + + /** + * Remaps 1 character, does not consider any context. + */ + def remapChar(c: Char): Char = remap(0, c, 0).toChar + + /** + * Remaps characters. Provides the previous and following characters since some remappings + * require this context. + * + * Plays a trick with negating the return value in order to avoid having to + * return more than one value, which is potentially less efficient. + * + * @param prev The character prior to the one being considered. (Needed for surrogates) + * @param curr The character under consideration for remapping. + * @param next The next character afterwards. (Needed for surrogates and CRLF pairs) + * @return The remapped character (as an Int) or that same remapped character Int + * value negated, which signals that curr+next was remapped to a single character. + * Such as is needed if CRLF is remapped to just LF. + */ + protected def remap (prev: Int, curr: Int, next: Int): Int + + private def needsRemapping(s: String): Boolean = { + // a one liner in scala, + // + // `s.exists{ remapChar(_) != _ }` + // + // but we need a fast java-like while loop... + var pos = 0 + var c = 0.toChar + val len = s.length + if (len != 0) + while (pos < len) { + c = s(pos) + if (remapChar(c) != c) Review Comment: Bug. Yes, a lone surrogate (which you can only tell from context) must be remapped. ########## daffodil-lib/src/main/scala/org/apache/daffodil/util/CharacterSetRemapper.scala: ########## @@ -0,0 +1,138 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.daffodil.util + +/** + * A abstract base for Remappers which convert strings. + * + * The public interface is just `def remap(s: String): String`. + * + * There are protected methods that implementations must provide. + * + * Contains shared implementation methods also. + * + * NOTE: This is inner loop stuff. Keep it and derived classes lean and fast. + * Use a java-like coding style. While loops, not map/flatmap/etc. avoid tuples. + */ +trait CharacterSetRemapper { + + /** + * Remaps the string. Returns the original string object if no remapping is required. + */ + def remap(s: String): String = remapImpl(s) + + /** + * Remaps 1 character, does not consider any context. + */ + def remapChar(c: Char): Char = remap(0, c, 0).toChar + + /** + * Remaps characters. Provides the previous and following characters since some remappings + * require this context. + * + * Plays a trick with negating the return value in order to avoid having to + * return more than one value, which is potentially less efficient. + * + * @param prev The character prior to the one being considered. (Needed for surrogates) + * @param curr The character under consideration for remapping. + * @param next The next character afterwards. (Needed for surrogates and CRLF pairs) + * @return The remapped character (as an Int) or that same remapped character Int + * value negated, which signals that curr+next was remapped to a single character. + * Such as is needed if CRLF is remapped to just LF. + */ + protected def remap (prev: Int, curr: Int, next: Int): Int + + private def needsRemapping(s: String): Boolean = { Review Comment: That's a good fix. Avoids on average 1/2 a pass over the data. Fixes surrogate bug with needsRemapping. ########## daffodil-cli/src/it/scala/org/apache/daffodil/xml/TestXMLConversionControl.scala: ########## @@ -0,0 +1,132 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.daffodil.xml + +import org.apache.commons.io.FileUtils +import org.junit.Test +import org.apache.daffodil.CLI.Util._ +import org.apache.daffodil.Main.ExitCode +import org.junit.Assert.assertTrue + +import java.nio.charset.StandardCharsets + +class TestXMLConversionControl { + + // + // To run tests conveniently under IntelliJ IDEA, + // rename the src/test dir to src/test1. Rename the src/it dir to src/test. + // Then modify this val to be "test". + // Then you can run these as ordinary junit-style tests under the IDE. + val test = "it" + + @Test def test_CLI_XMLConversionControlConvertCR(): Unit = { + withTempFile { output => + val schema = path(s"daffodil-cli/src/$test/resources/org/apache/daffodil/CLI/aString.dfdl.xsd") + val config = path(s"daffodil-cli/src/$test/resources/org/apache/daffodil/CLI/config-convertCR.cfg.xml") + val input = path(s"daffodil-cli/src/$test/resources/org/apache/daffodil/CLI/input/inputWithCRLFs.txt.dat") + + runCLI(args"parse -s $schema -c $config --root a -o $output $input") { + cli => cli.expect("") + }(ExitCode.Success) + + val res = FileUtils.readFileToString(output.toFile, StandardCharsets.UTF_8) + assertTrue(res.contains("<ex:a xmlns:ex=\"urn:ex\">abc\ndef\nghi</ex:a>")) + } + } + + @Test def test_CLI_XMLConversionControlPreserveCRParse(): Unit = { + withTempFile { output => + val schema = path(s"daffodil-cli/src/$test/resources/org/apache/daffodil/CLI/aString.dfdl.xsd") + val config = path(s"daffodil-cli/src/$test/resources/org/apache/daffodil/CLI/config-preserveCR.cfg.xml") + val input = path(s"daffodil-cli/src/$test/resources/org/apache/daffodil/CLI/input/inputWithCRLFs.txt.dat") + + runCLI(args"parse -s $schema -c $config --root a -o $output $input") { cli => + cli.expect("") + }(ExitCode.Success) + + val res = FileUtils.readFileToString(output.toFile, StandardCharsets.UTF_8) + assertTrue(res.contains("<ex:a xmlns:ex=\"urn:ex\">abc\uE00D\ndef\uE00D\nghi</ex:a>")) + } + } + + @Test def test_CLI_XMLConversionControlPreserveCRRoundTrip(): Unit = { + withTempFile { output => + withTempFile { xmlOut => + val schema = path(s"daffodil-cli/src/$test/resources/org/apache/daffodil/CLI/aString.dfdl.xsd") + val config = path(s"daffodil-cli/src/$test/resources/org/apache/daffodil/CLI/config-preserveCR.cfg.xml") + val input = path(s"daffodil-cli/src/$test/resources/org/apache/daffodil/CLI/input/inputWithCRLFs.txt.dat") + + var cmd = args"parse -s $schema -c $config --root a -o $xmlOut $input " + runCLI(cmd) { cli => + cli.expect(s"") + }(ExitCode.Success) + + cmd = args"unparse -s $schema -c $config --root a -o $output $xmlOut" + runCLI(cmd) { cli => + cli.expect(s"") + }(ExitCode.Success) + + + val xml = FileUtils.readFileToString(xmlOut.toFile, StandardCharsets.UTF_8) + println(xml) + assertTrue(xml.toString.contains("abc\uE00D\ndef\uE00D\nghi")) + } + + val xml = FileUtils.readFileToString(output.toFile, StandardCharsets.UTF_8) + assertTrue(xml.toString.contains("abc\r\ndef\r\nghi")) + } + } + + @Test def test_CLI_XMLConversionControlPreserveCRUnparseToFile(): Unit = { + withTempFile { output => + val schema = path(s"daffodil-cli/src/$test/resources/org/apache/daffodil/CLI/aString.dfdl.xsd") + val config = path(s"daffodil-cli/src/$test/resources/org/apache/daffodil/CLI/config-preserveCR.cfg.xml") + + runCLI(args"unparse -s $schema -c $config --root a -o $output ") { cli => + cli.send("<ex:a xmlns:ex='urn:ex'>abc\uE00D\ndef\uE00D\nghi</ex:a>", inputDone = true) + }(ExitCode.Success) + + val res = FileUtils.readFileToString(output.toFile, StandardCharsets.UTF_8) + assertTrue(res.contains("abc\r\ndef\r\nghi")) + } + } + + // + // Illustrates a problem with the expect library (perhaps?) Review Comment: Working as designed then. I will just remove this test. ########## daffodil-core/src/test/scala/org/apache/daffodil/dsom/TestExternalVariables.scala: ########## @@ -332,7 +332,7 @@ class TestExternalVariables { val dp1 = pf.onPath("/") val dp2 = pf.onPath("/").withExternalVariables(variables) - val outputter = new ScalaXMLInfosetOutputter() + val outputter = new ScalaXMLInfosetOutputter(dp2.daffodilConfig.xmlConversionControl) Review Comment: This is a good point. I only attached the config to the dp because i learned by trying it, that the xmlConversionControl object needed to get passed to exactly everywhere the dp was already being passed. Because to run something, in the CLI, or in the TDML runner, you have to configure both the daffodil DP, AND at least one infoset inputter/outputter, which usually means one of the XML ones. So the two objects (DP and config info) just go together everywhere. But I will look into refactoring this so that the DP object doesn't carry the config at all. That will straighten out the NullInfosetInputter as well. ########## daffodil-propgen/src/main/resources/org/apache/daffodil/xsd/dafext.xsd: ########## @@ -98,504 +117,1034 @@ - minExclusive - maxExclusive --> - <xs:element name="tunables"> - <xs:complexType> - <xs:all> - <xs:element name="allowExpressionResultCoercion" type="xs:boolean" default="true" minOccurs="0"> - <xs:annotation> - <xs:documentation> - Defines how Daffodil coerces expressions where the result type differs - from the expected type. As an example, assume the expected type of an - expression is an xs:string, but the expression is { 3 }. In this case, the - expression result is an xs:int, which should not be automatically coerced - to an xs:string. Instead, the expression should be { xs:string(3) } or { "3" } - If the value of this tunable is false, these types of expressions will - result in a schema definition error. If the value is true, Daffodil will - provide a warning and attempt to coerce the result type to the expected - type. - </xs:documentation> - </xs:annotation> - </xs:element> - <xs:element name="allowExternalPathExpressions" type="xs:boolean" default="false" minOccurs="0"> - <xs:annotation> - <xs:documentation> - By default, path expressions in Daffodil will only work correctly if path - steps are used in an expression defined in the schema when compiled. To - enable the use of other expressions (e.g. during debugging, where not all - expressions are known at schema compile time), set this tunable to true. - This may cause a degredation of performance in path expression evaluation, - so this should be avoided when in production. This flag is automatically - enabled when debugging is enabled. - </xs:documentation> - </xs:annotation> - </xs:element> - <xs:element name="blobChunkSizeInBytes" default="4096" minOccurs="0"> - <xs:annotation> - <xs:documentation> - When reading/writing blob data, the maximum number of bytes to read/write - at a time. This is also used when parsing xs:hexBinary data. - </xs:documentation> - </xs:annotation> - <xs:simpleType> - <xs:restriction base="xs:int"> - <xs:minInclusive value="1" /> - <xs:maxInclusive value="268435455" /> <!-- Limit to (MaxInt / 8) because some places convert this tunable to bits --> - </xs:restriction> - </xs:simpleType> - </xs:element> - <xs:element name="defaultEmptyElementParsePolicy" type="daf:TunableEmptyElementParsePolicy" default="treatAsEmpty" minOccurs="0"> - <xs:annotation> - <xs:documentation> - Defines the default empty element parse policy to use if it is not defined - in a schema. This is only used if requireEmptyElementParsePolicyProperty is - false. - </xs:documentation> - </xs:annotation> - </xs:element> - <xs:element name="defaultInitialRegexMatchLimitInChars" default="32" minOccurs="0"> - <xs:annotation> - <xs:documentation> - Deprecated. This tunable no longer has any affect and is only kept for - backwards compatability. - </xs:documentation> - </xs:annotation> - <xs:simpleType> - <xs:restriction base="xs:int"> - <xs:minInclusive value="1" /> - </xs:restriction> - </xs:simpleType> - </xs:element> - <xs:element name="errorOnUnsupportedJavaVersion" type="xs:boolean" default="true" minOccurs="0"> - <xs:annotation> - <xs:documentation> - Deprecated. This tunable no longer has any affect and is only kept for - backwards compatability. - </xs:documentation> - </xs:annotation> - </xs:element> - <xs:element name="generatedNamespacePrefixStem" type="xs:string" default="tns" minOccurs="0"> - <xs:annotation> - <xs:documentation> - Stem to use when generating a namespace prefix when one is not defined for - the target naespace. - </xs:documentation> - </xs:annotation> - </xs:element> - <xs:element name="initialElementOccurrencesHint" default="10" minOccurs="0"> - <xs:annotation> - <xs:documentation> - Initial array buffer size allocated for recurring elements/arrays. - </xs:documentation> - </xs:annotation> - <xs:simpleType> - <xs:restriction base="xs:int"> - <xs:minInclusive value="1" /> - </xs:restriction> - </xs:simpleType> - </xs:element> - <xs:element name="initialRegexMatchLimitInCharacters" default="64" minOccurs="0"> - <xs:annotation> - <xs:documentation> - Initial number of characters to match when performing regular expression - matches on input data. When a regex fails to match, more data may be - consumed up to the maximumRegexMatchLengthInCharacters tunable. - </xs:documentation> - </xs:annotation> - <xs:simpleType> - <xs:restriction base="xs:int"> - <xs:minInclusive value="1" /> - </xs:restriction> - </xs:simpleType> - </xs:element> - <xs:element name="infosetWalkerSkipMin" default="32" minOccurs="0"> - <xs:annotation> - <xs:documentation> - Daffodil periodically walks the internal infoset to send events to the configured - InfosetOutputter, skipping at least this number of walk attempts. Larger values - mean delayed InfosetOutputter events and more memory usage; Smaller values mean - more CPU usage. Set this value to zero to never skip any walk attempts. This is - specifically for advanced testing behavior and should not need to be changed by users. - </xs:documentation> - </xs:annotation> - <xs:simpleType> - <xs:restriction base="xs:int"> - <xs:minInclusive value="0" /> - </xs:restriction> - </xs:simpleType> - </xs:element> - <xs:element name="infosetWalkerSkipMax" default="2048" minOccurs="0"> - <xs:annotation> - <xs:documentation> - Daffodil periodically walks the internal infoset to send events to the configured - InfosetOutputter. On walks where no progress is made, the number of walks to skip - is increased with the assumption that something is blocking it (like an - unresolved point of uncertainty), up to this maximum value. Higher values mean - less attempts are made when blocked for a long time, but with potentially more - delays and memory usage before InfosetOutputter events are created. This is - specifically for advanced testing behavior and should not need to be changed by users. - </xs:documentation> - </xs:annotation> - <xs:simpleType> - <xs:restriction base="xs:int"> - <xs:minInclusive value="0" /> - </xs:restriction> - </xs:simpleType> - </xs:element> - <xs:element name="inputFileMemoryMapLowThreshold" type="xs:int" default="33554432" minOccurs="0"> - <xs:annotation> - <xs:documentation> - Deprecated. This tunable no longer has any affect and is only kept for - backwards compatability. - </xs:documentation> - </xs:annotation> - </xs:element> - <xs:element name="maxBinaryDecimalVirtualPoint" default="200" minOccurs="0"> - <xs:annotation> - <xs:documentation> - The largest allowed value of the dfdl:binaryDecimalVirtualPoint property. - </xs:documentation> - </xs:annotation> - <xs:simpleType> - <xs:restriction base="xs:int"> - <xs:minInclusive value="1" /> - </xs:restriction> - </xs:simpleType> - </xs:element> - <xs:element name="maxByteArrayOutputStreamBufferSizeInBytes" default="2097152000" minOccurs="0"> - <xs:annotation> - <xs:documentation> - When unparsing, this is the maximum size of the buffer that the - ByteArrayOutputStream can grow to before switching to a file based - output stream. - </xs:documentation> - </xs:annotation> - <xs:simpleType> - <xs:restriction base="xs:int"> - <xs:minInclusive value="0" /> - </xs:restriction> - </xs:simpleType> - </xs:element> - <xs:element name="maxDataDumpSizeInBytes" default="256" minOccurs="0"> - <xs:annotation> - <xs:documentation> - The maximum size of data to retrive when when getting data to display - for debugging. - </xs:documentation> - </xs:annotation> - <xs:simpleType> - <xs:restriction base="xs:int"> - <xs:minInclusive value="1" /> - </xs:restriction> - </xs:simpleType> - </xs:element> - <xs:element name="maxFieldContentLengthInBytes" type="xs:int" default="1048576" minOccurs="0"> - <xs:annotation> - <xs:documentation> - Deprecated. This tunable no longer has any affect and is only kept for - backwards compatability. - </xs:documentation> - </xs:annotation> - </xs:element> - <xs:element name="maxHexBinaryLengthInBytes" default="2147483647" minOccurs="0"> - <xs:annotation> - <xs:documentation> - The maximum size allowed for an xs:hexBinary element. - </xs:documentation> - </xs:annotation> - <xs:simpleType> - <xs:restriction base="xs:int"> - <xs:minInclusive value="1" /> - </xs:restriction> - </xs:simpleType> - </xs:element> - <xs:element name="maxLengthForVariableLengthDelimiterDisplay" default="10" minOccurs="0"> - <xs:annotation> - <xs:documentation> - When unexpected text is found where a delimiter is expected, this is the maximum - number of bytes (characters) to display when the expected delimiter is a variable - length delimiter. - </xs:documentation> - </xs:annotation> - <xs:simpleType> - <xs:restriction base="xs:int"> - <xs:minInclusive value="1" /> - </xs:restriction> - </xs:simpleType> - </xs:element> - <xs:element name="maxLookaheadFunctionBits" default="512" minOccurs="0"> - <xs:annotation> - <xs:documentation> - Max distance that the DPath lookahead function is permitted to look. - Distance is defined by the distance to the last bit accessed, and - so it is offset+bitsize. - </xs:documentation> - </xs:annotation> - <xs:simpleType> - <xs:restriction base="xs:long"> - <xs:minInclusive value="1" /> - </xs:restriction> - </xs:simpleType> - </xs:element> - <xs:element name="maxOccursBounds" default="2147483647" minOccurs="0"> - <xs:annotation> - <xs:documentation> - Maximum number of occurances of an array element. - </xs:documentation> - </xs:annotation> - <xs:simpleType> - <xs:restriction base="xs:long"> - <xs:minInclusive value="1" /> - </xs:restriction> - </xs:simpleType> - </xs:element> - <xs:element name="maxSkipLengthInBytes" default="1024" minOccurs="0"> - <xs:annotation> - <xs:documentation> - Maximum number of bytes allowed to skip in a skip region. - </xs:documentation> - </xs:annotation> - <xs:simpleType> - <xs:restriction base="xs:int"> - <xs:minInclusive value="1" /> - </xs:restriction> - </xs:simpleType> - </xs:element> - <xs:element name="maxValidYear" default="9999" minOccurs="0"> - <xs:annotation> - <xs:documentation> - Due to differences in the DFDL spec and ICU4J SimpleDateFormat, we must - have SimpleDateFormat parse in lenient mode, which allows the year value - to overflow with very large years into possibly negative years. This - tunable tunable sets an upper limit for values to prevent overflow. - </xs:documentation> - </xs:annotation> - <xs:simpleType> - <xs:restriction base="xs:int"> - <xs:minInclusive value="1" /> - </xs:restriction> - </xs:simpleType> - </xs:element> - <xs:element name="maximumRegexMatchLengthInCharacters" default="1048576" minOccurs="0"> - <xs:annotation> - <xs:documentation> - Maximum number of characters to match when performing regular expression - matches on input data. - </xs:documentation> - </xs:annotation> - <xs:simpleType> - <xs:restriction base="xs:int"> - <xs:minInclusive value="1" /> - </xs:restriction> - </xs:simpleType> - </xs:element> - <xs:element name="maximumSimpleElementSizeInCharacters" default="1048576" minOccurs="0"> - <xs:annotation> - <xs:documentation> - Maximum number of characters to parse when parsing string data. - </xs:documentation> - </xs:annotation> - <xs:simpleType> - <xs:restriction base="xs:int"> - <xs:minInclusive value="1" /> - </xs:restriction> - </xs:simpleType> - </xs:element> - <xs:element name="minBinaryDecimalVirtualPoint" default="-200" minOccurs="0"> - <xs:annotation> - <xs:documentation> - The smallest allowed value of the dfdl:binaryDecimalVirtualPoint property. - </xs:documentation> - </xs:annotation> - <xs:simpleType> - <xs:restriction base="xs:int"> - <xs:maxInclusive value="-1" /> - </xs:restriction> - </xs:simpleType> - </xs:element> - <xs:element name="minValidYear" type="xs:int" default="0" minOccurs="0"> - <xs:annotation> - <xs:documentation> - Due to differences in the DFDL spec and ICU4J SimpleDateFormat, we must - have SimpleDateFormat parse in lenient mode, which allows the year value - to overflow with very large years into possibly negative years. This - tunable tunable sets an upper limit for values to prevent underflow. - </xs:documentation> - </xs:annotation> - </xs:element> - <xs:element name="outputStreamChunkSizeInBytes" default="65536" minOccurs="0"> - <xs:annotation> - <xs:documentation> - When writing file data to the output stream during unparse, this - is the maximum number of bytes to write at a time. - </xs:documentation> - </xs:annotation> - <xs:simpleType> - <xs:restriction base="xs:int"> - <xs:minInclusive value="1" /> - </xs:restriction> - </xs:simpleType> - </xs:element> - <xs:element name="parseUnparsePolicy" type="daf:TunableParseUnparsePolicyTunable" default="fromRoot" minOccurs="0"> - <xs:annotation> - <xs:documentation> - Whether to compile a schema to support only parsing, only unparsing, both, or to - use the daf:parseUnparsePolicy from the root node. All child elements of the root - must have a compatable daf:parseUnaprsePolicy property. - </xs:documentation> - </xs:annotation> - </xs:element> - <xs:element name="readerByteBufferSize" type="xs:int" default="8192" minOccurs="0"> - <xs:annotation> - <xs:documentation> - Deprecated. This tunable no longer has any affect and is only kept for - backwards compatability. - </xs:documentation> - </xs:annotation> - </xs:element> - <xs:element name="releaseUnneededInfoset" type="xs:boolean" default="true" minOccurs="0"> - <xs:annotation> - <xs:documentation> - Daffodil will periodically release internal infoset elements that it determines - are no longer needed, thus freeing memory. Setting this value to false will - prevent this from taking place. This should usually only be used while debugging - or with very specific tests. - </xs:documentation> - </xs:annotation> - </xs:element> - <xs:element name="requireBitOrderProperty" type="xs:boolean" default="false" minOccurs="0"> - <xs:annotation> - <xs:documentation> - If true, require that the dfdl:bitOrder property is specified. If false, use a - default value if the property is not defined in the schema. - </xs:documentation> - </xs:annotation> - </xs:element> - <xs:element name="requireEmptyElementParsePolicyProperty" type="xs:boolean" default="false" minOccurs="0"> - <xs:annotation> - <xs:documentation> - If true, require that the dfdl:emptyElementParsePolicy property is specified in - the schema. If false, and not defined in the schema, uses the - defaultEmptyElementParsePolicy as the value of emptyElementParsePolicy. - </xs:documentation> - </xs:annotation> - </xs:element> - <xs:element name="requireEncodingErrorPolicyProperty" type="xs:boolean" default="false" minOccurs="0"> - <xs:annotation> - <xs:documentation> - If true, require that the dfdl:encodingErrorPolicy property is specified. If - false, use a default value if the property is not defined in the schema. - </xs:documentation> - </xs:annotation> - </xs:element> - <xs:element name="requireFloatingProperty" type="xs:boolean" default="false" minOccurs="0"> - <xs:annotation> - <xs:documentation> - If true, require that the dfdl:floating property is specified. If - false, use a default value if the property is not defined in the schema. - </xs:documentation> - </xs:annotation> - </xs:element> - <xs:element name="requireTextBidiProperty" type="xs:boolean" default="false" minOccurs="0"> - <xs:annotation> - <xs:documentation> - If true, require that the dfdl:testBidi property is specified. If - false, use a default value if the property is not defined in the schema. - </xs:documentation> - </xs:annotation> - </xs:element> - <xs:element name="requireTextStandardBaseProperty" type="xs:boolean" default="false" minOccurs="0"> - <xs:annotation> - <xs:documentation> - If true, require that the dfdl:textStandardBase property is specified. If false - and the property is missing, behave as if the property is set to 10. - </xs:documentation> - </xs:annotation> - </xs:element> - <xs:element name="saxUnparseEventBatchSize" default="100" minOccurs="0"> - <xs:annotation> - <xs:documentation> - Daffodil's SAX Unparse API allows events to be batched in memory to minimize the - frequency of context switching between the SAXInfosetInputter thread that processes - the events, and the DaffodilUnparseContentHandler thread that generates the events. - Setting this value to a low number will increase the frequency of context switching, - but will reduce the memory footprint. Swtting it to a high number will decrease the - frequency of context switching, but increase the memory footprint. - </xs:documentation> - </xs:annotation> - <xs:simpleType> - <xs:restriction base="xs:int"> - <xs:minInclusive value="1" /> - </xs:restriction> - </xs:simpleType> - </xs:element> - <xs:element name="suppressSchemaDefinitionWarnings" type="daf:TunableSuppressSchemaDefinitionWarnings" default="emptyElementParsePolicyError" minOccurs="0"> - <xs:annotation> - <xs:documentation> - Space-separated list of schema definition warnings that should be ignored, - or "all" to ignore all warnings. - </xs:documentation> - </xs:annotation> - </xs:element> - <xs:element name="tempFilePath" type="xs:string" default="This string is ignored. Default value is taken from java.io.tmpdir property" minOccurs="0"> - <xs:annotation> - <xs:documentation> - When unparsing, use this path to store temporary files that may be genrated. - The default value (empty string) will result in the use of the java.io.tmpdir - property being used as the path. - </xs:documentation> - </xs:annotation> - </xs:element> - <xs:element name="unqualifiedPathStepPolicy" type="daf:TunableUnqualifiedPathStepPolicy" default="noNamespace" minOccurs="0"> - <xs:annotation> - <xs:documentation> - Defines how to lookup DFDL expression path steps that to not include a - namespace prefix. Values are: - - noNamespace: only match elements that do not have a namespace - - defaultNamespace: only match elements defined in the default namespace - - preferDefaultNamespace: match elements defined in the default namespace; - if non are found, match elemnts that do not have a namespace - </xs:documentation> - </xs:annotation> - </xs:element> - <xs:element name="unparseSuspensionWaitOld" default="100" minOccurs="0"> - <xs:annotation> - <xs:documentation> - While unparsing, some unparse actions require "suspending" which - requires buffering unparse output until the suspension can be - evaluated. Daffodil periodically attempts to reevaluate these - suspensions so that these buffers can be released. We attempt to - evaluate young suspensions shortly after creation with the hope - that it will succeed and we can release associated buffers. But if - a young suspension fails it is moved to the old suspension list. - Old suspensions are evaluated less frequently since they are less - likely to succeeded. This minimizes the overhead related to - evaluating suspensions that are likely to fail. The - unparseSuspensionWaitYoung and unparseSuspensionWaitOld - values determine how many elements are unparsed before evaluating - young and old suspensions, respectively. - </xs:documentation> - </xs:annotation> - <xs:simpleType> - <xs:restriction base="xs:int"> - <xs:minInclusive value="1" /> - </xs:restriction> - </xs:simpleType> - </xs:element> - <xs:element name="unparseSuspensionWaitYoung" default="5" minOccurs="0"> - <xs:annotation> - <xs:documentation> - See unparseSuspensionWaitOld - </xs:documentation> - </xs:annotation> - <xs:simpleType> - <xs:restriction base="xs:int"> - <xs:minInclusive value="1" /> - </xs:restriction> - </xs:simpleType> - </xs:element> - </xs:all> - </xs:complexType> - </xs:element> + <xs:element name="tunables" type="tns:tunablesType"/> Review Comment: Yeah, I'm inclined to remove all changes to the config file stuff, switch to a simple properties file for the xml controls, and treat the existing config file stuff as legacy. However, it has become clear to me that some schemas just require a config file in order to be used, and the same will hold for these xml conversion control parameters. Without them the CLI parse/unparse, the TDML runner, or using them via API are going to always require a config along with certain XML conversion controls. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
