mbeckerle commented on code in PR #799:
URL: https://github.com/apache/daffodil/pull/799#discussion_r885969943
##########
daffodil-tdml-lib/src/main/scala/org/apache/daffodil/tdml/TDMLRunner.scala:
##########
@@ -2364,6 +2361,35 @@ sealed abstract class DocumentPart(part: Node, parent:
Document) {
}
}.trim.toUpperCase()
+ /*
+ * Description: Compare ByteDocumentPart/BitsDocumentPart string to a string
of valid characters.
+ * If the string runs into non-valid characters, skip over them.
+ * If the string runs into "//", any character following it will
be skipped (including valid ones).
+ * Characters on a new line will be considered again, unless
they are found after a "//"
+ * Parameters: String containing the acceptable characters (0,1 for bits)
(0-9,A-F,a-f for bytes)
+ * Return: ListBuffer[Char] of the accepted characters from the
ByteDocumentPart/BitsDocumentPart element
+ */
+ def commentCompatibleFormat(validCharacters: String) : ListBuffer[Char] = {
Review Comment:
I would have named this removeComments and have it return String.
Returning ListBuffer[Char] is an artifact of your algorithm.
##########
daffodil-tdml-lib/src/main/scala/org/apache/daffodil/tdml/TDMLRunner.scala:
##########
@@ -2364,6 +2361,35 @@ sealed abstract class DocumentPart(part: Node, parent:
Document) {
}
}.trim.toUpperCase()
+ /*
+ * Description: Compare ByteDocumentPart/BitsDocumentPart string to a string
of valid characters.
+ * If the string runs into non-valid characters, skip over them.
+ * If the string runs into "//", any character following it will
be skipped (including valid ones).
+ * Characters on a new line will be considered again, unless
they are found after a "//"
+ * Parameters: String containing the acceptable characters (0,1 for bits)
(0-9,A-F,a-f for bytes)
Review Comment:
The parameter arg contains not just acceptable chars, but also can be
comment chars or comment syntax introduced by //.
##########
daffodil-tdml-lib/src/main/scala/org/apache/daffodil/tdml/TDMLRunner.scala:
##########
@@ -2364,6 +2361,35 @@ sealed abstract class DocumentPart(part: Node, parent:
Document) {
}
}.trim.toUpperCase()
+ /*
+ * Description: Compare ByteDocumentPart/BitsDocumentPart string to a string
of valid characters.
+ * If the string runs into non-valid characters, skip over them.
+ * If the string runs into "//", any character following it will
be skipped (including valid ones).
+ * Characters on a new line will be considered again, unless
they are found after a "//"
+ * Parameters: String containing the acceptable characters (0,1 for bits)
(0-9,A-F,a-f for bytes)
+ * Return: ListBuffer[Char] of the accepted characters from the
ByteDocumentPart/BitsDocumentPart element
+ */
+ def commentCompatibleFormat(validCharacters: String) : ListBuffer[Char] = {
+ val tempListBuffer = new ListBuffer[Char]()
+ var noCommentBool = true
+
+ for (i <- 0 to partRawContent.length()-1){
Review Comment:
In general, iterating things with an index is frowned upon by many
experienced programmers now, as it's too hard to tell if you've done the
iteration just right or not. For someting that needs to be super-fast maybe.
But for this usage clarity is way more important than speed.
A way to do this that is much easier to review is more functional.
E.g.,
first split the string at line-endings to get a list of strings, one per
line.
Then split each line at "//" discarding all but the first part.
Then replace any char matching regular expression "[^0-9a-fA-F]" with "".
(for bytes case)
Then concatenate and mkstring from all the remaining pieces.
Each of those operations should be 1 line of code.
##########
daffodil-tdml-lib/src/main/scala/org/apache/daffodil/tdml/TDMLRunner.scala:
##########
@@ -2364,6 +2361,35 @@ sealed abstract class DocumentPart(part: Node, parent:
Document) {
}
}.trim.toUpperCase()
+ /*
+ * Description: Compare ByteDocumentPart/BitsDocumentPart string to a string
of valid characters.
Review Comment:
Instead of describing your algorithm here, imagine the user doesn't care
about the algorithm they just need to know how to call it properly. I don't
think you need to say more than "removes comments from '//' to end of line or
end of document part element. Any characters other than those explicitly
allowed are also considered comments and are removed."
##########
daffodil-tdml-lib/src/main/scala/org/apache/daffodil/tdml/TDMLRunner.scala:
##########
@@ -2214,20 +2215,16 @@ class ByteDocumentPart(part: Node, parent: Document)
extends DataDocumentPart(pa
bits
}
- // Note: anything that is not a valid hex digit (or binary digit for binary)
is simply skipped
- // TODO: we should check for whitespace and other characters we want to
allow, and verify them.
- // TODO: Or better, validate this in the XML Schema for tdml via a pattern
facet
- // TODO: Consider whether to support a comment syntax. When showing data
examples this may be useful.
- //
- lazy val hexDigits = partRawContent.flatMap { ch => if
(validHexDigits.contains(ch)) List(ch) else Nil }
+ lazy val tempListBuffer = commentCompatibleFormat(validHexDigits)
+ lazy val hexDigits = tempListBuffer.mkString
}
class BitsDocumentPart(part: Node, parent: Document) extends
DataDocumentPart(part, parent) {
- lazy val bitDigits = {
- val res = partRawContent.split("[^01]").mkString
- res
- }
+ val validBits = "01"
+
+ lazy val tempListBuffer = commentCompatibleFormat(validBits)
Review Comment:
naming. How about dataWithoutComments?
##########
daffodil-tdml-lib/src/test/scala/org/apache/daffodil/tdml/UnitTestTDMLRunner.scala:
##########
@@ -345,6 +345,46 @@ class UnitTestTDMLRunner {
assertEquals(expected, actual)
}
+ @Test def testCommentBit(): Unit = {
Review Comment:
Corner case to add a test for:
```
<documentPart type="bits">
// this doc part contains no bits
// at all. It is just comments.
// 101010101
</documentPart>
```
Also, I think this should not be an error:
```
<documentPart type="bits">01011010 // just a comment here no line ending
</documentPart>
```
We should put a test in for this.
The reason I like this is that it lets short things stay on a single line.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]