mbeckerle commented on code in PR #799:
URL: https://github.com/apache/daffodil/pull/799#discussion_r888244033
##########
daffodil-tdml-lib/src/test/scala/org/apache/daffodil/tdml/UnitTestTDMLRunner.scala:
##########
@@ -530,4 +490,126 @@ class UnitTestTDMLRunner {
assertTrue(dataElem ne null)
runner.reset
}
+
+ @Test def testCommentBit(): Unit = {
+ val xml = <document bitOrder="LSBFirst"><documentPart type="bits">00000010
//this is a label111</documentPart></document>
Review Comment:
Oh wow, these unit tests are going to get tricky becase "//" begins a scala
line comment, and you are embedding them within XML embedded inside scala.
It seems to be ok, in that your tests seem to be doing ok here, i.e., inside
of XML syntax context within Scala it doesn't seem to be interpreting //
incorrectly.
If that is at all flakey (e.g., in an IDE it gets goofed up sometimes
because of the IDE's scala + xml support) then you might want to do these tests
from strings. I.e., instead of using XML directly embedded in scala, use
strings like:
```
import scala.xml.XML
val xml = XML.loadString("""<document bitOrder="LSBFirst" ..... // this is a
label111</documentPart></document>""")
```
Note scala's triplequoted string can contain single quotes without escaping
them.
Also a triple quoted string can contain '//' without it being interpreted as
starting a comment line.
##########
daffodil-tdml-lib/src/main/scala/org/apache/daffodil/tdml/TDMLRunner.scala:
##########
@@ -2362,32 +2359,32 @@ sealed abstract class DocumentPart(part: Node, parent:
Document) {
}.trim.toUpperCase()
/*
- * Description: Compare ByteDocumentPart/BitsDocumentPart string to a string
of valid characters.
- * If the string runs into non-valid characters, skip over them.
- * If the string runs into "//", any character following it will
be skipped (including valid ones).
- * Characters on a new line will be considered again, unless
they are found after a "//"
- * Parameters: String containing the acceptable characters (0,1 for bits)
(0-9,A-F,a-f for bytes)
- * Return: ListBuffer[Char] of the accepted characters from the
ByteDocumentPart/BitsDocumentPart element
+ * Allow "//" and "/* */" (inline) to act as comments.
+ * Any characters not explicitly allowed are also considered comments and are
removed.
*/
- def commentCompatibleFormat(validCharacters: String) : ListBuffer[Char] = {
- val tempListBuffer = new ListBuffer[Char]()
- var noCommentBool = true
-
- for (i <- 0 to partRawContent.length()-1){
- //Check if valid
- if(noCommentBool && validCharacters.contains(partRawContent(i))){
- tempListBuffer += partRawContent(i)
- }
- else if(partRawContent(i) == '/' && i < partRawContent.length()-1){
- if(partRawContent(i+1) == '/'){
- noCommentBool = false
- }
- }
- else if(partRawContent(i) == '\n'){
- noCommentBool = true
- }
+ def removeComments(validCharactersSet : String, userData : String ) : String
= {
+ lazy val doubleForwardPattern = "//.*".r
Review Comment:
Use lazy val for a member of a class.
Use val inside a method.
These first two lazy val can become private lazy val class members.
The next two lazy val should be just be val.
##########
daffodil-tdml-lib/src/main/scala/org/apache/daffodil/tdml/TDMLRunner.scala:
##########
@@ -2362,32 +2359,32 @@ sealed abstract class DocumentPart(part: Node, parent:
Document) {
}.trim.toUpperCase()
/*
- * Description: Compare ByteDocumentPart/BitsDocumentPart string to a string
of valid characters.
- * If the string runs into non-valid characters, skip over them.
- * If the string runs into "//", any character following it will
be skipped (including valid ones).
- * Characters on a new line will be considered again, unless
they are found after a "//"
- * Parameters: String containing the acceptable characters (0,1 for bits)
(0-9,A-F,a-f for bytes)
- * Return: ListBuffer[Char] of the accepted characters from the
ByteDocumentPart/BitsDocumentPart element
+ * Allow "//" and "/* */" (inline) to act as comments.
+ * Any characters not explicitly allowed are also considered comments and are
removed.
*/
- def commentCompatibleFormat(validCharacters: String) : ListBuffer[Char] = {
- val tempListBuffer = new ListBuffer[Char]()
- var noCommentBool = true
-
- for (i <- 0 to partRawContent.length()-1){
- //Check if valid
- if(noCommentBool && validCharacters.contains(partRawContent(i))){
- tempListBuffer += partRawContent(i)
- }
- else if(partRawContent(i) == '/' && i < partRawContent.length()-1){
- if(partRawContent(i+1) == '/'){
- noCommentBool = false
- }
- }
- else if(partRawContent(i) == '\n'){
- noCommentBool = true
- }
+ def removeComments(validCharactersSet : String, userData : String ) : String
= {
+ lazy val doubleForwardPattern = "//.*".r
+ lazy val openClosePattern = "/[*].*[*]/".r
+
+ //Remove the comments (//) and (/* */)
+ lazy val noCommentsForward = doubleForwardPattern.replaceAllIn(userData,"")
+ lazy val noCommentsBothFormats =
openClosePattern.replaceAllIn(noCommentsForward,"")
+
+ //Throw exception if /* or */ still found. This means user input was not
formatted correctly.
+ if(noCommentsBothFormats.contains("/*") ||
noCommentsBothFormats.contains("*/")){
+ throw TDMLException("Improper formatting of /* */ style comment", None)
}
- tempListBuffer
+
+ //Remove the new lines - concatenate after
+ lazy val noCommentsString = noCommentsBothFormats.split('\n').mkString
+
+ //Iterate over and check value of the characters, if invalid character
found create log and skip over it
Review Comment:
I would like this check to not issue a warning if the invalid character is a
benign one like any of '|()[]'.
They want to be removed, but no warning issued. This will avoid a bunch of
daffodil tests issuing warnings where I would NOT want to remove the characters
because they are useful in understanding the bits.
Basically, when dealing with bit fields that cross byte boundaries, it is
often useful to be able to do this:
`(010010)(10 1010)(1110 10)(110111)`
That's 3 bytes containing 4 6-bit-wide characters. We do this in a bunch of
tests. I'd rather not be advised by warnings not to do this.
But use of letters other than A-F, a-f, or in the case of binary using
digits other than 0 or 1, and use of odd punctuation characters ... clearly
those things should draw a warning, and hopefully get eliminated over time with
real comments replacing them.
I suggest just add a
```
val noWarnChars = "|()[]"
var doWarning: Boolean = false
val validData = noCommentsString.flatMap{ ch =>
if (validCharacterSet.contains(ch)) ch
else {
if (!noWarnCharSet.contains(ch)) doWarning = true
""
}
}
```
Then only issue the warning if doWarning is true.
##########
daffodil-tdml-lib/src/test/scala/org/apache/daffodil/tdml/UnitTestTDMLRunner.scala:
##########
@@ -530,4 +490,126 @@ class UnitTestTDMLRunner {
assertTrue(dataElem ne null)
runner.reset
}
+
+ @Test def testCommentBit(): Unit = {
+ val xml = <document bitOrder="LSBFirst"><documentPart type="bits">00000010
//this is a label111</documentPart></document>
+
+ val doc = new Document(xml, null)
+ val dp = doc.documentParts.collect { case x: BitsDocumentPart => x }
+ val firstPart = dp(0)
+ assertEquals("00000010", firstPart.bitDigits)
+ }
+
+ @Test def testCommentBitWithNewLine(): Unit = {
+ val xml = <document bitOrder="LSBFirst">
+ <documentPart type="bits">01 01 11 //flagByte1
+ 1 //bool2</documentPart>
+ </document>
+
+ val doc = new Document(xml, null)
+ val dp = doc.documentParts.collect { case x: BitsDocumentPart => x }
+ val firstPart = dp(0)
+ assertEquals("0101111", firstPart.bitDigits)
+ }
+
+ @Test def testCommentBitJustComments(): Unit = {
+ val xml = <document bitOrder="LSBFirst">
+ <documentPart type="bits">
+ // this doc part contains no bits
+ // at all. It is just comments.
+ // 101010101
+ </documentPart>
+ </document>
+ val doc = new Document(xml, null)
+ val dp = doc.documentParts.collect { case x: BitsDocumentPart => x }
+ val firstPart = dp(0)
+ assertEquals("", firstPart.bitDigits)
+ }
+
+ @Test def testCommentBitNoLineEnding(): Unit = {
+ val xml = <document bitOrder="LSBFirst"><documentPart type="bits">01011010
// just a comment here no line ending </documentPart>
+ </document>
+ val doc = new Document(xml, null)
+ val dp = doc.documentParts.collect { case x: BitsDocumentPart => x }
+ val firstPart = dp(0)
+ assertEquals("01011010", firstPart.bitDigits)
+ }
+
+ @Test def testCommentBitBothCommentFormatsNewLine(): Unit = {
+ val xml = <document bitOrder="LSBFirst">
+ <documentPart type="bits">0100110110 /*C0mment 01011111 _01*/11
+100111//D1fferent sty1e c0mment</documentPart>
+ </document>
+ val doc = new Document(xml, null)
+ val dp = doc.documentParts.collect { case x: BitsDocumentPart => x }
+ val firstPart = dp(0)
+ assertEquals("010011011011100111", firstPart.bitDigits)
+ }
+
+ @Test def testCommentBitBothCommentFormats(): Unit = {
+ val xml = <document bitOrder="LSBFirst">
+ <documentPart type="bits">0100110110 /*C0mment 01011111
_01*/100111//D1fferent sty1e c0mment</documentPart>
+ </document>
+ val doc = new Document(xml, null)
+ val dp = doc.documentParts.collect { case x: BitsDocumentPart => x }
+ val firstPart = dp(0)
+ assertEquals("0100110110100111", firstPart.bitDigits)
+ }
+
+ @Test def testBadCommentFormatException(): Unit = {
+ val xml = <document bitOrder="LSBFirst">
+ <documentPart type="bits">0100110110 C0mment 01011111
_01100*/111//D1fferent sty1e c0mment</documentPart>
+ </document>
+ val exc = intercept[TDMLException] {
+ val doc = new Document(xml, null)
+ doc.documentParts.collect { case x: BitsDocumentPart => x }
+ val dp = doc.documentParts.collect { case x: BitsDocumentPart => x }
+ val firstPart = dp(0).bitDigits
+ }
+ assertTrue(exc.getMessage().contains("Improper formatting of /* */ style
comment"))
+ }
+
+ @Test def testCommentByte(): Unit = {
+ val xml = <document><documentPart type="byte">12 3A BC.abc //Label
(ABCDEF123456789</documentPart></document>
+ val doc = new Document(xml, null)
+ val dp = doc.documentParts.collect { case x: ByteDocumentPart => x }
+ val hexDigits = dp(0).hexDigits
+ assertEquals("123ABCabc", hexDigits)
+ }
+
+ @Test def testCommentByteWithNewLine(): Unit = {
+ val xml = <document><documentPart type="byte">123ABCabc //Label
(ABCDEF123456789
+ 456DEFdef //New Label</documentPart></document>
+ val doc = new Document(xml, null)
+ val dp = doc.documentParts.collect { case x: ByteDocumentPart => x }
+ val hexDigits = dp(0).hexDigits
+ assertEquals("123ABCabc456DEFdef", hexDigits)
+ }
+
+ @Test def testCommentByteBothCommentFormatsNewLine(): Unit = {
+ val xml = <document><documentPart type="byte">12AB3C /*Comment ABC123 ** */
+ 45D6d//Different style comment</documentPart></document>
+ val doc = new Document(xml, null)
+ val dp = doc.documentParts.collect { case x: ByteDocumentPart => x }
+ val hexDigits = dp(0).hexDigits
+ assertEquals("12AB3C45D6d", hexDigits)
+ }
+
+ @Test def testCommentByteBothCommentFormats(): Unit = {
+ val xml = <document><documentPart type="byte">12AB3C /*Comment ABC123 **
*/45D6d//Different style comment</documentPart></document>
+ val doc = new Document(xml, null)
+ val dp = doc.documentParts.collect { case x: ByteDocumentPart => x }
+ val hexDigits = dp(0).hexDigits
+ assertEquals("12AB3C45D6d", hexDigits)
+ }
+
+ @Test def testByteBadCommentFormatException(): Unit = {
+ val xml = <document><documentPart type="byte">12AB3C Comment ABC123 **
*/45D6d//Different style comment</documentPart></document>
+ val exc = intercept[TDMLException] {
+ val doc = new Document(xml, null)
+ val dp = doc.documentParts.collect { case x: ByteDocumentPart => x }
+ val hexDigits = dp(0).hexDigits
+ }
+ assertTrue(exc.getMessage().contains("Improper formatting of /* */ style
comment"))
+ }
Review Comment:
Include the "noWarning" characters like '|' in some test, just so you don't
get code-coverage missing on that code path.
##########
daffodil-tdml-lib/src/main/scala/org/apache/daffodil/tdml/TDMLRunner.scala:
##########
@@ -2215,16 +2214,14 @@ class ByteDocumentPart(part: Node, parent: Document)
extends DataDocumentPart(pa
bits
}
- lazy val tempListBuffer = commentCompatibleFormat(validHexDigits)
- lazy val hexDigits = tempListBuffer.mkString
+ lazy val hexDigits = removeComments(validHexDigits, partRawContent)
}
class BitsDocumentPart(part: Node, parent: Document) extends
DataDocumentPart(part, parent) {
Review Comment:
I think this can be 'final class'. No other class derives from this, nor
should it ever.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]