bsloane1650 commented on a change in pull request #245: Added
X-DFDL-5-BIT-DFI-1661-DUI-001 char encoding
URL: https://github.com/apache/incubator-daffodil/pull/245#discussion_r296903319
##########
File path:
daffodil-io/src/main/scala/org/apache/daffodil/processors/charset/X_DFDL_MIL_STD.scala
##########
@@ -43,7 +43,15 @@ object BitsCharset6BitDFI264DUI001 extends {
object BitsCharset6BitDFI311DUI002 extends {
override val name = "X-DFDL-6-BIT-DFI-311-DUI-002"
override val bitWidthOfACodeUnit = 6
- override val decodeString =
"""\u00A0ABCDEFGHIJKLMNOPQRSTuVWXYZ\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD
\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD0123456789\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD"""
+ override val decodeString =
"""\u00A0ABCDEFGHIJKLMNOPQRSTUVWXYZ\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD
\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD0123456789\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD"""
+ override val replacementCharCode = 0x0
+ override val requiredBitOrder = BitOrder.LeastSignificantBitFirst
+} with BitsCharsetNonByteSize
+
+object BitsCharset5BitDFI1661DUI001 extends {
+ override val name = "X-DFDL-5-BIT-DFI-1661-DUI-001"
+ override val bitWidthOfACodeUnit = 5
+ override val decodeString =
"""\u00A0ABCDEFGHIJKLMNOPQRSTUVWXYZ\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD"""
Review comment:
I am trying to write a test-case for this. What I came up with is:
```
<xs:element name="fiveBitDFI1661DUI001" type="xs:string"
dfdl:encoding="X-DFDL-5-BIT-DFI-1661-DUI-001"
dfdl:bitOrder="leastSignificantBitFirst" dfdl:byteOrder="littleEndian"/>
<tdml:parserTestCase name="fiveBitDFI1661DUI001"
root="fiveBitDFI1661DUI001" model="enc1"
description="X-DFDL-5-BIT-DFI-1661-DUI-001">
<tdml:document>
<tdml:documentPart type="bits" bitOrder="LSBFirst" byteOrder="RTL">
00000
00001
00010
00011
00100
00101
00110
00111
01000
01001
01010
01011
01100
01101
01110
01111
10000
10001
10010
10011
10100
10101
10110
10111
11000
11001
11010
11011
11100
11101
11110
11111
</tdml:documentPart>
</tdml:document>
<tdml:infoset>
<tdml:dfdlInfoset>
<!--
Note, the space below is actually \u00A0 no-break space
-->
<tns:fiveBitDFI1661DUI001><![CDATA[
ABCDEFGHIJKLMNOPQRSTUVWXYZ�����]]></tns:fiveBitDFI1661DUI001>
</tdml:dfdlInfoset>
</tdml:infoset>
</tdml:parserTestCase>
```
This fails, with the actual result of the parse being:
```
<ex:fiveBitDFI1661DUI001
xmlns:ex="http://example.com">�����ZYXWVUTSRQPONMLKJIHGFEDCBA </ex:fiveBitDFI1661DUI001>
```
Note that the string has the characters in reverse order.
Given my experience in the area, I assume the problem is with my
understanding of LSBF bit ordering.
As a secondary concern, Daffodil is outputing non-breaking space as   .
This is technically correct, but it is not clear if this is desireable or not.
Also, I would expect this test to fail to round-trip because of the number
of "undefined" characters being mapped to u+FFFD. Once I figure out parse only
for this one, I will add a second that round-trips without those characters.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services