olabusayoT commented on a change in pull request #254: Adds hex/utf-8 data dump
on left over data
URL: https://github.com/apache/incubator-daffodil/pull/254#discussion_r298567599
##########
File path: daffodil-io/src/main/scala/org/apache/daffodil/io/Dump.scala
##########
@@ -598,13 +599,13 @@ class DataDumper {
val endByteAddress0b = math.max(startByteAddress0b + lengthInBytes - 1, 0)
// val cs = optEncodingName.map { Charset.forName(_) }
- val decoder = getReplacingDecoder(optEncodingName)
+ val decoder = getReportingDecoder(optEncodingName)
var i = startByteAddress0b
val sb = new StringBuilder
while (i <= endByteAddress0b) {
- val (cR, _, _) = convertToCharRepr(i - startByteAddress0b,
endByteAddress0b, byteSource, decoder)
- sb += cR(0)
- i += 1
+ val (cR, nBytesConsumed, _) = convertToCharRepr(i - startByteAddress0b,
endByteAddress0b, byteSource, decoder)
Review comment:
It looks like we do have code that checks for non byte aligned encodings,
and we default to per byte decoding based on windows-1252 in those cases.
Depending on where this is called from, we'll either get text and hex (e.g
trace output) or text only (e.g left over data dump).
It doesn't look like we have tests for decoding non-8 bit charset encodings.
The update to this code was intended to add multibyte sequence support to
the textOnly dump as the original code didn't have support for that.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services