mbeckerle commented on a change in pull request #254: Adds hex/utf-8 data dump
on left over data
URL: https://github.com/apache/incubator-daffodil/pull/254#discussion_r298380256
##########
File path: daffodil-io/src/main/scala/org/apache/daffodil/io/Dump.scala
##########
@@ -598,13 +599,13 @@ class DataDumper {
val endByteAddress0b = math.max(startByteAddress0b + lengthInBytes - 1, 0)
// val cs = optEncodingName.map { Charset.forName(_) }
- val decoder = getReplacingDecoder(optEncodingName)
+ val decoder = getReportingDecoder(optEncodingName)
var i = startByteAddress0b
val sb = new StringBuilder
while (i <= endByteAddress0b) {
- val (cR, _, _) = convertToCharRepr(i - startByteAddress0b,
endByteAddress0b, byteSource, decoder)
- sb += cR(0)
- i += 1
+ val (cR, nBytesConsumed, _) = convertToCharRepr(i - startByteAddress0b,
endByteAddress0b, byteSource, decoder)
Review comment:
I wonder what will happen in this code if we have a text large enough to do
debug dumps of, but it is in one of our non-8-bit charset encodings?
Do we have tests of this, or alternatively, does this code exclude that
possibility somewhere? (I imagine the latter is more likely. This kind of dump
makes little sense to have text and hex if the characters are not using up
units of whole bytes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services