Re: RFR: 8170769 Provide a simple hexdump facility for binary data

Vincent Ryan Tue, 11 Dec 2018 13:22:45 -0800

Thanks for your review Stuart.

My preference is for PrintStream rather than Writer, for the same reason as 
Roger: it’s more convenient
for handling System.out. Does that address your concern?


I cannot simply cast 8859-1 characters into UTF-8 because UTF-8 is multi-byte 
charset so some 0x8X characters
Will trigger the multi-byte sequence and will end up being misinterpreted. 
Hence my rather awkward conversion to a String.
Is there a better way?

I’m not sure I’ve addressed your concern regarding IOExceptions - can you 
elaborate?


BTW updated webrev/javadoc available:
http://cr.openjdk.java.net/~vinnie/8170769/webrev.08/ 
<http://cr.openjdk.java.net/~vinnie/8170769/webrev.08/>
http://cr.openjdk.java.net/~vinnie/8170769/javadoc.08/api/java.base/java/util/HexFormat.html
 
<http://cr.openjdk.java.net/~vinnie/8170769/javadoc.08/api/java.base/java/util/HexFormat.html>



> On 11 Dec 2018, at 02:11, Stuart Marks <stuart.ma...@oracle.com> wrote:
> 
> On 12/7/18 10:22 AM, Vincent Ryan wrote:
>>> I'm not convinced that the overloads that send output to an OutputStream 
>>> pull their weight. They basically wrap the OutputStream in a PrintStream, 
>>> which conveniently doesn't declare IOException, making it easy to use from 
>>> a lambda passed to forEachOrdered(). If an error writing the output occurs, 
>>> this is recorded by the PrintStream wrapper; however, the wrapper is then 
>>> thrown away, making it impossible for the caller to check its error status.
>> The intent is to support a trivial convenience method call that generates 
>> the well-known hexdump format.
>> Especially for users that are interested in the hexdump data rather than the 
>> low-level details of how to terminate a stream.
>> The dumpAsStream methods are available to support cases that differ from 
>> that format.
>> Have you a suggestion to improve the dump() methods, or you’d like to see 
>> them omitted?
>>> The PrintStream wrapper also uses the platform default charset, and doesn't 
>>> provide any way for the caller to override the charset.
>> Is there a need for that? Originally the requirement was driven by the 
>> hexdump format which is ASCII-only.
>> Recently the class has been enhanced to also support the printable 
>> characters from ISO 8859-1.
>> A custom formatter be supplied to dumpAsStream() to cater for all other 
>> cases?
> 
> OK, let's step back from this a bit. I see this hexdump as a little subsystem 
> that has the following facets:
> 
> 1) a source of bytes
> 2) a converter to hex
> 3) a destination
> 
> The converter is HexDump.Formatter, which converts and formats a subrange of 
> byte[] to a String. Since the user can supply the Formatter function, the 
> result String can contain any unicode character. Thus, the destination needs 
> to handle any unicode character. It can be a Writer, which accepts String 
> data. Or if you want it to write bytes, it can be an OutputStream, which 
> raises the issue of encoding (charset). I would recommend against relying on 
> the platform default charset, as this has been a source of subtle bugs. The 
> preferred approach these days is to default to UTF-8 and provide an overload 
> that takes an explicit charset.
> 
> An alternative is PrintStream. (This overlaps somewhat with your recent 
> exchange with Roger on this topic.) PrintStream also does charset encoding, 
> and the charset it uses depends on how it's created. I think the same 
> approach should be applied as I described above with OutputStream, namely 
> avoid the platform default charset; default to UTF-8; and provide an overload 
> that takes an explicit charset.
> 
> I'm not sure which of these is the right thing. You should decide which is 
> the most convenient for the use cases you expect to see. However, the 
> solution needs to handle charset encoding. (And it should also properly deal 
> with I/O exceptions, per my previous message.)
> 
> **
> 
> ISO 8859-1 comes up in a different place. The toPrintableString() method 
> (used by the default formatter) considers a byte "printable" if it encodes a 
> valid ISO 8859-1 character. The byte is properly decoded to a String, so this 
> is ok. Note this is a distinct issue from the encoding of the OutputStream or 
> PrintStream as described above.
> 
> (As an aside I think that the encoding of ISO 8859-1 matches the 
> corresponding code units of UTF-16, so you don't have to do the new 
> String(..., ISO_8859_1) jazz. You can just cast the byte to a char and append 
> it to the StringBuilder.)
> 
> s'marks

Re: RFR: 8170769 Provide a simple hexdump facility for binary data

Reply via email to