Re: RFR: 8170769 Provide a simple hexdump facility for binary data

Roger Riggs Tue, 11 Dec 2018 08:47:09 -0800

Hi Stuart,

The APIs for streams of characters bifurcated a bit between PrintStreamand Writers.Many common use cases would like to direct the output to System.out/errwhich are

PrintStreams.  Hence, I lean toward PrintStream that can be used directly.


$.02, Roger



On 12/10/2018 09:11 PM, Stuart Marks wrote:

On 12/7/18 10:22 AM, Vincent Ryan wrote:
I'm not convinced that the overloads that send output to anOutputStream pull their weight. They basically wrap the OutputStreamin a PrintStream, which conveniently doesn't declare IOException,making it easy to use from a lambda passed to forEachOrdered(). Ifan error writing the output occurs, this is recorded by thePrintStream wrapper; however, the wrapper is then thrown away,making it impossible for the caller to check its error status.
The intent is to support a trivial convenience method call thatgenerates the well-known hexdump format.Especially for users that are interested in the hexdump data ratherthan the low-level details of how to terminate a stream.The dumpAsStream methods are available to support cases that differfrom that format.
Have you a suggestion to improve the dump() methods, or you’d like tosee them omitted?
The PrintStream wrapper also uses the platform default charset, anddoesn't provide any way for the caller to override the charset.
Is there a need for that? Originally the requirement was driven bythe hexdump format which is ASCII-only.Recently the class has been enhanced to also support the printablecharacters from ISO 8859-1.A custom formatter be supplied to dumpAsStream() to cater for allother cases?
OK, let's step back from this a bit. I see this hexdump as a littlesubsystem that has the following facets:
1) a source of bytes
2) a converter to hex
3) a destination
The converter is HexDump.Formatter, which converts and formats asubrange of byte[] to a String. Since the user can supply theFormatter function, the result String can contain any unicodecharacter. Thus, the destination needs to handle any unicodecharacter. It can be a Writer, which accepts String data. Or if youwant it to write bytes, it can be an OutputStream, which raises theissue of encoding (charset). I would recommend against relying on theplatform default charset, as this has been a source of subtle bugs.The preferred approach these days is to default to UTF-8 and providean overload that takes an explicit charset.
An alternative is PrintStream. (This overlaps somewhat with yourrecent exchange with Roger on this topic.) PrintStream also doescharset encoding, and the charset it uses depends on how it's created.I think the same approach should be applied as I described above withOutputStream, namely avoid the platform default charset; default toUTF-8; and provide an overload that takes an explicit charset.
I'm not sure which of these is the right thing. You should decidewhich is the most convenient for the use cases you expect to see.However, the solution needs to handle charset encoding. (And it shouldalso properly deal with I/O exceptions, per my previous message.)
**
ISO 8859-1 comes up in a different place. The toPrintableString()method (used by the default formatter) considers a byte "printable" ifit encodes a valid ISO 8859-1 character. The byte is properly decodedto a String, so this is ok. Note this is a distinct issue from theencoding of the OutputStream or PrintStream as described above.
(As an aside I think that the encoding of ISO 8859-1 matches thecorresponding code units of UTF-16, so you don't have to do the newString(..., ISO_8859_1) jazz. You can just cast the byte to a char andappend it to the StringBuilder.)
s'marks

Re: RFR: 8170769 Provide a simple hexdump facility for binary data

Reply via email to