Thanks for your review Stuart. My preference is for PrintStream rather than Writer, for the same reason as Roger: it’s more convenient for handling System.out. Does that address your concern?
I cannot simply cast 8859-1 characters into UTF-8 because UTF-8 is multi-byte charset so some 0x8X characters Will trigger the multi-byte sequence and will end up being misinterpreted. Hence my rather awkward conversion to a String. Is there a better way? I’m not sure I’ve addressed your concern regarding IOExceptions - can you elaborate? BTW updated webrev/javadoc available: http://cr.openjdk.java.net/~vinnie/8170769/webrev.08/ <http://cr.openjdk.java.net/~vinnie/8170769/webrev.08/> http://cr.openjdk.java.net/~vinnie/8170769/javadoc.08/api/java.base/java/util/HexFormat.html <http://cr.openjdk.java.net/~vinnie/8170769/javadoc.08/api/java.base/java/util/HexFormat.html> > On 11 Dec 2018, at 02:11, Stuart Marks <stuart.ma...@oracle.com> wrote: > > On 12/7/18 10:22 AM, Vincent Ryan wrote: >>> I'm not convinced that the overloads that send output to an OutputStream >>> pull their weight. They basically wrap the OutputStream in a PrintStream, >>> which conveniently doesn't declare IOException, making it easy to use from >>> a lambda passed to forEachOrdered(). If an error writing the output occurs, >>> this is recorded by the PrintStream wrapper; however, the wrapper is then >>> thrown away, making it impossible for the caller to check its error status. >> The intent is to support a trivial convenience method call that generates >> the well-known hexdump format. >> Especially for users that are interested in the hexdump data rather than the >> low-level details of how to terminate a stream. >> The dumpAsStream methods are available to support cases that differ from >> that format. >> Have you a suggestion to improve the dump() methods, or you’d like to see >> them omitted? >>> The PrintStream wrapper also uses the platform default charset, and doesn't >>> provide any way for the caller to override the charset. >> Is there a need for that? Originally the requirement was driven by the >> hexdump format which is ASCII-only. >> Recently the class has been enhanced to also support the printable >> characters from ISO 8859-1. >> A custom formatter be supplied to dumpAsStream() to cater for all other >> cases? > > OK, let's step back from this a bit. I see this hexdump as a little subsystem > that has the following facets: > > 1) a source of bytes > 2) a converter to hex > 3) a destination > > The converter is HexDump.Formatter, which converts and formats a subrange of > byte[] to a String. Since the user can supply the Formatter function, the > result String can contain any unicode character. Thus, the destination needs > to handle any unicode character. It can be a Writer, which accepts String > data. Or if you want it to write bytes, it can be an OutputStream, which > raises the issue of encoding (charset). I would recommend against relying on > the platform default charset, as this has been a source of subtle bugs. The > preferred approach these days is to default to UTF-8 and provide an overload > that takes an explicit charset. > > An alternative is PrintStream. (This overlaps somewhat with your recent > exchange with Roger on this topic.) PrintStream also does charset encoding, > and the charset it uses depends on how it's created. I think the same > approach should be applied as I described above with OutputStream, namely > avoid the platform default charset; default to UTF-8; and provide an overload > that takes an explicit charset. > > I'm not sure which of these is the right thing. You should decide which is > the most convenient for the use cases you expect to see. However, the > solution needs to handle charset encoding. (And it should also properly deal > with I/O exceptions, per my previous message.) > > ** > > ISO 8859-1 comes up in a different place. The toPrintableString() method > (used by the default formatter) considers a byte "printable" if it encodes a > valid ISO 8859-1 character. The byte is properly decoded to a String, so this > is ok. Note this is a distinct issue from the encoding of the OutputStream or > PrintStream as described above. > > (As an aside I think that the encoding of ISO 8859-1 matches the > corresponding code units of UTF-16, so you don't have to do the new > String(..., ISO_8859_1) jazz. You can just cast the byte to a char and append > it to the StringBuilder.) > > s'marks