Hi Stuart,
The APIs for streams of characters bifurcated a bit between PrintStream
and Writers.
Many common use cases would like to direct the output to System.out/err
which are
PrintStreams. Hence, I lean toward PrintStream that can be used directly.
$.02, Roger
On 12/10/2018 09:11 PM, Stuart Marks wrote:
On 12/7/18 10:22 AM, Vincent Ryan wrote:
I'm not convinced that the overloads that send output to an
OutputStream pull their weight. They basically wrap the OutputStream
in a PrintStream, which conveniently doesn't declare IOException,
making it easy to use from a lambda passed to forEachOrdered(). If
an error writing the output occurs, this is recorded by the
PrintStream wrapper; however, the wrapper is then thrown away,
making it impossible for the caller to check its error status.
The intent is to support a trivial convenience method call that
generates the well-known hexdump format.
Especially for users that are interested in the hexdump data rather
than the low-level details of how to terminate a stream.
The dumpAsStream methods are available to support cases that differ
from that format.
Have you a suggestion to improve the dump() methods, or you’d like to
see them omitted?
The PrintStream wrapper also uses the platform default charset, and
doesn't provide any way for the caller to override the charset.
Is there a need for that? Originally the requirement was driven by
the hexdump format which is ASCII-only.
Recently the class has been enhanced to also support the printable
characters from ISO 8859-1.
A custom formatter be supplied to dumpAsStream() to cater for all
other cases?
OK, let's step back from this a bit. I see this hexdump as a little
subsystem that has the following facets:
1) a source of bytes
2) a converter to hex
3) a destination
The converter is HexDump.Formatter, which converts and formats a
subrange of byte[] to a String. Since the user can supply the
Formatter function, the result String can contain any unicode
character. Thus, the destination needs to handle any unicode
character. It can be a Writer, which accepts String data. Or if you
want it to write bytes, it can be an OutputStream, which raises the
issue of encoding (charset). I would recommend against relying on the
platform default charset, as this has been a source of subtle bugs.
The preferred approach these days is to default to UTF-8 and provide
an overload that takes an explicit charset.
An alternative is PrintStream. (This overlaps somewhat with your
recent exchange with Roger on this topic.) PrintStream also does
charset encoding, and the charset it uses depends on how it's created.
I think the same approach should be applied as I described above with
OutputStream, namely avoid the platform default charset; default to
UTF-8; and provide an overload that takes an explicit charset.
I'm not sure which of these is the right thing. You should decide
which is the most convenient for the use cases you expect to see.
However, the solution needs to handle charset encoding. (And it should
also properly deal with I/O exceptions, per my previous message.)
**
ISO 8859-1 comes up in a different place. The toPrintableString()
method (used by the default formatter) considers a byte "printable" if
it encodes a valid ISO 8859-1 character. The byte is properly decoded
to a String, so this is ok. Note this is a distinct issue from the
encoding of the OutputStream or PrintStream as described above.
(As an aside I think that the encoding of ISO 8859-1 matches the
corresponding code units of UTF-16, so you don't have to do the new
String(..., ISO_8859_1) jazz. You can just cast the byte to a char and
append it to the StringBuilder.)
s'marks