Re: RFR: 8170769 Provide a simple hexdump facility for binary data

Vincent Ryan Wed, 12 Dec 2018 04:35:46 -0800

Thanks for your proposal.
Comments below.

> On 12 Dec 2018, at 02:35, Stuart Marks <stuart.ma...@oracle.com> wrote:
> 
> 
> 
> On 12/11/18 1:21 PM, Vincent Ryan wrote:
>> My preference is for PrintStream rather than Writer, for the same reason as 
>> Roger: it’s more convenient
>> for handling System.out. Does that address your concern?
> 
> PrintStream is fine with me.
> 
>> I cannot simply cast 8859-1 characters into UTF-8 because UTF-8 is 
>> multi-byte charset so some 0x8X characters
>> Will trigger the multi-byte sequence and will end up being misinterpreted. 
>> Hence my rather awkward conversion to a String.
>> Is there a better way?
> 
> In toPrintableString(),
> 
> 259         StringBuilder printable = new StringBuilder(toIndex - fromIndex);
> 260         for (int i = fromIndex; i < toIndex; i++) {
> 261             if (bytes[i] > 0x1F && bytes[i] < 0x7F) {
> 262                 printable.append((char) bytes[i]);
> 263             } else if (bytes[i] > (byte)0x9F && bytes[i] <= (byte)0xFF) {
> 264                 printable.append(new String(new byte[]{bytes[i]}, 
> ISO_8859_1));
> 265
> 266             } else {
> 267                 printable.append('.');
> 268             }
> 269         }
> 
> It works to cast ASCII bytes char, because the 7-bit ASCII range overlaps the 
> low 7 bits of the UTF-16 char range. The bytes values of ISO 8859-1 overlap 
> the low 8 bits of UTF-16, so casts work for them too.
> 
> For any other charset, you'd need to do codeset conversion. But you're 
> cleverly supporting only ISO 8859-1, so you don't have to do any conversion. 
> :-)
> 
>> I’m not sure I’ve addressed your concern regarding IOExceptions - can you 
>> elaborate?
> 
> Taking out the OutputStream overloads addressed my concerns. In at least one 
> case the code would wrap the OutputStream into a PrintStream, print stuff to 
> it, and then throw away the PrintStream. If an output error occurred, any 
> error state in the PrintStream would also be thrown away. The creation of the 
> PrintStream wrapper would also use the system's default charset instead of 
> letting the caller control it.
> 
> The dump() overloads now all take PrintStream, so it's the caller's 
> responsibility to ensure that the PrintStream is using the right charset and 
> to check for errors after. So this is all OK now.
> 
> Note that the internal getPrintStream(), to wrap an OutputStream in a 
> PrintStream, is now obsolete and can be removed.
> 
> (Oh, I see Roger has said much the same things. Oh well, the peril of 
> parallel reviews.)
> 
> **
> 
>> BTW updated webrev/javadoc available:
>> http://cr.openjdk.java.net/~vinnie/8170769/webrev.08/
>> http://cr.openjdk.java.net/~vinnie/8170769/javadoc.08/api/java.base/java/util/HexFormat.html
> 
> Now we have a somewhat unsatisfying asymmetry in the APIs.
> 
> There are four kinds of inputs:
> 
> 1. byte[]
> 2. byte[] subrange
> 3. InputStream
> 4. ByteBuffer
> 
> and two kinds of outputs:
> 
> 1. PrintStream
> 2. Stream<String>
> 
> and two variations of formatters:
> 
> 1. default formatter
> 2. custom formatter + chunk size
> 
> This is a total of 16 combinations. But there are only eight methods: three 
> PrintStream methods with choice of input, two stream-output methods using the 
> default formatter, and three stream-output methods using custom 
> chunk+formatter.
> 
> You don't have to provide ALL combinations, but what's here is an odd subset 
> with some apparently arbitrary choices. For example, if I have a ByteBuffer 
> and I want to dump it to System.out using default formatting, I have to go 
> the Stream.forEachOrdered route AND provide the default chunk size and 
> formatter.
> 
>    HexFormat.dumpAsStream(buf, DEFAULT_CHUNK_SIZE, HEXDUMP_FORMATTER)
>             .forEachOrdered(System.out::println);
> 
> These aren't huge deals, but they're easily stumbled over.
> 
> One approach to organizing this is to have a HexFormat instance that contains 
> the setup information and then to have instance methods that either update 
> the setup or perform conversion and output. I'd use static factory methods 
> instead of constructors. For example, you could do this:
> 
> static factories methods:
> - from(byte[])
> - from(byte[], fromIndex, toIndex)
> - from(InputStream)
> - from(ByteBuffer)
> 
> formatter setup instance methods:
> - format(chunksize, formatter)
> 
> output:
> - void dump(PrintStream)
> - Stream<String> stream()
> 
> Using this approach my example from above could be performed as follows:
> 
>   HexFormat.from(buf).dump(System.out);
> 
> Note, I'm not saying that you HAVE to do it this way. (In particular, the 
> naming could use work.) This is quite possibly overkill. But it's something 
> you might consider, as it gets you all 16 combinations using seven methods, 
> compared to the eight static methods in the current proposal that cover only 
> half of the combinations.
> 
> Alternatively, pare down the set of static methods to a bare minimum. Provide 
> one that can do everything, and then provide one or two more that are 
> essentially the same as the first but with some hardwired defaults. For 
> example, to help minimize things, you can wrap a ByteBuffer around a byte 
> array subrange, or get an InputStream from a byte array subrange. But you 
> can't get an InputStream from a ByteBuffer or vice-versa, without a lot of 
> work.
> 
> (I haven't looked at the to* or from* methods.)
> 
> s’marks


Your chaining approach has merit and the method chaining is attractive but it 
would be a significant change to the API at this advanced stage.

I agree that there are gaps in the combinations and perhaps that could be 
addressed by introducing a few convenience methods.
I think ByteBuffer is under-represented and propose adding the following two 
methods to handle the simplified, default cases:

    public static Stream<String> dumpAsStream(ByteBuffer buffer)

    public static void dump(ByteBuffer buffer, PrintStream out)


That would be 10 combos for 10 methods versus 16 combos for 7 methods which is 
still not full coverage but perhaps more methods could
be added in the future if there is demand for them.

Re: RFR: 8170769 Provide a simple hexdump facility for binary data

Reply via email to