Re: RFR: 8170769 Provide a simple hexdump facility for binary data

Roger Riggs Thu, 20 Dec 2018 13:44:46 -0800

Hi Vincent,

This looks good.


The doc for HEXDUMP_FORMATTER should include an example of the formatting.
Its easier to see it than to reconstruct it from the example code. like:

00000000 61 62 63 64 65 66 67 68 69 6a 6b 6c 6d 6e 6f 70|abcdefghijklmnop| 00000010 71 72 73 74 75 76 77 78 79 7a |qrstuvwxyz|


I'm not sure the reference to hexdump(1) is useful, it could be removed
without loss of precision or usefulness.

Thanks, Roger


On 12/12/2018 01:21 PM, Vincent Ryan wrote:

FYI I’ve updated the webrev/javadoc with latest edits including the 2 new 
ByteBuffer methods:

http://cr.openjdk.java.net/~vinnie/8170769/webrev.09/ 
<http://cr.openjdk.java.net/~vinnie/8170769/webrev.09/>
http://cr.openjdk.java.net/~vinnie/8170769/javadoc.09/api/java.base/java/util/HexFormat.html
 
<http://cr.openjdk.java.net/~vinnie/8170769/javadoc.09/api/java.base/java/util/HexFormat.html>
(NOTE: internal link to HexFormat.Formatter class is currently broken)


If that is an acceptable compromise then I believe all outstanding issues have 
now been addressed.
Thanks to all those who provided review comments.

On 12 Dec 2018, at 12:32, Vincent Ryan <vincent.x.r...@oracle.com> wrote:

Thanks for your proposal.
Comments below.

On 12 Dec 2018, at 02:35, Stuart Marks <stuart.ma...@oracle.com> wrote:



On 12/11/18 1:21 PM, Vincent Ryan wrote:

My preference is for PrintStream rather than Writer, for the same reason as 
Roger: it’s more convenient
for handling System.out. Does that address your concern?

PrintStream is fine with me.

I cannot simply cast 8859-1 characters into UTF-8 because UTF-8 is multi-byte 
charset so some 0x8X characters
Will trigger the multi-byte sequence and will end up being misinterpreted. 
Hence my rather awkward conversion to a String.
Is there a better way?

In toPrintableString(),

259         StringBuilder printable = new StringBuilder(toIndex - fromIndex);
260         for (int i = fromIndex; i < toIndex; i++) {
261             if (bytes[i] > 0x1F && bytes[i] < 0x7F) {
262                 printable.append((char) bytes[i]);
263             } else if (bytes[i] > (byte)0x9F && bytes[i] <= (byte)0xFF) {
264                 printable.append(new String(new byte[]{bytes[i]}, 
ISO_8859_1));
265
266             } else {
267                 printable.append('.');
268             }
269         }

It works to cast ASCII bytes char, because the 7-bit ASCII range overlaps the 
low 7 bits of the UTF-16 char range. The bytes values of ISO 8859-1 overlap the 
low 8 bits of UTF-16, so casts work for them too.

For any other charset, you'd need to do codeset conversion. But you're cleverly 
supporting only ISO 8859-1, so you don't have to do any conversion. :-)

I’m not sure I’ve addressed your concern regarding IOExceptions - can you 
elaborate?

Taking out the OutputStream overloads addressed my concerns. In at least one 
case the code would wrap the OutputStream into a PrintStream, print stuff to 
it, and then throw away the PrintStream. If an output error occurred, any error 
state in the PrintStream would also be thrown away. The creation of the 
PrintStream wrapper would also use the system's default charset instead of 
letting the caller control it.

The dump() overloads now all take PrintStream, so it's the caller's 
responsibility to ensure that the PrintStream is using the right charset and to 
check for errors after. So this is all OK now.

Note that the internal getPrintStream(), to wrap an OutputStream in a 
PrintStream, is now obsolete and can be removed.

(Oh, I see Roger has said much the same things. Oh well, the peril of parallel 
reviews.)

**

BTW updated webrev/javadoc available:
http://cr.openjdk.java.net/~vinnie/8170769/webrev.08/
http://cr.openjdk.java.net/~vinnie/8170769/javadoc.08/api/java.base/java/util/HexFormat.html

Now we have a somewhat unsatisfying asymmetry in the APIs.

There are four kinds of inputs:

1. byte[]
2. byte[] subrange
3. InputStream
4. ByteBuffer

and two kinds of outputs:

1. PrintStream
2. Stream<String>

and two variations of formatters:

1. default formatter
2. custom formatter + chunk size

This is a total of 16 combinations. But there are only eight methods: three 
PrintStream methods with choice of input, two stream-output methods using the 
default formatter, and three stream-output methods using custom chunk+formatter.

You don't have to provide ALL combinations, but what's here is an odd subset 
with some apparently arbitrary choices. For example, if I have a ByteBuffer and 
I want to dump it to System.out using default formatting, I have to go the 
Stream.forEachOrdered route AND provide the default chunk size and formatter.

   HexFormat.dumpAsStream(buf, DEFAULT_CHUNK_SIZE, HEXDUMP_FORMATTER)
            .forEachOrdered(System.out::println);

These aren't huge deals, but they're easily stumbled over.

One approach to organizing this is to have a HexFormat instance that contains 
the setup information and then to have instance methods that either update the 
setup or perform conversion and output. I'd use static factory methods instead 
of constructors. For example, you could do this:

static factories methods:
- from(byte[])
- from(byte[], fromIndex, toIndex)
- from(InputStream)
- from(ByteBuffer)

formatter setup instance methods:
- format(chunksize, formatter)

output:
- void dump(PrintStream)
- Stream<String> stream()

Using this approach my example from above could be performed as follows:

  HexFormat.from(buf).dump(System.out);

Note, I'm not saying that you HAVE to do it this way. (In particular, the 
naming could use work.) This is quite possibly overkill. But it's something you 
might consider, as it gets you all 16 combinations using seven methods, 
compared to the eight static methods in the current proposal that cover only 
half of the combinations.

Alternatively, pare down the set of static methods to a bare minimum. Provide 
one that can do everything, and then provide one or two more that are 
essentially the same as the first but with some hardwired defaults. For 
example, to help minimize things, you can wrap a ByteBuffer around a byte array 
subrange, or get an InputStream from a byte array subrange. But you can't get 
an InputStream from a ByteBuffer or vice-versa, without a lot of work.

(I haven't looked at the to* or from* methods.)

s’marks

Your chaining approach has merit and the method chaining is attractive but it 
would be a significant change to the API at this advanced stage.

I agree that there are gaps in the combinations and perhaps that could be 
addressed by introducing a few convenience methods.
I think ByteBuffer is under-represented and propose adding the following two 
methods to handle the simplified, default cases:

    public static Stream<String> dumpAsStream(ByteBuffer buffer)

    public static void dump(ByteBuffer buffer, PrintStream out)


That would be 10 combos for 10 methods versus 16 combos for 7 methods which is 
still not full coverage but perhaps more methods could
be added in the future if there is demand for them.

Re: RFR: 8170769 Provide a simple hexdump facility for binary data

Reply via email to