Re: RFR: 8170769 Provide a simple hexdump facility for binary data

Vincent Ryan Wed, 12 Dec 2018 10:24:47 -0800

FYI I’ve updated the webrev/javadoc with latest edits including the 2 new 
ByteBuffer methods:


http://cr.openjdk.java.net/~vinnie/8170769/webrev.09/ 
<http://cr.openjdk.java.net/~vinnie/8170769/webrev.09/>
http://cr.openjdk.java.net/~vinnie/8170769/javadoc.09/api/java.base/java/util/HexFormat.html
 
<http://cr.openjdk.java.net/~vinnie/8170769/javadoc.09/api/java.base/java/util/HexFormat.html>
(NOTE: internal link to HexFormat.Formatter class is currently broken)


If that is an acceptable compromise then I believe all outstanding issues have 
now been addressed.
Thanks to all those who provided review comments.


> On 12 Dec 2018, at 12:32, Vincent Ryan <vincent.x.r...@oracle.com> wrote:
> 
> Thanks for your proposal.
> Comments below.
> 
>> On 12 Dec 2018, at 02:35, Stuart Marks <stuart.ma...@oracle.com> wrote:
>> 
>> 
>> 
>> On 12/11/18 1:21 PM, Vincent Ryan wrote:
>>> My preference is for PrintStream rather than Writer, for the same reason as 
>>> Roger: it’s more convenient
>>> for handling System.out. Does that address your concern?
>> 
>> PrintStream is fine with me.
>> 
>>> I cannot simply cast 8859-1 characters into UTF-8 because UTF-8 is 
>>> multi-byte charset so some 0x8X characters
>>> Will trigger the multi-byte sequence and will end up being misinterpreted. 
>>> Hence my rather awkward conversion to a String.
>>> Is there a better way?
>> 
>> In toPrintableString(),
>> 
>> 259         StringBuilder printable = new StringBuilder(toIndex - fromIndex);
>> 260         for (int i = fromIndex; i < toIndex; i++) {
>> 261             if (bytes[i] > 0x1F && bytes[i] < 0x7F) {
>> 262                 printable.append((char) bytes[i]);
>> 263             } else if (bytes[i] > (byte)0x9F && bytes[i] <= (byte)0xFF) {
>> 264                 printable.append(new String(new byte[]{bytes[i]}, 
>> ISO_8859_1));
>> 265
>> 266             } else {
>> 267                 printable.append('.');
>> 268             }
>> 269         }
>> 
>> It works to cast ASCII bytes char, because the 7-bit ASCII range overlaps 
>> the low 7 bits of the UTF-16 char range. The bytes values of ISO 8859-1 
>> overlap the low 8 bits of UTF-16, so casts work for them too.
>> 
>> For any other charset, you'd need to do codeset conversion. But you're 
>> cleverly supporting only ISO 8859-1, so you don't have to do any conversion. 
>> :-)
>> 
>>> I’m not sure I’ve addressed your concern regarding IOExceptions - can you 
>>> elaborate?
>> 
>> Taking out the OutputStream overloads addressed my concerns. In at least one 
>> case the code would wrap the OutputStream into a PrintStream, print stuff to 
>> it, and then throw away the PrintStream. If an output error occurred, any 
>> error state in the PrintStream would also be thrown away. The creation of 
>> the PrintStream wrapper would also use the system's default charset instead 
>> of letting the caller control it.
>> 
>> The dump() overloads now all take PrintStream, so it's the caller's 
>> responsibility to ensure that the PrintStream is using the right charset and 
>> to check for errors after. So this is all OK now.
>> 
>> Note that the internal getPrintStream(), to wrap an OutputStream in a 
>> PrintStream, is now obsolete and can be removed.
>> 
>> (Oh, I see Roger has said much the same things. Oh well, the peril of 
>> parallel reviews.)
>> 
>> **
>> 
>>> BTW updated webrev/javadoc available:
>>> http://cr.openjdk.java.net/~vinnie/8170769/webrev.08/
>>> http://cr.openjdk.java.net/~vinnie/8170769/javadoc.08/api/java.base/java/util/HexFormat.html
>> 
>> Now we have a somewhat unsatisfying asymmetry in the APIs.
>> 
>> There are four kinds of inputs:
>> 
>> 1. byte[]
>> 2. byte[] subrange
>> 3. InputStream
>> 4. ByteBuffer
>> 
>> and two kinds of outputs:
>> 
>> 1. PrintStream
>> 2. Stream<String>
>> 
>> and two variations of formatters:
>> 
>> 1. default formatter
>> 2. custom formatter + chunk size
>> 
>> This is a total of 16 combinations. But there are only eight methods: three 
>> PrintStream methods with choice of input, two stream-output methods using 
>> the default formatter, and three stream-output methods using custom 
>> chunk+formatter.
>> 
>> You don't have to provide ALL combinations, but what's here is an odd subset 
>> with some apparently arbitrary choices. For example, if I have a ByteBuffer 
>> and I want to dump it to System.out using default formatting, I have to go 
>> the Stream.forEachOrdered route AND provide the default chunk size and 
>> formatter.
>> 
>>   HexFormat.dumpAsStream(buf, DEFAULT_CHUNK_SIZE, HEXDUMP_FORMATTER)
>>            .forEachOrdered(System.out::println);
>> 
>> These aren't huge deals, but they're easily stumbled over.
>> 
>> One approach to organizing this is to have a HexFormat instance that 
>> contains the setup information and then to have instance methods that either 
>> update the setup or perform conversion and output. I'd use static factory 
>> methods instead of constructors. For example, you could do this:
>> 
>> static factories methods:
>> - from(byte[])
>> - from(byte[], fromIndex, toIndex)
>> - from(InputStream)
>> - from(ByteBuffer)
>> 
>> formatter setup instance methods:
>> - format(chunksize, formatter)
>> 
>> output:
>> - void dump(PrintStream)
>> - Stream<String> stream()
>> 
>> Using this approach my example from above could be performed as follows:
>> 
>>  HexFormat.from(buf).dump(System.out);
>> 
>> Note, I'm not saying that you HAVE to do it this way. (In particular, the 
>> naming could use work.) This is quite possibly overkill. But it's something 
>> you might consider, as it gets you all 16 combinations using seven methods, 
>> compared to the eight static methods in the current proposal that cover only 
>> half of the combinations.
>> 
>> Alternatively, pare down the set of static methods to a bare minimum. 
>> Provide one that can do everything, and then provide one or two more that 
>> are essentially the same as the first but with some hardwired defaults. For 
>> example, to help minimize things, you can wrap a ByteBuffer around a byte 
>> array subrange, or get an InputStream from a byte array subrange. But you 
>> can't get an InputStream from a ByteBuffer or vice-versa, without a lot of 
>> work.
>> 
>> (I haven't looked at the to* or from* methods.)
>> 
>> s’marks
> 
> Your chaining approach has merit and the method chaining is attractive but it 
> would be a significant change to the API at this advanced stage.
> 
> I agree that there are gaps in the combinations and perhaps that could be 
> addressed by introducing a few convenience methods.
> I think ByteBuffer is under-represented and propose adding the following two 
> methods to handle the simplified, default cases:
> 
>    public static Stream<String> dumpAsStream(ByteBuffer buffer)
> 
>    public static void dump(ByteBuffer buffer, PrintStream out)
> 
> 
> That would be 10 combos for 10 methods versus 16 combos for 7 methods which 
> is still not full coverage but perhaps more methods could
> be added in the future if there is demand for them.
> 
> 
> 
>

Re: RFR: 8170769 Provide a simple hexdump facility for binary data

Reply via email to