On Fri, 21 Jul 2023 09:43:43 GMT, Uwe Schindler <uschind...@openjdk.org> wrote:

> 
> So have you thought of making this low-level classes public so we outside 
> users no longer need to deal with VarHandles?
> 
I believe this is beyond the scope of this PR.

As for what do we do in the JDK, I can see few options:

1. We keep things as they are in current mainline.
2. We keep changes in this PR.
3. We rewrite most uses of ByteArray in java.io to use BB and remove ByteArray
4. We remove ByteArray and provide some static helper function to generate an 
unsafe offset from an array

I agree with @uschindler that wrapping stuff in ByteBuffer "on the fly" might 
be problematic for code that is not inlined, so I don't think we should do that.

I have to admit that I'm a little unclear as to what the goal of this PR is. 
Initially, it started as an "improve startup" effort, which then morphed into a 
"let's make ByteArray" more usable, even for other clients (like classfile 
API), or Long::toString. I'm unsure about the latter use cases, because (a) 
Long/Integer are core classes and should probably use Unsafe directly, where 
needed and (b) for classfile API, using ByteBuffer seems a good candidate on 
paper (of course there is the unknown of how well the byte buffer access will 
optimize in the classfile API code - but if there's more than one access on the 
same buffer, we should be more than ok).

I'd like to add some more words of caution against the synthetic benchmarks 
that we tried above. These benchmarks are quite peculiar, for at least two 
reasons:

* we only ever access one element
* the accessed offset is always zero

No general API can equal Unsafe under this set of conditions. When playing with 
the benchmark I realize that every little thing mattered (we're really 
measuring the number of instructions emitted by C2)  - for instance, the fact 
that when access occurs with a byte buffer, the underlying array and limit have 
to be fetched from their fields has a cost. Also, the fact that ByteBuffer has 
a hierarchy has an even bigger cost (as C2 has to make sure you are really 
invoking HeapByteBuffer). The mutable endianness state in byte buffer also adds 
up to the noise. The above is what ends up in a big fat "2x slower" label.

That said, all these "factors" are only relevant because we're looking at a 
_single_ buffer operation. In fact, all such costs can be easily be amortized 
as soon as there more than one access. Or as soon as you start accessing 
offsets that are not known statically (unlike in the benchmark).

So, there's a question of what's the code idiom that leads to the absolute 
fastest code (and I agree that Unsafe + static wrappers seems the best here). 
And then there's the question of "but, what do we need to get the performance 
number/startup behavior we want". I feel the important question is the second, 
but we keep arguing about the former.

And, to assess that second question, we need to understand better what the 
goals are (which, so far, seems a bit fuzzy).

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/14636#discussion_r1270795253

Reply via email to