I tried one more approach using an interface with the appropriate 'getBytes', etc, methods, but unfortunately its allocation doesn't seem to be elided either: https://gist.github.com/885d36cc5e0097c4628f454d3deb23a6
MyBenchmark.testWithByteBuffer thrpt 2 164225320.729 ops/s MyBenchmark.testWithByteBuffer:·gc.alloc.rate.norm thrpt 2 56.000 B/op MyBenchmark.testWithBytes thrpt 2 289869913.686 ops/s MyBenchmark.testWithBytes:·gc.alloc.rate.norm thrpt 2 ≈ 10⁻⁷ B/op MyBenchmark.testWithBytesWrappedInAccessor thrpt 2 202213942.822 ops/s MyBenchmark.testWithBytesWrappedInAccessor:·gc.alloc.rate.norm thrpt 2 24.000 B/op MyBenchmark.testWithBytesWrappedInThreadLocalAccessor thrpt 2 183097145.922 ops/s MyBenchmark.testWithBytesWrappedInThreadLocalAccessor:·gc.alloc.rate thrpt 2 ≈ 10⁻⁴ MB/sec So while my little byte-array-wrapper is smaller than ByteBuffer (and faster), it still isn't allocation-free. Using a threadlocal can eliminate the allocation but gives up a bit of performance. So, does anyone have a clever idea to get the same performance as directly passing the byte array, but without any allocation, and in such a way that Java8 is supported? (clearly I could just locally hack the generator to only support byte[] and not ByteBuffer, but would prefer to contribute a change back to the flatbuffers project that can maintain back-compatibility as well). I suppose storing an Object and using 'instanceof' checks is an option, though makes me sad. On Wed, Aug 8, 2018 at 8:35 AM Todd Lipcon <[email protected]> wrote: > Thanks Gil. Unfortunately I'm stuck on Java 8 for now. And it sounds like > I'll have to modify the flat buffers code generation either way to get rid > of the byte buffer and replace it at least with some interface that could > wrap a bytebuffer, unsafe, varhandle, etc. > > Todd > > On Tue, Aug 7, 2018, 11:45 PM Gil Tene <[email protected]> wrote: > >> Oh, and there is MethodHandles.byteBufferViewVarHandle >> <https://docs.oracle.com/javase/10/docs/api/java/lang/invoke/MethodHandles.html#byteBufferViewVarHandle(java.lang.Class,java.nio.ByteOrder)> >> if you (for some reason) want to do the same but keep ByteBuffers around. >> >> On Tuesday, August 7, 2018 at 9:41:01 PM UTC-7, Gil Tene wrote: >>> >>> *IF* you can use post-java-8 stuff, VarHandles may have a more systemic >>> and intentional/explicit answer for expressing what you are trying to do >>> here, without resorting to Unsafe. Specifically, using a >>> MethodHandles.byteArrayViewVarHandle >>> <https://docs.oracle.com/javase/10/docs/api/java/lang/invoke/MethodHandles.html#byteArrayViewVarHandle(java.lang.Class,java.nio.ByteOrder)>() >>> that you would get once (statically), you should be able to peek into your >>> many different byte[] instances and extract a field of a different >>> primitive type (int, long, etc.) at some arbitrary index, without having to >>> wrap it up in the super-short-lived ByteBuffer in your example, and hope >>> for Escape analysis to take care of it... >>> >>> Here is a code example that does the same wrapping you were looking to >>> do, using VarHandles: >>> >>> import java.lang.invoke.MethodHandles; >>> import java.lang.invoke.VarHandle; >>> import java.nio.ByteOrder; >>> >>> >>> public class VarHandleExample { >>> >>> static final byte[] bytes = {0x02, 0x00, (byte) 0xbe, (byte) 0xba, ( >>> byte) 0xfe, (byte) 0xca}; >>> >>> private static class FileDesc { >>> static final VarHandle VH_intArrayView = MethodHandles. >>> byteArrayViewVarHandle(int[].class, ByteOrder.LITTLE_ENDIAN); >>> static final VarHandle VH_shortArrayView = MethodHandles. >>> byteArrayViewVarHandle(short[].class, ByteOrder.LITTLE_ENDIAN); >>> private final byte[] buf; >>> int bufPos; >>> >>> FileDesc(byte[] buf, int headerPosition) { >>> bufPos = ((short) VH_shortArrayView.get(buf, headerPosition >>> )) + headerPosition; >>> this.buf = buf; >>> } >>> >>> public int getVal() { >>> return (int) VH_intArrayView.get(buf, bufPos); >>> } >>> } >>> >>> >>> public static void main(String[] args) { >>> FileDesc fd = new FileDesc(bytes, 0); >>> System.out.format("The int we get from fd.get() is: 0x%x\n", fd. >>> getVal()); >>> } >>> } >>> >>> Running this results in the probably correct output of: >>> >>> The int we get from fd.get() is: *0xcafebabe* >>> >>> Which means that the byte offset reading in the backing byte[], using >>> little endian, and even at not-4-byte-offset-aligned locations, seems to >>> work. >>> >>> NOTE: I have NOT examined what it looks like in generated code, beyond >>> verifying that everything seems to get inlined, but as stated, the code >>> would not incur an allocation or need an intermediate object per buffer >>> instance. >>> >>> Now, since this only works in Java9+, you could code it that way for >>> those versions, and revert to the Unsafe equivalent for Java 8-. You could >>> even convert the code above to code that dynamically uses VarHandle (when >>> available) without requiring javac to know anything about them (using >>> reflection and MethodHandles), and uses Usafe only if VarHandle is not >>> supported. Ugly ProtableVarHandleExample that does that (and would run on >>> Java 7...10) *might* follow... >>> >>> On Tuesday, August 7, 2018 at 1:55:35 PM UTC-7, Todd Lipcon wrote: >>>> >>>> Hey folks, >>>> >>>> I'm working on reducing heap usage of a big server application that >>>> currently holds on to tens of millions of generated FlatBuffer instances in >>>> the old generation. Each such instance looks more or less like this: >>>> >>>> private static class FileDesc { >>>> private final ByteBuffer bb; >>>> int bbPos; >>>> >>>> FileDesc(ByteBuffer bb) { >>>> bbPos = bb.getShort(bb.position()) + bb.position(); >>>> this.bb = bb; >>>> } >>>> >>>> public int getVal() { >>>> return bb.getInt(bbPos); >>>> } >>>> } >>>> >>>> (I've simplified the code, but the important bit is the ByteBuffer >>>> member and the fact that it provides nice accessors which read data from >>>> various parts of the buffer) >>>> >>>> Unfortunately, the heap usage of these buffers adds up quite a bit -- >>>> each ByteBuffer takes 56 bytes of heap, and each 'FileDesc' takes 32 bytes >>>> after padding. The underlying buffers themselves are typically on the order >>>> of 100 bytes, so it seems like almost 50% of the heap is being used by >>>> wrapper objects instead of the underlying data itself. Additionally, 2/3 of >>>> the object count are overhead, which I imagine contributes to GC >>>> scanning/marking time. >>>> >>>> In practice, all of the ByteBuffers used by this app are simply >>>> ByteBuffer.wrap(byteArray). I was figuring that an easy improvement here >>>> would be to simply store the byte[] and whenever we need to access the >>>> contents of the FlatBuffer, use it as a flyweight: >>>> >>>> new FileDesc(ByteBuffer.wrap(byteArray)).getVal(); >>>> >>>> ... and let the magic of Escape Analysis eliminate those allocations. >>>> Unfortunately, I've learned from this group that magic should be tested, so >>>> I wrote a JMH benchmark: >>>> https://gist.github.com/4b6ddf0febcc3620ccdf68e5f11c6c83 and found >>>> that the ByteBuffer.wrap allocation is not eliminated. >>>> >>>> Has anyone faced this issue before? It seems like my only real option >>>> here is to modify the flatbuffer code generator to generate byte[] members >>>> instead of ByteBuffer members, so that the flyweight allocation would be >>>> eliminated, but maybe I missed something more clever. >>>> >>>> -Todd >>>> >>> -- >> You received this message because you are subscribed to the Google Groups >> "mechanical-sympathy" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> For more options, visit https://groups.google.com/d/optout. >> > -- You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
