Github user sachouche commented on a diff in the pull request: https://github.com/apache/drill/pull/1060#discussion_r164532290 --- Diff: exec/memory/base/src/main/java/io/netty/buffer/DrillBuf.java --- @@ -703,7 +703,18 @@ protected void _setLong(int index, long value) { @Override public ByteBuf getBytes(int index, ByteBuf dst, int dstIndex, int length) { - udle.getBytes(index + offset, dst, dstIndex, length); + final int BULK_COPY_THR = 1024; --- End diff -- You are right, I'll put more information on this optimization: - During code profiling, I have noticed that getBytes() doesn't perform well when called with small length (lower than 1k) - Its throughput improves as the length increases Analysis : - Java exposes two intrinsics for writing to direct memory: putByte and copyMemory - The JVM is able to inline memory access (no function call) for putByte - copyMemory is a bulk API and this internally invokes libc memcpy (requires function call) - The rational is that we are willing to incur a function call if the associated processing is larger than the overhead; this is almost never the case for small memory accesses.
---