yandrey321 commented on code in PR #10438:
URL: https://github.com/apache/ozone/pull/10438#discussion_r3383454418


##########
hadoop-hdds/common/src/test/java/org/apache/hadoop/ozone/common/ChunkBufferPutBenchmark.java:
##########
@@ -0,0 +1,346 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.ozone.common;
+
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.text.DecimalFormat;
+import java.util.concurrent.ThreadLocalRandom;
+import org.apache.hadoop.ozone.common.JfrByteBufferAllocations.AllocationStats;
+
+/**
+ * Microbenchmark for ChunkBuffer.put(byte[]) direct copy vs ByteBuffer.wrap 
path.
+ *
+ * <p>Focused on the scenarios where HDDS-15485 shows the clearest benefit:
+ * <ul>
+ *   <li>Throughput: 4KB stream fill with incremental buffer (64KB 
increment)</li>
+ *   <li>Allocations: same 4KB / 64KB-increment incremental buffer path (JFR + 
wrap calls)</li>
+ * </ul>
+ *
+ * <p>Run from the repo root:
+ * <pre>
+ *   mvn -pl hadoop-hdds/common -q test-compile exec:java \
+ *     -Dexec.mainClass=org.apache.hadoop.ozone.common.ChunkBufferPutBenchmark 
\
+ *     -Dexec.classpathScope=test \
+ *     -Dexec.args="--add-opens jdk.jfr/jdk.jfr=ALL-UNNAMED --add-opens 
jdk.jfr/jdk.jfr.consumer=ALL-UNNAMED"
+ * </pre>
+ * JFR ByteBuffer counts are sampled; put-op count reports exact wrap calls.
+ * Wrap-path timings use a blackhole on each {@code ByteBuffer.wrap} so the JVM
+ * cannot eliminate short-lived wrapper allocations via escape analysis.
+ */
+public final class ChunkBufferPutBenchmark {
+
+  /** Prevents escape analysis from removing ByteBuffer.wrap allocations in 
the wrap path. */
+  private static volatile Object blackhole;

Review Comment:
   @szetszwo I removed blackhole and rerun the test, I still see the same 
improvement:
   ```
   mvn -pl hadoop-hdds/common -q test-compile exec:java \
     -Dexec.mainClass=org.apache.hadoop.ozone.common.ChunkBufferPutBenchmark \
     -Dexec.classpathScope=test \
     -Dexec.args="--add-opens jdk.jfr/jdk.jfr=ALL-UNNAMED --add-opens 
jdk.jfr/jdk.jfr.consumer=ALL-UNNAMED"
   ChunkBuffer.put(byte[]) microbenchmark (pre-allocated buffer, put-only)
   JVM: 17 on aarch64
   
   === Throughput showcase ===
   --- Incremental buffer showcase ---
   Config: ozone.client.stream.buffer.size=4MB, 
ozone.client.stream.buffer.increment=64KB, io.file.buffer.size=4KB
   Pattern: 4KB stream fill into IncrementalChunkBuffer (64KB steps)
   Chunk=4096KB increment=64KB write=4KB
     round 1:
       direct put(byte[]):   42,262.7 MB/s | 92 ns/op | 20.00s | 216,385,536 ops
       wrap put(ByteBuffer): 38,938.0 MB/s | 100 ns/op | 20.00s | 199,363,584 
ops
       improvement: 8.5% faster (1.09x) per 4KB write; throughput 8.5% (1.09x)
     round 2:
       direct put(byte[]):   41,901.0 MB/s | 93 ns/op | 20.00s | 214,533,120 ops
       wrap put(ByteBuffer): 38,885.1 MB/s | 100 ns/op | 20.00s | 199,092,224 
ops
       improvement: 7.8% faster (1.08x) per 4KB write; throughput 7.8% (1.08x)
     round 3:
       direct put(byte[]):   41,889.6 MB/s | 93 ns/op | 20.00s | 214,474,752 ops
       wrap put(ByteBuffer): 38,947.6 MB/s | 100 ns/op | 20.00s | 199,412,736 
ops
       improvement: 7.6% faster (1.08x) per 4KB write; throughput 7.6% (1.08x)
     median improvement over 3 rounds: 7.8%
   
   === Allocation showcase ===
   --- Incremental buffer showcase ---
   Config: ozone.client.stream.buffer.size=4MB, 
ozone.client.stream.buffer.increment=64KB, io.file.buffer.size=4KB
   Pattern: 4KB stream fill into IncrementalChunkBuffer (64KB steps)
   Chunk=4096KB increment=64KB write=4KB
     direct put(byte[]):   53,298,176 put ops | 0 ByteBuffer TLAB allocs | 0 
alloc bytes
     wrap put(ByteBuffer): 48,955,392 put ops | 724 ByteBuffer TLAB allocs | 
40,544 alloc bytes
     ByteBuffer.wrap calls on wrap path (1 per put): 48,955,392
     direct path avoids 48,955,392 ByteBuffer.wrap calls per run
     JFR confirms zero ByteBuffer TLAB allocations on direct path
     JFR sampled ByteBuffer TLAB allocations avoided on direct path: 724
     (JFR samples TLAB events; put-op count is the exact wrap-call metric)
     ```
     
   Removing `-Dexec.args="--add-opens jdk.jfr/jdk.jfr=ALL-UNNAMED --add-opens 
jdk.jfr/jdk.jfr.consumer=ALL-UNNAMED"` param doubles the numbers:
   ```
   mvn -pl hadoop-hdds/common -q test-compile exec:java \
     -Dexec.mainClass=org.apache.hadoop.ozone.common.ChunkBufferPutBenchmark \
     -Dexec.classpathScope=test  
   ChunkBuffer.put(byte[]) microbenchmark (pre-allocated buffer, put-only)
   JVM: 17 on aarch64
   
   === Throughput showcase ===
   --- Incremental buffer showcase ---
   Config: ozone.client.stream.buffer.size=4MB, 
ozone.client.stream.buffer.increment=64KB, io.file.buffer.size=4KB
   Pattern: 4KB stream fill into IncrementalChunkBuffer (64KB steps)
   Chunk=4096KB increment=64KB write=4KB
     round 1:
       direct put(byte[]):   42,609.4 MB/s | 92 ns/op | 20.00s | 218,161,152 ops
       wrap put(ByteBuffer): 36,828.7 MB/s | 106 ns/op | 20.00s | 188,563,456 
ops
       improvement: 15.7% faster (1.16x) per 4KB write; throughput 15.7% (1.16x)
     round 2:
       direct put(byte[]):   42,486.2 MB/s | 92 ns/op | 20.00s | 217,530,368 ops
       wrap put(ByteBuffer): 37,137.1 MB/s | 105 ns/op | 20.00s | 190,142,464 
ops
       improvement: 14.4% faster (1.14x) per 4KB write; throughput 14.4% (1.14x)
     round 3:
       direct put(byte[]):   42,399.2 MB/s | 92 ns/op | 20.00s | 217,084,928 ops
       wrap put(ByteBuffer): 36,557.4 MB/s | 107 ns/op | 20.00s | 187,174,912 
ops
       improvement: 16.0% faster (1.16x) per 4KB write; throughput 16.0% (1.16x)
     median improvement over 3 rounds: 15.7%
   
   === Allocation showcase ===
   --- Incremental buffer showcase ---
   Config: ozone.client.stream.buffer.size=4MB, 
ozone.client.stream.buffer.increment=64KB, io.file.buffer.size=4KB
   Pattern: 4KB stream fill into IncrementalChunkBuffer (64KB steps)
   Chunk=4096KB increment=64KB write=4KB
     direct put(byte[]):   50,421,760 put ops | 0 ByteBuffer TLAB allocs | 0 
alloc bytes
     wrap put(ByteBuffer): 48,524,288 put ops | 714 ByteBuffer TLAB allocs | 
39,984 alloc bytes
     ByteBuffer.wrap calls on wrap path (1 per put): 48,524,288
     direct path avoids 48,524,288 ByteBuffer.wrap calls per run
     JFR confirms zero ByteBuffer TLAB allocations on direct path
     JFR sampled ByteBuffer TLAB allocations avoided on direct path: 714
     (JFR samples TLAB events; put-op count is the exact wrap-call metric)
     ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to