carp84 commented on a change in pull request #9501: [FLINK-12697] [State Backends] Support on-disk state storage for spill-able heap backend URL: https://github.com/apache/flink/pull/9501#discussion_r319071340
########## File path: flink-state-backends/flink-statebackend-heap-spillable/src/main/java/org/apache/flink/runtime/state/heap/ByteBufferInputStreamWithPos.java ########## @@ -0,0 +1,117 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.runtime.state.heap; + +import org.apache.flink.util.Preconditions; + +import javax.annotation.Nonnull; + +import java.io.InputStream; +import java.nio.ByteBuffer; + +/** + * Un-synchronized input stream using the given byte buffer. + */ +public class ByteBufferInputStreamWithPos extends InputStream { Review comment: Here is the result of JMH benchmark, where "Orig" means the original implementation of `ByteArrayInputStreamWithPos` and "New" means the new one rebased on `ByteBufferInputStreamWithPos`. From the result we could see the initialization will indeed take more time due to constructing `HeapByteBuffer` but there's no regression on `read` functions, so I think the real impact of the refactor is ok. ``` # Run complete. Total time: 00:06:08 Benchmark Mode Cnt Score Error Units ByteArrayInputStreamBenchmark.testInitNew thrpt 30 51998.302 ± 2460.797 ops/ms ByteArrayInputStreamBenchmark.testInitOrig thrpt 30 102891.299 ± 6027.684 ops/ms ByteArrayInputStreamBenchmark.testReadIntoBufferNew thrpt 30 22985.900 ± 1437.231 ops/ms ByteArrayInputStreamBenchmark.testReadIntoBufferOrig thrpt 30 24436.653 ± 297.965 ops/ms ByteArrayInputStreamBenchmark.testReadNew thrpt 30 25100.684 ± 1466.822 ops/ms ByteArrayInputStreamBenchmark.testReadOrig thrpt 30 26607.799 ± 413.156 ops/ms # Run complete. Total time: 00:06:09 Benchmark Mode Cnt Score Error Units ByteArrayInputStreamBenchmark.testInitNew thrpt 30 49778.439 ± 2988.952 ops/ms ByteArrayInputStreamBenchmark.testInitOrig thrpt 30 100307.654 ± 16369.951 ops/ms ByteArrayInputStreamBenchmark.testReadIntoBufferNew thrpt 30 22838.932 ± 1315.758 ops/ms ByteArrayInputStreamBenchmark.testReadIntoBufferOrig thrpt 30 22266.972 ± 1438.928 ops/ms ByteArrayInputStreamBenchmark.testReadNew thrpt 30 25849.028 ± 1521.414 ops/ms ByteArrayInputStreamBenchmark.testReadOrig thrpt 30 25672.482 ± 1196.857 ops/ms # Run complete. Total time: 00:06:07 Benchmark Mode Cnt Score Error Units ByteArrayInputStreamBenchmark.testInitNew thrpt 30 53944.255 ± 1009.925 ops/ms ByteArrayInputStreamBenchmark.testInitOrig thrpt 30 117808.909 ± 2655.215 ops/ms ByteArrayInputStreamBenchmark.testReadIntoBufferNew thrpt 30 24388.177 ± 346.684 ops/ms ByteArrayInputStreamBenchmark.testReadIntoBufferOrig thrpt 30 24491.455 ± 323.040 ops/ms ByteArrayInputStreamBenchmark.testReadNew thrpt 30 27081.540 ± 471.671 ops/ms ByteArrayInputStreamBenchmark.testReadOrig thrpt 30 26694.972 ± 348.197 ops/ms ``` Attached is the patch of the benchmark, based on the current flink-benchmark master branch. Command used to run the benchmark is `mvn -Dflink.version=1.10-SNAPSHOT clean package exec:exec -Dexec.executable=java -Dexec.args="-jar target/benchmarks.jar -rf csv org.apache.flink.benchmark.memory.*" ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services