On Thu, 28 Apr 2022 20:02:36 GMT, Brian Burkhalter <b...@openjdk.org> wrote:

>> Modify native multi-byte read-write code used by the `java.io` classes to 
>> limit the size of the allocated native buffer thereby decreasing off-heap 
>> memory footprint and increasing throughput.
>
> Brian Burkhalter has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   6478546: Decrease malloc'ed buffer maximum size to 64 kB

Further performance testing was conducted for the case where the native read 
and write functions used a fixed, stack-allocated buffer of size 8192. The 
loops were moved up into the Java code of `FileInputStream`, `FileOutputStream` 
and `RandomAccessFile`. Note that there was code duplication because RAF needs 
both read and write methods as well. The performance of writing with this 
approach was approximately half what it had been, so for writing the approach 
was abandoned.

Here are some updated performance measurements:

<img width="721" alt="FileInputStream-read-perf" 
src="https://user-images.githubusercontent.com/71468245/167041493-6d4c421c-c2ec-4a8a-8b32-09b2a902a77c.png";>

<img width="720" alt="FileOutputStream-write-perf" 
src="https://user-images.githubusercontent.com/71468245/167041541-94e5806c-de86-4e62-a117-4cfafac82e87.png";>

The performance measurements shown are for the following cases:

1. Master: unmodified code as it exists in the mainline
2. Java: fixed-size stack buffer in native read, read loops in Java, write as 
in the mainline but with malloc buffer size limit
3. Native: read loop in native read with malloc buffer size limit, write as in 
the mainline but with malloc buffer size limit

The horizontal axis represents a variety of lengths from 8192 to 1GB; the 
vertical axis is throughput (ops/s) on a log 10 scale. The native lines in the 
charts are for the code proposed to be integrated.

As can be seen, the performance of reading is quite similar up to larger 
lengths. The mainline version presumably starts to suffer the effect of large 
allocations. The native read loop performs the best throughout, being for 
lengths 10 MB and above from 50% to 3X faster than the mainline version. The 
native read loop is about 40% faster than the Java read loop for these larger 
lengths.

Due to the log scale of the charts, the reading performance detail cannot be 
seen exactly and so is given here for the larger lengths:


               Throughput of read(byte[]) (ops/s)
   Length      Master         Java        Native
   1048576    11341.39      6124.482    11371.091
  10485760      356.893      376.326      557.906
 251503002       10.036       14.27        19.869
 524288000        5.005        6.857        9.552
1000000000        1.675        3.527        4.997

The performance of writing is about the same for the Java and Native versions, 
as it should be since the implementations are the same. Any difference is 
likely due to measurement noise. The mainline version again suffers for larger 
lengths.

As the native write loop was already present in the mainline code, the 
principal complexity proposed to be added is the native read loop. Given the 
improved throughput and vastly reduced native memory allocation this seems to 
be justified.

-------------

PR: https://git.openjdk.java.net/jdk/pull/8235

Reply via email to