https://bugs.openjdk.java.net/browse/JDK-8193832
http://cr.openjdk.java.net/~bpb/8193832/webrev.00/
The implementation of InputStream.readAllBytes() can be modified to be in
general faster and to require less memory. The current implementation starts
with an internal buffer of size 8192 bytes and doubles the size of the buffer
each time more capacity is required resulting in a geometric progression in the
buffer size. At the end if the buffer size is not exactly equal to the total
number of bytes read, then another allocation equal to the total number read is
performed. The amount of memory N required can be calculated for initial buffer
size B and total bytes read L as
M for L == B*(2^n)
N =
M + L for L != B*(2^n)
where M = B*(2^(n + 1) - 1) and n = ceil(log2(L/B)).
An alternative implementation is not to increase the size of the internal
buffer each time more capacity is needed, but to instead maintain a list of
buffers of up to B bytes each and gather those into an output buffer at the
end. If there is only a single buffer in the list then gathering is not needed
and it can be returned directly. The amount of memory N required in this case is
B + L for L <= B
N =
B + 2*L for L > B
A comparison of memory usage for a number of lengths L with a buffer size B of
8192 is:
L N_old N_new
8191 16383 16383
8192 8192 16384
8193 32769 24578
16383 40959 40958
16384 24576 40960
16385 73729 40962
32767 90111 73726
32768 57344 73728
32769 155649 73730
65535 188415 139262
65536 122880 139264
65537 319489 139266
131071 385023 270334
131072 253952 270336
131073 647169 270338
262143 778239 532478
262144 516096 532480
262145 1302529 532482
In general if the size of the last intermediate buffer in the old version does
not equal the total number of bytes read, then the old version will require
more memory thaw the new. The performance for the same set of lengths was
measured using JMH:
Benchmark (base) (offset) Mode Cnt
Score Error Units
BenchReadAllBytesParam.readAllBytes 8192 -1 thrpt 5
578064.253 ± 18955.667 ops/s
BenchReadAllBytesParam.readAllBytesArrayList 8192 -1 thrpt 5
530963.634 ± 4799.923 ops/s
BenchReadAllBytesParam.readAllBytes 8192 0 thrpt 5
1414243.593 ± 68520.245 ops/s
BenchReadAllBytesParam.readAllBytesArrayList 8192 0 thrpt 5
532019.149 ± 7051.738 ops/s
BenchReadAllBytesParam.readAllBytes 8192 1 thrpt 5
293586.365 ± 8196.939 ops/s
BenchReadAllBytesParam.readAllBytesArrayList 8192 1 thrpt 5
361682.265 ± 27081.111 ops/s
BenchReadAllBytesParam.readAllBytes 16384 -1 thrpt 5
216809.517 ± 4853.852 ops/s
BenchReadAllBytesParam.readAllBytesArrayList 16384 -1 thrpt 5
212346.244 ± 2980.916 ops/s
BenchReadAllBytesParam.readAllBytes 16384 0 thrpt 5
379757.470 ± 14524.249 ops/s
BenchReadAllBytesParam.readAllBytesArrayList 16384 0 thrpt 5
218802.256 ± 4361.080 ops/s
BenchReadAllBytesParam.readAllBytes 16384 1 thrpt 5
123570.712 ± 1665.661 ops/s
BenchReadAllBytesParam.readAllBytesArrayList 16384 1 thrpt 5
208801.577 ± 8005.196 ops/s
BenchReadAllBytesParam.readAllBytes 32768 -1 thrpt 5
93083.146 ± 1024.309 ops/s
BenchReadAllBytesParam.readAllBytesArrayList 32768 -1 thrpt 5
104398.304 ± 1310.509 ops/s
BenchReadAllBytesParam.readAllBytes 32768 0 thrpt 5
151104.878 ± 3461.359 ops/s
BenchReadAllBytesParam.readAllBytesArrayList 32768 0 thrpt 5
104849.088 ± 3841.224 ops/s
BenchReadAllBytesParam.readAllBytes 32768 1 thrpt 5
58039.827 ± 398.908 ops/s
BenchReadAllBytesParam.readAllBytesArrayList 32768 1 thrpt 5
104489.685 ± 2118.496 ops/s
BenchReadAllBytesParam.readAllBytes 65536 -1 thrpt 5
43144.695 ± 440.349 ops/s
BenchReadAllBytesParam.readAllBytesArrayList 65536 -1 thrpt 5
55583.338 ± 2115.162 ops/s
BenchReadAllBytesParam.readAllBytes 65536 0 thrpt 5
67216.536 ± 2055.057 ops/s
BenchReadAllBytesParam.readAllBytesArrayList 65536 0 thrpt 5
55680.238 ± 1235.571 ops/s
BenchReadAllBytesParam.readAllBytes 65536 1 thrpt 5
27908.000 ± 568.473 ops/s
BenchReadAllBytesParam.readAllBytesArrayList 65536 1 thrpt 5
55938.779 ± 813.991 ops/s
BenchReadAllBytesParam.readAllBytes 131072 -1 thrpt 5
20299.014 ± 568.616 ops/s
BenchReadAllBytesParam.readAllBytesArrayList 131072 -1 thrpt 5
28115.036 ± 842.392 ops/s
BenchReadAllBytesParam.readAllBytes 131072 0 thrpt 5
31617.475 ± 467.378 ops/s
BenchReadAllBytesParam.readAllBytesArrayList 131072 0 thrpt 5
28274.289 ± 259.699 ops/s
BenchReadAllBytesParam.readAllBytes 131072 1 thrpt 5
13640.406 ± 303.125 ops/s
BenchReadAllBytesParam.readAllBytesArrayList 131072 1 thrpt 5
28221.298 ± 515.030 ops/s
BenchReadAllBytesParam.readAllBytes 262144 -1 thrpt 5
9995.183 ± 249.368 ops/s
BenchReadAllBytesParam.readAllBytesArrayList 262144 -1 thrpt 5
14043.194 ± 138.026 ops/s
BenchReadAllBytesParam.readAllBytes 262144 0 thrpt 5
15309.238 ± 353.752 ops/s
BenchReadAllBytesParam.readAllBytesArrayList 262144 0 thrpt 5
14048.569 ± 348.699 ops/s
BenchReadAllBytesParam.readAllBytes 262144 1 thrpt 5
5940.442 ± 206.855 ops/s
BenchReadAllBytesParam.readAllBytesArrayList 262144 1 thrpt 5
14055.357 ± 412.470 ops/s
In each case the number of bytes read is the sum of the base and offset
parameters. Similar behavior with respect to data length is observed for
performance as for memory usage: if the data length is not equal to the size of
the last intermediate buffer used, then the performance of the old version is
in general worse than that of the new.
Thanks,
Brian