Hi,
For the case of reading 2^N bytes i believe you can avoid doing a last copy by
checking if “n < 0" within the “nread > 0” block when “nread ==
DEAFULT_BUFFER_SIZE”. That might close the perf gap for smaller cases. You can
also move "nread = 0” to the same block e.g.:
var copy = (n < 0 && nread == DEAFULT_BUFFER_SIZE) ? buf : Arrays.copyOf(buf,
nread);
list.add(copy)
nread = 0;
262 byte[] output = new byte[total];
263 int offset = 0;
264 int numCached = list.size();
265 for (int i = 0; i < numCached; i++) {
266 byte[] b = list.get(i);
267 System.arraycopy(b, 0, output, offset, b.length);
268 offset += b.length;
269 }
You can simplify to:
var result = new byte[total];
int offset = 0;
for (buf : list) {
System.arraycopy(buf, 0, result, offset, buf.length);
offset += buf.length;
}
s/list/bufs and then you can use var for the declarations at the start of the
method.
Paul.
> On 19 Dec 2017, at 11:57, Brian Burkhalter <[email protected]>
> wrote:
>
> https://bugs.openjdk.java.net/browse/JDK-8193832
> http://cr.openjdk.java.net/~bpb/8193832/webrev.00/
>
> The implementation of InputStream.readAllBytes() can be modified to be in
> general faster and to require less memory. The current implementation starts
> with an internal buffer of size 8192 bytes and doubles the size of the buffer
> each time more capacity is required resulting in a geometric progression in
> the buffer size. At the end if the buffer size is not exactly equal to the
> total number of bytes read, then another allocation equal to the total number
> read is performed. The amount of memory N required can be calculated for
> initial buffer size B and total bytes read L as
>
> M for L == B*(2^n)
> N =
> M + L for L != B*(2^n)
>
> where M = B*(2^(n + 1) - 1) and n = ceil(log2(L/B)).
>
> An alternative implementation is not to increase the size of the internal
> buffer each time more capacity is needed, but to instead maintain a list of
> buffers of up to B bytes each and gather those into an output buffer at the
> end. If there is only a single buffer in the list then gathering is not
> needed and it can be returned directly. The amount of memory N required in
> this case is
>
> B + L for L <= B
> N =
> B + 2*L for L > B
>
> A comparison of memory usage for a number of lengths L with a buffer size B
> of 8192 is:
>
> L N_old N_new
> 8191 16383 16383
> 8192 8192 16384
> 8193 32769 24578
> 16383 40959 40958
> 16384 24576 40960
> 16385 73729 40962
> 32767 90111 73726
> 32768 57344 73728
> 32769 155649 73730
> 65535 188415 139262
> 65536 122880 139264
> 65537 319489 139266
> 131071 385023 270334
> 131072 253952 270336
> 131073 647169 270338
> 262143 778239 532478
> 262144 516096 532480
> 262145 1302529 532482
>
> In general if the size of the last intermediate buffer in the old version
> does not equal the total number of bytes read, then the old version will
> require more memory thaw the new. The performance for the same set of lengths
> was measured using JMH:
>
> Benchmark (base) (offset) Mode Cnt
> Score Error Units
> BenchReadAllBytesParam.readAllBytes 8192 -1 thrpt 5
> 578064.253 ± 18955.667 ops/s
> BenchReadAllBytesParam.readAllBytesArrayList 8192 -1 thrpt 5
> 530963.634 ± 4799.923 ops/s
> BenchReadAllBytesParam.readAllBytes 8192 0 thrpt 5
> 1414243.593 ± 68520.245 ops/s
> BenchReadAllBytesParam.readAllBytesArrayList 8192 0 thrpt 5
> 532019.149 ± 7051.738 ops/s
> BenchReadAllBytesParam.readAllBytes 8192 1 thrpt 5
> 293586.365 ± 8196.939 ops/s
> BenchReadAllBytesParam.readAllBytesArrayList 8192 1 thrpt 5
> 361682.265 ± 27081.111 ops/s
> BenchReadAllBytesParam.readAllBytes 16384 -1 thrpt 5
> 216809.517 ± 4853.852 ops/s
> BenchReadAllBytesParam.readAllBytesArrayList 16384 -1 thrpt 5
> 212346.244 ± 2980.916 ops/s
> BenchReadAllBytesParam.readAllBytes 16384 0 thrpt 5
> 379757.470 ± 14524.249 ops/s
> BenchReadAllBytesParam.readAllBytesArrayList 16384 0 thrpt 5
> 218802.256 ± 4361.080 ops/s
> BenchReadAllBytesParam.readAllBytes 16384 1 thrpt 5
> 123570.712 ± 1665.661 ops/s
> BenchReadAllBytesParam.readAllBytesArrayList 16384 1 thrpt 5
> 208801.577 ± 8005.196 ops/s
> BenchReadAllBytesParam.readAllBytes 32768 -1 thrpt 5
> 93083.146 ± 1024.309 ops/s
> BenchReadAllBytesParam.readAllBytesArrayList 32768 -1 thrpt 5
> 104398.304 ± 1310.509 ops/s
> BenchReadAllBytesParam.readAllBytes 32768 0 thrpt 5
> 151104.878 ± 3461.359 ops/s
> BenchReadAllBytesParam.readAllBytesArrayList 32768 0 thrpt 5
> 104849.088 ± 3841.224 ops/s
> BenchReadAllBytesParam.readAllBytes 32768 1 thrpt 5
> 58039.827 ± 398.908 ops/s
> BenchReadAllBytesParam.readAllBytesArrayList 32768 1 thrpt 5
> 104489.685 ± 2118.496 ops/s
> BenchReadAllBytesParam.readAllBytes 65536 -1 thrpt 5
> 43144.695 ± 440.349 ops/s
> BenchReadAllBytesParam.readAllBytesArrayList 65536 -1 thrpt 5
> 55583.338 ± 2115.162 ops/s
> BenchReadAllBytesParam.readAllBytes 65536 0 thrpt 5
> 67216.536 ± 2055.057 ops/s
> BenchReadAllBytesParam.readAllBytesArrayList 65536 0 thrpt 5
> 55680.238 ± 1235.571 ops/s
> BenchReadAllBytesParam.readAllBytes 65536 1 thrpt 5
> 27908.000 ± 568.473 ops/s
> BenchReadAllBytesParam.readAllBytesArrayList 65536 1 thrpt 5
> 55938.779 ± 813.991 ops/s
> BenchReadAllBytesParam.readAllBytes 131072 -1 thrpt 5
> 20299.014 ± 568.616 ops/s
> BenchReadAllBytesParam.readAllBytesArrayList 131072 -1 thrpt 5
> 28115.036 ± 842.392 ops/s
> BenchReadAllBytesParam.readAllBytes 131072 0 thrpt 5
> 31617.475 ± 467.378 ops/s
> BenchReadAllBytesParam.readAllBytesArrayList 131072 0 thrpt 5
> 28274.289 ± 259.699 ops/s
> BenchReadAllBytesParam.readAllBytes 131072 1 thrpt 5
> 13640.406 ± 303.125 ops/s
> BenchReadAllBytesParam.readAllBytesArrayList 131072 1 thrpt 5
> 28221.298 ± 515.030 ops/s
> BenchReadAllBytesParam.readAllBytes 262144 -1 thrpt 5
> 9995.183 ± 249.368 ops/s
> BenchReadAllBytesParam.readAllBytesArrayList 262144 -1 thrpt 5
> 14043.194 ± 138.026 ops/s
> BenchReadAllBytesParam.readAllBytes 262144 0 thrpt 5
> 15309.238 ± 353.752 ops/s
> BenchReadAllBytesParam.readAllBytesArrayList 262144 0 thrpt 5
> 14048.569 ± 348.699 ops/s
> BenchReadAllBytesParam.readAllBytes 262144 1 thrpt 5
> 5940.442 ± 206.855 ops/s
> BenchReadAllBytesParam.readAllBytesArrayList 262144 1 thrpt 5
> 14055.357 ± 412.470 ops/s
>
> In each case the number of bytes read is the sum of the base and offset
> parameters. Similar behavior with respect to data length is observed for
> performance as for memory usage: if the data length is not equal to the size
> of the last intermediate buffer used, then the performance of the old version
> is in general worse than that of the new.
>
> Thanks,
>
> Brian