RFR 8193832: Performance of InputStream.readAllBytes() could be improved

Brian Burkhalter Tue, 19 Dec 2017 12:03:42 -0800

https://bugs.openjdk.java.net/browse/JDK-8193832
http://cr.openjdk.java.net/~bpb/8193832/webrev.00/


The implementation of InputStream.readAllBytes() can be modified to be in 
general faster and to require less memory. The current implementation starts 
with an internal buffer of size 8192 bytes and doubles the size of the buffer 
each time more capacity is required resulting in a geometric progression in the 
buffer size. At the end if the buffer size is not exactly equal to the total 
number of bytes read, then another allocation equal to the total number read is 
performed. The amount of memory N required can be calculated for initial buffer 
size B and total bytes read L as

        M for L == B*(2^n)
N =
        M + L for L != B*(2^n)

where M = B*(2^(n + 1) - 1) and n = ceil(log2(L/B)).

An alternative implementation is not to increase the size of the internal 
buffer each time more capacity is needed, but to instead maintain a list of 
buffers of up to B bytes each and gather those into an output buffer at the 
end. If there is only a single buffer in the list then gathering is not needed 
and it can be returned directly. The amount of memory N required in this case is

        B + L for L <= B
N =
        B + 2*L for L > B

A comparison of memory usage for a number of lengths L with a buffer size B of 
8192 is:

L          N_old      N_new       
8191       16383      16383     
8192       8192       16384     
8193       32769      24578     
16383      40959      40958     
16384      24576      40960     
16385      73729      40962     
32767      90111      73726     
32768      57344      73728     
32769      155649     73730     
65535      188415     139262    
65536      122880     139264    
65537      319489     139266    
131071     385023     270334    
131072     253952     270336    
131073     647169     270338    
262143     778239     532478    
262144     516096     532480    
262145     1302529    532482

In general if the size of the last intermediate buffer in the old version does 
not equal the total number of bytes read, then the old version will require 
more memory thaw the new. The performance for the same set of lengths was 
measured using JMH:

Benchmark                                     (base)  (offset)   Mode  Cnt      
  Score       Error  Units
BenchReadAllBytesParam.readAllBytes             8192        -1  thrpt    5   
578064.253 ± 18955.667  ops/s
BenchReadAllBytesParam.readAllBytesArrayList    8192        -1  thrpt    5   
530963.634 ±  4799.923  ops/s
BenchReadAllBytesParam.readAllBytes             8192         0  thrpt    5  
1414243.593 ± 68520.245  ops/s
BenchReadAllBytesParam.readAllBytesArrayList    8192         0  thrpt    5   
532019.149 ±  7051.738  ops/s
BenchReadAllBytesParam.readAllBytes             8192         1  thrpt    5   
293586.365 ±  8196.939  ops/s
BenchReadAllBytesParam.readAllBytesArrayList    8192         1  thrpt    5   
361682.265 ± 27081.111  ops/s
BenchReadAllBytesParam.readAllBytes            16384        -1  thrpt    5   
216809.517 ±  4853.852  ops/s
BenchReadAllBytesParam.readAllBytesArrayList   16384        -1  thrpt    5   
212346.244 ±  2980.916  ops/s
BenchReadAllBytesParam.readAllBytes            16384         0  thrpt    5   
379757.470 ± 14524.249  ops/s
BenchReadAllBytesParam.readAllBytesArrayList   16384         0  thrpt    5   
218802.256 ±  4361.080  ops/s
BenchReadAllBytesParam.readAllBytes            16384         1  thrpt    5   
123570.712 ±  1665.661  ops/s
BenchReadAllBytesParam.readAllBytesArrayList   16384         1  thrpt    5   
208801.577 ±  8005.196  ops/s
BenchReadAllBytesParam.readAllBytes            32768        -1  thrpt    5    
93083.146 ±  1024.309  ops/s
BenchReadAllBytesParam.readAllBytesArrayList   32768        -1  thrpt    5   
104398.304 ±  1310.509  ops/s
BenchReadAllBytesParam.readAllBytes            32768         0  thrpt    5   
151104.878 ±  3461.359  ops/s
BenchReadAllBytesParam.readAllBytesArrayList   32768         0  thrpt    5   
104849.088 ±  3841.224  ops/s
BenchReadAllBytesParam.readAllBytes            32768         1  thrpt    5    
58039.827 ±   398.908  ops/s
BenchReadAllBytesParam.readAllBytesArrayList   32768         1  thrpt    5   
104489.685 ±  2118.496  ops/s
BenchReadAllBytesParam.readAllBytes            65536        -1  thrpt    5    
43144.695 ±   440.349  ops/s
BenchReadAllBytesParam.readAllBytesArrayList   65536        -1  thrpt    5    
55583.338 ±  2115.162  ops/s
BenchReadAllBytesParam.readAllBytes            65536         0  thrpt    5    
67216.536 ±  2055.057  ops/s
BenchReadAllBytesParam.readAllBytesArrayList   65536         0  thrpt    5    
55680.238 ±  1235.571  ops/s
BenchReadAllBytesParam.readAllBytes            65536         1  thrpt    5    
27908.000 ±   568.473  ops/s
BenchReadAllBytesParam.readAllBytesArrayList   65536         1  thrpt    5    
55938.779 ±   813.991  ops/s
BenchReadAllBytesParam.readAllBytes           131072        -1  thrpt    5    
20299.014 ±   568.616  ops/s
BenchReadAllBytesParam.readAllBytesArrayList  131072        -1  thrpt    5    
28115.036 ±   842.392  ops/s
BenchReadAllBytesParam.readAllBytes           131072         0  thrpt    5    
31617.475 ±   467.378  ops/s
BenchReadAllBytesParam.readAllBytesArrayList  131072         0  thrpt    5    
28274.289 ±   259.699  ops/s
BenchReadAllBytesParam.readAllBytes           131072         1  thrpt    5    
13640.406 ±   303.125  ops/s
BenchReadAllBytesParam.readAllBytesArrayList  131072         1  thrpt    5    
28221.298 ±   515.030  ops/s
BenchReadAllBytesParam.readAllBytes           262144        -1  thrpt    5     
9995.183 ±   249.368  ops/s
BenchReadAllBytesParam.readAllBytesArrayList  262144        -1  thrpt    5    
14043.194 ±   138.026  ops/s
BenchReadAllBytesParam.readAllBytes           262144         0  thrpt    5    
15309.238 ±   353.752  ops/s
BenchReadAllBytesParam.readAllBytesArrayList  262144         0  thrpt    5    
14048.569 ±   348.699  ops/s
BenchReadAllBytesParam.readAllBytes           262144         1  thrpt    5     
5940.442 ±   206.855  ops/s
BenchReadAllBytesParam.readAllBytesArrayList  262144         1  thrpt    5    
14055.357 ±   412.470  ops/s

In each case the number of bytes read is the sum of the base and offset 
parameters. Similar behavior with respect to data length is observed for 
performance as for memory usage: if the data length is not equal to the size of 
the last intermediate buffer used, then the performance of the old version is 
in general worse than that of the new.

Thanks,

Brian

RFR 8193832: Performance of InputStream.readAllBytes() could be improved

Reply via email to