Re: RFR 8193832: Performance of InputStream.readAllBytes() could be improved

Paul Sandoz Tue, 19 Dec 2017 12:53:21 -0800

Hi,

For the case of reading 2^N bytes i believe you can avoid doing a last copy by 
checking if “n < 0" within the “nread > 0” block when “nread == 
DEAFULT_BUFFER_SIZE”. That might close the perf gap for smaller cases. You can 
also move "nread = 0” to the same block e.g.:


  var copy = (n < 0 && nread == DEAFULT_BUFFER_SIZE) ? buf : Arrays.copyOf(buf, 
nread);
  list.add(copy)
  nread = 0;


 262         byte[] output = new byte[total];
 263         int offset = 0;
 264         int numCached = list.size();
 265         for (int i = 0; i < numCached; i++) {
 266             byte[] b = list.get(i);
 267             System.arraycopy(b, 0, output, offset, b.length);
 268             offset += b.length;
 269         }

You can simplify to:

var result = new byte[total];
int offset = 0;
for (buf : list) {
  System.arraycopy(buf, 0, result, offset, buf.length);
  offset += buf.length;
}

s/list/bufs and then you can use var for the declarations at the start of the 
method.

Paul.


> On 19 Dec 2017, at 11:57, Brian Burkhalter <[email protected]> 
> wrote:
> 
> https://bugs.openjdk.java.net/browse/JDK-8193832
> http://cr.openjdk.java.net/~bpb/8193832/webrev.00/
> 
> The implementation of InputStream.readAllBytes() can be modified to be in 
> general faster and to require less memory. The current implementation starts 
> with an internal buffer of size 8192 bytes and doubles the size of the buffer 
> each time more capacity is required resulting in a geometric progression in 
> the buffer size. At the end if the buffer size is not exactly equal to the 
> total number of bytes read, then another allocation equal to the total number 
> read is performed. The amount of memory N required can be calculated for 
> initial buffer size B and total bytes read L as
> 
>       M for L == B*(2^n)
> N =
>       M + L for L != B*(2^n)
> 
> where M = B*(2^(n + 1) - 1) and n = ceil(log2(L/B)).
> 
> An alternative implementation is not to increase the size of the internal 
> buffer each time more capacity is needed, but to instead maintain a list of 
> buffers of up to B bytes each and gather those into an output buffer at the 
> end. If there is only a single buffer in the list then gathering is not 
> needed and it can be returned directly. The amount of memory N required in 
> this case is
> 
>       B + L for L <= B
> N =
>       B + 2*L for L > B
> 
> A comparison of memory usage for a number of lengths L with a buffer size B 
> of 8192 is:
> 
> L                N_old      N_new
> 8191       16383      16383
> 8192       8192       16384
> 8193       32769      24578
> 16383      40959      40958
> 16384      24576      40960
> 16385      73729      40962
> 32767      90111      73726
> 32768      57344      73728
> 32769      155649     73730
> 65535      188415     139262
> 65536      122880     139264
> 65537      319489     139266
> 131071     385023     270334
> 131072     253952     270336
> 131073     647169     270338
> 262143     778239     532478
> 262144     516096     532480
> 262145     1302529    532482
> 
> In general if the size of the last intermediate buffer in the old version 
> does not equal the total number of bytes read, then the old version will 
> require more memory thaw the new. The performance for the same set of lengths 
> was measured using JMH:
> 
> Benchmark                                     (base)  (offset)   Mode  Cnt    
>     Score       Error  Units
> BenchReadAllBytesParam.readAllBytes             8192        -1  thrpt    5   
> 578064.253 ± 18955.667  ops/s
> BenchReadAllBytesParam.readAllBytesArrayList    8192        -1  thrpt    5   
> 530963.634 ±  4799.923  ops/s
> BenchReadAllBytesParam.readAllBytes             8192         0  thrpt    5  
> 1414243.593 ± 68520.245  ops/s
> BenchReadAllBytesParam.readAllBytesArrayList    8192         0  thrpt    5   
> 532019.149 ±  7051.738  ops/s
> BenchReadAllBytesParam.readAllBytes             8192         1  thrpt    5   
> 293586.365 ±  8196.939  ops/s
> BenchReadAllBytesParam.readAllBytesArrayList    8192         1  thrpt    5   
> 361682.265 ± 27081.111  ops/s
> BenchReadAllBytesParam.readAllBytes            16384        -1  thrpt    5   
> 216809.517 ±  4853.852  ops/s
> BenchReadAllBytesParam.readAllBytesArrayList   16384        -1  thrpt    5   
> 212346.244 ±  2980.916  ops/s
> BenchReadAllBytesParam.readAllBytes            16384         0  thrpt    5   
> 379757.470 ± 14524.249  ops/s
> BenchReadAllBytesParam.readAllBytesArrayList   16384         0  thrpt    5   
> 218802.256 ±  4361.080  ops/s
> BenchReadAllBytesParam.readAllBytes            16384         1  thrpt    5   
> 123570.712 ±  1665.661  ops/s
> BenchReadAllBytesParam.readAllBytesArrayList   16384         1  thrpt    5   
> 208801.577 ±  8005.196  ops/s
> BenchReadAllBytesParam.readAllBytes            32768        -1  thrpt    5    
> 93083.146 ±  1024.309  ops/s
> BenchReadAllBytesParam.readAllBytesArrayList   32768        -1  thrpt    5   
> 104398.304 ±  1310.509  ops/s
> BenchReadAllBytesParam.readAllBytes            32768         0  thrpt    5   
> 151104.878 ±  3461.359  ops/s
> BenchReadAllBytesParam.readAllBytesArrayList   32768         0  thrpt    5   
> 104849.088 ±  3841.224  ops/s
> BenchReadAllBytesParam.readAllBytes            32768         1  thrpt    5    
> 58039.827 ±   398.908  ops/s
> BenchReadAllBytesParam.readAllBytesArrayList   32768         1  thrpt    5   
> 104489.685 ±  2118.496  ops/s
> BenchReadAllBytesParam.readAllBytes            65536        -1  thrpt    5    
> 43144.695 ±   440.349  ops/s
> BenchReadAllBytesParam.readAllBytesArrayList   65536        -1  thrpt    5    
> 55583.338 ±  2115.162  ops/s
> BenchReadAllBytesParam.readAllBytes            65536         0  thrpt    5    
> 67216.536 ±  2055.057  ops/s
> BenchReadAllBytesParam.readAllBytesArrayList   65536         0  thrpt    5    
> 55680.238 ±  1235.571  ops/s
> BenchReadAllBytesParam.readAllBytes            65536         1  thrpt    5    
> 27908.000 ±   568.473  ops/s
> BenchReadAllBytesParam.readAllBytesArrayList   65536         1  thrpt    5    
> 55938.779 ±   813.991  ops/s
> BenchReadAllBytesParam.readAllBytes           131072        -1  thrpt    5    
> 20299.014 ±   568.616  ops/s
> BenchReadAllBytesParam.readAllBytesArrayList  131072        -1  thrpt    5    
> 28115.036 ±   842.392  ops/s
> BenchReadAllBytesParam.readAllBytes           131072         0  thrpt    5    
> 31617.475 ±   467.378  ops/s
> BenchReadAllBytesParam.readAllBytesArrayList  131072         0  thrpt    5    
> 28274.289 ±   259.699  ops/s
> BenchReadAllBytesParam.readAllBytes           131072         1  thrpt    5    
> 13640.406 ±   303.125  ops/s
> BenchReadAllBytesParam.readAllBytesArrayList  131072         1  thrpt    5    
> 28221.298 ±   515.030  ops/s
> BenchReadAllBytesParam.readAllBytes           262144        -1  thrpt    5    
>  9995.183 ±   249.368  ops/s
> BenchReadAllBytesParam.readAllBytesArrayList  262144        -1  thrpt    5    
> 14043.194 ±   138.026  ops/s
> BenchReadAllBytesParam.readAllBytes           262144         0  thrpt    5    
> 15309.238 ±   353.752  ops/s
> BenchReadAllBytesParam.readAllBytesArrayList  262144         0  thrpt    5    
> 14048.569 ±   348.699  ops/s
> BenchReadAllBytesParam.readAllBytes           262144         1  thrpt    5    
>  5940.442 ±   206.855  ops/s
> BenchReadAllBytesParam.readAllBytesArrayList  262144         1  thrpt    5    
> 14055.357 ±   412.470  ops/s
> 
> In each case the number of bytes read is the sum of the base and offset 
> parameters. Similar behavior with respect to data length is observed for 
> performance as for memory usage: if the data length is not equal to the size 
> of the last intermediate buffer used, then the performance of the old version 
> is in general worse than that of the new.
> 
> Thanks,
> 
> Brian

Re: RFR 8193832: Performance of InputStream.readAllBytes() could be improved

Reply via email to