On Mon, Mar 21, 2022 at 8:19 PM Alan Bateman <alan.bate...@oracle.com> wrote: >
Hi Alan, Thanks for looking at this issue. Please find my answers to your questions inline: > On 04/03/2022 10:04, Volker Simonis wrote: > > : > > > > 1. Relax the specification of `InflaterInputStream::read(..)` and > > specifically note in the API documentation that a call to > > `InflaterInputStream::read(byte[] b, int off, int len)` may write more > > than *k* characters into `b` where *k* is the returned number of > > inflated bytes. Notice that there's a precedence for this approach in > > `java.io.ByteArrayInputStream::read(..)` which "*unlike the overridden > > method of InputStream, returns -1 instead of zero if the end of the > > stream has been reached and len==0*". > > > On the surface this is probably okay. It wouldn't really be relaxing the > specification, rather than it would say for a return value n, the bytes > in b[n] to b[len-1] are undefined as the read operate may have clobbered > their original values. Yes, that's exactly what I'd like to propose. > The risk is that it becomes a performance "trick" > to provide a larger buffer. A larger buffer is always beneficial because it saves us from multiple native down-calls to zlib's inflate function. That's already true today, with the current implementation, where ideally the whole inflated content fits into the output buffer such that it can be inflated by a single call to inflate(). The issue I'd like to solve here is really a corner case for the case where the size of the inflated content is `m` but the buffer size is `n` with `n < m`. In that case we need `(m + (n-1)) / n)` calls to inflate and if `(m % n) > 0` the last call will write fewer than ´n` content bytes (i.e. `m % n` bytes) to the output buffer. Only during the last call to inflate() some of the remaining bytes in the buffer after the `m % n` content bytes can be clobbered. Calling that out in the specification won't change the performance benefits of a larger output buffer except for the writing of the 16 (i.e. the size of a vector store) very last inflated bytes. This last write could be potentially optimized with a 16-byte vector store instruction if the output buffer was larger than the deflated content `m`. But that's really a corner case because in reality the output buffer is usually smaller than `m` and the size of the inflated content `m` is usually much, much larger than 16 (so optimizing the output of the last 16 bytes won't have any measurable effects). > That said, I think the main thing we need to > satisfy ourselves on is security. One part of that is whether anything > can be gleaned by reading from the byte array during or after an > inflate. I can't see any security risks here. The optimized vector store instructions are only used to copy repeated content from the history buffer (aka "sliding window") to the output (i.e. inflate LZ77 compressed data). So the "clobber" bytes will always only contain bytes which have already been written before. See [1, 2] for details. [1] https://www.zlib.net/manual.html [2] https://www.euccas.me/zlib/#deflate_sliding_window > The other part is how the implementation behaves when there is > a tampering of the array contents during an inflate. I don't really understand this concern? Do you mean what happens if another thread is changing the content of the output buffer during an inflate? I think such a use case has never been well-defined and amending the specification won't change anything for such a situation. As I've explained before, the extra clobber bytes won't contain any data which is required by the inflate algorithm for its correct operation. It just contains some additional data from the LZ77 history buffer which can be safely overwritten. The optimized inflate implementations are actually constantly doing this until they reach the end of the output buffer or the end of the deflated data (see [3] for a simple example). [3] https://github.com/simonis/zlib-chromium#inflater-improvements-in-zlib-chromium > > -Alan If you don't mind I'll send out a pull request with a draft of the amended API-doc and open a CSR for this issue. Thank you and best regards, Volker