On Mon, Mar 21, 2022 at 8:19 PM Alan Bateman <alan.bate...@oracle.com> wrote:
>

Hi Alan,

Thanks for looking at this issue. Please find my answers to your
questions inline:

> On 04/03/2022 10:04, Volker Simonis wrote:
> > :
> >
> > 1. Relax the specification of `InflaterInputStream::read(..)` and
> > specifically note in the API documentation that a call to
> > `InflaterInputStream::read(byte[] b, int off, int len)` may write more
> > than *k* characters into `b` where *k* is the returned number of
> > inflated bytes. Notice that there's a precedence for this approach in
> > `java.io.ByteArrayInputStream::read(..)` which "*unlike the overridden
> > method of InputStream, returns -1 instead of zero if the end of the
> > stream has been reached and len==0*".
> >
> On the surface this is probably okay. It wouldn't really be relaxing the
> specification, rather than it would say for a return value n, the bytes
> in b[n] to b[len-1] are undefined as the read operate may have clobbered
> their original values.

Yes, that's exactly what I'd like to propose.

> The risk is that it becomes a performance "trick"
> to provide a larger buffer.

A larger buffer is always beneficial because it saves us from multiple
native down-calls to zlib's inflate function. That's already true
today, with the current implementation, where ideally the whole
inflated content fits into the output buffer such that it can be
inflated by a single call to inflate().

The issue I'd like to solve here is really a corner case for the case
where the size of the inflated content is `m` but the buffer size is
`n` with `n < m`. In that case we need `(m + (n-1)) / n)` calls to
inflate and if `(m % n) > 0` the last call will write fewer than ´n`
content bytes (i.e. `m % n` bytes) to the output buffer. Only during
the last call to inflate() some of the remaining bytes in the buffer
after the `m % n` content bytes can be clobbered. Calling that out in
the specification won't change the performance benefits of a larger
output buffer except for the writing of the 16 (i.e. the size of a
vector store) very last inflated bytes. This last write could be
potentially optimized with a 16-byte vector store instruction if the
output buffer was larger than the deflated content `m`. But that's
really a corner case because in reality the output buffer is usually
smaller than `m` and the size of the inflated content `m` is usually
much, much larger than 16 (so optimizing the output of the last 16
bytes won't have any measurable effects).

> That said, I think the main thing we need to
> satisfy ourselves on is security. One part of that is whether anything
> can be gleaned by reading from the byte array during or after an
> inflate.

I can't see any security risks here. The optimized vector store
instructions are only used to copy repeated content from the history
buffer (aka "sliding window") to the output (i.e. inflate LZ77
compressed data). So the "clobber" bytes will always only contain
bytes which have already been written before. See [1, 2] for details.

[1] https://www.zlib.net/manual.html
[2] https://www.euccas.me/zlib/#deflate_sliding_window

> The other part is how the implementation behaves when there is
> a tampering of the array contents during an inflate.

I don't really understand this concern? Do you mean what happens if
another thread is changing the content of the output buffer during an
inflate? I think such a use case has never been well-defined and
amending the specification won't change anything for such a situation.
As I've explained before, the extra clobber bytes won't contain any
data which is required by the inflate algorithm for its correct
operation. It just contains some additional data from the LZ77 history
buffer which can be safely overwritten. The optimized inflate
implementations are actually constantly doing this until they reach
the end of the output buffer or the end of the deflated data (see [3]
for a simple example).

[3] 
https://github.com/simonis/zlib-chromium#inflater-improvements-in-zlib-chromium

>
> -Alan

If you don't mind I'll send out a pull request with a draft of the
amended API-doc and  open a CSR for this issue.

Thank you and best regards,
Volker

Reply via email to