Re: [IO] Java's InflaterInputStream not playing well with IOUtils.skipFully

2023-06-13 Thread Tim Allison
Thank you!  I opened https://issues.apache.org/jira/browse/IO-802

On 2023/06/13 02:14:51 Phil Steitz wrote:
> On Mon, Jun 12, 2023 at 6:52 PM Phil Steitz  wrote:
> 
> >
> >
> > On Mon, Jun 12, 2023 at 2:14 PM Tim Allison  wrote:
> >
> >> All,
> >>
> >>   Since the refactoring to avoid threadlocal in skipFully, we are now
> >> getting errors in one of our unit tests on Tika [0].  We're still on
> >> commons-io 2.11.0, but we found this problem with 2.12.0 and 2.13.0.
> >>
> >>I was able to reproduce the problem outside of Tika [1].  To reproduce
> >> the problem, the stream has to be wrapped in java's InflaterInputStream,
> >> and it has to be run multithreaded.  The problem is easily reproducible
> >> with only two threads, but there is variation on which iteration triggers
> >> the problem even with the same seed (what you'd expect from a
> >> multi-threading bug).
> >>
> >>   I don't think this is a bug in commons-io, per se, but it is mildly
> >> frightening from a data reliability perspective that the combination of
> >> IOUtils.skipFully on top of Java's InflaterInputStream can lead to
> >> different data.
> >>
> >>   Is this a bug in Java, a bug in commons-io or something else?  What
> >> should we do?  Thank you!
> >>
> >
> > Interesting.  Taking a quick look at the source for both IOUtils and
> > InflaterInputStream, it looks to me like the InflaterInputStream may be
> > effectively assuming that it has exclusive access to the buffer that it is
> > reading into.   I looked at a couple of jdk sources and they both look like
> > this:
> >
> > public int read(byte[] b, int off, int len) throws IOException {
> > ensureOpen();
> > if (b == null) {
> > throw new NullPointerException();
> > } else if (off < 0 || len < 0 || len > b.length - off) {
> > throw new IndexOutOfBoundsException();
> > } else if (len == 0) {
> > return 0;
> > }
> > try {
> > int n;
> > while ((n = inf.inflate(b, off, len)) == 0) {
> > if (inf.finished() || inf.needsDictionary()) {
> > reachEOF = true;
> > return -1;
> > }
> > if (inf.needsInput()) {
> > fill();
> > }
> > }
> > return n;
> > } catch (DataFormatException e) {
> > String s = e.getMessage();
> > throw new ZipException(s != null ? s : "Invalid ZLIB data
> > format");
> > }
> > }
> >
> > I can see how zero-ing the byte array it is reading to could cause
> > problems here.  I don't see any doc anywhere saying that either this Reader
> > or the interface in general has this constraint, so maybe it is a jdk bug,
> > but it is what it is and I think to be safe it would be better to add back
> > the ThreadLocal on the write only bufffer in IOUtills or change it to not
> > zero the array.  I don't see why the zeroing is necessary, but I don't know
> > much about [io].
> >
> 
> Sorry, just eliminating the zero-ing probably wont work because concurrent
> writes could make buffer segments not inflatable IIUC what is going on.  So
> I would just restore the protection.
> 
> >
> > Phil
> >
> >
> >
> >>Best,
> >>
> >>Tim
> >>
> >>
> >>
> >>
> >> [0] https://issues.apache.org/jira/browse/TIKA-4065
> >> [1]
> >>
> >> https://github.com/tballison/commons-io/blob/TIKA-4065/src/test/java/org/apache/commons/io/IOUtilsMultithreadedTest.java
> >>
> >
> 

-
To unsubscribe, e-mail: user-unsubscr...@commons.apache.org
For additional commands, e-mail: user-h...@commons.apache.org



Re: [IO] Java's InflaterInputStream not playing well with IOUtils.skipFully

2023-06-12 Thread Phil Steitz
On Mon, Jun 12, 2023 at 6:52 PM Phil Steitz  wrote:

>
>
> On Mon, Jun 12, 2023 at 2:14 PM Tim Allison  wrote:
>
>> All,
>>
>>   Since the refactoring to avoid threadlocal in skipFully, we are now
>> getting errors in one of our unit tests on Tika [0].  We're still on
>> commons-io 2.11.0, but we found this problem with 2.12.0 and 2.13.0.
>>
>>I was able to reproduce the problem outside of Tika [1].  To reproduce
>> the problem, the stream has to be wrapped in java's InflaterInputStream,
>> and it has to be run multithreaded.  The problem is easily reproducible
>> with only two threads, but there is variation on which iteration triggers
>> the problem even with the same seed (what you'd expect from a
>> multi-threading bug).
>>
>>   I don't think this is a bug in commons-io, per se, but it is mildly
>> frightening from a data reliability perspective that the combination of
>> IOUtils.skipFully on top of Java's InflaterInputStream can lead to
>> different data.
>>
>>   Is this a bug in Java, a bug in commons-io or something else?  What
>> should we do?  Thank you!
>>
>
> Interesting.  Taking a quick look at the source for both IOUtils and
> InflaterInputStream, it looks to me like the InflaterInputStream may be
> effectively assuming that it has exclusive access to the buffer that it is
> reading into.   I looked at a couple of jdk sources and they both look like
> this:
>
> public int read(byte[] b, int off, int len) throws IOException {
> ensureOpen();
> if (b == null) {
> throw new NullPointerException();
> } else if (off < 0 || len < 0 || len > b.length - off) {
> throw new IndexOutOfBoundsException();
> } else if (len == 0) {
> return 0;
> }
> try {
> int n;
> while ((n = inf.inflate(b, off, len)) == 0) {
> if (inf.finished() || inf.needsDictionary()) {
> reachEOF = true;
> return -1;
> }
> if (inf.needsInput()) {
> fill();
> }
> }
> return n;
> } catch (DataFormatException e) {
> String s = e.getMessage();
> throw new ZipException(s != null ? s : "Invalid ZLIB data
> format");
> }
> }
>
> I can see how zero-ing the byte array it is reading to could cause
> problems here.  I don't see any doc anywhere saying that either this Reader
> or the interface in general has this constraint, so maybe it is a jdk bug,
> but it is what it is and I think to be safe it would be better to add back
> the ThreadLocal on the write only bufffer in IOUtills or change it to not
> zero the array.  I don't see why the zeroing is necessary, but I don't know
> much about [io].
>

Sorry, just eliminating the zero-ing probably wont work because concurrent
writes could make buffer segments not inflatable IIUC what is going on.  So
I would just restore the protection.

>
> Phil
>
>
>
>>Best,
>>
>>Tim
>>
>>
>>
>>
>> [0] https://issues.apache.org/jira/browse/TIKA-4065
>> [1]
>>
>> https://github.com/tballison/commons-io/blob/TIKA-4065/src/test/java/org/apache/commons/io/IOUtilsMultithreadedTest.java
>>
>


Re: [IO] Java's InflaterInputStream not playing well with IOUtils.skipFully

2023-06-12 Thread Phil Steitz
On Mon, Jun 12, 2023 at 2:14 PM Tim Allison  wrote:

> All,
>
>   Since the refactoring to avoid threadlocal in skipFully, we are now
> getting errors in one of our unit tests on Tika [0].  We're still on
> commons-io 2.11.0, but we found this problem with 2.12.0 and 2.13.0.
>
>I was able to reproduce the problem outside of Tika [1].  To reproduce
> the problem, the stream has to be wrapped in java's InflaterInputStream,
> and it has to be run multithreaded.  The problem is easily reproducible
> with only two threads, but there is variation on which iteration triggers
> the problem even with the same seed (what you'd expect from a
> multi-threading bug).
>
>   I don't think this is a bug in commons-io, per se, but it is mildly
> frightening from a data reliability perspective that the combination of
> IOUtils.skipFully on top of Java's InflaterInputStream can lead to
> different data.
>
>   Is this a bug in Java, a bug in commons-io or something else?  What
> should we do?  Thank you!
>

Interesting.  Taking a quick look at the source for both IOUtils and
InflaterInputStream, it looks to me like the InflaterInputStream may be
effectively assuming that it has exclusive access to the buffer that it is
reading into.   I looked at a couple of jdk sources and they both look like
this:

public int read(byte[] b, int off, int len) throws IOException {
ensureOpen();
if (b == null) {
throw new NullPointerException();
} else if (off < 0 || len < 0 || len > b.length - off) {
throw new IndexOutOfBoundsException();
} else if (len == 0) {
return 0;
}
try {
int n;
while ((n = inf.inflate(b, off, len)) == 0) {
if (inf.finished() || inf.needsDictionary()) {
reachEOF = true;
return -1;
}
if (inf.needsInput()) {
fill();
}
}
return n;
} catch (DataFormatException e) {
String s = e.getMessage();
throw new ZipException(s != null ? s : "Invalid ZLIB data
format");
}
}

I can see how zero-ing the byte array it is reading to could cause problems
here.  I don't see any doc anywhere saying that either this Reader or the
interface in general has this constraint, so maybe it is a jdk bug, but it
is what it is and I think to be safe it would be better to add back the
ThreadLocal on the write only bufffer in IOUtills or change it to not zero
the array.  I don't see why the zeroing is necessary, but I don't know much
about [io].

Phil



>Best,
>
>Tim
>
>
>
>
> [0] https://issues.apache.org/jira/browse/TIKA-4065
> [1]
>
> https://github.com/tballison/commons-io/blob/TIKA-4065/src/test/java/org/apache/commons/io/IOUtilsMultithreadedTest.java
>


[IO] Java's InflaterInputStream not playing well with IOUtils.skipFully

2023-06-12 Thread Tim Allison
All,

  Since the refactoring to avoid threadlocal in skipFully, we are now
getting errors in one of our unit tests on Tika [0].  We're still on
commons-io 2.11.0, but we found this problem with 2.12.0 and 2.13.0.

   I was able to reproduce the problem outside of Tika [1].  To reproduce
the problem, the stream has to be wrapped in java's InflaterInputStream,
and it has to be run multithreaded.  The problem is easily reproducible
with only two threads, but there is variation on which iteration triggers
the problem even with the same seed (what you'd expect from a
multi-threading bug).

  I don't think this is a bug in commons-io, per se, but it is mildly
frightening from a data reliability perspective that the combination of
IOUtils.skipFully on top of Java's InflaterInputStream can lead to
different data.

  Is this a bug in Java, a bug in commons-io or something else?  What
should we do?  Thank you!

   Best,

   Tim




[0] https://issues.apache.org/jira/browse/TIKA-4065
[1]
https://github.com/tballison/commons-io/blob/TIKA-4065/src/test/java/org/apache/commons/io/IOUtilsMultithreadedTest.java