Re: [IO] Java's InflaterInputStream not playing well with IOUtils.skipFully
Thank you! I opened https://issues.apache.org/jira/browse/IO-802 On 2023/06/13 02:14:51 Phil Steitz wrote: > On Mon, Jun 12, 2023 at 6:52 PM Phil Steitz wrote: > > > > > > > On Mon, Jun 12, 2023 at 2:14 PM Tim Allison wrote: > > > >> All, > >> > >> Since the refactoring to avoid threadlocal in skipFully, we are now > >> getting errors in one of our unit tests on Tika [0]. We're still on > >> commons-io 2.11.0, but we found this problem with 2.12.0 and 2.13.0. > >> > >>I was able to reproduce the problem outside of Tika [1]. To reproduce > >> the problem, the stream has to be wrapped in java's InflaterInputStream, > >> and it has to be run multithreaded. The problem is easily reproducible > >> with only two threads, but there is variation on which iteration triggers > >> the problem even with the same seed (what you'd expect from a > >> multi-threading bug). > >> > >> I don't think this is a bug in commons-io, per se, but it is mildly > >> frightening from a data reliability perspective that the combination of > >> IOUtils.skipFully on top of Java's InflaterInputStream can lead to > >> different data. > >> > >> Is this a bug in Java, a bug in commons-io or something else? What > >> should we do? Thank you! > >> > > > > Interesting. Taking a quick look at the source for both IOUtils and > > InflaterInputStream, it looks to me like the InflaterInputStream may be > > effectively assuming that it has exclusive access to the buffer that it is > > reading into. I looked at a couple of jdk sources and they both look like > > this: > > > > public int read(byte[] b, int off, int len) throws IOException { > > ensureOpen(); > > if (b == null) { > > throw new NullPointerException(); > > } else if (off < 0 || len < 0 || len > b.length - off) { > > throw new IndexOutOfBoundsException(); > > } else if (len == 0) { > > return 0; > > } > > try { > > int n; > > while ((n = inf.inflate(b, off, len)) == 0) { > > if (inf.finished() || inf.needsDictionary()) { > > reachEOF = true; > > return -1; > > } > > if (inf.needsInput()) { > > fill(); > > } > > } > > return n; > > } catch (DataFormatException e) { > > String s = e.getMessage(); > > throw new ZipException(s != null ? s : "Invalid ZLIB data > > format"); > > } > > } > > > > I can see how zero-ing the byte array it is reading to could cause > > problems here. I don't see any doc anywhere saying that either this Reader > > or the interface in general has this constraint, so maybe it is a jdk bug, > > but it is what it is and I think to be safe it would be better to add back > > the ThreadLocal on the write only bufffer in IOUtills or change it to not > > zero the array. I don't see why the zeroing is necessary, but I don't know > > much about [io]. > > > > Sorry, just eliminating the zero-ing probably wont work because concurrent > writes could make buffer segments not inflatable IIUC what is going on. So > I would just restore the protection. > > > > > Phil > > > > > > > >>Best, > >> > >>Tim > >> > >> > >> > >> > >> [0] https://issues.apache.org/jira/browse/TIKA-4065 > >> [1] > >> > >> https://github.com/tballison/commons-io/blob/TIKA-4065/src/test/java/org/apache/commons/io/IOUtilsMultithreadedTest.java > >> > > > - To unsubscribe, e-mail: user-unsubscr...@commons.apache.org For additional commands, e-mail: user-h...@commons.apache.org
Re: [IO] Java's InflaterInputStream not playing well with IOUtils.skipFully
On Mon, Jun 12, 2023 at 6:52 PM Phil Steitz wrote: > > > On Mon, Jun 12, 2023 at 2:14 PM Tim Allison wrote: > >> All, >> >> Since the refactoring to avoid threadlocal in skipFully, we are now >> getting errors in one of our unit tests on Tika [0]. We're still on >> commons-io 2.11.0, but we found this problem with 2.12.0 and 2.13.0. >> >>I was able to reproduce the problem outside of Tika [1]. To reproduce >> the problem, the stream has to be wrapped in java's InflaterInputStream, >> and it has to be run multithreaded. The problem is easily reproducible >> with only two threads, but there is variation on which iteration triggers >> the problem even with the same seed (what you'd expect from a >> multi-threading bug). >> >> I don't think this is a bug in commons-io, per se, but it is mildly >> frightening from a data reliability perspective that the combination of >> IOUtils.skipFully on top of Java's InflaterInputStream can lead to >> different data. >> >> Is this a bug in Java, a bug in commons-io or something else? What >> should we do? Thank you! >> > > Interesting. Taking a quick look at the source for both IOUtils and > InflaterInputStream, it looks to me like the InflaterInputStream may be > effectively assuming that it has exclusive access to the buffer that it is > reading into. I looked at a couple of jdk sources and they both look like > this: > > public int read(byte[] b, int off, int len) throws IOException { > ensureOpen(); > if (b == null) { > throw new NullPointerException(); > } else if (off < 0 || len < 0 || len > b.length - off) { > throw new IndexOutOfBoundsException(); > } else if (len == 0) { > return 0; > } > try { > int n; > while ((n = inf.inflate(b, off, len)) == 0) { > if (inf.finished() || inf.needsDictionary()) { > reachEOF = true; > return -1; > } > if (inf.needsInput()) { > fill(); > } > } > return n; > } catch (DataFormatException e) { > String s = e.getMessage(); > throw new ZipException(s != null ? s : "Invalid ZLIB data > format"); > } > } > > I can see how zero-ing the byte array it is reading to could cause > problems here. I don't see any doc anywhere saying that either this Reader > or the interface in general has this constraint, so maybe it is a jdk bug, > but it is what it is and I think to be safe it would be better to add back > the ThreadLocal on the write only bufffer in IOUtills or change it to not > zero the array. I don't see why the zeroing is necessary, but I don't know > much about [io]. > Sorry, just eliminating the zero-ing probably wont work because concurrent writes could make buffer segments not inflatable IIUC what is going on. So I would just restore the protection. > > Phil > > > >>Best, >> >>Tim >> >> >> >> >> [0] https://issues.apache.org/jira/browse/TIKA-4065 >> [1] >> >> https://github.com/tballison/commons-io/blob/TIKA-4065/src/test/java/org/apache/commons/io/IOUtilsMultithreadedTest.java >> >
Re: [IO] Java's InflaterInputStream not playing well with IOUtils.skipFully
On Mon, Jun 12, 2023 at 2:14 PM Tim Allison wrote: > All, > > Since the refactoring to avoid threadlocal in skipFully, we are now > getting errors in one of our unit tests on Tika [0]. We're still on > commons-io 2.11.0, but we found this problem with 2.12.0 and 2.13.0. > >I was able to reproduce the problem outside of Tika [1]. To reproduce > the problem, the stream has to be wrapped in java's InflaterInputStream, > and it has to be run multithreaded. The problem is easily reproducible > with only two threads, but there is variation on which iteration triggers > the problem even with the same seed (what you'd expect from a > multi-threading bug). > > I don't think this is a bug in commons-io, per se, but it is mildly > frightening from a data reliability perspective that the combination of > IOUtils.skipFully on top of Java's InflaterInputStream can lead to > different data. > > Is this a bug in Java, a bug in commons-io or something else? What > should we do? Thank you! > Interesting. Taking a quick look at the source for both IOUtils and InflaterInputStream, it looks to me like the InflaterInputStream may be effectively assuming that it has exclusive access to the buffer that it is reading into. I looked at a couple of jdk sources and they both look like this: public int read(byte[] b, int off, int len) throws IOException { ensureOpen(); if (b == null) { throw new NullPointerException(); } else if (off < 0 || len < 0 || len > b.length - off) { throw new IndexOutOfBoundsException(); } else if (len == 0) { return 0; } try { int n; while ((n = inf.inflate(b, off, len)) == 0) { if (inf.finished() || inf.needsDictionary()) { reachEOF = true; return -1; } if (inf.needsInput()) { fill(); } } return n; } catch (DataFormatException e) { String s = e.getMessage(); throw new ZipException(s != null ? s : "Invalid ZLIB data format"); } } I can see how zero-ing the byte array it is reading to could cause problems here. I don't see any doc anywhere saying that either this Reader or the interface in general has this constraint, so maybe it is a jdk bug, but it is what it is and I think to be safe it would be better to add back the ThreadLocal on the write only bufffer in IOUtills or change it to not zero the array. I don't see why the zeroing is necessary, but I don't know much about [io]. Phil >Best, > >Tim > > > > > [0] https://issues.apache.org/jira/browse/TIKA-4065 > [1] > > https://github.com/tballison/commons-io/blob/TIKA-4065/src/test/java/org/apache/commons/io/IOUtilsMultithreadedTest.java >
[IO] Java's InflaterInputStream not playing well with IOUtils.skipFully
All, Since the refactoring to avoid threadlocal in skipFully, we are now getting errors in one of our unit tests on Tika [0]. We're still on commons-io 2.11.0, but we found this problem with 2.12.0 and 2.13.0. I was able to reproduce the problem outside of Tika [1]. To reproduce the problem, the stream has to be wrapped in java's InflaterInputStream, and it has to be run multithreaded. The problem is easily reproducible with only two threads, but there is variation on which iteration triggers the problem even with the same seed (what you'd expect from a multi-threading bug). I don't think this is a bug in commons-io, per se, but it is mildly frightening from a data reliability perspective that the combination of IOUtils.skipFully on top of Java's InflaterInputStream can lead to different data. Is this a bug in Java, a bug in commons-io or something else? What should we do? Thank you! Best, Tim [0] https://issues.apache.org/jira/browse/TIKA-4065 [1] https://github.com/tballison/commons-io/blob/TIKA-4065/src/test/java/org/apache/commons/io/IOUtilsMultithreadedTest.java