On Mon, Jun 12, 2023 at 6:52 PM Phil Steitz <phil.ste...@gmail.com> wrote:

>
>
> On Mon, Jun 12, 2023 at 2:14 PM Tim Allison <talli...@apache.org> wrote:
>
>> All,
>>
>>   Since the refactoring to avoid threadlocal in skipFully, we are now
>> getting errors in one of our unit tests on Tika [0].  We're still on
>> commons-io 2.11.0, but we found this problem with 2.12.0 and 2.13.0.
>>
>>    I was able to reproduce the problem outside of Tika [1].  To reproduce
>> the problem, the stream has to be wrapped in java's InflaterInputStream,
>> and it has to be run multithreaded.  The problem is easily reproducible
>> with only two threads, but there is variation on which iteration triggers
>> the problem even with the same seed (what you'd expect from a
>> multi-threading bug).
>>
>>   I don't think this is a bug in commons-io, per se, but it is mildly
>> frightening from a data reliability perspective that the combination of
>> IOUtils.skipFully on top of Java's InflaterInputStream can lead to
>> different data.
>>
>>   Is this a bug in Java, a bug in commons-io or something else?  What
>> should we do?  Thank you!
>>
>
> Interesting.  Taking a quick look at the source for both IOUtils and
> InflaterInputStream, it looks to me like the InflaterInputStream may be
> effectively assuming that it has exclusive access to the buffer that it is
> reading into.   I looked at a couple of jdk sources and they both look like
> this:
>
> public int read(byte[] b, int off, int len) throws IOException {
>         ensureOpen();
>         if (b == null) {
>             throw new NullPointerException();
>         } else if (off < 0 || len < 0 || len > b.length - off) {
>             throw new IndexOutOfBoundsException();
>         } else if (len == 0) {
>             return 0;
>         }
>         try {
>             int n;
>             while ((n = inf.inflate(b, off, len)) == 0) {
>                 if (inf.finished() || inf.needsDictionary()) {
>                     reachEOF = true;
>                     return -1;
>                 }
>                 if (inf.needsInput()) {
>                     fill();
>                 }
>             }
>             return n;
>         } catch (DataFormatException e) {
>             String s = e.getMessage();
>             throw new ZipException(s != null ? s : "Invalid ZLIB data
> format");
>         }
>     }
>
> I can see how zero-ing the byte array it is reading to could cause
> problems here.  I don't see any doc anywhere saying that either this Reader
> or the interface in general has this constraint, so maybe it is a jdk bug,
> but it is what it is and I think to be safe it would be better to add back
> the ThreadLocal on the write only bufffer in IOUtills or change it to not
> zero the array.  I don't see why the zeroing is necessary, but I don't know
> much about [io].
>

Sorry, just eliminating the zero-ing probably wont work because concurrent
writes could make buffer segments not inflatable IIUC what is going on.  So
I would just restore the protection.

>
> Phil
>
>
>
>>        Best,
>>
>>            Tim
>>
>>
>>
>>
>> [0] https://issues.apache.org/jira/browse/TIKA-4065
>> [1]
>>
>> https://github.com/tballison/commons-io/blob/TIKA-4065/src/test/java/org/apache/commons/io/IOUtilsMultithreadedTest.java
>>
>

Reply via email to