I fixed the hwp5 multithreading problem.
I looked into tar files, and the handful I reviewed had a "skip the rest of
the final block with x bytes", but there weren't actually x bytes. This
didn't harm extraction because this happened on the last block. Folks will
get more exceptions, but will get the same content. I think this is ok on
balance given the improved safety we're getting with skip->skipFully in
TikaInputStream.
We do have more exceptions in mp4, but I think that is mostly on truncated
files.
In short, I _think_ we're ready to go for 1.24.1. Please take a look at
the reports and let me know what you think.
Best,
Tim
On Tue, Apr 14, 2020 at 10:36 AM Tim Allison <[email protected]> wrote:
> All,
> We've made some important bug fixes since 1.24. I recently ran the
> regression tests locally. The reports are here:
>
>
> https://github.com/tballison/share/blob/master/tika_comparisons/tika_1_24_1_reports.tgz
>
> We're getting more exceptions with .tar on "read the rest of the
> block". I'll look into this; my initial impression is that these files are
> not truncated.
>
> We're also getting more exceptions on mp4 with 0-length records, which,
> I think, is a side effect of truncation.
>
> Let me know what else you see.
>
> Cheers,
>
> Tim
>