Hello Tim,

Could you please start another "B" batch + eval? I think we've fixed most, maybe all.

Thanks

Tilman

Am 09.04.2021 um 20:11 schrieb Tim Allison:
Apologies for my delay...

Reports are here:
https://corpora.tika.apache.org/base/reports/pdfbox-3.x-snapshot-reports.tgz

I added two new reports new_catastrophic_exceptions_in_b and
fixed_catastrophic_exceptions_in_b.  The former shows which files had
a missing or 0-byte extract in B but not A.  The latter shows the
opposite.  We can get missing or 0-byte extracts when the app crashes
(timeout or oom or other fatal crash).  Given that this is
multithreaded, all files that are currently being parsed during a
catastrophic event will have a 0-byte or missing extract.  So, there
are likely some files in there that are ok.

I ran the comparison before the fix for the infinite loop that Tilman
made this morning.  Note that that was a regular IOException because
TikaInputStream identified it because of too many EOFs...that did not
cause catastrophic problems.

Let me know if you have questions.  I haven't looked in great detail yet...

There's every chance that I need to make some more changes on the Tika side. :D

Cheers and happy 3.x!

Best,

       Tim

On Wed, Apr 7, 2021 at 9:23 AM Tim Allison <[email protected]> wrote:
LOL...  K.  I'll build locally with the PDFBOX-5153 fix and kick it
off today or tomorrow.

On Wed, Apr 7, 2021 at 1:40 AM Tilman Hausherr <[email protected]> wrote:
Yes it would be useful and no I haven't done it. I'm optimistic about
the results despite PDFBOX-5153.

Tilman

Am 06.04.2021 um 17:22 schrieb Tim Allison:
Hi All,

    Would it be useful for me to run regression tests comparing 2.x with
3.0.0-RC1 now or should I wait?  Or, has someone already done this?

    See https://issues.apache.org/jira/browse/TIKA-3347 for integration
with Tika.  Many thanks!

        Cheers,

             Tim

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to