I'm done and created two issues. One difference I didn't report because the file fails in different ways (had 0 pages in one run and an exception in the other).

I haven't really understood the "new_catastrophic_exceptions_in_b" file. I can extract text from the files I tried. But the first file, bug_trackers/libvips/libvips-LINK-1721-0.pdf has problems rendering if memory is set to -Xmx8g. No problems when set to -Xmx4g.

Tilman

Am 16.04.2021 um 23:16 schrieb Tim Allison:
Hi All,
  I reran 2.0.23 with our added handling for flash files against the
3.0.0-SNAPSHOT that I ran yesterday.  The diffs look almost the same
as the reports I created yesterday, so I think those are accurate:
https://corpora.tika.apache.org/base/reports/pdfbox-2.0.23-richmedia.tgz

There are a handful of files that "lose" attachments going into
3.0.0-SNAPSHOT because I haven't added the richmedia handling in our
3.0.0 branch.

      Best,

            Tim

On Thu, Apr 15, 2021 at 7:15 PM Tim Allison <[email protected]> wrote:
Diffs look suspiciously small...I may have to rerun the analyses.

On Thu, Apr 15, 2021 at 7:08 PM Tim Allison <[email protected]> wrote:
Latest here: 
https://corpora.tika.apache.org/base/reports/pdfbox-3.0.0-20210415_reports.tgz

I haven't had a chance to look yet.  Will dig in tomorrow.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to