I'm done and created two issues. One difference I didn't report because
the file fails in different ways (had 0 pages in one run and an
exception in the other).
I haven't really understood the "new_catastrophic_exceptions_in_b" file.
I can extract text from the files I tried. But the first file,
bug_trackers/libvips/libvips-LINK-1721-0.pdf has problems rendering if
memory is set to -Xmx8g. No problems when set to -Xmx4g.
Tilman
Am 16.04.2021 um 23:16 schrieb Tim Allison:
Hi All,
I reran 2.0.23 with our added handling for flash files against the
3.0.0-SNAPSHOT that I ran yesterday. The diffs look almost the same
as the reports I created yesterday, so I think those are accurate:
https://corpora.tika.apache.org/base/reports/pdfbox-2.0.23-richmedia.tgz
There are a handful of files that "lose" attachments going into
3.0.0-SNAPSHOT because I haven't added the richmedia handling in our
3.0.0 branch.
Best,
Tim
On Thu, Apr 15, 2021 at 7:15 PM Tim Allison <[email protected]> wrote:
Diffs look suspiciously small...I may have to rerun the analyses.
On Thu, Apr 15, 2021 at 7:08 PM Tim Allison <[email protected]> wrote:
Latest here:
https://corpora.tika.apache.org/base/reports/pdfbox-3.0.0-20210415_reports.tgz
I haven't had a chance to look yet. Will dig in tomorrow.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]