Am 05.06.21 um 20:09 schrieb Tilman Hausherr:
Thanks!
I created one issue (PDFBOX-5207) but I don't consider this a blocker.
The other files where column T has text have troubles related to matrix
multiplication. I suspect that some parser changes produce larger numbers than
before.
The file
bug_trackers/poppler/poppler-84988-0.zip-3.pdf
has a different problem but I suspect it is related:
/MediaBox [0 170141183460469231731687303715884105728 612 792]
in 2.0.23 rendering worked (it seems the number was skipped and then the
rectangle ignored), but in 2.0.24 it doesn't.
This is related to PDFBOX-5176 which changes the behaviour of the parser when it
comes to numerical valid but out of range values.
Tilman
Am 03.06.2021 um 14:24 schrieb Tim Allison:
Reports are here:
https://corpora.tika.apache.org/base/reports/reports-pdfbox-2.0.24-SNAPSHOT.tgz
No new exceptions. Content looks better by a tiny amount. There are a
few files with some apparent regressions, but overall, the diffs are
negligible.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]