new report:
http://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.22_vs_2.0.23_3.tar.xz
Tilman
Am 08.03.2021 um 10:35 schrieb Tilman Hausherr:
I think we're good (despite the differences, most of which are because
of the soft hyphen), but I'm now experimenting with a modified version
of tika-eval to see what happens.
Tilman
Am 07.03.2021 um 19:47 schrieb Tilman Hausherr:
new report at
http://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.22_vs_2.0.23_2.tar.xz
Tilman
Am 07.03.2021 um 11:43 schrieb Tilman Hausherr:
Am 07.03.2021 um 06:04 schrieb Tilman Hausherr:
Report is here:
http://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.22_vs_2.0.23.tar.xz
There's not much changed. No new exceptions. Re content, the changes
that seem important are all related to "soft hyphen".
https://issues.apache.org/jira/browse/PDFBOX-5115
I am currently fixing this, and then I'll run the tests again. The
text extraction differences will likely stay. It's possible that a
change in tika-eval is needed too.
Tilman
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org