Hi,
@Tilman
Thanks for running all those tests. The results are looking good to me, although
I've to admit that I don't understand every bit of those sheets especially those
about the content ;-)
However, I'm planning to cut the 2.0.23 release next Monday.
Andreas
Am 08.03.21 um 11:17 schrieb Tilman Hausherr:
new report:
http://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.22_vs_2.0.23_3.tar.xz
Tilman
Am 08.03.2021 um 10:35 schrieb Tilman Hausherr:
I think we're good (despite the differences, most of which are because of the
soft hyphen), but I'm now experimenting with a modified version of tika-eval
to see what happens.
Tilman
Am 07.03.2021 um 19:47 schrieb Tilman Hausherr:
new report at
http://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.22_vs_2.0.23_2.tar.xz
Tilman
Am 07.03.2021 um 11:43 schrieb Tilman Hausherr:
Am 07.03.2021 um 06:04 schrieb Tilman Hausherr:
Report is here:
http://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.22_vs_2.0.23.tar.xz
There's not much changed. No new exceptions. Re content, the changes that
seem important are all related to "soft hyphen".
https://issues.apache.org/jira/browse/PDFBOX-5115
I am currently fixing this, and then I'll run the tests again. The text
extraction differences will likely stay. It's possible that a change in
tika-eval is needed too.
Tilman
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org