new report:
http://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.22_vs_2.0.23_3.tar.xz

Tilman

Am 08.03.2021 um 10:35 schrieb Tilman Hausherr:
I think we're good (despite the differences, most of which are because of the soft hyphen), but I'm now experimenting with a modified version of tika-eval to see what happens.

Tilman

Am 07.03.2021 um 19:47 schrieb Tilman Hausherr:
new report at

http://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.22_vs_2.0.23_2.tar.xz

Tilman

Am 07.03.2021 um 11:43 schrieb Tilman Hausherr:
Am 07.03.2021 um 06:04 schrieb Tilman Hausherr:
Report is here:

http://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.22_vs_2.0.23.tar.xz

There's not much changed. No new exceptions. Re content, the changes that seem important are all related to "soft hyphen".

https://issues.apache.org/jira/browse/PDFBOX-5115

I am currently fixing this, and then I'll run the tests again. The text extraction differences will likely stay. It's possible that a change in tika-eval is needed too.

Tilman


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to