new report
http://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.22_vs_2.0.23_5.tar.xz
The content differences part is now the smallest ever, likely due to my
change in tika-eval (TIKA-3314) and restoring a PDFBox code segment I
accidentally deleted (PDFBOX-5115).
There are three new exceptions. Two are in jempbox and one is in tika
itself so I suspect PDFBox isn't to blame. I'll look at it too if I have
the time.
Tilman
Am 08.03.2021 um 11:17 schrieb Tilman Hausherr:
new report:
http://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.22_vs_2.0.23_3.tar.xz
Tilman
Am 08.03.2021 um 10:35 schrieb Tilman Hausherr:
I think we're good (despite the differences, most of which are
because of the soft hyphen), but I'm now experimenting with a
modified version of tika-eval to see what happens.
Tilman
Am 07.03.2021 um 19:47 schrieb Tilman Hausherr:
new report at
http://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.22_vs_2.0.23_2.tar.xz
Tilman
Am 07.03.2021 um 11:43 schrieb Tilman Hausherr:
Am 07.03.2021 um 06:04 schrieb Tilman Hausherr:
Report is here:
http://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.22_vs_2.0.23.tar.xz
There's not much changed. No new exceptions. Re content, the
changes that seem important are all related to "soft hyphen".
https://issues.apache.org/jira/browse/PDFBOX-5115
I am currently fixing this, and then I'll run the tests again. The
text extraction differences will likely stay. It's possible that a
change in tika-eval is needed too.
Tilman
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org