Am 11.03.2021 um 07:46 schrieb Andreas Lehmkuehler:
Am 11.03.21 um 07:24 schrieb Tilman Hausherr:
new report
http://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.22_vs_2.0.23_5.tar.xz
The content differences part is now the smallest ever, likely due to
my change in tika-eval (TIKA-3314) and restoring a PDFBox code
segment I accidentally deleted (PDFBOX-5115).
Cool!!
There are three new exceptions. Two are in jempbox and one is in tika
itself so I suspect PDFBox isn't to blame. I'll look at it too if I
have the time.
As far as I remember the jempbox issue isn't new, Tim mentioned it
some time ago. Just out of curiosity does it make sense to use an old
lib to extract metadata? Is there anything missing in xmpbox but
available in jempbox?
The three new exceptions weren't in earlier reports.
IIRC the reason Tika uses Jempbox is because Xmpbox fails when there is
a non standard schema.
Tilman
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org