Am Freitag, den 11.12.2020, 14:58 +0100 schrieb Tilman Hausherr: > The exceptions are mostly about the acroform fixup. > This fails when the font can't be used. > > bug_trackers/PDFBOX/PDFBOX-4086-0.pdf > bug_trackers/PDFBOX/PDFBOX-4086-1.pdf > bug_trackers/PDFBOX/PDFBOX-4086-2.pdf > bug_trackers/PDFBOX/PDFBOX-3587-0.zip-5.pdf > bug_trackers/PDFBOX/PDFBOX-3642-0.pdf
they should be fixed now. > > > However I wonder if Tika should also be changed: it doesn't need the > appearances for text extraction. However it could use the field > repair. would be benefitial - that's also the reason why there are multiple processors with a single purpose. > > Tilman > > > Am 11.12.2020 um 13:07 schrieb Tilman Hausherr: > > I had a quick look > > - 32 new exceptions > > - content is a bit better, for NUM_COMMON_TOKENS the new version > > extracts 100.41% of the old one. > > > > Tilman > > > > Am 11.12.2020 um 13:04 schrieb Tilman Hausherr: > > > > > > http://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.21_vs_2.0.22.tar.xz > > > > > > > > > > > > > ------------------------------------------------------------------- > > -- > > To unsubscribe, e-mail: [email protected] > > For additional commands, e-mail: [email protected] > > > -- -- Maruan Sahyoun FileAffairs GmbH Josef-Schappe-Straße 21 40882 Ratingen Tel: +49 (2102) 89497 88 Fax: +49 (2102) 89497 91 [email protected] www.fileaffairs.de Geschäftsführer: Maruan Sahyoun Handelsregister: AG Düsseldorf, HRB 53837 UST.-ID: DE248275827 --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
