Am Freitag, den 11.12.2020, 14:58 +0100 schrieb Tilman Hausherr:
> The exceptions are mostly about the acroform fixup.
> This fails when the font can't be used.
> 
> bug_trackers/PDFBOX/PDFBOX-4086-0.pdf
> bug_trackers/PDFBOX/PDFBOX-4086-1.pdf
> bug_trackers/PDFBOX/PDFBOX-4086-2.pdf
> bug_trackers/PDFBOX/PDFBOX-3587-0.zip-5.pdf
> bug_trackers/PDFBOX/PDFBOX-3642-0.pdf

they should be fixed now.

> 
> 
> However I wonder if Tika should also be changed: it doesn't need the 
> appearances for text extraction. However it could use the field
> repair.

would be benefitial - that's also the reason why there are multiple
processors with a single purpose.

> 
> Tilman
> 
> 
> Am 11.12.2020 um 13:07 schrieb Tilman Hausherr:
> > I had a quick look
> > - 32 new exceptions
> > - content is a bit better, for NUM_COMMON_TOKENS the new version 
> > extracts 100.41% of the old one.
> > 
> > Tilman
> > 
> > Am 11.12.2020 um 13:04 schrieb Tilman Hausherr:
> > >  
> > > http://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.21_vs_2.0.22.tar.xz
> > >  
> > 
> > 
> > 
> > 
> > -------------------------------------------------------------------
> > --
> > To unsubscribe, e-mail: [email protected]
> > For additional commands, e-mail: [email protected]
> > 
> 

-- 
-- 
Maruan Sahyoun

FileAffairs GmbH
Josef-Schappe-Straße 21
40882 Ratingen

Tel: +49 (2102) 89497 88
Fax: +49 (2102) 89497 91
[email protected]
www.fileaffairs.de

Geschäftsführer: Maruan Sahyoun
Handelsregister: AG Düsseldorf, HRB 53837
UST.-ID: DE248275827


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to