Am Donnerstag, dem 11.03.2021 um 07:56 +0100 schrieb Tilman Hausherr:
> Am 11.03.2021 um 07:46 schrieb Andreas Lehmkuehler:
> > Am 11.03.21 um 07:24 schrieb Tilman Hausherr:
> > > new report
> > > http://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.22_vs_2.0.23_5.tar.xz
> > > 
> > > The content differences part is now the smallest ever, likely due
> > > to 
> > > my change in tika-eval (TIKA-3314) and restoring a PDFBox code 
> > > segment I accidentally deleted (PDFBOX-5115).
> > Cool!!
> > 
> > > There are three new exceptions. Two are in jempbox and one is in
> > > tika 
> > > itself so I suspect PDFBox isn't to blame. I'll look at it too if
> > > I 
> > > have the time.
> > As far as I remember the jempbox issue isn't new, Tim mentioned it 
> > some time ago. Just out of curiosity does it make sense to use an
> > old 
> > lib to extract metadata? Is there anything missing in xmpbox but 
> > available in jempbox?
> > 
> The three new exceptions weren't in earlier reports.
> 
> IIRC the reason Tika uses Jempbox is because Xmpbox fails when there
> is 
> a non standard schema.

would it make sense to add that support? If yes could we get samles of
various schema to support that development? Could look into that if we
think that's worth the effort

Maruan


> 
> Tilman
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: dev-h...@pdfbox.apache.org
> 

-- 
-- 
Maruan Sahyoun

FileAffairs GmbH
Josef-Schappe-Straße 21
40882 Ratingen

Tel: +49 (2102) 89497 88
Fax: +49 (2102) 89497 91
sahy...@fileaffairs.de
www.fileaffairs.de

Geschäftsführer: Maruan Sahyoun
Handelsregister: AG Düsseldorf, HRB 53837
UST.-ID: DE248275827


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to