[ 
https://issues.apache.org/jira/browse/TIKA-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison updated TIKA-1575:
------------------------------
    Attachment: PDFBox_1_8_8VPDFBox_1_8_9-SNAPSHOT_reports.zip
                PDFBox_1_8_8VPDFBox_1_8_9-SNAPSHOT.xlsx

[~tilman], thank you, again, for pinging me on the impending release of PDFBox 
1.8.9.  And, also thanks to you, I've turned on the AccessChecker, so you 
shouldn't see any content from files that don't allow extraction.

I ran the most recent eval code against all files that end in a pdf extension 
in govdocs1.

I've included in the xlsx file all files with some kind of an exception or with 
any difference in attachment counts, metadata value counts, lang id or content.

I've also included an example of a static dump of reports from the comparison 
database.  More work remains on that...

I haven't had a chance to join in your earlier comments from our work on the 
1.8.8 release.  Many apologies!

My quick impression:
1) no differences in attachments
2) no differences in metadata values
3) 1.8.9 fixed 3 null pointer exceptions, no new exceptions
4) Content wise:
      a) with 1.8.9 we're getting less form field info (looks like internal 
field names? More digging is required...)
      b) might be actual modest regressions with 
147/147012.pdf
223/223704.pdf


> Upgrade to PDFBox 1.8.9 when available
> --------------------------------------
>
>                 Key: TIKA-1575
>                 URL: https://issues.apache.org/jira/browse/TIKA-1575
>             Project: Tika
>          Issue Type: Improvement
>            Reporter: Tim Allison
>            Priority: Minor
>         Attachments: PDFBox_1_8_8VPDFBox_1_8_9-SNAPSHOT.xlsx, 
> PDFBox_1_8_8VPDFBox_1_8_9-SNAPSHOT_reports.zip
>
>
> The PDFBox community is about to release 1.8.9.  Let's use this issue to 
> track discussions before the release and to track Tika's upgrade to PDFBox 
> 1.8.9



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to