[
https://issues.apache.org/jira/browse/TIKA-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tilman Hausherr updated TIKA-1419:
----------------------------------
Attachment: compare_Tika-trunk-1.7_w_PDFBox1.8.6Vs.1.8.7.xlsx
Here's an excel file, on the new column on the right I wrote which files
improved by solving the three related PDFBox issues above. I mostly tested the
files that had less tokens. I tested a few that had more tokens, there the
results are inconclusive. Some have improved, some had more tokens due to a
regression that has been solved now.
Would it be possible, the next time, to test with the same set of files, and
test not 1.8.8 against 1.8.7, but rather 1.8.8 against 1.8.6? The reason is
that if there's an unknown regression in 1.8.7, and this isn't solved, 1.8.8
would look as if there's the same quality, but it is not.
> Upgrade to PDFBox 1.8.7
> -----------------------
>
> Key: TIKA-1419
> URL: https://issues.apache.org/jira/browse/TIKA-1419
> Project: Tika
> Issue Type: Improvement
> Reporter: Tim Allison
> Assignee: Tim Allison
> Priority: Minor
> Attachments: compare_Tika-trunk-1.7_w_PDFBox1.8.6Vs.1.8.7.csv,
> compare_Tika-trunk-1.7_w_PDFBox1.8.6Vs.1.8.7.xlsx
>
>
> Will run against govdocs1 early next week and then upgrade if no major
> regressions are found.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)