[
https://issues.apache.org/jira/browse/TIKA-3622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17461768#comment-17461768
]
Tilman Hausherr commented on TIKA-3622:
---------------------------------------
>From the PDFBox regression test a few days ago:
https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.24_vs_2.0.25.tar.xz
We have a 57% increase in tokens and 8% increase in common tokens likely due to
PDFBOX-5324 and PDFBOX-5331.
We have lots of files that extract trash. Some "common tokens" are "lost"
because the "new trash" is connected to the token. There's lots of "good stuff"
where there was nothing but also a lot of trash where there was nothing.
> Upgrade PDFBox to 2.0.25
> ------------------------
>
> Key: TIKA-3622
> URL: https://issues.apache.org/jira/browse/TIKA-3622
> Project: Tika
> Issue Type: Task
> Reporter: Tim Allison
> Priority: Minor
>
> Just released, not quite in Maven yet. I'm sorry we couldn't hold our
> releases for this, but we'll likely have new releases of Tika early in the
> new year.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)