[
https://issues.apache.org/jira/browse/TIKA-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15096866#comment-15096866
]
Tilman Hausherr commented on TIKA-1830:
---------------------------------------
I can't reproduce the difference for the file 074531.pdf. ExtractText returns
identical results, that makes me doubt on the entire test :-(
I can reproduce the difference for 290377.pdf, this is because of a change in
decompression (rev 1709182) that tries to squeeze as much as possible from
corrupt streams.
There may be some differences due to a bugfix related to "article beads". This
will mean improved results for files with correct beads, but worse results for
files where bead rectangles are incorrect.
> Upgrade to PDFBox 1.8.11 when available
> ---------------------------------------
>
> Key: TIKA-1830
> URL: https://issues.apache.org/jira/browse/TIKA-1830
> Project: Tika
> Issue Type: Improvement
> Reporter: Tim Allison
> Attachments: reports_pdfbox_1_8_11-rc1.zip
>
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)