[
https://issues.apache.org/jira/browse/TIKA-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14365733#comment-14365733
]
Tim Allison commented on TIKA-1575:
-----------------------------------
If the multithreading hypothesis is correct, we had to get _extremely_ lucky
because we're now clearing the resources on PDFont after every document, and it
looks like the fonts are in the document, but they're clearly broken. So that
means that thread B would have had to overwrite (correct) the font in thread A
after thread A read the fonts for p. 14 but before it processed p 14...all
while threads C through J didn't happen to hit clearResources() between the
overwrite by Thread B and the processing by Thread A. Is this plausible? Are
there other static objects that could explain this behavior? Something else
going on?
> Upgrade to PDFBox 1.8.9 when available
> --------------------------------------
>
> Key: TIKA-1575
> URL: https://issues.apache.org/jira/browse/TIKA-1575
> Project: Tika
> Issue Type: Improvement
> Reporter: Tim Allison
> Priority: Minor
> Attachments: 005937.pdf.json, 005937_1_8_9-SNAPSHOT.pdf.json,
> 10-814_Appendix B_v3.pdf, PDFBox_1_8_8VPDFBox_1_8_9-SNAPSHOT.xlsx,
> PDFBox_1_8_8VPDFBox_1_8_9-SNAPSHOT_reports.zip,
> PDFBox_1_8_8Vs1_8_9_20150316.zip, content_diffs_20150316.xlsx
>
>
> The PDFBox community is about to release 1.8.9. Let's use this issue to
> track discussions before the release and to track Tika's upgrade to PDFBox
> 1.8.9
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)