[
https://issues.apache.org/jira/browse/PDFBOX-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andreas Lehmkühler resolved PDFBOX-5682.
----------------------------------------
Resolution: Fixed
Set to resolved.
[~msahyoun] I'm planning to implement some performance tests in the context of
some improvements/refactorings w.r.t. to the parser
> Long/permanent hang in PDFBox 3.x
> ---------------------------------
>
> Key: PDFBOX-5682
> URL: https://issues.apache.org/jira/browse/PDFBOX-5682
> Project: PDFBox
> Issue Type: Bug
> Reporter: Tim Allison
> Assignee: Andreas Lehmkühler
> Priority: Minor
> Fix For: 3.0.1 PDFBox, 4.0.0
>
>
> I found two files in the regression tests where we're now getting timeouts at
> 3 minutes where we weren't before. Unfortunately, PDFBox's export:text works
> on both, so it is probably another structural feature, perhaps a problem in
> Tika?
> This file halts after printing out the header for Table 19 on page 46:
> https://corpora.tika.apache.org/base/docs/govdocs1/078/078656.pdf
> Pure PDFBox's export:text complains multiple times: "Page skipped due to an
> invalid or missing type null, but it does finish quickly."
> This file halts after extracting {{"854,793,592"}}:
> https://corpora.tika.apache.org/base/docs/commoncrawl3_refetched/G7/G7BO7PNCCREVF2BCY5YSYOPYDLMBYASY
> Pure PDFBox's export:text processes this without problem.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]