Tim Allison created PDFBOX-5682:
-----------------------------------

             Summary: Long/permanent hang i n PDFBox 3.x
                 Key: PDFBOX-5682
                 URL: https://issues.apache.org/jira/browse/PDFBOX-5682
             Project: PDFBox
          Issue Type: Bug
            Reporter: Tim Allison


I found two files in the regression tests where we're now getting timeouts at 3 
minutes where we weren't before.  Unfortunately, PDFBox's export:text works on 
both, so it is probably another structural feature, perhaps a problem in Tika?

This file halts after printing out the header for Table 19 on page 46: 
https://corpora.tika.apache.org/base/docs/govdocs1/078/078656.pdf

Pure PDFBox's export:text complains multiple times: "Page skipped due to an 
invalid or missing type null, but it does finish quickly."

This file halts after extracting {{"854,793,592"}}: 
https://corpora.tika.apache.org/base/docs/commoncrawl3_refetched/G7/G7BO7PNCCREVF2BCY5YSYOPYDLMBYASY

Pure PDFBox's export:text processes this without problem.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to