Michael McCandless created PDFBOX-1297:
------------------------------------------
Summary: ExtractText fails to extract text from packaged PDFs
Key: PDFBOX-1297
URL: https://issues.apache.org/jira/browse/PDFBOX-1297
Project: PDFBox
Issue Type: Improvement
Components: Text extraction
Affects Versions: 1.6.0
Environment: Fedora 13 Linux
Reporter: Michael McCandless
Apparently a PDF is able to contain multiple files (like a Zip file); it's
called
a PDF Package, described at
http://help.adobe.com/en_US/Reader/8.0/help.html?content=WSE034CA46-D08F-4fff-AA3C-FF04510DAEF0.html
I have a simple example PDF Package, containing two sub-PDFs, but ExtractText
fails to extract their text.
It does run successfully (no exceptions), but the text it extracts is just the
boilerplate text
saying you should upgrade to Adobe Acrobat version 8 or later to view this PDF.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira