[
https://issues.apache.org/jira/browse/JCR-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jukka Zitting updated JCR-1567:
-------------------------------
Affects Version/s: (was: core 1.4.2)
1.4
Fix Version/s: 1.5
Assignee: Jukka Zitting
Issue Type: Improvement (was: Bug)
Summary: Upgrade to PDFBox 0.7.3 (was: IOException while
extracting text from PDF)
Classifying this as an improvement, as the bug is in PDFBox and not in
Jackrabbit. The Jackrabbit improvement would be the upgrade to PDFBox 0.7.3.
PS. PDFBox is currently incubating to become an Apache project, so there's
still life there.
> Upgrade to PDFBox 0.7.3
> -----------------------
>
> Key: JCR-1567
> URL: https://issues.apache.org/jira/browse/JCR-1567
> Project: Jackrabbit
> Issue Type: Improvement
> Components: indexing, jackrabbit-text-extractors
> Affects Versions: 1.4
> Environment: Tomcat 6; JDK 1.6; Windows 2003;
> Reporter: Julio Castillo
> Assignee: Jukka Zitting
> Fix For: 1.5
>
>
> while trying to upload a PDF document (which I can view fine with Acrobat
> Reader once it is loaded) I get the following exception:
> 01.05.2008 12:24:44 *WARN * PdfTextExtractor: Failed to extract PDF text
> content (PdfTextExtractor.java, line 91)
> java.io.IOException: Error: Expected an integer type, actual='%%EOF'
> at org.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1159)
> at org.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:349)
> at org.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:132)
> at
> org.apache.jackrabbit.extractor.PdfTextExtractor.extractText(PdfTextExtractor.java:69)
> at
> org.apache.jackrabbit.extractor.CompositeTextExtractor.extractText(CompositeTextExtractor.java:90)
> at
> org.apache.jackrabbit.core.query.lucene.JackrabbitTextExtractor.extractText(JackrabbitTextExtractor.java:195)
> at
> org.apache.jackrabbit.core.query.lucene.NodeIndexer.addBinaryValue(NodeIndexer.java:393)
> ....
> I replaced the version of pdfbox (0.6.4) that is bundled with the jackrabbit
> war file with a more recent version (0.7.3 and fontbox 01.) and it worked
> fine. The bundled versions should be upgraded.
> On the other hand, this software appears to be inactive. Probably a different
> package should be selected in the long run, but for now, a simple upgrade
> will do the trick.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.