Hi,
I have to index PDF files. For that I am using pdfbox. But when I try to
extract text from pdf file using pdfbox I get the following error:
java.io.IOException: Error: No 'ToUnicode' and no 'Encoding' for Font
at org.pdfbox.pdmodel.font.PDFont.encode(PDFont.java:347)
at
org.pdfbox.util.PDFStreamEngine.showString(PDFStreamEngine.java:169)
at
org.pdfbox.util.PDFTextStripper.showString(PDFTextStripper.java:461)
at
org.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:692)
at
org.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:128)
at
org.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:268)
at
org.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:200)
at
org.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:172)
at
org.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:120)
at org.pdfbox.ExtractText.main(ExtractText.java:213)
at test.LuceneExampleIndexer.indexFile(LuceneExampleIndexer.java:67)
at
test.LuceneExampleIndexer.indexDirectory(LuceneExampleIndexer.java:47)
at test.LuceneExampleIndexer.index(LuceneExampleIndexer.java:30)
at test.LuceneExampleIndexer.main(LuceneExampleIndexer.java:118)
Please tell me how to go about it.
Thanks,
Ankur
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]