[
https://issues.apache.org/jira/browse/PDFBOX-76?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andreas Lehmkühler updated PDFBOX-76:
-------------------------------------
Attachment: PDFBOX76-Page1.pdf
> Text Extraction Unsuccessful with PDFBox
> ----------------------------------------
>
> Key: PDFBOX-76
> URL: https://issues.apache.org/jira/browse/PDFBOX-76
> Project: PDFBox
> Issue Type: Bug
> Components: Text extraction
> Attachments: PDFBOX76-Page1.pdf
>
>
> [imported from SourceForge]
> http://sourceforge.net/tracker/index.php?group_id=78314&atid=552832&aid=1250097
> Originally submitted by salchow on 2005-08-02 03:07.
> Hi Ben,
> Here you will find the different files : PDF, text file obtain
> with PDFBox, text file obtain with TextFromPDF.
> Thanks!
> salchow
> [attachment on SourceForge]
> http://sourceforge.net/tracker/download.php?group_id=78314&atid=552832&aid=1250097&file_id=144104
> PDFBox.rar (application/octet-stream), 39931 bytes
> [comment on SourceForge]
> Originally sent by nobody.
> Logged In: NO
> I'm getting the following error
> Exception in thread "main" java.lang.NoSuchMethodError:
> org.pdfbox.util.operator.OperatorProcessor.setContext(Lorg/pdfbox/util/PDFStreamEngine;)V
> while extracting text from pdf file.
> Please help me!
> [comment on SourceForge]
> Originally sent by benlitchfield.
> Logged In: YES
> user_id=601708
> A parser for Type1C CFF fonts needs to be written. Adobe
> technical spec 5176 CFF talks about the file format. It is a
> binary format that looks pretty straighforward, should be
> pretty fun to write a parser for it.
> This is not a high priority right now, but if someone is willing
> to write the parser I will integrate it. If you would like to write
> a parser please let me know and we can discuss, it needs to
> be somewhat robust as the data structures it creates should
> support modifying data so the font can be written back to a
> stream, which will support embedding fonts back into the pdf
> document.
> If this parser is written, I may start a separate project that just
> deals with font files in java, PDFBox currently has parsers for
> TTF/PFB and soon CFF I see more advanced font
> requirements in the pipeline, this type of functionality is really
> outside the scope of PDFBox but I don't believe anything else
> exists. I am sure if I start a font library project there are other
> java libraries that could make use of it as well
> Ben Litchfield
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.