[jira] Updated: (PDFBOX-76) Text Extraction Unsuccessful with PDFBox

JIRA Sat, 02 Oct 2010 08:26:55 -0700

     [ 
https://issues.apache.org/jira/browse/PDFBOX-76?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Andreas Lehmkühler updated PDFBOX-76:
-------------------------------------

    Attachment: PDFBOX76-Page1.pdf

> Text Extraction Unsuccessful with PDFBox
> ----------------------------------------
>
>                 Key: PDFBOX-76
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-76
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>         Attachments: PDFBOX76-Page1.pdf
>
>
> [imported from SourceForge]
> http://sourceforge.net/tracker/index.php?group_id=78314&atid=552832&aid=1250097
> Originally submitted by salchow on 2005-08-02 03:07.
> Hi Ben,
> Here you will find the different files : PDF, text file obtain 
> with PDFBox, text file obtain with TextFromPDF.
> Thanks!
> salchow
> [attachment on SourceForge]
> http://sourceforge.net/tracker/download.php?group_id=78314&atid=552832&aid=1250097&file_id=144104
> PDFBox.rar (application/octet-stream), 39931 bytes
> [comment on SourceForge]
> Originally sent by nobody.
> Logged In: NO 
> I'm getting the following error
> Exception in thread "main" java.lang.NoSuchMethodError: 
> org.pdfbox.util.operator.OperatorProcessor.setContext(Lorg/pdfbox/util/PDFStreamEngine;)V
> while extracting text from pdf file.
> Please help me!
> [comment on SourceForge]
> Originally sent by benlitchfield.
> Logged In: YES 
> user_id=601708
> A parser for Type1C CFF fonts needs to be written.  Adobe 
> technical spec 5176 CFF talks about the file format.  It is a 
> binary format that looks pretty straighforward, should be 
> pretty fun to write a parser for it.
> This is not a high priority right now, but if someone is willing 
> to write the parser I will integrate it.  If you would like to write 
> a parser please let me know and we can discuss, it needs to 
> be somewhat robust as the data structures it creates should 
> support modifying data so the font can be written back to a 
> stream, which will support embedding fonts back into the pdf 
> document.
> If this parser is written, I may start a separate project that just 
> deals with font files in java, PDFBox currently has parsers for 
> TTF/PFB and soon CFF I see more advanced font 
> requirements in the pipeline, this type of functionality is really 
> outside the scope of PDFBox but I don't believe anything else 
> exists.  I am sure if I start a font library project there are other 
> java libraries that could make use of it as well
> Ben Litchfield

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PDFBOX-76) Text Extraction Unsuccessful with PDFBox

Reply via email to