[jira] Commented: (PDFBOX-890) Can't extract text from PDF

Martijn Brinkers (JIRA) Fri, 19 Nov 2010 09:48:37 -0800

    [ 
https://issues.apache.org/jira/browse/PDFBOX-890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12933883#action_12933883
 ]


Martijn Brinkers commented on PDFBOX-890:
-----------------------------------------

The  singleByteMappings contain all the characters ('E', 'x', 't'.... ). The 
singleByteMappings are not used. I have attached a patch that fixes this. The 
PDF gurus should check whether my patch is correct or whether it just fixes 
this particular bug.

> Can't extract text from PDF
> ---------------------------
>
>                 Key: PDFBOX-890
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-890
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 1.3.1
>            Reporter: Igor Spasic
>         Attachments: test.pdf
>
>
> I have created a simply pdf by using Bullzip PDF printer (virtual Windows 
> printer). 
> PDFBOX is not able to parse text from this PDF, it just return some low ascii 
> chars.
> command:
> @java -jar pdfbox-app-1.3.1.jar ExtractText -console test.pdf

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PDFBOX-890) Can't extract text from PDF

Reply via email to