[ 
https://issues.apache.org/jira/browse/PDFBOX-82?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andreas Lehmkühler updated PDFBOX-82:
-------------------------------------

    Attachment: PDFBOX82-strangeEncoding.pdf

> Strange encoding in PDF-file
> ----------------------------
>
>                 Key: PDFBOX-82
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-82
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>         Attachments: PDFBOX82-strangeEncoding.pdf
>
>
> [imported from SourceForge]
> http://sourceforge.net/tracker/index.php?group_id=78314&atid=552832&aid=1261693
> Originally submitted by mai00 on 2005-08-16 23:16.
> I've got a PDF-file with a strange encoding. When I use
> ExtractText it gives me lines of 
> MT73MT110MT102MT111MT45MT66.... 
> I tried to use other extracting tools like pdftohtml
> and it could extract the text in plain English.
> Maybe I'm just doing something wrong with the encoding
> but I tried different things and could't extract the
> right text.
> [attachment on SourceForge]
> http://sourceforge.net/tracker/download.php?group_id=78314&atid=552832&aid=1261693&file_id=145929
> strangeEncoding.pdf (application/pdf), 155806 bytes
> PDF-File with strange encoding
> [comment on SourceForge]
> Originally sent by mai00.
> Logged In: YES 
> user_id=1310361
> Hello,
> I have found more files which are strange encoded but e.g.
> can be extracted by Adobe. I though it might have to do with
> the Font. All of these files don't provide a BaseFont and I
> think it might be because maybe this files were created on
> Unix systems. Maybe if somehow the font can be provided or
> someone can find out which font is used one can find out how
> to decode the text.
> [comment on SourceForge]
> Originally sent by salchow.
> Logged In: YES 
> user_id=1316887
> I have got exactly the same problem. I registred a bug (ID 
> 1250097) and asked a question in the forum at the begining 
> of August under the title : "Text Extraction unsuccessful with 
> PDFBox". Unfortunately, I haven't got any solution yet. 
> If you find a way to solve this problem, i am interested in ...
> Thanks
> Salchow

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to