[jira] Commented: (PDFBOX-860) 'fi' getting converted to '?'

Saurabh Mehrotra (JIRA) Tue, 12 Oct 2010 05:35:03 -0700

    [ 
https://issues.apache.org/jira/browse/PDFBOX-860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12920168#action_12920168
 ]


Saurabh Mehrotra commented on PDFBOX-860:
-----------------------------------------

Hi

It will not be possible to attach a sample pdf file. I will try to create a PDF 
file with those characters and upload the results.

However an useful finding which might help you is that the output is fine when 
using 0.8.0 version of the PDF box but when we use the 1.2.1 version the 
chracters 'fi' get converted to '?'

I will check if ligature characters are used in the PDF files.

Thanks & Regards
Saurabh

> 'fi' getting converted to '?'
> -----------------------------
>
>                 Key: PDFBOX-860
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-860
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 1.2.1
>         Environment: Solaris 10
>            Reporter: Saurabh Mehrotra
>
> Hi
> I am trying to use PDF box 1.2.1 version to extract text from PDF files.
> The following issue is observed in the extracted text:
> 1. Combination of the characters 'fi' is converted to a '?'
> example:  first becomes ?rst
>                   classifier becomes classi?er
>                   find becomes ?nd
> Is this a known bug? Can some setting of the PDF box be turned of to prevent 
> this?
> Thanks & Regards
> Saurabh

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PDFBOX-860) 'fi' getting converted to '?'

Reply via email to