Hi Ahmet We’re currently looking into a similar problem https://issues.apache.org/jira/browse/PDFBOX-2259
If you think this is the *exact* same problem that you’re seeing, please attach your PDF file to that JIRA issue, if not then please open a new JIRA issue and attach your file. (You can attach files in JIRA using More > Attach Files). The mailing list does not support file attachments, so we can’t see your file unless it is on JIRA. Thanks -- John On 25 Sep 2014, at 07:26, Ahmet Aker <[email protected]> wrote: > Hi, > I am using pdfBox (1.8.6) for converting Arabic pdf files (not images of > texts but real texts) to html. PdfBox works really good in most cases > however, it does have problems in recognizing compound characters. I am > attaching you a sample pdf file. In that e.g. I get > الفغاني but I should be getting > الأفغاني (الأفغاني). The > pdfBox misses the bit highlighted red. The same is valid for: > > ا (pdfBox output) --- الله (الله) > > Has this maybe to do with the encodings? I hope you can help me on this > matter. > > Many thanks, > ahmet
