Hi, I am using pdfBox (1.8.6) for converting Arabic pdf files (not images of texts but real texts) to html. PdfBox works really good in most cases however, it does have problems in recognizing compound characters. I am attaching you a sample pdf file. In that e.g. I get الفغاني but I should be getting الأفغاني (الأفغاني). The pdfBox misses the bit highlighted red. The same is valid for:
ا (pdfBox output) --- الله (الله) Has this maybe to do with the encodings? I hope you can help me on this matter. Many thanks, ahmet
