Many things can go wrong with Arabic script languages and PDFs. Some are
fixable; some are not. This _might_ be relevant:
https://issues.apache.org/jira/browse/PDFBOX-4531
If you know Arabic and can help, please chip in on that issue and/or open a
new one on PDFBox’s JIRA.
Best,
Tim
On Fri, Apr 26, 2019 at 8:05 PM Tim Allison <[email protected]> wrote:
> https://wiki.apache.org/tika/Troubleshooting%20Tika#PDF_Text_Problems
>
>
> On Fri, Apr 26, 2019 at 4:00 PM Chris Mattmann <[email protected]>
> wrote:
>
>> Hi,
>>
>>
>>
>> This would be a good question to ask on the [email protected] list so I’m
>> CC’ing them.
>>
>>
>>
>> Cheers,
>>
>> Chris
>>
>>
>>
>>
>>
>> From: Djari Imene <[email protected]>
>> Date: Friday, April 26, 2019 at 9:45 AM
>> To: "Mattmann, Chris A (1761)" <[email protected]>
>> Subject: [EXTERNAL] Tika script
>>
>>
>>
>> Good evening sir I am writing you to request more information about how
>> can i parse a arabic pdf to xml i tried to convert it to text by using your
>> script but it gives a wrong caracters i would be very thankful if you
>> could just help me to fix this problem
>>
>>