noureldin-eg commented on PR #155: URL: https://github.com/apache/pdfbox/pull/155#issuecomment-2543361791
> Any known side effects for this commit? No known side effects for Arabic (and English) text extraction. I can't confirm its impact on other languages, but if you'd like, I could modify the implementation to apply this fix only when the unicode fall within the Arabic code pages. > whether the contents in PDFBOX-5487-arabic.pdf-sorted-diff.txt are better in the lines with "new" Yes, the extracted contents are better after my commit. Specifically, the two key changes highlighted in the screenshots above and explained in the Jira issue have been addressed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org