[ https://issues.apache.org/jira/browse/PDFBOX-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17905817#comment-17905817 ]
ASF subversion and git services commented on PDFBOX-5487: --------------------------------------------------------- Commit 1922517 from Tilman Hausherr in branch 'pdfbox/trunk' [ https://svn.apache.org/r1922517 ] PDFBOX-5487: Sonar fix > extra whitespaces when extracting Arabic text > --------------------------------------------- > > Key: PDFBOX-5487 > URL: https://issues.apache.org/jira/browse/PDFBOX-5487 > Project: PDFBox > Issue Type: Bug > Components: Text extraction > Affects Versions: 2.0.32, 3.0.3 PDFBox > Reporter: Fatemeh Elyasi > Assignee: Tilman Hausherr > Priority: Major > Labels: Arabic > Fix For: 2.0.33, 3.0.4 PDFBox, 4.0.0 > > Attachments: Malpass-at-the-G7-Leaders-Summit-Media-Briefing-AR > (withoutFixes).txt, Malpass-at-the-G7-Leaders-Summit-Media-Briefing-AR.pdf, > Malpass-at-the-G7-Leaders-Summit-Media-Briefing-AR.txt, > PDFBOX-3774-reduced.pdf-sorted-diff.txt, > PDFBOX-5487-arabic.pdf-sorted-diff.txt, PDFBOX-5487_ اعلامية.png, > PDFBOX-5487_ وفضلا.png, arabtest.pdf, meld1.png, meld2.png, meld3.png, > screenshot-1.png > > > trying to extract text from an arabic PDF. You may notice that some of > whitespaces are extracted in wrong place. > Example: > Original word: العالمية > Extracted word: العالمي ة > > Pdf is attached, the example word is on the first line. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org