[ 
https://issues.apache.org/jira/browse/PDFBOX-5018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17233567#comment-17233567
 ] 

Marco Barbi commented on PDFBOX-5018:
-------------------------------------

Thanks for the detailed explanaition.

Then, at our end we could implement a logic that, if two characters share the 
same location and one of them is a blank, the blank is always rendered as 
first. This is not a general rule but in our use cases may be effective.

 

> Wrong extraction of blank character
> -----------------------------------
>
>                 Key: PDFBOX-5018
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-5018
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 2.0.21
>            Reporter: Marco Barbi
>            Priority: Major
>         Attachments: IT00820340966_Z-SO-PO 5270213 (1).pdf, 
> image-2020-11-17-11-19-45-132.png
>
>
> Applying the PDFTextStripper to the attached PDF Document, a not-existing 
> blank character is read in the following text:
>  
> !image-2020-11-17-11-19-45-132.png!
>  
> Instead of "O1AI7A" the text supplied to the writeString method callback is 
> "O 1AI7A".
> Making copy&paste from Adobe Reader doesn't introduce any blank character.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to