[
https://issues.apache.org/jira/browse/PDFBOX-1542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13612826#comment-13612826
]
Andreas Lehmkühler commented on PDFBOX-1542:
--------------------------------------------
I did something like this
java -jar pdfbox-app-x.y.z.jar ExtractText -sort invoice1.pdf
using different versions of PDFBox and everything works fine. And yes,
ExtractText uses PDFTextStripper which is a subclass of PDFStreamEngine.
> Whitespaces between words are not created
> -----------------------------------------
>
> Key: PDFBOX-1542
> URL: https://issues.apache.org/jira/browse/PDFBOX-1542
> Project: PDFBox
> Issue Type: Wish
> Components: Text extraction
> Affects Versions: 1.7.1
> Reporter: Vitalie Bureanu
> Priority: Minor
> Original Estimate: 1h
> Remaining Estimate: 1h
>
> Hello, I extract the text with PDFBox from PDF files. I noticed that
> extraction of text from some pdf files are not so good as expected. I have a
> seria of pdf invoices from which I try to extract the text with coordinates
> and resultat is pretty well, but I noticed very strange thing: when I extract
> text - the words are extracted without whitespaces bettween. Example: if I
> try to extract "Unit Price" the result is "UnitPrice".
> But if I open the invoice in Adobe Reader and make "Copy/Past" into
> Notepad... I have the "Unit Price" with whitespaces!
> I think the whitespaces are not present in original pdf document... but the
> Adobe Reader in some way "insert" whitespaces between words when it show
> content of the pdf.
>
> Guys, can you please suggest me how I can have the strings with spaces after
> the parsing?
> See example of invoice here: http://www.cloudforpeople.com/Invoice1.pdf
> PS: I want to try the 1.8.0. version of PDFBox - how I can download it?
> Many thanks,
> Vitalie
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira