[
https://issues.apache.org/jira/browse/PDFBOX-2991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14908062#comment-14908062
]
Ben McCann edited comment on PDFBOX-2991 at 10/15/15 10:19 PM:
---------------------------------------------------------------
The big problem here for me is that it's not recognizing them as different
words. It should be able to tell to put a whitespace between "California" and
"[email protected]", right?
was (Author: chengas123):
I don't care whether it thinks they're on the same line or not. The big problem
here for me is that it's not recognizing them as different words. It should be
able to tell to put a whitespace between "CA" and "[email protected]", right?
> Improper word concatenation when extracting pdf
> -----------------------------------------------
>
> Key: PDFBOX-2991
> URL: https://issues.apache.org/jira/browse/PDFBOX-2991
> Project: PDFBox
> Issue Type: Bug
> Reporter: Ben McCann
> Attachments: sample-resume.pdf
>
>
> The code below will output text for a pdf. Words that are on different lines
> are concatenated together
> PDDocument pdDoc = PDDocument.load(new File("sample-resume.pdf"));
> StringWriter writer = new StringWriter();
> new PDFTextStripper().writeText(pdDoc, writer);
> pdDoc.close();
> System.out.println(writer.toString());
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]