[
https://issues.apache.org/jira/browse/PDFBOX-2232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14074944#comment-14074944
]
John Hewson edited comment on PDFBOX-2232 at 7/25/14 9:24 PM:
--------------------------------------------------------------
PDFBox's extracted text is not quite the same as copying to the clipboard from
Acrobat. Using Acrobat's "Save As Other..." and selecting plain text is what
you want to compare.
was (Author: jahewson):
Extracted text is not quite the same as copying to the clipboard from Acrobat.
Using Acrobat's "Save As Other..." and selecting plain text is what you want to
compare.
> Is there difference between character \n and character space(32) in pdf stream
> ------------------------------------------------------------------------------
>
> Key: PDFBOX-2232
> URL: https://issues.apache.org/jira/browse/PDFBOX-2232
> Project: PDFBox
> Issue Type: Bug
> Components: Text extraction
> Reporter: huangchangan
>
> when extract text from pdf files with PDFTextStripper, I get a space(32) at
> each end of paragraph or cells in a table, while in the text copyed from
> Adobe reader, the end character is \n, I wonder whether pdfbox convert
> character \n to space(32),I checked function processEncodedText in
> PDFStreamEngine and get no usefull information.
--
This message was sent by Atlassian JIRA
(v6.2#6252)