[ 
https://issues.apache.org/jira/browse/PDFBOX-2232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14074944#comment-14074944
 ] 

John Hewson edited comment on PDFBOX-2232 at 7/25/14 9:24 PM:
--------------------------------------------------------------

PDFBox's extracted text is not quite the same as copying to the clipboard from 
Acrobat. Using Acrobat's "Save As Other..." and selecting plain text is what 
you want to compare.


was (Author: jahewson):
Extracted text is not quite the same as copying to the clipboard from Acrobat. 
Using Acrobat's "Save As Other..." and selecting plain text is what you want to 
compare.

> Is there difference between character \n and character space(32) in pdf stream
> ------------------------------------------------------------------------------
>
>                 Key: PDFBOX-2232
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-2232
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>            Reporter: huangchangan
>
> when extract text from pdf files with PDFTextStripper, I get a space(32) at 
> each end of paragraph or  cells in a table, while in the text copyed from 
> Adobe reader, the end character is \n, I wonder whether pdfbox convert 
> character \n to space(32),I checked function processEncodedText in 
> PDFStreamEngine and get no usefull information.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to