[
https://issues.apache.org/jira/browse/PDFBOX-3715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15904464#comment-15904464
]
Tilman Hausherr commented on PDFBOX-3715:
-----------------------------------------
I did an ordinary ExtractText with 2.0.4 and with the trunk and it's all there.
{code}
All rights reserved. No part of the contents of this book may be reproduced or
transmitted in any form or by any means
{code}
> Text Stripper trims last spaces - regression of 2.0
> ---------------------------------------------------
>
> Key: PDFBOX-3715
> URL: https://issues.apache.org/jira/browse/PDFBOX-3715
> Project: PDFBox
> Issue Type: Bug
> Reporter: Roman
> Attachments: WindowsPhone7.pdf_page1_qdf.pdf
>
>
> When migrated from 1.8 to 2.0, we realized that some spaces are disappeared.
> Please see attached PDF. Disappeared spaces are shown as blue boxes in it.
> Those spaces WERE present in 1.8 version.
> Our App overrides *PDFTextStripper* class, implements *writePage()* method,
> and uses *charactersByArticle* property, which is actually a list of all
> *TextPosition* objects existing for every character from document.
> Some trailing spaces are disappeared from it. In the same time, those spaces
> are present in PDF via explicit declaration. For example, these piece of
> attached PDF contains the space right after "contents" word:
> {code}
> [( the content)-7(s )-2(of t)...]TJ
> {code}
> PS
> I found that this bug occurs only when *sortExtractedTextByPosition* mode
> is set to *false*. The spaces actually not disappeared, but moved to the
> begin of *charactersByArticle* list. Such behavior is not expected when
> sorting is Off.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]