[
https://issues.apache.org/jira/browse/PDFBOX-3715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Roman updated PDFBOX-3715:
--------------------------
Description:
When migrated from 1.8 to 2.0, we realized that some spaces are disappeared.
Please see attached PDF. Disappeared spaces are shown as blue boxes in it.
Those spaces WERE present in 1.8 version.
Our App overrides *PDFTextStripper* class, implements *writePage()* method, and
uses *charactersByArticle* property, which is actually a list of all
*TextPosition* objects existing for every character from document.
Some trailing spaces are disappeared from it. In the same time, those spaces
are present in PDF via explicit declaration. For example, these piece of
attached PDF contains the space right after "contents" word:
{code}
[( the content)-7(s )-2(of t)...]TJ
{code}
PS
I found that this bug occurs only when *sortExtractedTextByPosition* mode is
set to *false*. The spaces actually not disappeared, but moved to the begin of
*charactersByArticle* list. Such behavior is not expected when sorting is Off.
was:
When migrated from 1.8 to 2.0, we realized that some spaces are disappeared.
Please see attached PDF. Disappeared spaces are shown as blue boxes in it.
Those spaces WERE present in 1.8 version.
Our App overrides *PDFTextStripper* class, implements *writePage()* method, and
uses *charactersByArticle* property, which is actually a list of all
*TextPosition* objects existing for every character from document.
Some trailing spaces are disappeared from it. In the same time, those spaces
are present in PDF via explicit declaration. For example, these piece of
attached PDF contains the space right after "contents" word:
{code}
[( the content)-7(s )-2(of t)...]TJ
{code}
PS
I found that this bug occurs only when *sortExtractedTextByPosition* mode is
set to *false*. The spaces actually not disappeared, but moved to the begin of
*charactersByArticle* list. Such behaveour is not expected when sorting is Off.
> Text Stripper trims last spaces - regression of 2.0
> ---------------------------------------------------
>
> Key: PDFBOX-3715
> URL: https://issues.apache.org/jira/browse/PDFBOX-3715
> Project: PDFBox
> Issue Type: Bug
> Reporter: Roman
> Attachments: WindowsPhone7.pdf_page1_qdf.pdf
>
>
> When migrated from 1.8 to 2.0, we realized that some spaces are disappeared.
> Please see attached PDF. Disappeared spaces are shown as blue boxes in it.
> Those spaces WERE present in 1.8 version.
> Our App overrides *PDFTextStripper* class, implements *writePage()* method,
> and uses *charactersByArticle* property, which is actually a list of all
> *TextPosition* objects existing for every character from document.
> Some trailing spaces are disappeared from it. In the same time, those spaces
> are present in PDF via explicit declaration. For example, these piece of
> attached PDF contains the space right after "contents" word:
> {code}
> [( the content)-7(s )-2(of t)...]TJ
> {code}
> PS
> I found that this bug occurs only when *sortExtractedTextByPosition* mode
> is set to *false*. The spaces actually not disappeared, but moved to the
> begin of *charactersByArticle* list. Such behavior is not expected when
> sorting is Off.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]