[ 
https://issues.apache.org/jira/browse/PDFBOX-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14969340#comment-14969340
 ] 

Tilman Hausherr edited comment on PDFBOX-3045 at 10/22/15 3:53 PM:
-------------------------------------------------------------------

The wrong space widths is a duplicate of PDFBOX-2508, I'll write a comment 
there soon. The text extraction works with the current trunk, likely due to 
resolving PDFBOX-2976 which allowed recovered of files with corrupt compression 
(your file is corrupt). Have been using RC1? That one has the problem.


was (Author: tilman):
The wrong space widths is a duplicate of PDFBOX-2508. The text extraction works 
with the current trunk, likely due to resolving PDFBOX-2976 which allowed 
recovered of files with broken compression. Have been using RC1? That one has 
the problem.

> File that read fine in 1.8 does not in 2.0
> ------------------------------------------
>
>                 Key: PDFBOX-3045
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-3045
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 2.0.0
>            Reporter: Fred Andrews
>         Attachments: PDFbox2.pdf
>
>
> Attached is a page of a file that was parsed fine with PDFBox 1.8
> In 2.0, using pdfbox/examples/util/PrintTextLocations.java
> lots of the text is missing - for example all the text like
> "MERCH BANKCARD NET SETLMT"
> Also it has width_of_space as some bad value - 561591.3
> Start of PrintTextLocations....
> Oct 21, 2015 10:36:22 PM org.apache.pdfbox.filter.FlateFilter decode
> SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
> Oct 21, 2015 10:36:22 PM org.apache.pdfbox.contentstream.PDFStreamEngine 
> operatorException
> WARNING: java.util.zip.DataFormatException: incorrect data check
> Oct 21, 2015 10:36:22 PM org.apache.pdfbox.contentstream.PDFStreamEngine 
> operatorException
> WARNING: Cannot execute restore, the graphics stack is empty
> String[161.94,422.1 fs=10.0 xscale=10.0 height=7.2857146 space=561591.3 
> width=6.6857147]B
> String[168.62572,422.1 fs=10.0 xscale=10.0 height=7.2857146 space=561591.3 
> width=4.457138]e
> String[173.08286,422.1 fs=10.0 xscale=10.0 height=7.2857146 space=561591.3 
> width=4.9714355]g
> String[178.05429,422.1 fs=10.0 xscale=10.0 height=7.2857146 space=561591.3 
> width=2.742859]i



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to