[ 
https://issues.apache.org/jira/browse/PDFBOX-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965748#comment-14965748
 ] 

Tilman Hausherr commented on PDFBOX-2069:
-----------------------------------------

Acrobat Reader (which is the "gold standard") gets this:
{code}
Ulster Savings
Deposits W i t h d r a w a l s
3 0 0 . 0 0 -
2 0 0 . 0 0 -
11 0 . 0 0 -
202.57-
36.26-
712.00-
29589.00 3
768.12-
300.00-
-
Banking • Loans • investments • Tax & Payroll • Insurance
JO
W 12498-0000
Acc
Beginning Balance on Apri l 8 , 2013
Deposits and Other Credits
Checks and Other Debits
Credit - Interest
25
Statement Date 05/07/13
YOU NOW HAVE ACCESS TO 509000 ALLPOINT ATMS WORLDWIDE.
TO FIND THE ONE NEAREST YOU, USE OUR ATM LOCATOR AT
ULSTERSAVINGS.COM
-- PREMIUM 50+ FREE INTEREST CHECKING - -
Ending Balance on May 7 , 2013 $ 6 5 5 . 2 6
Average Balance $ E n c l o s u r e s 3
YTD Interest $ 4 . 3 1
YTD Withholding $ 0 . 0 0
Transaction Activity
$
$ 0 . 7 2
Date D e s c r i p t i o n
0 4 / 0 8 B E G I N N I N G BALANCE
0 4 / 0 8 WITHDRAWAL-ATM 3 1 1 7
62 M I L L H I L L ROAD WOODSTOCK N Y
0 4 / 1 0 WITHDRAWAL-ACH
HUMAN RIGHTS WAT- B I L L PAYMT
04/12 C K # 1 2 7 3
04/15 WITHDRAWAL-ACH
NEW SOUTH INSURA-BILL PAYMT
04/15 WITHDRAWAL-ACH
WASTE CONNECTION-BILL PAYMT
0 4 / 1 7 WITHDRAWAL-ACH
N PYMT T
0 4 / 1 8 WITHDRAWAL-ACH
N PYMT T
0 4 / 1 9 WITHDRAWAL-ACH
S PAYBMI LTL
04/22 WITHDRAWAL-ATM 3 1 1 7 EFF 0 4 - 2 1
62 M I L L H I L L ROAD WOODSTOCK N Y
04/22 WITHDRAWAL-ACH
{code}


> PDF's with Tc before Tm are getting incorrect spacing in PDFTextArea
> --------------------------------------------------------------------
>
>                 Key: PDFBOX-2069
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-2069
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 1.8.5
>         Environment: Windows
>            Reporter: Joel Hirsh
>              Labels: pdfbox
>         Attachments: PDFBOX-2609-visible.pdf, PDFBOX-2609.pdf, 
> PDFBox-2609-patch.zip
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> Attached PDF is getting incorrect spacing using example program 
> ExtractTextByArea.java as follows:
> Text in the area:java.awt.Rectangle[x=10,y=500,width=600,height=200]
> Transaction Activity
> Date D e s c r i p t i o n Deposits W i t h d r a w a l s
> 0 4 / 0 8  B E G I N N I N G  BALANCE
> 04 / 0 8  W I THDRAWAL - ATM  3 1 1 7 3 0 0 . 0 0 -
> 62 M I L L  H I L L  ROAD WOODSTOCK N Y
> 04 / 1 0  W I THDRAWAL - ACH 2 0 0 . 0 0 -
> HUMAN RIGHTS WAT-B I L L  PAYMT
> 04 / 12  C K #  1 2 7 3 11 0 . 0 0 -
> 0 4 / 1 5  W I THDRAWAL - ACH 2 0 2 . 5 7 -
> NEW SOUTH INSURA -B I LL PAYMT
> 04 / 1 5  W I THDRAWAL - ACH 3 6 . 2 6 -
> WASTE CONNECTION-BILL PAYMT
> 04 / 1 7  W I THDRAWAL - ACH 71 2 . 0 0 -
> N  PYMT T
> 04 / 1 8  W I THDRAWAL - ACH 2958 9 . 0 0 3
> N  PYMT T
> 04 / 1 9  W I THDRAWAL - ACH 76 8 . 1 2 -
> I believe this because PDF streams with Tc before Tm are having the matrix 
> applied to the Tc, which is contrary to my experience with graphic pipelines. 
>  Most PDF streams seem to to have Tc after Tm, and thus do not hit this 
> situation.
> I have attached a patch to two files that corrects the problem for this file, 
> and also works correctly on my test suite of about 40 files from other 
> sources.  
> The result for the attached file now becomes:
> Text in the area:java.awt.Rectangle[x=10,y=500,width=600,height=200]
> Transaction  Activity
> Date  Description Deposits  Withdrawals
> 04/08  BEGINNING  BALANCE
> 04/08  WITHDRAWAL-ATM  3 117 300.00-
> 62 MILL  HILL  ROAD  WOODSTOCK  NY
> 04/10  WITHDRAWAL-ACH 200.00-
> HUMAN RIGHTS  WAT-BILL  PAYMT
> 04/12  CK#  1273 110.00-
> 04/15  WITHDRAWAL-ACH 202.57-
> NEW SOUTH  INSURA-BILL  PAYMT
> 04/15  WITHDRAWAL-ACH 36.26-
> WASTE CONNECTION-BILL  PAYMT
> 04/17  WITHDRAWAL-ACH 712.00-
> N  PYMT T
> 04/18  WITHDRAWAL-ACH 29589.00 3
> N  PYMT T
> 04/19  WITHDRAWAL-ACH 768.12-



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to