[
https://issues.apache.org/jira/browse/PDFBOX-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965748#comment-14965748
]
Tilman Hausherr commented on PDFBOX-2069:
-----------------------------------------
Acrobat Reader (which is the "gold standard") gets this:
{code}
Ulster Savings
Deposits W i t h d r a w a l s
3 0 0 . 0 0 -
2 0 0 . 0 0 -
11 0 . 0 0 -
202.57-
36.26-
712.00-
29589.00 3
768.12-
300.00-
-
Banking • Loans • investments • Tax & Payroll • Insurance
JO
W 12498-0000
Acc
Beginning Balance on Apri l 8 , 2013
Deposits and Other Credits
Checks and Other Debits
Credit - Interest
25
Statement Date 05/07/13
YOU NOW HAVE ACCESS TO 509000 ALLPOINT ATMS WORLDWIDE.
TO FIND THE ONE NEAREST YOU, USE OUR ATM LOCATOR AT
ULSTERSAVINGS.COM
-- PREMIUM 50+ FREE INTEREST CHECKING - -
Ending Balance on May 7 , 2013 $ 6 5 5 . 2 6
Average Balance $ E n c l o s u r e s 3
YTD Interest $ 4 . 3 1
YTD Withholding $ 0 . 0 0
Transaction Activity
$
$ 0 . 7 2
Date D e s c r i p t i o n
0 4 / 0 8 B E G I N N I N G BALANCE
0 4 / 0 8 WITHDRAWAL-ATM 3 1 1 7
62 M I L L H I L L ROAD WOODSTOCK N Y
0 4 / 1 0 WITHDRAWAL-ACH
HUMAN RIGHTS WAT- B I L L PAYMT
04/12 C K # 1 2 7 3
04/15 WITHDRAWAL-ACH
NEW SOUTH INSURA-BILL PAYMT
04/15 WITHDRAWAL-ACH
WASTE CONNECTION-BILL PAYMT
0 4 / 1 7 WITHDRAWAL-ACH
N PYMT T
0 4 / 1 8 WITHDRAWAL-ACH
N PYMT T
0 4 / 1 9 WITHDRAWAL-ACH
S PAYBMI LTL
04/22 WITHDRAWAL-ATM 3 1 1 7 EFF 0 4 - 2 1
62 M I L L H I L L ROAD WOODSTOCK N Y
04/22 WITHDRAWAL-ACH
{code}
> PDF's with Tc before Tm are getting incorrect spacing in PDFTextArea
> --------------------------------------------------------------------
>
> Key: PDFBOX-2069
> URL: https://issues.apache.org/jira/browse/PDFBOX-2069
> Project: PDFBox
> Issue Type: Bug
> Components: Text extraction
> Affects Versions: 1.8.5
> Environment: Windows
> Reporter: Joel Hirsh
> Labels: pdfbox
> Attachments: PDFBOX-2609-visible.pdf, PDFBOX-2609.pdf,
> PDFBox-2609-patch.zip
>
> Original Estimate: 1h
> Remaining Estimate: 1h
>
> Attached PDF is getting incorrect spacing using example program
> ExtractTextByArea.java as follows:
> Text in the area:java.awt.Rectangle[x=10,y=500,width=600,height=200]
> Transaction Activity
> Date D e s c r i p t i o n Deposits W i t h d r a w a l s
> 0 4 / 0 8 B E G I N N I N G BALANCE
> 04 / 0 8 W I THDRAWAL - ATM 3 1 1 7 3 0 0 . 0 0 -
> 62 M I L L H I L L ROAD WOODSTOCK N Y
> 04 / 1 0 W I THDRAWAL - ACH 2 0 0 . 0 0 -
> HUMAN RIGHTS WAT-B I L L PAYMT
> 04 / 12 C K # 1 2 7 3 11 0 . 0 0 -
> 0 4 / 1 5 W I THDRAWAL - ACH 2 0 2 . 5 7 -
> NEW SOUTH INSURA -B I LL PAYMT
> 04 / 1 5 W I THDRAWAL - ACH 3 6 . 2 6 -
> WASTE CONNECTION-BILL PAYMT
> 04 / 1 7 W I THDRAWAL - ACH 71 2 . 0 0 -
> N PYMT T
> 04 / 1 8 W I THDRAWAL - ACH 2958 9 . 0 0 3
> N PYMT T
> 04 / 1 9 W I THDRAWAL - ACH 76 8 . 1 2 -
> I believe this because PDF streams with Tc before Tm are having the matrix
> applied to the Tc, which is contrary to my experience with graphic pipelines.
> Most PDF streams seem to to have Tc after Tm, and thus do not hit this
> situation.
> I have attached a patch to two files that corrects the problem for this file,
> and also works correctly on my test suite of about 40 files from other
> sources.
> The result for the attached file now becomes:
> Text in the area:java.awt.Rectangle[x=10,y=500,width=600,height=200]
> Transaction Activity
> Date Description Deposits Withdrawals
> 04/08 BEGINNING BALANCE
> 04/08 WITHDRAWAL-ATM 3 117 300.00-
> 62 MILL HILL ROAD WOODSTOCK NY
> 04/10 WITHDRAWAL-ACH 200.00-
> HUMAN RIGHTS WAT-BILL PAYMT
> 04/12 CK# 1273 110.00-
> 04/15 WITHDRAWAL-ACH 202.57-
> NEW SOUTH INSURA-BILL PAYMT
> 04/15 WITHDRAWAL-ACH 36.26-
> WASTE CONNECTION-BILL PAYMT
> 04/17 WITHDRAWAL-ACH 712.00-
> N PYMT T
> 04/18 WITHDRAWAL-ACH 29589.00 3
> N PYMT T
> 04/19 WITHDRAWAL-ACH 768.12-
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]