Larry Evans wrote > I'm trying to extract the characters from page 41 of: > www.irs.gov/pub/irs-pdf/i1040.pdf > However, using the attached, ExtractPageContentSorted.java, and the member > function, at_page, where: > reader was produced from: > www.irs.gov/pub/irs-pdf/i1040.pdf > and pageNum was: > 41 > I only managed to produce the output shown in 2nd attachment, > i1040p41.txt. The characters shown in i1040p41.txt are nothing like what > appears on page 41 of i1040.pdf. Since the at_page member function > essentially does what Listing 15.27 in the book does: > > What can be done to *properly* extract the text characters from page 41 of > i1040.pdf.
When I apply > final static File TEST_FILE_IRS1040 = new File("data/i1040.pdf"); > final static File TEST_FILE_IRS1040_TEXT = new > File("data/out/i1040_41.txt"); > > public void testExtractIrs1040() throws DocumentException, IOException > { > System.out.printf("\n\nFile %s:", TEST_FILE_IRS1040); > PdfReader reader = new PdfReader(TEST_FILE_IRS1040.toString()); > String text = PdfTextExtractor.getTextFromPage(reader, 41); > > System.out.println(); > System.out.printf(">>>%s<<<\n", text); > > FileOutputStream fos = new > FileOutputStream(TEST_FILE_IRS1040_TEXT); > fos.write(text.getBytes()); > fos.close(); > } I get > Page 41 of 108 Fileid: … ions/I1040/2012/A/XML/Cycle10/source 21:06 - > 18-Jan-2013 > The type and rule above prints on all proofs including departmental > reproduction proofs. MUST be removed before printing. > 2012 Form 1040—Line 44 > Qualified Dividends and Capital Gain Tax Worksheet—Line 44 Keep for Your > Records > Before you begin: > See the earlier instructions for line 44 to see if you can use this > worksheet to figure your tax. > Before completing this worksheet, complete Form 1040 through line 43. > If you do not have to file Schedule D and you received capital gain > distributions, be sure you checked > the box on line 13 of Form 1040. > 1. Enter the amount from Form 1040, line 43. However, if you are filing > Form > 2555 or 2555-EZ (relating to foreign earned income), enter the amount from > > line 3 of the Foreign Earned Income Tax Worksheet .....................1. > 2. Enter the amount from Form 1040, line 9b* ....... > 2. > 3. Are you filing Schedule D?* > Yes. Enter the smaller of line 15 or 16 of > Schedule D. If either line 15 or line 16 is > > blank or a loss, enter -0- 3. > No. Enter the amount from Form 1040, line 13 > 4. Add lines 2 and 3 .............................. > 4. > 5. If filing Form 4952 (used to figure investment > interest expense deduction), enter any amount from > > line 4g of that form. Otherwise, enter -0- .......... 5. > 6. Subtract line 5 from line 4. If zero or less, enter -0- > ...................... > 6. > 7. Subtract line 6 from line 1. If zero or less, enter -0- > ...................... > 7. > 8. Enter: > $35,350 if single or married filing separately, > $70,700 if married filing jointly or qualifying widow(er), > > ............. 8. > $47,350 if head of household. > 9. Enter the smaller of line 1 or line 8 > .................................... > 9. > 10. Enter the smaller of line 7 or line 9 > .................................... > 10. > 11. Subtract line 10 from line 9. This amount is taxed at 0% > .................. > 11. > 12. Enter the smaller of line 1 or line 6 > .................................... > 12. > 13. Enter the amount from line 11 ........................................ > 13. > 14. Subtract line 13 from line 12 > ......................................... > 14. > 15. Multiply line 14 by 15% (.15) > ......................................................... > 15. > 16. Figure the tax on the amount on line 7. If the amount on line 7 is > less than $100,000, use the Tax > Table to figure this tax. If the amount on line 7 is $100,000 or more, use > the Tax Computation > > Worksheet > ......................................................................... > 16. > 17. Add lines 15 and 16 > .................................................................. > 17. > 18. Figure the tax on the amount on line 1. If the amount on line 1 is > less than $100,000, use the Tax > Table to figure this tax. If the amount on line 1 is $100,000 or more, use > the Tax Computation > > Worksheet > ......................................................................... > 18. > 19. Tax on all taxable income. Enter the smaller of line 17 or line 18. > Also include this amount on > Form 1040, line 44. If you are filing Form 2555 or 2555-EZ, do not enter > this amount on Form > > 1040, line 44. Instead, enter it on line 4 of the Foreign Earned Income > Tax Worksheet ......... 19. > *If you are filing Form 2555 or 2555-EZ, see the footnote in the Foreign > Earned Income Tax Worksheet before completing this line. > -41- > Need more information or forms? Visit IRS.gov. which looks ok. Thus,there is some other issue in your setup, maybe your PrintWriter uses a destructive encoding, or maybe some other program down the pipeline does something weird. Or maybe some old iText version? Regards, Michael -- View this message in context: http://itext-general.2136553.n4.nabble.com/apparently-garbage-characters-extracted-from-i1040-pdf-page-41-tp4659185p4659192.html Sent from the iText - General mailing list archive at Nabble.com. ------------------------------------------------------------------------------ LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99! 1,500+ hours of tutorials including VisualStudio 2012, Windows 8, SharePoint 2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack includes Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/22/13. http://pubads.g.doubleclick.net/gampad/clk?id=64545871&iu=/4140/ostg.clktrk _______________________________________________ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions iText(R) is a registered trademark of 1T3XT BVBA. Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/ Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php