Bernhard Wellhöfer wrote: > Hello, > > I have a PDF document with ~900 pages. The document lists in a long > table the personal data of people. My job is it now to extract the first > and last name into a structured document (e.g. XML, Excel, csv ...).
I hope you didn't accept that assignment yet. Is there still time to turn it down? > I already managed to open the document via iText. After studying the > Documentation I do not understand how to search in the document for the > table and then process each table cell. Can somebody send me a hint or a > link to the documenation how to find and process the table? Is the PDF a Tagged PDF file? If not: congratulations, you have accepted a mission impossible! PDF is a Page Description Language, not a Word Processing format. If you add a table to a PDF file, it is painted on a canvas and all structure is lost (unless the PDF is tagged). It's similar to creating an image (GIF, JPG) from a table, and then ask somebody to convert the image to XLS or Word. If you want to know what Tagged PDF is, please download the free chapters from my book: http://www.manning.com/affiliate/idevaffiliate.php?id=223_53 and read them carefully. Your best chances lie with a product that does OCR (but I don't know any free ones). best regards, Bruno ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ iText-questions mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://itext.ugent.be/itext-in-action/
