Meher Saride wrote: > Problem space: Extract values out of a table with in the pdf.
There is no table in the PDF. There are lines and glyphs drawn on a page. When a human looks at the page, he says a table. When a program looks at the page, there is none. > Reference: I have a problem space where customer send us a formatted > PDF, we need to extract values out of it and apply some business rules > and stick it to the database. Unfortunately, I could not find a way to > get this done using java/iText. I could not post the original pdf > because of the SLA but I found a similar document with the same problem > attached here. The example pdf has a long table with some data cells in > it, I would like to extract them. You can extract the text if you write a TextExtractionStrategy. This will give you TextRenderInfo containing text snippets and their position on the page, but that's not the same as "extracting a table". > Another question, Is there a way to recursively write document > dictionary w/pdf object type by page ? Are you talking about creating Tagged PDF? Please be more specific. -- This answer is provided by 1T3XT BVBA http://www.1t3xt.com/ - http://www.1t3xt.info ------------------------------------------------------------------------------ Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev _______________________________________________ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/