It may be possible to do this doing spatial analysis - as long as your input files are fairly uniform. I posted a zip file a couple of days ago with the beginnings of text extraction functionality. You could extend or re-implement the SimpleTextExtractingPdfContentStreamProcessor to capture the X position of your column headers, determine an 'average' bounding rectangle for each column, then extract text that only falls into that bounding rectangle.
It would also be possible to analze the graphic operations that draw the cell borders to construct the target rectangles. This kind of thing is *hard*, and impossible in the general case - but for specific case it would probably be doable. No matter what, this wouldn't be 100% perfect (as Paulo says, PDF files do not have any sort of meta data that captures the concept of a 'table'), but it might be an option for you. - K ----------------------- Original Message ----------------------- From: 1T3XT info <[EMAIL PROTECTED]> To: Post all your questions about iText here <[email protected]> Cc: Date: Wed, 05 Nov 2008 08:41:46 +0100 Subject: Re: [iText-questions] Search PDF content using iText Maheen Alaudeen wrote: > Here is my requirement. > > A pdf document has table in it. I need to retrieve only "particular > column values" from the table if I give input as "column header" name. > Is there any way to do this using iText? > > Can some have any idea on this? The concept of a Table is inexistent in PDF. A table on a page is just a bunch of glyphs and lines. You should consider PDF as a "one way process". Once a table is rendered in a PDF file, there's no way back. -- This answer is provided by 1T3XT BVBA http://www.1t3xt.com/ - http://www.1t3xt.info ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ iText-questions mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.1t3xt.com/docs/book.php ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ iText-questions mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.1t3xt.com/docs/book.php
