Re: [iText-questions] How to extract PDFTable content for process.

1T3XT info Thu, 15 Apr 2010 09:41:55 -0700

Meher Saride wrote:
> Problem space: Extract values out of a table with in the pdf.


There is no table in the PDF.
There are lines and glyphs drawn on a page.

When a human looks at the page, he says a table.
When a program looks at the page, there is none.

> Reference: I have a problem space where customer send us a formatted 
> PDF, we need to extract values out of it and apply some business rules 
> and stick it to the database. Unfortunately, I could not find a way to 
> get this done using java/iText.  I could not post the original pdf 
> because of the SLA but I found a similar document with the same problem 
> attached here. The example pdf has a long table with some data cells in 
> it, I would like to extract them.

You can extract the text if you write a TextExtractionStrategy.
This will give you TextRenderInfo containing text snippets and
their position on the page, but that's not the same as "extracting
a table".

> Another question, Is there a way to recursively write document 
> dictionary w/pdf object type by page ?

Are you talking about creating Tagged PDF? Please be more specific.
-- 
This answer is provided by 1T3XT BVBA
http://www.1t3xt.com/ - http://www.1t3xt.info

------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.itextpdf.com/book/
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/

Re: [iText-questions] How to extract PDFTable content for process.

Reply via email to