Kevin

    I tried to find 'SimpleTextExtractingPdfContentStreamProcessor' class(or
Interface) in iText documentation API, but i could not find. Can you give me
a fully qualified name? You also mentioned about a zip file. I looked into
that file too, but i could not see any zip file as 'attachement'.  For your
reference i had pasted that link here...

http://www.mail-archive.com/[email protected]/msg08363.html

Thanks in advance.

Regards,
Maheen Alaudeen

On Wed, Nov 5, 2008 at 8:20 AM, 1T3XT info <[EMAIL PROTECTED]> wrote:

> Kevin Day wrote:
> > It may be possible to do this doing spatial analysis - as long as your
> input files are fairly uniform.
> > I posted a zip file a couple of days ago with the beginnings of text
> extraction functionality.
>
> It is now in SVN and there's some promising stuff in it.
> By the way: as I was going over the code, I added comments.
>
> >  You could extend or re-implement the
> SimpleTextExtractingPdfContentStreamProcessor to capture the
> > X position of your column headers, determine an 'average' bounding
> rectangle for each column, then
> > extract text that only falls into that bounding rectangle.
>
> That's indeed possible, but a lot of work.
> Another interesting implementation of the abstract class
> PdfContentStreamProcessor would be a processor that retrieves
> the positions of images in a PDF page.
>
> > It would also be possible to analze the graphic operations that draw the
> cell borders to construct
> > the target rectangles.  This kind of thing is *hard*, and impossible in
> the general case - but
> > for specific case it would probably be doable.
>
> Correct.
>
> > No matter what, this wouldn't be 100% perfect (as Paulo says, PDF files
> do not have any sort of
> > meta data that captures the concept of a 'table'), but it might be an
> option for you.
>
> Actually it's Bruno, but I've been away from the list for a very
> long time.
>  --
> This answer is provided by 1T3XT BVBA
> http://www.1t3xt.com/ - http://www.1t3xt.info
>
> -------------------------------------------------------------------------
> This SF.Net email is sponsored by the Moblin Your Move Developer's
> challenge
> Build the coolest Linux based applications with Moblin SDK & win great
> prizes
> Grand prize is a trip for two to an Open Source event anywhere in the world
> http://moblin-contest.org/redirect.php?banner_id=100&url=/
> _______________________________________________
> iText-questions mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/itext-questions
>
> Buy the iText book: http://www.1t3xt.com/docs/book.php
>
-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
iText-questions mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.1t3xt.com/docs/book.php

Reply via email to