Re: [iText-questions] Read PDF replacing whitespace with spaces

1T3XT info Sat, 08 Nov 2008 08:07:33 -0800

Eoin Hinchy wrote:
> Hi guys,
> 
> I was wondering if it's possible to use iText to read in a PDF and
> replace all the whitespace in it with spaces/tabs/newlines.
> For example:
> Read in the file http://www.plainsight.info/dev/example.pdf
> and output something along the lines of:
> http://www.plainsight.info/dev/desired.txt
> 
> I've been looking through the itext forums/mail lists for the answer
> to my question but I couldn't find it.
> Is it even possible?


A week ago, I'd have said: no, that's not possible.
Then Kevin Day contributed code to parse PDF content
(it will be in the next release).

You can't achieve what you want with his code yet,
but... depending on the way the PDF is made (internally;
depending on the organization of the PDF syntax), you
could write an implementation of the PdfContentStreamProcessor
that gets close to what you need.
-- 
This answer is provided by 1T3XT BVBA
http://www.1t3xt.com/ - http://www.1t3xt.info

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
iText-questions mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.1t3xt.com/docs/book.php

Re: [iText-questions] Read PDF replacing whitespace with spaces

Reply via email to