I used XPDF - http://www.foolabs.com/xpdf/ for indexing PDFs with Zend http://www.kapustabrothers.com/2008/01/20/indexing-pdf-documents-with-zend_search_lucene/
Shaun On Fri, Feb 6, 2009 at 9:08 AM, Matthias Buesing < [email protected]> wrote: > Hi Jonathan, > I found pdftohtml which is exactly what I've been searching for. > > Thank you very much. > Matthias > > > > Jonathan Maron schrieb: > > Hello Matthias > > > > If you are running Linux, have you considered 'pdftotext'? > > > > http://linux.die.net/man/1/pdftotext > > > > If would be trivial to shell out using exec() and convert the text that > way. > > > > If you choose this route, it is very important to ensure all > > parameters being sent to exec() have not been manipulated. > > > > Jonathan Maron > > > > > > > > On Fri, Feb 6, 2009 at 12:44 PM, Matthias Buesing > > <[email protected]> wrote: > >> Hello, > >> is there any way to get the Text from inside of a PDF with Zend_PDF? > >> Or does anybody know a _free_ tool to do this? > >> > >> Greetings > >> Matthias > >> > >> > > -- --------------------------- Shaun J. Farrell Washington, DC http://www.livinginthedistrict.com/ http://www.kapustabrothers.com/
