Re: PDF Text extraction

Kelvin Tan Thu, 26 Dec 2002 23:02:53 -0800

Try this?

InputStream input = new FileInputStream(file);
                COSDocument document = parseDocument(input);
                PDFTextStripper stripper = new PDFTextStripper();
                StringWriter output = new StringWriter()
                stripper.writeText(document, output);
System.out.println(output.toString())


errmmm...the code may not be 100% correct, but you get the idea.

Regards,
Kelvin

--------
The book giving manifesto     - http://how.to/sharethisbook


On Fri, 27 Dec 2002 12:04:11 +0530, Suhas Indra said:
>Hello List
>
>I am using PDFBox to index some of the PDF documents. The parser
>works fine and I can read the summary. But the contents are
>displayed as java.io.InputStream.
>
>When I try the following:
>System.out.println(doc.getField("contents")) (where doc is the
>Document object)
>
>The result will be:
>
>Text<contents:java.io.InputStreamReader@127dc0>
>
>I want to print the extracted data.
>
>Can anyone please let me know how to extract the contents?
>
>Regards
>
>Suhas
>
>
>
>--------------------------------------------------------------
>Robosoft Technologies - Partners in Product Development
>
>
>
>
>
>
>
>
>
>--
>To unsubscribe, e-mail:   <mailto:lucene-user-
>[EMAIL PROTECTED]> For additional commands, e-mail:
><mailto:lucene-user-
>[EMAIL PROTECTED]>




--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

Re: PDF Text extraction

Reply via email to