Try this?
InputStream input = new FileInputStream(file);
COSDocument document = parseDocument(input);
PDFTextStripper stripper = new PDFTextStripper();
StringWriter output = new StringWriter()
stripper.writeText(document, output);
System.out.println(output.toString())
errmmm...the code may not be 100% correct, but you get the idea.
Regards,
Kelvin
--------
The book giving manifesto - http://how.to/sharethisbook
On Fri, 27 Dec 2002 12:04:11 +0530, Suhas Indra said:
>Hello List
>
>I am using PDFBox to index some of the PDF documents. The parser
>works fine and I can read the summary. But the contents are
>displayed as java.io.InputStream.
>
>When I try the following:
>System.out.println(doc.getField("contents")) (where doc is the
>Document object)
>
>The result will be:
>
>Text<contents:java.io.InputStreamReader@127dc0>
>
>I want to print the extracted data.
>
>Can anyone please let me know how to extract the contents?
>
>Regards
>
>Suhas
>
>
>
>--------------------------------------------------------------
>Robosoft Technologies - Partners in Product Development
>
>
>
>
>
>
>
>
>
>--
>To unsubscribe, e-mail: <mailto:lucene-user-
>[EMAIL PROTECTED]> For additional commands, e-mail:
><mailto:lucene-user-
>[EMAIL PROTECTED]>
--
To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>