Where do i download the
org.textmining package?
-Henry
Dmitry Goldenberg wrote:
Henry,
There are a few things you can try.
1. Take a look at org.textmining's Word text extractor:
org.textmining.text.extraction.WordExtractor
All you have to do is this:
new WordExtractor().extractText(inputStream)
2. There is also the POI extractor:
org.apache.poi.hdf.extractor.WordDocument
All you do is:
WordDocument wd = new WordDocument(is);
StringWriter docTextWriter = new StringWriter();
wd.writeAllText(new PrintWriter(docTextWriter));
docTextWriter.close();
text = docTextWriter.toString();
3. I'd also check out the following:
org.semanticdesktop.aperture.extractor.word.WordExtractor
here: http://aperture.sourceforge.net/doc/javadoc/index.html
Hope this helps,
- Dmitry
________________________________
From: Henry Lu [mailto:[EMAIL PROTECTED]
Sent: Thu 5/17/2007 1:19 PM
To: poi-user@jakarta.apache.org
Subject: reading MS word file
Is there an example/code to read a MS Word file for text line by line.
All I am interested in is the text regardless format, style, font...
-Henry
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
Mailing List: http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta Poi Project: http://jakarta.apache.org/poi/
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
Mailing List: http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta Poi Project: http://jakarta.apache.org/poi/