Re: reading MS word file
Where do i download the org.textmining package? -Henry Dmitry Goldenberg wrote: Henry, There are a few things you can try. 1. Take a look at org.textmining's Word text extractor: org.textmining.text.extraction.WordExtractor All you have to do is this: new WordExtractor().extractText(inputStream) 2. There is also the POI extractor: org.apache.poi.hdf.extractor.WordDocument All you do is: WordDocument wd = new WordDocument(is); StringWriter docTextWriter = new StringWriter(); wd.writeAllText(new PrintWriter(docTextWriter)); docTextWriter.close(); text = docTextWriter.toString(); 3. I'd also check out the following: org.semanticdesktop.aperture.extractor.word.WordExtractor here: http://aperture.sourceforge.net/doc/javadoc/index.html Hope this helps, - Dmitry From: Henry Lu [mailto:[EMAIL PROTECTED] Sent: Thu 5/17/2007 1:19 PM To: poi-user@jakarta.apache.org Subject: reading MS word file Is there an example/code to read a MS Word file for text line by line. All I am interested in is the text regardless format, style, font... -Henry - To unsubscribe, e-mail: [EMAIL PROTECTED] Mailing List: http://jakarta.apache.org/site/mail2.html#poi The Apache Jakarta Poi Project: http://jakarta.apache.org/poi/ - To unsubscribe, e-mail: [EMAIL PROTECTED] Mailing List: http://jakarta.apache.org/site/mail2.html#poi The Apache Jakarta Poi Project: http://jakarta.apache.org/poi/
Re: reading MS word file
Can somwone tell me where to download the svn package for POI? -Henry - To unsubscribe, e-mail: [EMAIL PROTECTED] Mailing List: http://jakarta.apache.org/site/mail2.html#poi The Apache Jakarta Poi Project: http://jakarta.apache.org/poi/
Re: reading MS word file
On Fri, 18 May 2007, Henry Lu wrote: Can somwone tell me where to download the svn package for POI? You can either do a svn checkout yourself, and build it with ant: http://jakarta.apache.org/site/cvsindex.html Or download nightly builds from: http://encore.torchbox.com/poi-svn-build/ Nick - To unsubscribe, e-mail: [EMAIL PROTECTED] Mailing List: http://jakarta.apache.org/site/mail2.html#poi The Apache Jakarta Poi Project: http://jakarta.apache.org/poi/
RE: reading MS word file
http://textmining.org/ From: Henry Lu [mailto:[EMAIL PROTECTED] Sent: Fri 5/18/2007 7:22 AM To: POI Users List Subject: Re: reading MS word file Where do i download the org.textmining package? -Henry Dmitry Goldenberg wrote: Henry, There are a few things you can try. 1. Take a look at org.textmining's Word text extractor: org.textmining.text.extraction.WordExtractor All you have to do is this: new WordExtractor().extractText(inputStream) 2. There is also the POI extractor: org.apache.poi.hdf.extractor.WordDocument All you do is: WordDocument wd = new WordDocument(is); StringWriter docTextWriter = new StringWriter(); wd.writeAllText(new PrintWriter(docTextWriter)); docTextWriter.close(); text = docTextWriter.toString(); 3. I'd also check out the following: org.semanticdesktop.aperture.extractor.word.WordExtractor here: http://aperture.sourceforge.net/doc/javadoc/index.html Hope this helps, - Dmitry From: Henry Lu [mailto:[EMAIL PROTECTED] Sent: Thu 5/17/2007 1:19 PM To: poi-user@jakarta.apache.org Subject: reading MS word file Is there an example/code to read a MS Word file for text line by line. All I am interested in is the text regardless format, style, font... -Henry - To unsubscribe, e-mail: [EMAIL PROTECTED] Mailing List: http://jakarta.apache.org/site/mail2.html#poi The Apache Jakarta Poi Project: http://jakarta.apache.org/poi/ - To unsubscribe, e-mail: [EMAIL PROTECTED] Mailing List: http://jakarta.apache.org/site/mail2.html#poi The Apache Jakarta Poi Project: http://jakarta.apache.org/poi/ - To unsubscribe, e-mail: [EMAIL PROTECTED] Mailing List: http://jakarta.apache.org/site/mail2.html#poi The Apache Jakarta Poi Project: http://jakarta.apache.org/poi/
RE: reading MS word file
Henry, There are a few things you can try. 1. Take a look at org.textmining's Word text extractor: org.textmining.text.extraction.WordExtractor All you have to do is this: new WordExtractor().extractText(inputStream) 2. There is also the POI extractor: org.apache.poi.hdf.extractor.WordDocument All you do is: WordDocument wd = new WordDocument(is); StringWriter docTextWriter = new StringWriter(); wd.writeAllText(new PrintWriter(docTextWriter)); docTextWriter.close(); text = docTextWriter.toString(); 3. I'd also check out the following: org.semanticdesktop.aperture.extractor.word.WordExtractor here: http://aperture.sourceforge.net/doc/javadoc/index.html Hope this helps, - Dmitry From: Henry Lu [mailto:[EMAIL PROTECTED] Sent: Thu 5/17/2007 1:19 PM To: poi-user@jakarta.apache.org Subject: reading MS word file Is there an example/code to read a MS Word file for text line by line. All I am interested in is the text regardless format, style, font... -Henry - To unsubscribe, e-mail: [EMAIL PROTECTED] Mailing List: http://jakarta.apache.org/site/mail2.html#poi The Apache Jakarta Poi Project: http://jakarta.apache.org/poi/