Re: reading MS word file

2007-05-18 Thread Henry Lu

Where do i download the

org.textmining package?

-Henry


Dmitry Goldenberg wrote:

Henry,
 
There are a few things you can try.
 
1. Take a look at org.textmining's Word text extractor:


org.textmining.text.extraction.WordExtractor

All you have to do is this: 


new WordExtractor().extractText(inputStream)

2. There is also the POI extractor:

org.apache.poi.hdf.extractor.WordDocument

All you do is:

WordDocument wd = new WordDocument(is);
StringWriter docTextWriter = new StringWriter();
wd.writeAllText(new PrintWriter(docTextWriter));
docTextWriter.close();
text = docTextWriter.toString();

3. I'd also check out the following:

org.semanticdesktop.aperture.extractor.word.WordExtractor

here: http://aperture.sourceforge.net/doc/javadoc/index.html

Hope this helps,
- Dmitry




From: Henry Lu [mailto:[EMAIL PROTECTED]
Sent: Thu 5/17/2007 1:19 PM
To: poi-user@jakarta.apache.org
Subject: reading MS word file



Is there an example/code  to read a MS Word file for text line by line.
All I am interested in is the text regardless format, style, font...

-Henry

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
Mailing List: http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/




  


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
Mailing List: http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/



Re: reading MS word file

2007-05-18 Thread Henry Lu

Can somwone tell me where to download the svn package for POI?

-Henry
  


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
Mailing List: http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/



Re: reading MS word file

2007-05-18 Thread Nick Burch

On Fri, 18 May 2007, Henry Lu wrote:

Can somwone tell me where to download the svn package for POI?


You can either do a svn checkout yourself, and build it with ant:
http://jakarta.apache.org/site/cvsindex.html

Or download nightly builds from:
http://encore.torchbox.com/poi-svn-build/

Nick

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
Mailing List: http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/



RE: reading MS word file

2007-05-18 Thread Dmitry Goldenberg
http://textmining.org/



From: Henry Lu [mailto:[EMAIL PROTECTED]
Sent: Fri 5/18/2007 7:22 AM
To: POI Users List
Subject: Re: reading MS word file



Where do i download the

org.textmining package?

-Henry


Dmitry Goldenberg wrote:
 Henry,
 
 There are a few things you can try.
 
 1. Take a look at org.textmining's Word text extractor:

 org.textmining.text.extraction.WordExtractor

 All you have to do is this:

 new WordExtractor().extractText(inputStream)

 2. There is also the POI extractor:

 org.apache.poi.hdf.extractor.WordDocument

 All you do is:

 WordDocument wd = new WordDocument(is);
 StringWriter docTextWriter = new StringWriter();
 wd.writeAllText(new PrintWriter(docTextWriter));
 docTextWriter.close();
 text = docTextWriter.toString();

 3. I'd also check out the following:

 org.semanticdesktop.aperture.extractor.word.WordExtractor

 here: http://aperture.sourceforge.net/doc/javadoc/index.html

 Hope this helps,
 - Dmitry


 

 From: Henry Lu [mailto:[EMAIL PROTECTED]
 Sent: Thu 5/17/2007 1:19 PM
 To: poi-user@jakarta.apache.org
 Subject: reading MS word file



 Is there an example/code  to read a MS Word file for text line by line.
 All I am interested in is the text regardless format, style, font...

 -Henry

 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 Mailing List: http://jakarta.apache.org/site/mail2.html#poi
 The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/




  

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
Mailing List: http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
Mailing List: http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/

RE: reading MS word file

2007-05-17 Thread Dmitry Goldenberg
Henry,
 
There are a few things you can try.
 
1. Take a look at org.textmining's Word text extractor:

org.textmining.text.extraction.WordExtractor

All you have to do is this: 

new WordExtractor().extractText(inputStream)

2. There is also the POI extractor:

org.apache.poi.hdf.extractor.WordDocument

All you do is:

WordDocument wd = new WordDocument(is);
StringWriter docTextWriter = new StringWriter();
wd.writeAllText(new PrintWriter(docTextWriter));
docTextWriter.close();
text = docTextWriter.toString();

3. I'd also check out the following:

org.semanticdesktop.aperture.extractor.word.WordExtractor

here: http://aperture.sourceforge.net/doc/javadoc/index.html

Hope this helps,
- Dmitry




From: Henry Lu [mailto:[EMAIL PROTECTED]
Sent: Thu 5/17/2007 1:19 PM
To: poi-user@jakarta.apache.org
Subject: reading MS word file



Is there an example/code  to read a MS Word file for text line by line.
All I am interested in is the text regardless format, style, font...

-Henry

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
Mailing List: http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/