Hi, we had some problems using the POI Word filter. In one document set, everything would work fine, in another more than 50% documents refused to work with it (does not index). I am not an OLE2 pro and cannot see any apparent difference in the documents between the different sets. The version used was Word 97 in almost all the docs. For the moment, I switched to a native converter (that does not process metadata and must be run using Runtime.exec(), though) until I have time to revisit the problem.
I do not want to disrecommend the POI-filters, it's a very cool idea. Please do try your particular document set with it. For a quick test, you can use the Docco personal search tool by Peter Becker and colleagues (available from SourceForge). It has a current version of POI included as a plugin and Lucene running as indexing backend. So you don't have to write code to get answers... Cheers, gregor -----Original Message----- From: Pleasant, Tracy [mailto:[EMAIL PROTECTED] Sent: Monday, December 15, 2003 2:58 PM To: Lucene Users List Subject: Word Documents As a spinoff, I was wondering if anyone has been happy with indexing and searching Word docs. What about reading the contents? Any problems? -----Original Message----- From: Ryan Ackley [mailto:[EMAIL PROTECTED] Sent: Friday, December 12, 2003 5:59 PM To: Zhou, Oliver; Lucene Users List Subject: Re: textmining: document title Check out jakarta POI (http://jakarta.apache.org/poi ) particularly the HPSF API. It allows you to extract metadata like Title, Author, etc. from OLE documents. -Ryan ----- Original Message ----- From: "Zhou, Oliver" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Friday, December 12, 2003 5:26 PM Subject: textmining: document title > Ryan, > > I'm using textmining and lucene to index word documents but don't know how > to get word document title. Your advice on this matter is appreciated. > > Thanks, > Oliver Zhou > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
