Alrighty. :-) The council has spoken, or has it? Not the first time I'm sharing my indexing method, but somehow there was little interest in the past, though I experience success with Word, Excel and Powerpoint files... Perhaps its an inelegant solution, but heck it works.
What I do is quite simple. Try opening an MS Word file in Notepad/Textpad: you see a bunch of text intermingled with binary and nonsense characters. My solution: throw away anything unreadable, and index everything else. I use regexp for the job. Try it, it works. :-) -- , [EMAIL PROTECTED] on 01/31/2003 On Fri, 31 Jan 2003 12:18:25 +0100, Massimo Mannino wrote: > >POI it' s correct, but use a OLE >If your application running under unix POI it' s incorrect... > > > > > >"Ronnie > >Kolehmainen" To: "Lucene Users List" <lucene- >[EMAIL PROTECTED]> ><ronnie cc: > >@sunstone.se> Subject: RE: How to index a Word document > > > >31/01/03 11.17 > >Please respond > >to "Lucene Users > >List" > > > > > > > > > >I've been using the POI-scratchpad package with a slightly altered >(only >interested in the text stuff) WordDocument class for a while. > >Results show that approx 50% of the Word documents are parsable with >this >package. This is not very good, but imo better than nothing, and yet >the >best(?) Java solution. > > >/Ronnie > > > > >>-----Ursprungligt meddelande----- >>Fr�n: Nellai [mailto:[EMAIL PROTECTED]] >>Skickat: den 31 januari 2003 04:50 >>Till: [EMAIL PROTECTED] >>�mne: How to index a Word document >> >> >>Hi! >> >>Can anyone tell me how to include word document for indexing. Is >>there any parser available for that. >> >>Thanks in advance >> >>Nellai... >> > > >--------------------------------------------------------------------- >To unsubscribe, e-mail: [EMAIL PROTECTED] >For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > >--------------------------------------------------------------------- >To unsubscribe, e-mail: [EMAIL PROTECTED] >For additional commands, e-mail: [EMAIL PROTECTED] > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
