Alrighty. :-) The council has spoken, or has it?

Not the first time I'm sharing my indexing method, but somehow there
was little interest in the past, though I experience success with
Word, Excel and Powerpoint files... Perhaps its an inelegant
solution, but heck it works.

What I do is quite simple. Try opening an MS Word file in
Notepad/Textpad: you see a bunch of text intermingled with binary and
nonsense characters. My solution: throw away anything unreadable, and
index everything else. I use regexp for the job. Try it, it works.
:-)
--
, [EMAIL PROTECTED] on 01/31/2003


On Fri, 31 Jan 2003 12:18:25 +0100, Massimo Mannino wrote:
>
>POI it' s correct, but use a OLE
>If your application running under unix POI it' s incorrect...
>
>
>
>
>
>"Ronnie
>
>Kolehmainen"             To:      "Lucene Users List" <lucene-
>[EMAIL PROTECTED]>
><ronnie                  cc:
>
>@sunstone.se>            Subject: RE: How to index a Word document
>
>
>
>31/01/03 11.17
>
>Please respond
>
>to "Lucene Users
>
>List"
>
>
>
>
>
>
>
>
>
>I've been using the POI-scratchpad package with a slightly altered
>(only
>interested in the text stuff) WordDocument class for a while.
>
>Results show that approx 50% of the Word documents are parsable with
>this
>package. This is not very good, but imo better than nothing, and yet
>the
>best(?) Java solution.
>
>
>/Ronnie
>
>
>
>
>>-----Ursprungligt meddelande-----
>>Fr�n: Nellai [mailto:[EMAIL PROTECTED]]
>>Skickat: den 31 januari 2003 04:50
>>Till: [EMAIL PROTECTED]
>>�mne: How to index a Word document
>>
>>
>>Hi!
>>
>>Can anyone tell me how to include word document for indexing. Is
>>there any parser available for that.
>>
>>Thanks in advance
>>
>>Nellai...
>>
>
>
>---------------------------------------------------------------------

>To unsubscribe, e-mail: [EMAIL PROTECTED]
>For additional commands, e-mail: [EMAIL PROTECTED]
>
>
>
>
>
>
>---------------------------------------------------------------------

>To unsubscribe, e-mail: [EMAIL PROTECTED]
>For additional commands, e-mail: [EMAIL PROTECTED]
>




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to