This question surely shows how new I am to Lucene... but I'm interested in removing terms from a lucene index. In particular, I'd like to be able to delete all terms that appear in fewer than x documents (say x=3). This is in efforts to reduce the feature set for some research I'm doing.

I found a post to this effect on the list from a while back:
   http://www.gossamer-threads.com/lists/lucene/java-user/9538#9538
but I couldn't find any responses to it.

The only thing I can think of is to re-index the collection, using the undesired words as a sort of stoplist. But surely there's a better way to do it (the inverted index structure seems like this should be natural). Any pointers would be most helpful.

Thanks,
-Miles

Andrzej Bialecki wrote:

Huinan wrote:

Thanks, Ronnie. But why it works in some cases (when there is a small number
of documents inside the index) ?


The Hits class retrieves the first 50 results, and caches them.


--
Miles Efron
http://www.ibiblio.org/mefron
[EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to