This question surely shows how new I am to Lucene... but I'm interested
in removing terms from a lucene index. In particular, I'd like to be
able to delete all terms that appear in fewer than x documents (say
x=3). This is in efforts to reduce the feature set for some research
I'm doing.
I found a post to this effect on the list from a while back:
http://www.gossamer-threads.com/lists/lucene/java-user/9538#9538
but I couldn't find any responses to it.
The only thing I can think of is to re-index the collection, using the
undesired words as a sort of stoplist. But surely there's a better way
to do it (the inverted index structure seems like this should be
natural). Any pointers would be most helpful.
Thanks,
-Miles
Andrzej Bialecki wrote:
Huinan wrote:
Thanks, Ronnie. But why it works in some cases (when there is a small
number
of documents inside the index) ?
The Hits class retrieves the first 50 results, and caches them.
--
Miles Efron
http://www.ibiblio.org/mefron
[EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]