Re: The best way forward

Otis Gospodnetic Tue, 04 Nov 2003 05:08:22 -0800

--- jt oob <[EMAIL PROTECTED]> wrote:
> Thank you for the replies!
> 
> My indexes are currently looking like they might be 12GB when
> finished
> on the current run.
> 
> I have spotted a tool on the lucene site for listing the most
> frequently occuring words in the index. Currently I am using the
> defaultAnalyzer  stoplist, I should probably use a more comprehensive
> list.
> 
> Is there a way of implementing a stoplist after the index has been
> created,  removing all occurances of the new stoplist words?
> I could then write a new Analyzer with the new stoplist for adding
> new documents to the index.
> Am i doomed to reindexing with a better stoplist?


I believe you'll need to re-index.
Well, if your old stop list is a subset of the new stop list, then you
may be able to get away without re-indexing.

> In view of the index size, I am going to see how well the kernel
> caching performs, as the index probably won't fit entirely into
> memory
> once the operating system and other system processes have taken their
> bite of the available memory.
> 
> Eventually i am going to try to implement something similar to google
> groups, indexing lots of NNTP traffic. Has anyone done this before
> with lucune?

Not that I know, but people have used Lucene to index their email,
which is somewhat similar.

Otis


__________________________________
Do you Yahoo!?
Protect your identity with Yahoo! Mail AddressGuard
http://antispam.yahoo.com/whatsnewfree

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: The best way forward

Reply via email to