--- jt oob <[EMAIL PROTECTED]> wrote: > Thank you for the replies! > > My indexes are currently looking like they might be 12GB when > finished > on the current run. > > I have spotted a tool on the lucene site for listing the most > frequently occuring words in the index. Currently I am using the > defaultAnalyzer stoplist, I should probably use a more comprehensive > list. > > Is there a way of implementing a stoplist after the index has been > created, removing all occurances of the new stoplist words? > I could then write a new Analyzer with the new stoplist for adding > new documents to the index. > Am i doomed to reindexing with a better stoplist?
I believe you'll need to re-index. Well, if your old stop list is a subset of the new stop list, then you may be able to get away without re-indexing. > In view of the index size, I am going to see how well the kernel > caching performs, as the index probably won't fit entirely into > memory > once the operating system and other system processes have taken their > bite of the available memory. > > Eventually i am going to try to implement something similar to google > groups, indexing lots of NNTP traffic. Has anyone done this before > with lucune? Not that I know, but people have used Lucene to index their email, which is somewhat similar. Otis __________________________________ Do you Yahoo!? Protect your identity with Yahoo! Mail AddressGuard http://antispam.yahoo.com/whatsnewfree --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
