Thank you for the replies! My indexes are currently looking like they might be 12GB when finished on the current run.
I have spotted a tool on the lucene site for listing the most frequently occuring words in the index. Currently I am using the defaultAnalyzer stoplist, I should probably use a more comprehensive list. Is there a way of implementing a stoplist after the index has been created, removing all occurances of the new stoplist words? I could then write a new Analyzer with the new stoplist for adding new documents to the index. Am i doomed to reindexing with a better stoplist? In view of the index size, I am going to see how well the kernel caching performs, as the index probably won't fit entirely into memory once the operating system and other system processes have taken their bite of the available memory. Eventually i am going to try to implement something similar to google groups, indexing lots of NNTP traffic. Has anyone done this before with lucune? Thanks again, jt ________________________________________________________________________ Want to chat instantly with your online friends? Get the FREE Yahoo! Messenger http://mail.messenger.yahoo.co.uk --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
