Doug Cutting wrote:
Kevin A. Burton wrote:
So is it possible to fix this index now? Can I just delete the most recent segment that was created? I can find this by ls -alt
Sorry, I forgot to answer your question: this should work fine. I don't think you should even have to delete that segment.
I'm worried about duplicate or missing content from the original index. I'd rather rebuild the index and waste another 6 hours (I've probably blown 100 hours of CPU time on this already) and have a correct index :)
During an optimize I assume Lucene starts writing to a new segment and leaves all others in place until everything is done and THEN deletes them?
Also, to elaborate on my previous comment, a mergeFactor of 5000 not only delays the work until the end, but it also makes the disk workload more seek-dominated, which is not optimal.
The only settings I uses are:
targetIndex.mergeFactor=10; targetIndex.minMergeDocs=1000;
the resulting index has 230k files in it :-/
I assume this is contributing to all the disk seeks.
So I suspect a smaller merge factor, together with a larger minMergeDocs, will be much faster overall, including the final optimize(). Please tell us how it goes.This is what I did for this last round but then I ended up with the highly fragmented index.
hm...
Thanks for all the help btw!
Kevin
--
Please reply using PGP.
http://peerfear.org/pubkey.asc NewsMonster - http://www.newsmonster.org/
Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965
AIM/YIM - sfburtonator, Web - http://peerfear.org/
GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412
IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
