Hi Lucene people! First off, Thanks so much Doug -- its a wonderful piece of software! Next -- I have an severe optimization issue. On my Tecra 650 I'm getting indexing speeds of 1 record (smallish, say 1K) every 4 milliseconds. On some supposedly powerful Sun boxes, it looks like 32 milliseconds a record...
So, I played with the Merge factor -- and this sped things up considerably, until I hit open File Descriptor limits -- I got it down to 7 millisconds on the SPARC (at merge Factor = 512). Seems that each MergeFactor of 1, requires 9 filedescriptors (+ 9 for the main index). So -- I at a mergeFactor of 109, (we have 1000 fd limit) I get a speed of about 12 msec a record. Anyway, with 8 million records to index this is something like 24 hours. So two questions: (1) Why does indexing on Sun boxes suck so much :) I'm guessing there is Disk I/O problem, maybe with the JVM, but I don't know :) (2) How can i avoid the FD problem? I know about parallelizing the indexing, but I'd like to get an efficient single index before doing that ? If I could set the Merge Factor up real high, then I think I'd be able to work (3) I have a ton of memory available (2-4 GB), so can this help -- I can't just make a RAMdisk though, as the index is likely to be fairly large (1/16 of the index was .3 gibs -- = 4.8 gigs estimate). Can anyone help ? Thanks again though for such a wonderful Open Source project! Cheers, Winton _______________________________________________ Lucene-users mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/lucene-users