Hi Lucene people!

   First off, Thanks so much Doug -- its a wonderful piece of software!
 
   Next -- I have an severe optimization issue. On my Tecra 650 I'm 
getting indexing speeds of 1 record (smallish, say 1K) every 4 
milliseconds. On some supposedly powerful Sun boxes, it looks like 32 
milliseconds a record...

  So, I played with the Merge factor -- and this sped things up 
considerably, until I hit open File Descriptor limits -- I got it 
down to 7 millisconds on the SPARC (at merge Factor = 512). Seems 
that each MergeFactor of 1, requires 9 filedescriptors (+ 9 for the 
main index).

  So -- I at a mergeFactor of 109, (we have 1000 fd limit) I get a 
speed of about 12 msec a record.

  Anyway, with 8 million records to index this is something like 24 hours.

  So two questions:

  (1) Why does indexing on Sun boxes suck so much :) I'm guessing 
there is Disk I/O problem, maybe with the JVM, but I don't know :)

  (2) How can i avoid the FD problem?  I know about parallelizing the 
indexing, but I'd like to get an efficient single index before doing 
that ? If I could set the Merge Factor up real high, then I think I'd 
be able to work

  (3) I have a ton of memory available (2-4 GB), so can this help -- I 
can't just make a RAMdisk though, as the index is likely to be fairly 
large (1/16 of the index was .3 gibs -- = 4.8 gigs estimate).

  Can anyone help ?
 
Thanks again though for such a wonderful Open Source project!

  Cheers,
   Winton

_______________________________________________
Lucene-users mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/lucene-users

Reply via email to