Things to consider: - disk speed and whether it is busy satisfying other processes' requests - CPU speed - amount or free RAM in the machine and amount of RAM given to your JVM - the bottleneck - could be a slow XML parser, for instance, profile it
I'm about to submit another Lucene article to Onjava.com. It talks about the performance of indexing. I don't know when exactly it will be published, but when it does I'll send the URL to the list. Otis --- Michael Barry <[EMAIL PROTECTED]> wrote: > All, > I'm in need of some pointers, hints or tips on indexing large > collections > of data. I know I saw some tips on this list before but when I tried > searching > the list, I came up blank. > I have a large collection of XML files (336000 files around 5K > apiece) that I'm > indexing and its taking quite a bit of time (27 hours). I've played > around with the > mergeFactor, RAMDirectories and multiple threads (X number of threads > > indexing > a subset of the data and then merging the indexes at the end) but I > cannot seem > to bring the time down. I'm probably not doing these things properly > but > from > what I read I believe I am. Maybe this is the best I can do with > this > data but I > would be really grateful to hear how others have tackled this same > issue. > As always pointers to places in the mailing list archive or other > places would be > appreciated. > > Thanks, Mike. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > __________________________________________________ Do you Yahoo!? Yahoo! Tax Center - forms, calculators, tips, more http://taxes.yahoo.com/ --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
