Mike, By way of comparison, I've got a collection of about 50,000 XML files, each of which averages about 8K. It takes about 1.25 hours to index (on a 1.8Ghz machine). I use basically the standard configuration (mergeFactor, etc.) and I've got about 30 fields per document. I add about 200 new ones per day. I don't recall how long that it takes to index the 200 (I do it through a background task), but it takes a couple of minutes to merge the new 200 document index with the master index.
HTH, Terry ----- Original Message ----- From: "Michael Barry" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Monday, February 24, 2003 2:00 PM Subject: Indexing Tips and Hints > All, > I'm in need of some pointers, hints or tips on indexing large collections > of data. I know I saw some tips on this list before but when I tried > searching > the list, I came up blank. > I have a large collection of XML files (336000 files around 5K > apiece) that I'm > indexing and its taking quite a bit of time (27 hours). I've played > around with the > mergeFactor, RAMDirectories and multiple threads (X number of threads > indexing > a subset of the data and then merging the indexes at the end) but I > cannot seem > to bring the time down. I'm probably not doing these things properly but > from > what I read I believe I am. Maybe this is the best I can do with this > data but I > would be really grateful to hear how others have tackled this same issue. > As always pointers to places in the mailing list archive or other > places would be > appreciated. > > Thanks, Mike. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
