Hi Andrzej, Thanks for the code. I'll try it as soon as I have time. If you had a copy of the modified FSDirectory implementation you could also share, that would make testing it a bit quicker and easier. BTW, when you said it "supposedly increases I/O", I gather that you are not the author?
Regards, Terry ----- Original Message ----- From: "Andrzej Bialecki" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Monday, February 24, 2003 3:59 PM Subject: Re: Indexing Tips and Hints > Hello, > > Since you are trying this anyway, and looking for ways to improve > indexing times... Could you perhaps try to replace use of > java.io.RandomAccessFile in FSDirectory implementation, with the > attached implementation? It supposedly increases I/O throughput by > orders of magnitude, by using partial buffering. > > Terry Steichen wrote: > > Mike, > > > > By way of comparison, I've got a collection of about 50,000 XML files, each > > of which averages about 8K. It takes about 1.25 hours to index (on a 1.8Ghz > > machine). I use basically the standard configuration (mergeFactor, etc.) > > and I've got about 30 fields per document. I add about 200 new ones per > > day. I don't recall how long that it takes to index the 200 (I do it > > through a background task), but it takes a couple of minutes to merge the > > new 200 document index with the master index. > > > > HTH, > > > > Terry > > > > ----- Original Message ----- > > From: "Michael Barry" <[EMAIL PROTECTED]> > > To: "Lucene Users List" <[EMAIL PROTECTED]> > > Sent: Monday, February 24, 2003 2:00 PM > > Subject: Indexing Tips and Hints > > > > > > > >>All, > >> I'm in need of some pointers, hints or tips on indexing large > > > > collections > > > >>of data. I know I saw some tips on this list before but when I tried > >>searching > >>the list, I came up blank. > >> I have a large collection of XML files (336000 files around 5K > >>apiece) that I'm > >>indexing and its taking quite a bit of time (27 hours). I've played > >>around with the > >>mergeFactor, RAMDirectories and multiple threads (X number of threads > >>indexing > >>a subset of the data and then merging the indexes at the end) but I > >>cannot seem > >>to bring the time down. I'm probably not doing these things properly but > >>from > >>what I read I believe I am. Maybe this is the best I can do with this > >>data but I > >>would be really grateful to hear how others have tackled this same issue. > >> As always pointers to places in the mailing list archive or other > >>places would be > >>appreciated. > >> > >>Thanks, Mike. > >> > >>--------------------------------------------------------------------- > >>To unsubscribe, e-mail: [EMAIL PROTECTED] > >>For additional commands, e-mail: [EMAIL PROTECTED] > >> > >> > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > > -- > > -- > Best regards, > Andrzej Bialecki > > ------------------------------------------------- > Software Architect, System Integration Specialist > ------------------------------------------------- > FreeBSD developer (http://www.freebsd.org) > > ---------------------------------------------------------------------------- ---- > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
