I didn't know that.... explains a lot. Thanks David
-----Original Message----- From: Jeff [mailto:[EMAIL PROTECTED] Sent: Saturday, 21 July 2007 12:47 p.m. To: [email protected] Subject: Re: Performance of indexing.... One problem using addindexes(Directory[] dirs) is the index gets optimized after merging indexes in this fashion. This slows down indexing and uses more resources because of the forced optimization. In Lucene 2.1 there is a new method called addIndexesNoOptimize(Directory[] dirs) which is much faster and will utilize a lot less resources. Jeff On 7/18/07, David Smith <[EMAIL PROTECTED]> wrote: > > > morning all, > > I have a reasonable sized index, approx 5Gig & 2Million documents, that I > update daily. I use a number of worker threads to create a number of > small > indexes which I merge together to get 1 index of about 100000 documents > and > 500Meg in size. I then merge this into the main index. This is were my > problem exists. the merging of the main and temp index not only take a > long > time, but causes an excessive amount of disk IO. the final merging > results > in 30Gigs+ of data reading and 25+Gigs of data writing. This seems more > than a bit excessive. > > my code goes along the lines of > > > Dim idxs As New System.Collections.Generic.List(Of > Lucene.Net.Store.Directory) > > 1. start 5 worker threads each with their index, reading from a message > queue. into idxs > > > Dim tIndex As Lucene.Net.Index.IndexWriter = Nothing > Dim fIndex As Lucene.Net.Index.IndexWriter = Nothing > Dim tmpIdx As Lucene.Net.Store.Directory > > tmpIdx = > Lucene.Net.Store.FSDirectory.GetDirectory(System.IO.Path.Combine > (Configurati > on.TempIndexPath, "wrk"), True) > > > 2. when done, merge the 5 work indexes into 1 temp index > ( up till now disk IO etc as I would expect ) > > tIndex = New Lucene.Net.Index.IndexWriter(tmpIdx, New > Lucene.Net.Analysis.Standard.StandardAnalyzer(sWords.ToArray), True) > tIndex.AddIndexes(idxs.ToArray) > tIndex.Close() > > > 3 merge temp index with main index > ( disk IO goes haywire here) > > fIndex= New Lucene.Net.Index.IndexWriter(Configuration.IndexPath, New > Lucene.Net.Analysis.Standard.StandardAnalyzer(sWords.ToArray), False) > fIndex.SetMergeFactor(10000) > fIndex.SetUseCompoundFile(True) > fIndex.AddIndexes(New Lucene.Net.Store.Directory() {tmpIdx}) > fIndex.Close() > > > > Is there a better way of maintaining or implementing an index of this > size, > and growing? > > Thanks > David >
