Hi David, Merging is a costly operation. Therefore it can be another solution using the same index both for indexing and searching. But; 1- Using an unoptimized index may result in a small search-performance degrade, even if I don't think that it will be noticeable. 2- That approach may require some mutually exclusive synchronization mechanism for indexing and searching. (I am not sure to what extend Lucene.Net can provide it)
DIGY -----Original Message----- From: David Smith [mailto:[EMAIL PROTECTED] Sent: Thursday, July 19, 2007 1:17 AM To: [email protected] Subject: Performance of indexing.... morning all, I have a reasonable sized index, approx 5Gig & 2Million documents, that I update daily. I use a number of worker threads to create a number of small indexes which I merge together to get 1 index of about 100000 documents and 500Meg in size. I then merge this into the main index. This is were my problem exists. the merging of the main and temp index not only take a long time, but causes an excessive amount of disk IO. the final merging results in 30Gigs+ of data reading and 25+Gigs of data writing. This seems more than a bit excessive. my code goes along the lines of Dim idxs As New System.Collections.Generic.List(Of Lucene.Net.Store.Directory) 1. start 5 worker threads each with their index, reading from a message queue. into idxs Dim tIndex As Lucene.Net.Index.IndexWriter = Nothing Dim fIndex As Lucene.Net.Index.IndexWriter = Nothing Dim tmpIdx As Lucene.Net.Store.Directory tmpIdx = Lucene.Net.Store.FSDirectory.GetDirectory(System.IO.Path.Combine(Configurati on.TempIndexPath, "wrk"), True) 2. when done, merge the 5 work indexes into 1 temp index ( up till now disk IO etc as I would expect ) tIndex = New Lucene.Net.Index.IndexWriter(tmpIdx, New Lucene.Net.Analysis.Standard.StandardAnalyzer(sWords.ToArray), True) tIndex.AddIndexes(idxs.ToArray) tIndex.Close() 3 merge temp index with main index ( disk IO goes haywire here) fIndex= New Lucene.Net.Index.IndexWriter(Configuration.IndexPath, New Lucene.Net.Analysis.Standard.StandardAnalyzer(sWords.ToArray), False) fIndex.SetMergeFactor(10000) fIndex.SetUseCompoundFile(True) fIndex.AddIndexes(New Lucene.Net.Store.Directory() {tmpIdx}) fIndex.Close() Is there a better way of maintaining or implementing an index of this size, and growing? Thanks David
