One problem using addindexes(Directory[] dirs) is the index gets optimized
after merging indexes in this fashion. This slows down indexing and uses
more resources because of the forced optimization.

In Lucene 2.1 there is a new method called addIndexesNoOptimize(Directory[]
dirs) which is much faster and will utilize a lot less resources.

Jeff

On 7/18/07, David Smith <[EMAIL PROTECTED]> wrote:


morning all,

I have a reasonable sized index, approx 5Gig & 2Million documents, that I
update daily.  I use a number of worker threads to create a number of
small
indexes which I merge together to get 1 index of about 100000 documents
and
500Meg in size.  I then merge this into the main index.  This is were my
problem exists.  the merging of the main and temp index not only take a
long
time, but causes an excessive amount of disk IO.  the final merging
results
in 30Gigs+ of data reading and 25+Gigs of data writing.  This seems more
than a bit excessive.

my code goes along the lines of


Dim idxs As New System.Collections.Generic.List(Of
Lucene.Net.Store.Directory)

1. start 5 worker threads each with their index, reading from a message
queue. into idxs


Dim tIndex As Lucene.Net.Index.IndexWriter = Nothing
Dim fIndex As Lucene.Net.Index.IndexWriter = Nothing
Dim tmpIdx As Lucene.Net.Store.Directory

tmpIdx =
Lucene.Net.Store.FSDirectory.GetDirectory(System.IO.Path.Combine
(Configurati
on.TempIndexPath, "wrk"), True)


2. when done, merge the 5 work indexes into 1 temp index
( up till now disk IO etc as I would expect )

tIndex = New Lucene.Net.Index.IndexWriter(tmpIdx, New
Lucene.Net.Analysis.Standard.StandardAnalyzer(sWords.ToArray), True)
tIndex.AddIndexes(idxs.ToArray)
tIndex.Close()


3 merge temp index with main index
( disk IO goes haywire here)

fIndex= New Lucene.Net.Index.IndexWriter(Configuration.IndexPath, New
Lucene.Net.Analysis.Standard.StandardAnalyzer(sWords.ToArray), False)
fIndex.SetMergeFactor(10000)
fIndex.SetUseCompoundFile(True)
fIndex.AddIndexes(New Lucene.Net.Store.Directory() {tmpIdx})
fIndex.Close()



Is there a better way of maintaining or implementing an index of this
size,
and growing?

Thanks
David

Reply via email to