RE: Performance of indexing....

Digy Thu, 19 Jul 2007 13:55:43 -0700

Hi David,

Merging is a costly operation. Therefore it can be another solution using
the same index both for indexing and searching.  But;
1- Using an unoptimized index may result in a small search-performance
degrade, even if  I don't think that it will be  noticeable.
2- That approach may require some mutually exclusive synchronization
mechanism for indexing and searching.
(I am not sure to what extend Lucene.Net can provide it)

DIGY

-----Original Message-----
From: David Smith [mailto:[EMAIL PROTECTED] 
Sent: Thursday, July 19, 2007 1:17 AM
To: [email protected]
Subject: Performance of indexing....

morning all,

I have a reasonable sized index, approx 5Gig & 2Million documents, that I
update daily.  I use a number of worker threads to create a number of small
indexes which I merge together to get 1 index of about 100000 documents and
500Meg in size.  I then merge this into the main index.  This is were my
problem exists.  the merging of the main and temp index not only take a long
time, but causes an excessive amount of disk IO.  the final merging results
in 30Gigs+ of data reading and 25+Gigs of data writing.  This seems more
than a bit excessive.

my code goes along the lines of

Dim idxs As New System.Collections.Generic.List(Of
Lucene.Net.Store.Directory)

1. start 5 worker threads each with their index, reading from a message
queue. into idxs

Dim tIndex As Lucene.Net.Index.IndexWriter = Nothing
Dim fIndex As Lucene.Net.Index.IndexWriter = Nothing
Dim tmpIdx As Lucene.Net.Store.Directory

tmpIdx =
Lucene.Net.Store.FSDirectory.GetDirectory(System.IO.Path.Combine(Configurati
on.TempIndexPath, "wrk"), True)

2. when done, merge the 5 work indexes into 1 temp index
( up till now disk IO etc as I would expect )

tIndex = New Lucene.Net.Index.IndexWriter(tmpIdx, New
Lucene.Net.Analysis.Standard.StandardAnalyzer(sWords.ToArray), True)
tIndex.AddIndexes(idxs.ToArray)
tIndex.Close()

3 merge temp index with main index
( disk IO goes haywire here)

fIndex= New Lucene.Net.Index.IndexWriter(Configuration.IndexPath, New
Lucene.Net.Analysis.Standard.StandardAnalyzer(sWords.ToArray), False)
fIndex.SetMergeFactor(10000)
fIndex.SetUseCompoundFile(True)
fIndex.AddIndexes(New Lucene.Net.Store.Directory() {tmpIdx})
fIndex.Close()

Is there a better way of maintaining or implementing an index of this size,
and growing?

Thanks
David

RE: Performance of indexing....

Reply via email to