RE: Performance of indexing....

David Smith Sun, 22 Jul 2007 14:16:36 -0700

I didn't know that.... explains a lot.

Thanks
David


-----Original Message-----
From: Jeff [mailto:[EMAIL PROTECTED]
Sent: Saturday, 21 July 2007 12:47 p.m.
To: [email protected]
Subject: Re: Performance of indexing....


One problem using addindexes(Directory[] dirs) is the index gets optimized
after merging indexes in this fashion. This slows down indexing and uses
more resources because of the forced optimization.

In Lucene 2.1 there is a new method called addIndexesNoOptimize(Directory[]
dirs) which is much faster and will utilize a lot less resources.

Jeff

On 7/18/07, David Smith <[EMAIL PROTECTED]> wrote:
>
>
> morning all,
>
> I have a reasonable sized index, approx 5Gig & 2Million documents, that I
> update daily.  I use a number of worker threads to create a number of
> small
> indexes which I merge together to get 1 index of about 100000 documents
> and
> 500Meg in size.  I then merge this into the main index.  This is were my
> problem exists.  the merging of the main and temp index not only take a
> long
> time, but causes an excessive amount of disk IO.  the final merging
> results
> in 30Gigs+ of data reading and 25+Gigs of data writing.  This seems more
> than a bit excessive.
>
> my code goes along the lines of
>
>
> Dim idxs As New System.Collections.Generic.List(Of
> Lucene.Net.Store.Directory)
>
> 1. start 5 worker threads each with their index, reading from a message
> queue. into idxs
>
>
> Dim tIndex As Lucene.Net.Index.IndexWriter = Nothing
> Dim fIndex As Lucene.Net.Index.IndexWriter = Nothing
> Dim tmpIdx As Lucene.Net.Store.Directory
>
> tmpIdx =
> Lucene.Net.Store.FSDirectory.GetDirectory(System.IO.Path.Combine
> (Configurati
> on.TempIndexPath, "wrk"), True)
>
>
> 2. when done, merge the 5 work indexes into 1 temp index
> ( up till now disk IO etc as I would expect )
>
> tIndex = New Lucene.Net.Index.IndexWriter(tmpIdx, New
> Lucene.Net.Analysis.Standard.StandardAnalyzer(sWords.ToArray), True)
> tIndex.AddIndexes(idxs.ToArray)
> tIndex.Close()
>
>
> 3 merge temp index with main index
> ( disk IO goes haywire here)
>
> fIndex= New Lucene.Net.Index.IndexWriter(Configuration.IndexPath, New
> Lucene.Net.Analysis.Standard.StandardAnalyzer(sWords.ToArray), False)
> fIndex.SetMergeFactor(10000)
> fIndex.SetUseCompoundFile(True)
> fIndex.AddIndexes(New Lucene.Net.Store.Directory() {tmpIdx})
> fIndex.Close()
>
>
>
> Is there a better way of maintaining or implementing an index of this
> size,
> and growing?
>
> Thanks
> David
>

RE: Performance of indexing....

Reply via email to