the document number don't matter, the merge factor is the max number of documents that will be maintained in memory, so both 1000 documents and 200 documents will have a maximum of 50 documents (with theirs terms vectors etc.) in memory, losing performance as i said if you hit the virtual memory.
Have you traced the memory usage, the page faults, memory used and so on? another thing that could help in performance, is the usage of stop-words. have you take a look to the resultant index information with luke, to watch if in the top terms you have common words as "and" "the" "is". these words are very common, and if you get rid of them, the performance will also be increased. hope it helps you. -- Jokin. On 2/11/07, Sairaj Sunil <[EMAIL PROTECTED]> wrote:
Hi the total no of docs size upto 120 MB the average size of the document is around 400KB. just to check that it is not happening only for 1000 docs, i repeated the experiments for less no of docs(200), but still increasing the merge factor didnt improve the performance. can you tell me the parameters that affect the performance to a large extent. I have just upgraded to lucene.net 2.0 version. On 2/10/07, Jokin Cuadrado <[EMAIL PROTECTED]> wrote: > > what is the size of the documents? > > the documents are stored in the main memory until the merge, so if you > increase very much the merge factor, the memory could grow until > virtual memory is used, with the penalization that it > involves. > > -- > Jokin > > > On 2/10/07, Sairaj Sunil <[EMAIL PROTECTED]> wrote: > > Hi, > > I saw an article and it tells me that increasing the mergeFactor speeds > up > > the indexing. But the reverse had happened in my case. > > To be more specific I had conducted some experiments for 1000 documents. > The > > time taken is quite large, due to pdf file indexing. I had changed the > > indexwriter's parameters. > > > > MergeFactor – default(10) > > minMergeDocs – default(10) > > Time taken – 690 sec > > > > MergeFactor – 50 > > minMergeDocs – default(10) > > Time taken – 765 sec > > MergeFactor – default(10) > > minMergeDocs – 100 > > Time taken – 670 sec > > > > MergeFactor –100 > > minMergeDocs – 100 > > Time taken – 738 sec > > Increasing the mergeFactor did not speed up, but increasing the > minMergeDocs > > had improved. I am using Lucene.Net. > > Can you explain the behavior. I am confused. > > just to give more info, I am using Lucene.Net 1.3 version, and not > > 1.9version. Can you tell me the best way to speed up the performance. > > What are > > the parameters that I should set. I know that this depends on the > system, > > but which parameter exactly speeds up the indexing performance. > > > > thanks > > -- > > Sairaj Sunil > > > -- Sairaj Sunil II Mtech(CS) SSSIHL Prashanthi Nilayam
