Hi I have not traced the memory usage. i have one question. what is the difference between batch indexing and interactive indexing. may be this is too silly to ask , but nevertheless i want to make it clear. because if i reduce the merge factor below 10 (for example 5), the performance has improved slightly. i am indexing the documents all at once. i.e., I open the writer and add the documents in the end optimize and then close.
On 2/12/07, Jokin Cuadrado <[EMAIL PROTECTED]> wrote:
the document number don't matter, the merge factor is the max number of documents that will be maintained in memory, so both 1000 documents and 200 documents will have a maximum of 50 documents (with theirs terms vectors etc.) in memory, losing performance as i said if you hit the virtual memory. Have you traced the memory usage, the page faults, memory used and so on? another thing that could help in performance, is the usage of stop-words. have you take a look to the resultant index information with luke, to watch if in the top terms you have common words as "and" "the" "is". these words are very common, and if you get rid of them, the performance will also be increased. hope it helps you. -- Jokin.
-- Sairaj Sunil II Mtech(CS)
