Hi
I have not traced the memory usage.
i have one question. what is the difference between batch indexing and
interactive indexing. may be this is too silly to ask , but nevertheless i
want to make it clear. because if i reduce the merge factor below 10 (for
example 5), the performance has improved slightly.
i am indexing the documents all at once. i.e., I open the writer and add the
documents in the end optimize and then close.


On 2/12/07, Jokin Cuadrado <[EMAIL PROTECTED]> wrote:

the document number don't matter, the merge factor is the max number
of documents that will be maintained in memory, so both 1000 documents
and 200 documents will have a maximum of 50 documents (with theirs
terms vectors etc.) in memory, losing performance as i said if you hit
the virtual memory.

Have you traced the memory usage, the page faults, memory used and so on?

another thing that could help in performance, is the usage of
stop-words. have you take a look to the resultant index information
with luke, to watch if in the top terms you have common words as "and"
"the" "is". these words are very common, and if you get rid of them,
the performance will also be increased.


hope it helps you.

--
Jokin.





--
Sairaj Sunil
II Mtech(CS)

Reply via email to