Without seeing more information/code, I can't tell which part of your
system slows down with time, but I can tell you that Lucene's 'add'
does not slow over time (i.e. as the index gets larger). Therefore, I
would look elsewhere for causes of the slowdown.
Otis, can you point me to some proofs that time of "insert" operation does not depend on the index size, please? Amortized time of "insert" is O(log(docsIndexed/mergeFac)), I think. Thus I do not know how it could be O(1).
Thank you. Leo
AFAIK the issue with PDF files can be based on the PDF parser (I already encountered this with PDFbox).
The easiest thing to do is add logging to suspicious portions of the code. That will narrow the scope of the code you need to analyze.
Otis
--- [EMAIL PROTECTED] wrote:
Hey Lucene-users,
I'm setting up a Lucene index on 5G of PDF files (full-text search). I've been really happy with Lucene so far but I'm curious what tips and
strategies I can use to optimize my performance at this large size.
So far I am using pretty much all of the defaults (I'm new to Lucene).
I am using PDFBox to add the documents to the index. I can usually add about 800 or so PDF files and then the add loop:
for ( int i = 0; i < fileNames.length; i++ ) {
Document doc = IndexFile.index(baseDirectory+documentRoot+"fileNames
[i]); writer.addDocument(doc);
}
really starts to slow down. Doesn't seem to be memory related. Thoughts anyone?
Thanks in advance, CK Hill
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
