Hi Mike, > Was 24M (and not more) clearly the fastest performance? No, this is kind of optimum. Throwing more memory up to 32M makes things slightly faster at slow rate, having maximum at 32. After that things start getting slower (slowly)
We are not yet completely done with tuning, especially with two tips you mentioned in this mail. Fields are already reused, but 1. Reusing Document, this is one new Vector() in there (and at these speeds, something like this makes difference!!!) in Document List fields = new Vector(); (by the way, must this be synchronized Vector? Why not ArrayList? Any difference from it) 2. Reusing Field, excuse my ignorance, but how I can do it? with Document is easy with luceneDocument.add(field) luceneDocument.removeFields(name) //Wouldn't be better to have luceneDocument.removeAllFields() 3. "LUCENE-845" Whoops, I totally overlooked this one! And I am sure my maxBufferedDocs is well under what fits in 24Mb?!? Any good tip on how to determine good number: count added docs and see how far this number goes before flush() triggers (how I detect when flush by ram gets triggered?) and than add 10% to this number... ----- Original Message ---- From: Michael McCandless <[EMAIL PROTECTED]> To: java-dev@lucene.apache.org Sent: Thursday, 12 July, 2007 9:22:48 PM Subject: Re: Post mortem kudos for (LUCENE-843) :) Thank you for the compliments, and thank you for being such early adopter testers! I'm very glad you didn't hit any issues :) > before LUCENE-843 indexing speed was 5-6k records per second (and I > believed this was already as fast as it gets) > after (trunk version yesterday) 60-65k documents per second! All > (exhaustive!) tests pass on this index. Wow, 10X speedup is even faster than my fastest results! > autocommit = false, 24M RAMBuffer, using char[] instead of String > for Token (this was the reason we separated Analysis in two phases, > leaving for Lucene Analyzer only simple whitespace tokenization) Looks like you're doing everything right to get fastest performance. You can also re-use the Document & Field instances, and also the Token instance in your analyzer and that should help more. Was 24M (and not more) clearly the fastest performance? Also note that you must workaround LUCENE-845 (still open): http://issues.apache.org/jira/browse/LUCENE-845;jsessionid=E110C767DA8EFEC5B7D39D00EEF1EB74 You should set your maxBufferedDocs to something "close to but always above" how many docs actually get flushed after 24 MB RAM is full, else you could spend way too much time merging. I'm working on LUCENE-845 now but not yet sure when it will be resolved... Mike --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] ___________________________________________________________ What kind of emailer are you? Find out today - get a free analysis of your email personality. Take the quiz at the Yahoo! Mail Championship. http://uk.rd.yahoo.com/evt=44106/*http://mail.yahoo.net/uk --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]