"eks dev" <[EMAIL PROTECTED]> wrote: > > Was 24M (and not more) clearly the fastest performance? > > No, this is kind of optimum. Throwing more memory up to 32M makes things > slightly faster at slow rate, having maximum at 32. After that things > start getting slower (slowly)
Interesting. This matches the experience Doron had where adding more RAM actually slowed things down a bit (posted to LUCENE-843). > We are not yet completely done with tuning, especially with two tips > you mentioned in this mail. > Fields are already reused, but Super. > 1. Reusing Document, this is one new Vector() in there (and at these > speeds, something like this makes difference!!!) > in Document List fields = new Vector(); (by the way, must this be > synchronized Vector? Why not ArrayList? Any difference from it) Oh yeah, it would be good to not "new Vector()" every time. What I did in the benchmarking for LUCENE-843 was make a single Document, make my N fields (using my own class that implements Fieldable but lets me change the value), add these fields to the Document, and then hold onto the fields as local variables (textField, titleField, idField, etc.). Then for each doc I just set the field values (textField.setValue(...), etc.) and then call writer.addDocument(doc). > 2. Reusing Field, excuse my ignorance, but how I can do it? with Document > is easy with > luceneDocument.add(field) > luceneDocument.removeFields(name) //Wouldn't be better to have > luceneDocument.removeAllFields() Yeah it's not so easy now: Field.java does not have setters. You have to make your own class that implements Fieldable (or subclasses AbstractField) and adds your own setters. Field.java is also [currently] final so you can't subclass it. In the benchmarking code (see patch in http://issues.apache.org/jira/browse/LUCENE-947) I created a ReusableStringField that lets you setStringValue(...). You could use that as your Field class. Alternatively you can make a "ReusableStringReader" (there's one in DocumentsWriter in the trunk now) and then use the normal Field class but pass in your instance of ReusableStringReader. This approach could be faster if you implemented it to use a char[] instead of a String (the current one in DocumentsWriter reads a String). > 3. "LUCENE-845" Whoops, I totally overlooked this one! And I am sure my > maxBufferedDocs is well under what fits in 24Mb?!? Any good tip on how > to determine good number: count added docs and see how far this number > goes before flush() triggers (how I detect when flush by ram gets > triggered?) and than add 10% to this number... Whoa, OK. First you need to figure out how many docs are "typically" getting flushed at 24 MB. Easiest way would be to call writer.setInfoStream(System.out) and look for the lines that say "flush postings as segment XXX numDocs=YYY". Likely your YYY is "fairly" close every time since your docs are so predictable in size. Then, set your maxBufferedDocs anywhere above YYY and below 10 * YYY and you shouldn't hit LUCENE-845 (actually 5.5 * YYY is best since it gives you max safety margin). Note that you should call setMaxBufferedDocs(...) first and then call setRamBufferSizeMB(...) in that order. If you do it backwards then the writer will flush @ exactly that number of buffered docs instead. Mike --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]