Good question! It didn't actually change when we flush: inside balanceRAM() we only flush on when numBytesUsed >= ramBufferSize. So previously it was costing an additional (unnecessary) synchronized method call once we got to 95% of ramBufferSize.
Mike Felipe Albrecht <[EMAIL PROTECTED]> wrote: > Hello, > > I have a simple question about this patch. > > In the following patch segment, it is shown that the threadshould for > synchronize the data changed. > > if (ramBufferSize != IndexWriter.DISABLE_AUTO_FLUSH > - && numBytesUsed > 0.95 * ramBufferSize) > + && numBytesUsed >= ramBufferSize) > balanceRAM(); > > Why it was changed and it *may be* is not influencing some time result? > In other words, it's saying: "use more ram before to flush", and doing > larger flushes, > and less quantity of them, may be is influencing the final time. > > I am a bit new in Lucene, ony 2 weeks, but it pointed my attention. > > Thank you, > > Felipe Albrecht > > > On Feb 11, 2008 5:30 PM, Michael McCandless <[EMAIL PROTECTED]> wrote: > > > > > > Grant Ingersoll wrote: > > > > > Also, perhaps we should spin off another thread to discuss how to > > > make DocsWriter easier to maintain. My biggest concern is > > > understanding how the various threads work together, and a few > > > other areas but, like I said, let's spin up a separate thread to > > > brainstorm what is needed. > > > > I agree we should work on simplifying it with time, and, spreading > > the knowledge of how it works. > > > > > > > Note, that there is some risk in just using wikipedia for profiling > > > given it's distribution of terms, etc.. > > > > Good point. Previously I was using Europarl, but, that corpus is > > just too fast to index. > > > > Are you thinking Wikipedia is somewhat "dirty" (lots of extra terms > > not normally seen with clean content)? Since I'm using > > StandardAnalyzer and not an analyzer based on the new > > WikipediaTokenizer, I'm getting even extra terms. Also, I think we'd > > need an HTMLFilter in the chain since Wikipedia content uses HTML > > markup. Grant, what analyzer chain do you use when you index Wikipedia? > > > > > > > I also wonder if using the LineDocMaker is all that realistic a > > > profiling scenario. While it is really useful in that it minimizes > > > IO interaction, etc. I can't help but feel that it isn't at all > > > close to typical usage. Most users are not going to have all their > > > docs rolled up into a single file, 1 doc per line, so I wonder if > > > we potentially lose insight into how Lucene performs given that > > > other issues like I/O/memory used for loading files may force the > > > JVM/Lucene to not have the resources it needs. Of course, I do > > > know it is good to try to isolate things so we can focus just on > > > Lucene, but we also should try to make some accounting for how it > > > lives in the wild. > > > > I agree, this part is not realistic, and the intention is to measure > > just the indexing time. In fact I expect most apps spend quite a bit > > more time building up a Document (filtering binary docs, etc) than > > actually indexing it. The only real-world app that I can think of > > that would be close to LineDocMaker is using Lucene to search big log > > files, where one line = one Document. > > > > > > > Last, I think it would be good to always attach/check in the .alg > > > file that is used when running the test, so that others can verify > > > on different systems/configurations, etc. > > > > I did post the alg (under LUCENE-1172). Though I see I forgot to > > {code} it and it looks messed up now. My recent test to try a single > > quickSort(Object[]) were the same alg, just repeated 10 times instead > > of 3. > > > > But I agree we should always post the alg for all tests... > > > > > > > > > > Mike > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]