[ https://issues.apache.org/jira/browse/LUCENE-845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12520334 ]
Yonik Seeley commented on LUCENE-845: ------------------------------------- You may avoid the cost of a bunch of small merges, but then you pay the price in searching performance. I'm not sure that's the right tradeoff because if someone wanted to optimize for indexing performance, they would do more in batches. How does this work when flushing by MB? If you set setRamBufferSizeMB(32), are you guaranteed that you never have more than 10 segments less than 32MB (ignoring LEVEL_LOG_SPAN for now) if mergeFactor is 10? Almost seems like we need a minSegmentSize parameter too - using setRamBufferSizeMB confuses two different but related issues. > If you "flush by RAM usage" then IndexWriter may over-merge > ----------------------------------------------------------- > > Key: LUCENE-845 > URL: https://issues.apache.org/jira/browse/LUCENE-845 > Project: Lucene - Java > Issue Type: Bug > Components: Index > Affects Versions: 2.1 > Reporter: Michael McCandless > Assignee: Michael McCandless > Priority: Minor > Attachments: LUCENE-845.patch > > > I think a good way to maximize performance of Lucene's indexing for a > given amount of RAM is to flush (writer.flush()) the added documents > whenever the RAM usage (writer.ramSizeInBytes()) has crossed the max > RAM you can afford. > But, this can confuse the merge policy and cause over-merging, unless > you set maxBufferedDocs properly. > This is because the merge policy looks at the current maxBufferedDocs > to figure out which segments are level 0 (first flushed) or level 1 > (merged from <mergeFactor> level 0 segments). > I'm not sure how to fix this. Maybe we can look at net size (bytes) > of a segment and "infer" level from this? Still we would have to be > resilient to the application suddenly increasing the RAM allowed. > The good news is to workaround this bug I think you just need to > ensure that your maxBufferedDocs is less than mergeFactor * > typical-number-of-docs-flushed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]