[ https://issues.apache.org/jira/browse/LUCENE-845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12526460 ]
Michael McCandless commented on LUCENE-845: ------------------------------------------- > Shound minMergeDocs default to maxBufferedDocs (that should yield > the old behavior)? Good idea -- I think we could do this dynamically so that whenever IndexWriter is flushing by doc count and the merge policy is LogDocMergePolicy we "write through" any changes to maxBufferedDocs --> LogDocMergePolicy.setMinMergeDocs? I'll take that approach to keep backwards compatibility. > Although 1000 isn't bad... much better to slow indexing a little in > the odd app than to break it by running it out of descriptors. Agreed, that's the right direction to "err" here. > If you "flush by RAM usage" then IndexWriter may over-merge > ----------------------------------------------------------- > > Key: LUCENE-845 > URL: https://issues.apache.org/jira/browse/LUCENE-845 > Project: Lucene - Java > Issue Type: Bug > Components: Index > Affects Versions: 2.1 > Reporter: Michael McCandless > Assignee: Michael McCandless > Priority: Minor > Fix For: 2.3 > > Attachments: LUCENE-845.patch > > > I think a good way to maximize performance of Lucene's indexing for a > given amount of RAM is to flush (writer.flush()) the added documents > whenever the RAM usage (writer.ramSizeInBytes()) has crossed the max > RAM you can afford. > But, this can confuse the merge policy and cause over-merging, unless > you set maxBufferedDocs properly. > This is because the merge policy looks at the current maxBufferedDocs > to figure out which segments are level 0 (first flushed) or level 1 > (merged from <mergeFactor> level 0 segments). > I'm not sure how to fix this. Maybe we can look at net size (bytes) > of a segment and "infer" level from this? Still we would have to be > resilient to the application suddenly increasing the RAM allowed. > The good news is to workaround this bug I think you just need to > ensure that your maxBufferedDocs is less than mergeFactor * > typical-number-of-docs-flushed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]