"Steven Parkes" <[EMAIL PROTECTED]> wrote: > I've been wondering about taking minMergeDocs out of LMP > (LogarithmicMergePolicy): if IW is doing maxBufferedDocs, can we get by > with > ceil(log(docs)) > rather than > ceil(log(ceil(docs/minMergeDocs)) > (That's not exactly right, but it's close). The simplicity appeals to > me, but ...
I think we could do that? Though if we change the default to be "by #bytes used by each segment" (for the new default "by size" merge policy) then we can disregard #docs in a segment during merging entirely? (And then, leave the "by #docs" legacy merge policy as is?). > If we remove these from the MergePolicy interface then maybe we > don't need MergePolicyBase? (Just to makes things simpler). > > Just a DRY class. I have no strong feeling about this. In fact, I went > back and forth on it. It's served as a placeholder while I experimented. Got it. I was thinking once we removed these params from the base then there was even less "repeating" to worry about. > * I was a little spooked by this change to TestAddIndexesNoOptimize: > > - assertEquals(2, writer.getSegmentCount()); > + assertEquals(3, writer.getSegmentCount()); > > I think with just the refactoring, there should not need to be any > changes to unit tests right? > > I don't know if I this got into what I wrote either in e-mail or in the > start of the comments. I guess I've done two steps in one here: the > factoring isn't just renaming methods and classes. I did create an > MergePolicy interface that is has a slight simplificatin on how the > merge policy is currently implemented. Ahhh, sorry, I missed that this was not a pure refactoring. I think you did mention this. OK now that I understand the issue better, I agree, let's keep the merge policy interface simple. I think the merge policy should not need to know the "history" of how the segments came to be in this index (addIndexes, flush, etc); instead, it should look at them now and decide 1) whether to merge, and 2) which specific segments to merge. > * It's interesting that you've pulled "useCompoundFile" into the > LegacyMergePolicy. I'm torn on whether it belongs in MergePolicy > at all, since this is really a file format issue? > > Well, the idea was here that you might want to use non-compound files > for big segments (since you have few of them) and compound for smaller > segments. It basically reflects the idea that to some extent, the merge > policy is factoring the number of file descriptors required into its > decision. Ahh that's a good idea! I guess we could look at compound file as a form of merging: you've merged many files into a single file in order to save on file-descriptors. OK I think that (moving decision of CFS or not for a given segment, and, for a newly flushed segment, into the merge policy) makes sense. Mike --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]