"Chris Hostetter" <[EMAIL PROTECTED]> wrote: > I haven't really delved into the MergePolicy work that's been done, but a > recent Jira comment going me poking arround the javadocs -- MergePolicy is > a public interface, which suggests clients are allowed to impliment it, > leading me wonder about two things... > > 1) Writing a MergePolicy requires knowing about the package protected > SegmentInfos class ... how do we expect people to make that work (i know > we've said in the past that people shouldn't have to implement classes in > the o.a.l namespace just to make thigns work for them)
Good point. Currently your class (implementing MergePolicy) must be part of the o.a.l.index package, so you can see the package-protected SegmentInfos/SegmentInfo classes. I had thought that was OK. Is it really so bad to require users to put their class into the o.a.l.index package, when what they are doing is a very advanced thing? The only other option I can see is to make SegmentInfos/SegmentInfo public. Maybe we should add API warning caveats in the javadocs ("this API is advanced & new & may change") like we have now for Payloads, and leave the package-protection in place for now to limit usage to brave early adopters (even if we intend later to make things public)? > 2) should we instead make this an abstract base class to help "future > proof" ourselves against wanting to add support for more "optional" > methods we might want to allow MergePolicies to specify? > > (this being the age old interface vs bse class discussion ... providing a > base class allows us add support for new methods later by providing > defaults, interfaces can never be changed except in major leases (ie: > X.0) > > For example: suppose down the road we want to support an option like yonik > describes here... > > https://issues.apache.org/jira/browse/LUCENE-1043?#action_12539675 > > More controversial: maybe even expand the number of docs that can be > > bulk copied by not bothering removing deleted docs if it's some very small > > number (unless it's an optimize). This is probably not worth it. > > ...this is the kind ofthing a MergePolicy could specify with some new > method... > public float getMaxAllowedPercentageOfDeletedDocsIgnored() { > return 0.0f; > } > ...that individual MergePolicies could override. Switching to an abstract base class is a good idea. I think it's important to reserve the freedom to add default methods in-between major releases. I'll work out a patch. > Perhaps the broader question is: do we really want/expect people to write > their own MergePolicies, or is hte interface just to provide an > abstraction for picking one of the provided Impls? ... in that case, it > seems like we should lock down the API a bit more (we can always open it > up later) I *think* people will want to implement their own merge policies, though it is of course hard to tell at this point :). EG use cases: customize optimize to NOT merge the very large segments; favor merging segments that have many pending deletes; postpone heavy merging until overnight when search traffic is low; make a merge policy that's free to merge non-adjacent segments (though we can't do that one until we fix IndexWriter to accept such a MergeSpecification). Mike --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]