Committed the discussed fix as HBASE-28622 . Thank you Andrew for discussing this, and Duo for the review.
Istvan On Fri, May 31, 2024 at 2:36 PM Istvan Toth <st...@cloudera.com> wrote: > It turns out that ColumnPaginationFilter is both row stateful and can > return a seek hint. > I have removed the HintingFilter marker from it to preserve the correct > operation. > > With this change, ColumnPaginationFilter is no worse off than it was, but > the rest of the hinting > filters will work correctly. > > On Fri, May 31, 2024 at 9:32 AM Istvan Toth <st...@cloudera.com> wrote: > >> This is indeed quite a small change. >> The PR is up at https://github.com/apache/hbase/pull/5955 >> >> On Wed, May 29, 2024 at 10:07 AM Istvan Toth <st...@cloudera.com> wrote: >> >>> Thanks for the detailed reply, Andrew. >>> >>> I was also considering default methods, but it turns out that Filter is >>> not an interface, but an abstract class, so it doesn't apply. >>> >>> Children not implementing a marker interface or marker method would >>> inherit the marker method implementation from the closest parent the same >>> way they would inherit the marker interface, so I think they are equivalent >>> in this aspect, too. >>> >>> I think that marker interface(s) and overridable non-abstract getter(s) >>> in Filter are mostly equivalent from both logical and source compatibility >>> aspects. >>> The only difference is that the marker interfaces cannot be removed in a >>> subclass, while the getter can be overridden anywhere, but with well-chosen >>> defaults it shouldn't be much of a limitation. >>> >>> Now that I think about it, we could cache the markers' values in an >>> array when creating the filter lists, so even the cost of looking them up >>> doesn't matter as it wouldn't happen in the hot code path. >>> >>> Using the marker interfaces is more elegant, and discourages problematic >>> subclassing, so I am leaning towards that. >>> >>> Istvan >>> >>> On Wed, May 29, 2024 at 2:30 AM Andrew Purtell <apurt...@apache.org> >>> wrote: >>> >>>> Actually source compatibility with default methods would be fine too. I >>>> forget this is the main reason default methods were invented. The code >>>> of >>>> derived classes would not need to be changed, unless the returned value >>>> of >>>> the new method should be changed, and this is no worse than having a >>>> marker >>>> interface, which would also require code changes to implement >>>> non-default >>>> behaviors. >>>> >>>> A marker interface does remain as an option. It might make a difference >>>> in >>>> chained use cases. Consider a chain of filter instances that mixes >>>> derived >>>> code that is unaware of isHinting() and base code that is. The filter >>>> chain >>>> can be examined for the presence or absence of the marker interface and >>>> would not need to rely on every filter in the chain passing return >>>> values >>>> of isHinting back. >>>> >>>> Marker interfaces can also be added to denote stateful or stateless >>>> filters, if distinguishing between them would be useful, perhaps down >>>> the >>>> road. >>>> >>>> On Tue, May 28, 2024 at 5:13 PM Andrew Purtell <apurt...@apache.org> >>>> wrote: >>>> >>>> > I think you've clearly put a lot of time into the analysis and it is >>>> > plausible. >>>> > >>>> > Adding isHinting as a default method will preserve binary >>>> compatibility. >>>> > Source compatibility for derived custom filters would be broken >>>> though and >>>> > that probably prevents this going back into a releasing code line. >>>> > >>>> > Have you considered adding a marker interface instead? That would >>>> preserve >>>> > both source and binary compatibility. It wouldn't require any changes >>>> to >>>> > derived custom filters. A runtime instanceof test would determine if >>>> the >>>> > filter is a hinting filter or not. No need for a new method, default >>>> or >>>> > otherwise. >>>> > >>>> > On Tue, May 28, 2024 at 12:41 AM Istvan Toth <st...@apache.org> >>>> wrote: >>>> > >>>> >> I have recently opened HBASE-28622 >>>> >> <https://issues.apache.org/jira/browse/HBASE-28622> , which has >>>> turned >>>> >> out >>>> >> to be another aspect of the problem discussed in HBASE-20565 >>>> >> <https://issues.apache.org/jira/browse/HBASE-20565> . >>>> >> >>>> >> The problem is discussed in detail in HBASE-20565 >>>> >> <https://issues.apache.org/jira/browse/HBASE-20565> , but it boils >>>> down >>>> >> to >>>> >> the API design decision that the filters returning >>>> SEEK_NEXT_USING_HINT >>>> >> rely on filterCell() getting called. >>>> >> >>>> >> On the other hand, some filters maintain an internal row state that >>>> sets >>>> >> counters for calls of filterCell(), which interacts with the results >>>> of >>>> >> previous filters in a filterList. >>>> >> >>>> >> When filters return different results for filterRowkey(), then >>>> filters >>>> >> returning SEEK_NEXT_USING_HINT that have returned false must have >>>> >> filterCell() called, otherwise the scan will degenerate into a full >>>> scan. >>>> >> >>>> >> On the other hand, filters that maintain an internal row state must >>>> only >>>> >> be >>>> >> called if all previous filters have INCLUDEed the Cell, otherwise >>>> their >>>> >> internal state will be off. (This still has caveats, as described in >>>> >> HBASE-20565 <https://issues.apache.org/jira/browse/HBASE-20565>) >>>> >> >>>> >> In my opinion, the current code from HBASE-20565 >>>> >> <https://issues.apache.org/jira/browse/HBASE-20565> strikes a bad >>>> balance >>>> >> between features, as while it fixes some use cases for row stateful >>>> >> filters, it also often negates the performance benefits of the >>>> filters >>>> >> providing hints, which in practice makes them unusable in many >>>> filter list >>>> >> combinations. >>>> >> >>>> >> Without completely re-designing the filter system, I think that the >>>> best >>>> >> solution would be adding a method to distinguish the filters that can >>>> >> return hints from the rest of them. (This was also suggested in >>>> >> HBASE-20565 >>>> >> <https://issues.apache.org/jira/browse/HBASE-20565> , but it was not >>>> >> implemented) >>>> >> >>>> >> In theory, we have four combinations of hinting and row stateful >>>> filters, >>>> >> but currently we have no filters that are both hinting and row >>>> stateful, >>>> >> and I don't think that there is valid use case for those. The ones >>>> that >>>> >> are >>>> >> neither hinting nor stateful could be handled as either, but >>>> treating them >>>> >> as non-hinting seems faster. >>>> >> >>>> >> Once we have that, we can improve the filterList behaviour a lot: >>>> >> - in filterRowKey(), if any hinting filter returns false, then we >>>> could >>>> >> return false >>>> >> - in filterCell(), rather than returning on the first non-include >>>> result, >>>> >> we could process the remaining hinting filters, while skipping the >>>> >> non-hinting ones. >>>> >> >>>> >> The code changes are minimal, we just need to add a new method like >>>> >> isHinting() to the Filter class, and change the above two methods. >>>> >> >>>> >> We could add this even in 2.5, by defaulting isHinting() to return >>>> false >>>> >> in >>>> >> the Filter class, which would preserve the current API and behaviour >>>> for >>>> >> existing custom filters. >>>> >> >>>> >> I was looking at it from the AND filter perspective, but if needed, >>>> >> similar >>>> >> changes could be made to the OR filter. >>>> >> >>>> >> What do you think ? >>>> >> Is this a good idea ? >>>> >> >>>> >> Istvan >>>> >> >>>> > >>>> -- >>>> Best regards, >>>> Andrew >>>> >>>> Unrest, ignorance distilled, nihilistic imbeciles - >>>> It's what we’ve earned >>>> Welcome, apocalypse, what’s taken you so long? >>>> Bring us the fitting end that we’ve been counting on >>>> - A23, Welcome, Apocalypse >>>> >>> >>> >>> -- >>> *István Tóth* | Sr. Staff Software Engineer >>> *Email*: st...@cloudera.com >>> cloudera.com <https://www.cloudera.com> >>> [image: Cloudera] <https://www.cloudera.com/> >>> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image: >>> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: >>> Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera> >>> ------------------------------ >>> ------------------------------ >>> >> >> >> -- >> *István Tóth* | Sr. Staff Software Engineer >> *Email*: st...@cloudera.com >> cloudera.com <https://www.cloudera.com> >> [image: Cloudera] <https://www.cloudera.com/> >> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image: >> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: >> Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera> >> ------------------------------ >> ------------------------------ >> > > > -- > *István Tóth* | Sr. Staff Software Engineer > *Email*: st...@cloudera.com > cloudera.com <https://www.cloudera.com> > [image: Cloudera] <https://www.cloudera.com/> > [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image: > Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: > Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera> > ------------------------------ > ------------------------------ > -- *István Tóth* | Sr. Staff Software Engineer *Email*: st...@cloudera.com cloudera.com <https://www.cloudera.com> [image: Cloudera] <https://www.cloudera.com/> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image: Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera> ------------------------------ ------------------------------