It turns out that ColumnPaginationFilter is both row stateful and can return a seek hint. I have removed the HintingFilter marker from it to preserve the correct operation.
With this change, ColumnPaginationFilter is no worse off than it was, but the rest of the hinting filters will work correctly. On Fri, May 31, 2024 at 9:32 AM Istvan Toth <st...@cloudera.com> wrote: > This is indeed quite a small change. > The PR is up at https://github.com/apache/hbase/pull/5955 > > On Wed, May 29, 2024 at 10:07 AM Istvan Toth <st...@cloudera.com> wrote: > >> Thanks for the detailed reply, Andrew. >> >> I was also considering default methods, but it turns out that Filter is >> not an interface, but an abstract class, so it doesn't apply. >> >> Children not implementing a marker interface or marker method would >> inherit the marker method implementation from the closest parent the same >> way they would inherit the marker interface, so I think they are equivalent >> in this aspect, too. >> >> I think that marker interface(s) and overridable non-abstract getter(s) >> in Filter are mostly equivalent from both logical and source compatibility >> aspects. >> The only difference is that the marker interfaces cannot be removed in a >> subclass, while the getter can be overridden anywhere, but with well-chosen >> defaults it shouldn't be much of a limitation. >> >> Now that I think about it, we could cache the markers' values in an array >> when creating the filter lists, so even the cost of looking them up doesn't >> matter as it wouldn't happen in the hot code path. >> >> Using the marker interfaces is more elegant, and discourages problematic >> subclassing, so I am leaning towards that. >> >> Istvan >> >> On Wed, May 29, 2024 at 2:30 AM Andrew Purtell <apurt...@apache.org> >> wrote: >> >>> Actually source compatibility with default methods would be fine too. I >>> forget this is the main reason default methods were invented. The code of >>> derived classes would not need to be changed, unless the returned value >>> of >>> the new method should be changed, and this is no worse than having a >>> marker >>> interface, which would also require code changes to implement non-default >>> behaviors. >>> >>> A marker interface does remain as an option. It might make a difference >>> in >>> chained use cases. Consider a chain of filter instances that mixes >>> derived >>> code that is unaware of isHinting() and base code that is. The filter >>> chain >>> can be examined for the presence or absence of the marker interface and >>> would not need to rely on every filter in the chain passing return values >>> of isHinting back. >>> >>> Marker interfaces can also be added to denote stateful or stateless >>> filters, if distinguishing between them would be useful, perhaps down the >>> road. >>> >>> On Tue, May 28, 2024 at 5:13 PM Andrew Purtell <apurt...@apache.org> >>> wrote: >>> >>> > I think you've clearly put a lot of time into the analysis and it is >>> > plausible. >>> > >>> > Adding isHinting as a default method will preserve binary >>> compatibility. >>> > Source compatibility for derived custom filters would be broken though >>> and >>> > that probably prevents this going back into a releasing code line. >>> > >>> > Have you considered adding a marker interface instead? That would >>> preserve >>> > both source and binary compatibility. It wouldn't require any changes >>> to >>> > derived custom filters. A runtime instanceof test would determine if >>> the >>> > filter is a hinting filter or not. No need for a new method, default or >>> > otherwise. >>> > >>> > On Tue, May 28, 2024 at 12:41 AM Istvan Toth <st...@apache.org> wrote: >>> > >>> >> I have recently opened HBASE-28622 >>> >> <https://issues.apache.org/jira/browse/HBASE-28622> , which has >>> turned >>> >> out >>> >> to be another aspect of the problem discussed in HBASE-20565 >>> >> <https://issues.apache.org/jira/browse/HBASE-20565> . >>> >> >>> >> The problem is discussed in detail in HBASE-20565 >>> >> <https://issues.apache.org/jira/browse/HBASE-20565> , but it boils >>> down >>> >> to >>> >> the API design decision that the filters returning >>> SEEK_NEXT_USING_HINT >>> >> rely on filterCell() getting called. >>> >> >>> >> On the other hand, some filters maintain an internal row state that >>> sets >>> >> counters for calls of filterCell(), which interacts with the results >>> of >>> >> previous filters in a filterList. >>> >> >>> >> When filters return different results for filterRowkey(), then filters >>> >> returning SEEK_NEXT_USING_HINT that have returned false must have >>> >> filterCell() called, otherwise the scan will degenerate into a full >>> scan. >>> >> >>> >> On the other hand, filters that maintain an internal row state must >>> only >>> >> be >>> >> called if all previous filters have INCLUDEed the Cell, otherwise >>> their >>> >> internal state will be off. (This still has caveats, as described in >>> >> HBASE-20565 <https://issues.apache.org/jira/browse/HBASE-20565>) >>> >> >>> >> In my opinion, the current code from HBASE-20565 >>> >> <https://issues.apache.org/jira/browse/HBASE-20565> strikes a bad >>> balance >>> >> between features, as while it fixes some use cases for row stateful >>> >> filters, it also often negates the performance benefits of the filters >>> >> providing hints, which in practice makes them unusable in many filter >>> list >>> >> combinations. >>> >> >>> >> Without completely re-designing the filter system, I think that the >>> best >>> >> solution would be adding a method to distinguish the filters that can >>> >> return hints from the rest of them. (This was also suggested in >>> >> HBASE-20565 >>> >> <https://issues.apache.org/jira/browse/HBASE-20565> , but it was not >>> >> implemented) >>> >> >>> >> In theory, we have four combinations of hinting and row stateful >>> filters, >>> >> but currently we have no filters that are both hinting and row >>> stateful, >>> >> and I don't think that there is valid use case for those. The ones >>> that >>> >> are >>> >> neither hinting nor stateful could be handled as either, but treating >>> them >>> >> as non-hinting seems faster. >>> >> >>> >> Once we have that, we can improve the filterList behaviour a lot: >>> >> - in filterRowKey(), if any hinting filter returns false, then we >>> could >>> >> return false >>> >> - in filterCell(), rather than returning on the first non-include >>> result, >>> >> we could process the remaining hinting filters, while skipping the >>> >> non-hinting ones. >>> >> >>> >> The code changes are minimal, we just need to add a new method like >>> >> isHinting() to the Filter class, and change the above two methods. >>> >> >>> >> We could add this even in 2.5, by defaulting isHinting() to return >>> false >>> >> in >>> >> the Filter class, which would preserve the current API and behaviour >>> for >>> >> existing custom filters. >>> >> >>> >> I was looking at it from the AND filter perspective, but if needed, >>> >> similar >>> >> changes could be made to the OR filter. >>> >> >>> >> What do you think ? >>> >> Is this a good idea ? >>> >> >>> >> Istvan >>> >> >>> > >>> -- >>> Best regards, >>> Andrew >>> >>> Unrest, ignorance distilled, nihilistic imbeciles - >>> It's what we’ve earned >>> Welcome, apocalypse, what’s taken you so long? >>> Bring us the fitting end that we’ve been counting on >>> - A23, Welcome, Apocalypse >>> >> >> >> -- >> *István Tóth* | Sr. Staff Software Engineer >> *Email*: st...@cloudera.com >> cloudera.com <https://www.cloudera.com> >> [image: Cloudera] <https://www.cloudera.com/> >> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image: >> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: >> Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera> >> ------------------------------ >> ------------------------------ >> > > > -- > *István Tóth* | Sr. Staff Software Engineer > *Email*: st...@cloudera.com > cloudera.com <https://www.cloudera.com> > [image: Cloudera] <https://www.cloudera.com/> > [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image: > Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: > Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera> > ------------------------------ > ------------------------------ > -- *István Tóth* | Sr. Staff Software Engineer *Email*: st...@cloudera.com cloudera.com <https://www.cloudera.com> [image: Cloudera] <https://www.cloudera.com/> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image: Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera> ------------------------------ ------------------------------