I think you've clearly put a lot of time into the analysis and it is plausible.
Adding isHinting as a default method will preserve binary compatibility. Source compatibility for derived custom filters would be broken though and that probably prevents this going back into a releasing code line. Have you considered adding a marker interface instead? That would preserve both source and binary compatibility. It wouldn't require any changes to derived custom filters. A runtime instanceof test would determine if the filter is a hinting filter or not. No need for a new method, default or otherwise. On Tue, May 28, 2024 at 12:41 AM Istvan Toth <st...@apache.org> wrote: > I have recently opened HBASE-28622 > <https://issues.apache.org/jira/browse/HBASE-28622> , which has turned out > to be another aspect of the problem discussed in HBASE-20565 > <https://issues.apache.org/jira/browse/HBASE-20565> . > > The problem is discussed in detail in HBASE-20565 > <https://issues.apache.org/jira/browse/HBASE-20565> , but it boils down to > the API design decision that the filters returning SEEK_NEXT_USING_HINT > rely on filterCell() getting called. > > On the other hand, some filters maintain an internal row state that sets > counters for calls of filterCell(), which interacts with the results of > previous filters in a filterList. > > When filters return different results for filterRowkey(), then filters > returning SEEK_NEXT_USING_HINT that have returned false must have > filterCell() called, otherwise the scan will degenerate into a full scan. > > On the other hand, filters that maintain an internal row state must only be > called if all previous filters have INCLUDEed the Cell, otherwise their > internal state will be off. (This still has caveats, as described in > HBASE-20565 <https://issues.apache.org/jira/browse/HBASE-20565>) > > In my opinion, the current code from HBASE-20565 > <https://issues.apache.org/jira/browse/HBASE-20565> strikes a bad balance > between features, as while it fixes some use cases for row stateful > filters, it also often negates the performance benefits of the filters > providing hints, which in practice makes them unusable in many filter list > combinations. > > Without completely re-designing the filter system, I think that the best > solution would be adding a method to distinguish the filters that can > return hints from the rest of them. (This was also suggested in HBASE-20565 > <https://issues.apache.org/jira/browse/HBASE-20565> , but it was not > implemented) > > In theory, we have four combinations of hinting and row stateful filters, > but currently we have no filters that are both hinting and row stateful, > and I don't think that there is valid use case for those. The ones that are > neither hinting nor stateful could be handled as either, but treating them > as non-hinting seems faster. > > Once we have that, we can improve the filterList behaviour a lot: > - in filterRowKey(), if any hinting filter returns false, then we could > return false > - in filterCell(), rather than returning on the first non-include result, > we could process the remaining hinting filters, while skipping the > non-hinting ones. > > The code changes are minimal, we just need to add a new method like > isHinting() to the Filter class, and change the above two methods. > > We could add this even in 2.5, by defaulting isHinting() to return false in > the Filter class, which would preserve the current API and behaviour for > existing custom filters. > > I was looking at it from the AND filter perspective, but if needed, similar > changes could be made to the OR filter. > > What do you think ? > Is this a good idea ? > > Istvan > -- Best regards, Andrew Unrest, ignorance distilled, nihilistic imbeciles - It's what we’ve earned Welcome, apocalypse, what’s taken you so long? Bring us the fitting end that we’ve been counting on - A23, Welcome, Apocalypse