I think you've clearly put a lot of time into the analysis and it is
plausible.

Adding isHinting as a default method will preserve binary compatibility.
Source compatibility for derived custom filters would be broken though and
that probably prevents this going back into a releasing code line.

Have you considered adding a marker interface instead? That would preserve
both source and binary compatibility. It wouldn't require any changes to
derived custom filters. A runtime instanceof test would determine if the
filter is a hinting filter or not. No need for a new method, default or
otherwise.

On Tue, May 28, 2024 at 12:41 AM Istvan Toth <st...@apache.org> wrote:

> I have recently opened HBASE-28622
> <https://issues.apache.org/jira/browse/HBASE-28622> , which has turned out
> to be another aspect of the problem discussed in HBASE-20565
> <https://issues.apache.org/jira/browse/HBASE-20565> .
>
> The problem is discussed in detail in HBASE-20565
> <https://issues.apache.org/jira/browse/HBASE-20565> , but it boils down to
> the API design decision that the filters returning SEEK_NEXT_USING_HINT
> rely on filterCell() getting called.
>
> On the other hand, some filters maintain an internal row state that sets
> counters for calls of filterCell(), which interacts with the results of
> previous filters in a filterList.
>
> When filters return different results for filterRowkey(), then filters
> returning  SEEK_NEXT_USING_HINT that have returned false must have
> filterCell() called, otherwise the scan will degenerate into a full scan.
>
> On the other hand, filters that maintain an internal row state must only be
> called if all previous filters have INCLUDEed the Cell, otherwise their
> internal state will be off. (This still has caveats, as described in
> HBASE-20565 <https://issues.apache.org/jira/browse/HBASE-20565>)
>
> In my opinion, the current code from HBASE-20565
> <https://issues.apache.org/jira/browse/HBASE-20565> strikes a bad balance
> between features, as while it fixes some use cases for row stateful
> filters, it also often negates the performance benefits of the filters
> providing hints, which in practice makes them unusable in many filter list
> combinations.
>
> Without completely re-designing the filter system, I think that the best
> solution would be adding a method to distinguish the filters that can
> return hints from the rest of them. (This was also suggested in HBASE-20565
> <https://issues.apache.org/jira/browse/HBASE-20565> , but it was not
> implemented)
>
> In theory, we have four combinations of hinting and row stateful filters,
> but currently we have no filters that are both hinting and row stateful,
> and I don't think that there is valid use case for those. The ones that are
> neither hinting nor stateful could be handled as either, but treating them
> as non-hinting seems faster.
>
> Once we have that, we can improve the filterList behaviour a lot:
> - in filterRowKey(), if any hinting filter returns false, then we could
> return false
> - in filterCell(), rather than returning on the first non-include result,
> we could process the remaining hinting filters, while skipping the
> non-hinting ones.
>
> The code changes are minimal, we just need to add a new method like
> isHinting() to the Filter class, and change the above two methods.
>
> We could add this even in 2.5, by defaulting isHinting() to return false in
> the Filter class, which would preserve the current API and behaviour for
> existing custom filters.
>
> I was looking at it from the AND filter perspective, but if needed, similar
> changes could be made to the OR filter.
>
> What do you think ?
> Is this a good idea ?
>
> Istvan
>


-- 
Best regards,
Andrew

Unrest, ignorance distilled, nihilistic imbeciles -
    It's what we’ve earned
Welcome, apocalypse, what’s taken you so long?
Bring us the fitting end that we’ve been counting on
   - A23, Welcome, Apocalypse

Reply via email to