Actually source compatibility with default methods would be fine too. I
forget this is the main reason default methods were invented. The code of
derived classes would not need to be changed, unless the returned value of
the new method should be changed, and this is no worse than having a marker
interface, which would also require code changes to implement non-default
behaviors.

A marker interface does remain as an option. It might make a difference in
chained use cases. Consider a chain of filter instances that mixes derived
code that is unaware of isHinting() and base code that is. The filter chain
can be examined for the presence or absence of the marker interface and
would not need to rely on every filter in the chain passing return values
of isHinting back.

Marker interfaces can also be added to denote stateful or stateless
filters, if distinguishing between them would be useful, perhaps down the
road.

On Tue, May 28, 2024 at 5:13 PM Andrew Purtell <apurt...@apache.org> wrote:

> I think you've clearly put a lot of time into the analysis and it is
> plausible.
>
> Adding isHinting as a default method will preserve binary compatibility.
> Source compatibility for derived custom filters would be broken though and
> that probably prevents this going back into a releasing code line.
>
> Have you considered adding a marker interface instead? That would preserve
> both source and binary compatibility. It wouldn't require any changes to
> derived custom filters. A runtime instanceof test would determine if the
> filter is a hinting filter or not. No need for a new method, default or
> otherwise.
>
> On Tue, May 28, 2024 at 12:41 AM Istvan Toth <st...@apache.org> wrote:
>
>> I have recently opened HBASE-28622
>> <https://issues.apache.org/jira/browse/HBASE-28622> , which has turned
>> out
>> to be another aspect of the problem discussed in HBASE-20565
>> <https://issues.apache.org/jira/browse/HBASE-20565> .
>>
>> The problem is discussed in detail in HBASE-20565
>> <https://issues.apache.org/jira/browse/HBASE-20565> , but it boils down
>> to
>> the API design decision that the filters returning SEEK_NEXT_USING_HINT
>> rely on filterCell() getting called.
>>
>> On the other hand, some filters maintain an internal row state that sets
>> counters for calls of filterCell(), which interacts with the results of
>> previous filters in a filterList.
>>
>> When filters return different results for filterRowkey(), then filters
>> returning  SEEK_NEXT_USING_HINT that have returned false must have
>> filterCell() called, otherwise the scan will degenerate into a full scan.
>>
>> On the other hand, filters that maintain an internal row state must only
>> be
>> called if all previous filters have INCLUDEed the Cell, otherwise their
>> internal state will be off. (This still has caveats, as described in
>> HBASE-20565 <https://issues.apache.org/jira/browse/HBASE-20565>)
>>
>> In my opinion, the current code from HBASE-20565
>> <https://issues.apache.org/jira/browse/HBASE-20565> strikes a bad balance
>> between features, as while it fixes some use cases for row stateful
>> filters, it also often negates the performance benefits of the filters
>> providing hints, which in practice makes them unusable in many filter list
>> combinations.
>>
>> Without completely re-designing the filter system, I think that the best
>> solution would be adding a method to distinguish the filters that can
>> return hints from the rest of them. (This was also suggested in
>> HBASE-20565
>> <https://issues.apache.org/jira/browse/HBASE-20565> , but it was not
>> implemented)
>>
>> In theory, we have four combinations of hinting and row stateful filters,
>> but currently we have no filters that are both hinting and row stateful,
>> and I don't think that there is valid use case for those. The ones that
>> are
>> neither hinting nor stateful could be handled as either, but treating them
>> as non-hinting seems faster.
>>
>> Once we have that, we can improve the filterList behaviour a lot:
>> - in filterRowKey(), if any hinting filter returns false, then we could
>> return false
>> - in filterCell(), rather than returning on the first non-include result,
>> we could process the remaining hinting filters, while skipping the
>> non-hinting ones.
>>
>> The code changes are minimal, we just need to add a new method like
>> isHinting() to the Filter class, and change the above two methods.
>>
>> We could add this even in 2.5, by defaulting isHinting() to return false
>> in
>> the Filter class, which would preserve the current API and behaviour for
>> existing custom filters.
>>
>> I was looking at it from the AND filter perspective, but if needed,
>> similar
>> changes could be made to the OR filter.
>>
>> What do you think ?
>> Is this a good idea ?
>>
>> Istvan
>>
>
-- 
Best regards,
Andrew

Unrest, ignorance distilled, nihilistic imbeciles -
    It's what we’ve earned
Welcome, apocalypse, what’s taken you so long?
Bring us the fitting end that we’ve been counting on
   - A23, Welcome, Apocalypse

Reply via email to