Committed the discussed fix as HBASE-28622 .
Thank you Andrew for discussing this, and Duo for the review.

Istvan

On Fri, May 31, 2024 at 2:36 PM Istvan Toth <st...@cloudera.com> wrote:

> It turns out that ColumnPaginationFilter is both row stateful and can
> return a seek hint.
> I have removed the HintingFilter marker from it to preserve the correct
> operation.
>
> With this change, ColumnPaginationFilter is no worse off than it was, but
> the rest of the hinting
> filters will work correctly.
>
> On Fri, May 31, 2024 at 9:32 AM Istvan Toth <st...@cloudera.com> wrote:
>
>> This is indeed quite a small change.
>> The PR is up at https://github.com/apache/hbase/pull/5955
>>
>> On Wed, May 29, 2024 at 10:07 AM Istvan Toth <st...@cloudera.com> wrote:
>>
>>> Thanks for the detailed reply, Andrew.
>>>
>>> I was also considering default methods, but it turns out that Filter is
>>> not an interface, but an abstract class, so it doesn't apply.
>>>
>>> Children not implementing a marker interface or marker method would
>>> inherit the marker method implementation from the closest parent the same
>>> way they would inherit the marker interface, so I think they are equivalent
>>> in this aspect, too.
>>>
>>> I think that marker interface(s) and overridable non-abstract getter(s)
>>> in Filter are mostly equivalent from both logical and source compatibility
>>> aspects.
>>> The only difference is that the marker interfaces cannot be removed in a
>>> subclass, while the getter can be overridden anywhere, but with well-chosen
>>> defaults it shouldn't be much of a limitation.
>>>
>>> Now that I think about it, we could cache the markers' values in an
>>> array when creating the filter lists, so even the cost of looking them up
>>> doesn't matter as it wouldn't happen in the hot code path.
>>>
>>> Using the marker interfaces is more elegant, and discourages problematic
>>> subclassing, so I am leaning towards that.
>>>
>>> Istvan
>>>
>>> On Wed, May 29, 2024 at 2:30 AM Andrew Purtell <apurt...@apache.org>
>>> wrote:
>>>
>>>> Actually source compatibility with default methods would be fine too. I
>>>> forget this is the main reason default methods were invented. The code
>>>> of
>>>> derived classes would not need to be changed, unless the returned value
>>>> of
>>>> the new method should be changed, and this is no worse than having a
>>>> marker
>>>> interface, which would also require code changes to implement
>>>> non-default
>>>> behaviors.
>>>>
>>>> A marker interface does remain as an option. It might make a difference
>>>> in
>>>> chained use cases. Consider a chain of filter instances that mixes
>>>> derived
>>>> code that is unaware of isHinting() and base code that is. The filter
>>>> chain
>>>> can be examined for the presence or absence of the marker interface and
>>>> would not need to rely on every filter in the chain passing return
>>>> values
>>>> of isHinting back.
>>>>
>>>> Marker interfaces can also be added to denote stateful or stateless
>>>> filters, if distinguishing between them would be useful, perhaps down
>>>> the
>>>> road.
>>>>
>>>> On Tue, May 28, 2024 at 5:13 PM Andrew Purtell <apurt...@apache.org>
>>>> wrote:
>>>>
>>>> > I think you've clearly put a lot of time into the analysis and it is
>>>> > plausible.
>>>> >
>>>> > Adding isHinting as a default method will preserve binary
>>>> compatibility.
>>>> > Source compatibility for derived custom filters would be broken
>>>> though and
>>>> > that probably prevents this going back into a releasing code line.
>>>> >
>>>> > Have you considered adding a marker interface instead? That would
>>>> preserve
>>>> > both source and binary compatibility. It wouldn't require any changes
>>>> to
>>>> > derived custom filters. A runtime instanceof test would determine if
>>>> the
>>>> > filter is a hinting filter or not. No need for a new method, default
>>>> or
>>>> > otherwise.
>>>> >
>>>> > On Tue, May 28, 2024 at 12:41 AM Istvan Toth <st...@apache.org>
>>>> wrote:
>>>> >
>>>> >> I have recently opened HBASE-28622
>>>> >> <https://issues.apache.org/jira/browse/HBASE-28622> , which has
>>>> turned
>>>> >> out
>>>> >> to be another aspect of the problem discussed in HBASE-20565
>>>> >> <https://issues.apache.org/jira/browse/HBASE-20565> .
>>>> >>
>>>> >> The problem is discussed in detail in HBASE-20565
>>>> >> <https://issues.apache.org/jira/browse/HBASE-20565> , but it boils
>>>> down
>>>> >> to
>>>> >> the API design decision that the filters returning
>>>> SEEK_NEXT_USING_HINT
>>>> >> rely on filterCell() getting called.
>>>> >>
>>>> >> On the other hand, some filters maintain an internal row state that
>>>> sets
>>>> >> counters for calls of filterCell(), which interacts with the results
>>>> of
>>>> >> previous filters in a filterList.
>>>> >>
>>>> >> When filters return different results for filterRowkey(), then
>>>> filters
>>>> >> returning  SEEK_NEXT_USING_HINT that have returned false must have
>>>> >> filterCell() called, otherwise the scan will degenerate into a full
>>>> scan.
>>>> >>
>>>> >> On the other hand, filters that maintain an internal row state must
>>>> only
>>>> >> be
>>>> >> called if all previous filters have INCLUDEed the Cell, otherwise
>>>> their
>>>> >> internal state will be off. (This still has caveats, as described in
>>>> >> HBASE-20565 <https://issues.apache.org/jira/browse/HBASE-20565>)
>>>> >>
>>>> >> In my opinion, the current code from HBASE-20565
>>>> >> <https://issues.apache.org/jira/browse/HBASE-20565> strikes a bad
>>>> balance
>>>> >> between features, as while it fixes some use cases for row stateful
>>>> >> filters, it also often negates the performance benefits of the
>>>> filters
>>>> >> providing hints, which in practice makes them unusable in many
>>>> filter list
>>>> >> combinations.
>>>> >>
>>>> >> Without completely re-designing the filter system, I think that the
>>>> best
>>>> >> solution would be adding a method to distinguish the filters that can
>>>> >> return hints from the rest of them. (This was also suggested in
>>>> >> HBASE-20565
>>>> >> <https://issues.apache.org/jira/browse/HBASE-20565> , but it was not
>>>> >> implemented)
>>>> >>
>>>> >> In theory, we have four combinations of hinting and row stateful
>>>> filters,
>>>> >> but currently we have no filters that are both hinting and row
>>>> stateful,
>>>> >> and I don't think that there is valid use case for those. The ones
>>>> that
>>>> >> are
>>>> >> neither hinting nor stateful could be handled as either, but
>>>> treating them
>>>> >> as non-hinting seems faster.
>>>> >>
>>>> >> Once we have that, we can improve the filterList behaviour a lot:
>>>> >> - in filterRowKey(), if any hinting filter returns false, then we
>>>> could
>>>> >> return false
>>>> >> - in filterCell(), rather than returning on the first non-include
>>>> result,
>>>> >> we could process the remaining hinting filters, while skipping the
>>>> >> non-hinting ones.
>>>> >>
>>>> >> The code changes are minimal, we just need to add a new method like
>>>> >> isHinting() to the Filter class, and change the above two methods.
>>>> >>
>>>> >> We could add this even in 2.5, by defaulting isHinting() to return
>>>> false
>>>> >> in
>>>> >> the Filter class, which would preserve the current API and behaviour
>>>> for
>>>> >> existing custom filters.
>>>> >>
>>>> >> I was looking at it from the AND filter perspective, but if needed,
>>>> >> similar
>>>> >> changes could be made to the OR filter.
>>>> >>
>>>> >> What do you think ?
>>>> >> Is this a good idea ?
>>>> >>
>>>> >> Istvan
>>>> >>
>>>> >
>>>> --
>>>> Best regards,
>>>> Andrew
>>>>
>>>> Unrest, ignorance distilled, nihilistic imbeciles -
>>>>     It's what we’ve earned
>>>> Welcome, apocalypse, what’s taken you so long?
>>>> Bring us the fitting end that we’ve been counting on
>>>>    - A23, Welcome, Apocalypse
>>>>
>>>
>>>
>>> --
>>> *István Tóth* | Sr. Staff Software Engineer
>>> *Email*: st...@cloudera.com
>>> cloudera.com <https://www.cloudera.com>
>>> [image: Cloudera] <https://www.cloudera.com/>
>>> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
>>> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
>>> Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera>
>>> ------------------------------
>>> ------------------------------
>>>
>>
>>
>> --
>> *István Tóth* | Sr. Staff Software Engineer
>> *Email*: st...@cloudera.com
>> cloudera.com <https://www.cloudera.com>
>> [image: Cloudera] <https://www.cloudera.com/>
>> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
>> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
>> Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera>
>> ------------------------------
>> ------------------------------
>>
>
>
> --
> *István Tóth* | Sr. Staff Software Engineer
> *Email*: st...@cloudera.com
> cloudera.com <https://www.cloudera.com>
> [image: Cloudera] <https://www.cloudera.com/>
> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
> Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera>
> ------------------------------
> ------------------------------
>


-- 
*István Tóth* | Sr. Staff Software Engineer
*Email*: st...@cloudera.com
cloudera.com <https://www.cloudera.com>
[image: Cloudera] <https://www.cloudera.com/>
[image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera
on LinkedIn] <https://www.linkedin.com/company/cloudera>
------------------------------
------------------------------

Reply via email to