I have recently opened HBASE-28622 <https://issues.apache.org/jira/browse/HBASE-28622> , which has turned out to be another aspect of the problem discussed in HBASE-20565 <https://issues.apache.org/jira/browse/HBASE-20565> .
The problem is discussed in detail in HBASE-20565 <https://issues.apache.org/jira/browse/HBASE-20565> , but it boils down to the API design decision that the filters returning SEEK_NEXT_USING_HINT rely on filterCell() getting called. On the other hand, some filters maintain an internal row state that sets counters for calls of filterCell(), which interacts with the results of previous filters in a filterList. When filters return different results for filterRowkey(), then filters returning SEEK_NEXT_USING_HINT that have returned false must have filterCell() called, otherwise the scan will degenerate into a full scan. On the other hand, filters that maintain an internal row state must only be called if all previous filters have INCLUDEed the Cell, otherwise their internal state will be off. (This still has caveats, as described in HBASE-20565 <https://issues.apache.org/jira/browse/HBASE-20565>) In my opinion, the current code from HBASE-20565 <https://issues.apache.org/jira/browse/HBASE-20565> strikes a bad balance between features, as while it fixes some use cases for row stateful filters, it also often negates the performance benefits of the filters providing hints, which in practice makes them unusable in many filter list combinations. Without completely re-designing the filter system, I think that the best solution would be adding a method to distinguish the filters that can return hints from the rest of them. (This was also suggested in HBASE-20565 <https://issues.apache.org/jira/browse/HBASE-20565> , but it was not implemented) In theory, we have four combinations of hinting and row stateful filters, but currently we have no filters that are both hinting and row stateful, and I don't think that there is valid use case for those. The ones that are neither hinting nor stateful could be handled as either, but treating them as non-hinting seems faster. Once we have that, we can improve the filterList behaviour a lot: - in filterRowKey(), if any hinting filter returns false, then we could return false - in filterCell(), rather than returning on the first non-include result, we could process the remaining hinting filters, while skipping the non-hinting ones. The code changes are minimal, we just need to add a new method like isHinting() to the Filter class, and change the above two methods. We could add this even in 2.5, by defaulting isHinting() to return false in the Filter class, which would preserve the current API and behaviour for existing custom filters. I was looking at it from the AND filter perspective, but if needed, similar changes could be made to the OR filter. What do you think ? Is this a good idea ? Istvan