Bryan Beaudreault created HBASE-27666:
-----------------------------------------
Summary: Allow preCompact hooks to return scanners whose cells can
be shipped
Key: HBASE-27666
URL: https://issues.apache.org/jira/browse/HBASE-27666
Project: HBase
Issue Type: Improvement
Reporter: Bryan Beaudreault
In
[https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/Compactor.java#L492,]
when the compaction scanner extends KeyValueScanner, we call shipped()
periodically. This releases shared buffers which makes compactions less memory
hungry.
When a preCompact hook returns a custom scanner, that scanner is unlikely to
extend KeyValueScanner. So installing a preCompact hook can cause a regression
in terms of memory pressure.
KeyValueScanner is a large interface with many methods, but the method being
called (shipped()) is actually on the Shipper interface (which KeyValueScanner
extends). It would be nice to allow coprocessors to more easily integrate here.
I can think of two options:
# Simply change the cast on [line
436|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/Compactor.java#L436]
from KeyValueScanner to Shipper. A coprocessor can easily return an
InternalScanner impl which also extends Shipper since it's just one method. A
common pattern will likely be to delegate to the underlying real scanner.
# Add a new DelegateInternalScanner which coprocessors can return. In that
case, call scanner.getDelegate() and check if _that_ extends KeyValueScanner.
I prefer option 1 as it's very simple and makes it really easy to reason about
– a coprocessor may not _want_ to release cells, in which case they don't need
to extend Shipper. The downside of this is we'd have to promote Shipper
interface to IA.LimitedPrivate(COPROC)
Option 2 could open us to other possibilities in the future, and could be
configurable. Like maybe it has a scanner.supportsShipped() or something. If
true, the underlying scanner's shipped method will be called. If not, it won't.
Barring other opinions, I will go with option 1 as it's trivial.
cc [~anoopsamjohn] since you were involved in the linked change and the
creation of Shipper interface. In the linked code, we don't use any other
method on KeyValueScanner interface so it seems totally fine to use Shipper
instead.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)