[
https://issues.apache.org/jira/browse/HBASE-27666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Bryan Beaudreault updated HBASE-27666:
--------------------------------------
Description:
In
[Compactor.java#L492|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/Compactor.java#L492],
when the compaction scanner extends KeyValueScanner, we call shipped()
periodically. This releases shared buffers which makes compactions less memory
hungry.
When a preCompact hook returns a custom scanner, that scanner is unlikely to
extend KeyValueScanner. So installing a preCompact hook can cause a regression
in terms of memory pressure.
KeyValueScanner is a large interface with many methods, but the method being
called (shipped()) is actually on the Shipper interface (which KeyValueScanner
extends). It would be nice to allow coprocessors to more easily integrate here.
I can think of two options:
# Simply change the cast on [line
436|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/Compactor.java#L436]
from KeyValueScanner to Shipper. A coprocessor can easily return an
InternalScanner impl which also extends Shipper since it's just one method. A
common pattern will likely be to delegate to the underlying real scanner.
# Add a new DelegateInternalScanner which coprocessors can return. In that
case, call scanner.getDelegate() and check if _that_ extends KeyValueScanner.
I prefer option 1 as it's very simple and makes it really easy to reason about
– a coprocessor may not _want_ to release cells, in which case they don't need
to extend Shipper. The downside of this is we'd have to promote Shipper
interface to IA.LimitedPrivate(COPROC)
Option 2 could open us to other possibilities in the future, and could be
configurable. Like maybe it has a scanner.supportsShipped() or something. If
true, the underlying scanner's shipped method will be called. If not, it won't.
Barring other opinions, I will go with option 1 as it's trivial.
cc [~anoopsamjohn] since you were involved in the linked change and the
creation of Shipper interface. In the linked code, we don't use any other
method on KeyValueScanner interface so it seems totally fine to use Shipper
instead.
was:
In
[https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/Compactor.java#L492,]
when the compaction scanner extends KeyValueScanner, we call shipped()
periodically. This releases shared buffers which makes compactions less memory
hungry.
When a preCompact hook returns a custom scanner, that scanner is unlikely to
extend KeyValueScanner. So installing a preCompact hook can cause a regression
in terms of memory pressure.
KeyValueScanner is a large interface with many methods, but the method being
called (shipped()) is actually on the Shipper interface (which KeyValueScanner
extends). It would be nice to allow coprocessors to more easily integrate here.
I can think of two options:
# Simply change the cast on [line
436|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/Compactor.java#L436]
from KeyValueScanner to Shipper. A coprocessor can easily return an
InternalScanner impl which also extends Shipper since it's just one method. A
common pattern will likely be to delegate to the underlying real scanner.
# Add a new DelegateInternalScanner which coprocessors can return. In that
case, call scanner.getDelegate() and check if _that_ extends KeyValueScanner.
I prefer option 1 as it's very simple and makes it really easy to reason about
– a coprocessor may not _want_ to release cells, in which case they don't need
to extend Shipper. The downside of this is we'd have to promote Shipper
interface to IA.LimitedPrivate(COPROC)
Option 2 could open us to other possibilities in the future, and could be
configurable. Like maybe it has a scanner.supportsShipped() or something. If
true, the underlying scanner's shipped method will be called. If not, it won't.
Barring other opinions, I will go with option 1 as it's trivial.
cc [~anoopsamjohn] since you were involved in the linked change and the
creation of Shipper interface. In the linked code, we don't use any other
method on KeyValueScanner interface so it seems totally fine to use Shipper
instead.
> Allow preCompact hooks to return scanners whose cells can be shipped
> --------------------------------------------------------------------
>
> Key: HBASE-27666
> URL: https://issues.apache.org/jira/browse/HBASE-27666
> Project: HBase
> Issue Type: Improvement
> Reporter: Bryan Beaudreault
> Priority: Major
>
> In
> [Compactor.java#L492|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/Compactor.java#L492],
> when the compaction scanner extends KeyValueScanner, we call shipped()
> periodically. This releases shared buffers which makes compactions less
> memory hungry.
> When a preCompact hook returns a custom scanner, that scanner is unlikely to
> extend KeyValueScanner. So installing a preCompact hook can cause a
> regression in terms of memory pressure.
> KeyValueScanner is a large interface with many methods, but the method being
> called (shipped()) is actually on the Shipper interface (which
> KeyValueScanner extends). It would be nice to allow coprocessors to more
> easily integrate here.
> I can think of two options:
> # Simply change the cast on [line
> 436|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/Compactor.java#L436]
> from KeyValueScanner to Shipper. A coprocessor can easily return an
> InternalScanner impl which also extends Shipper since it's just one method. A
> common pattern will likely be to delegate to the underlying real scanner.
> # Add a new DelegateInternalScanner which coprocessors can return. In that
> case, call scanner.getDelegate() and check if _that_ extends KeyValueScanner.
> I prefer option 1 as it's very simple and makes it really easy to reason
> about – a coprocessor may not _want_ to release cells, in which case they
> don't need to extend Shipper. The downside of this is we'd have to promote
> Shipper interface to IA.LimitedPrivate(COPROC)
> Option 2 could open us to other possibilities in the future, and could be
> configurable. Like maybe it has a scanner.supportsShipped() or something. If
> true, the underlying scanner's shipped method will be called. If not, it
> won't.
> Barring other opinions, I will go with option 1 as it's trivial.
> cc [~anoopsamjohn] since you were involved in the linked change and the
> creation of Shipper interface. In the linked code, we don't use any other
> method on KeyValueScanner interface so it seems totally fine to use Shipper
> instead.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)