[ 
https://issues.apache.org/jira/browse/HBASE-27666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Beaudreault updated HBASE-27666:
--------------------------------------
    Description: 
In 
[Compactor.java#L492|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/Compactor.java#L492],
 when the compaction scanner extends KeyValueScanner, we call shipped() 
periodically. This releases shared buffers which makes compactions less memory 
hungry.

When a preCompact hook returns a custom scanner, that scanner is unlikely to 
extend KeyValueScanner. So installing a preCompact hook can cause a regression 
in terms of memory pressure.

KeyValueScanner is a large interface with many methods, but the method being 
called (shipped()) is actually on the Shipper interface (which KeyValueScanner 
extends). It would be nice to allow coprocessors to more easily integrate here.

I can think of two options:
 # Simply change the cast on [line 
436|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/Compactor.java#L436]
 from KeyValueScanner to Shipper. A coprocessor can easily return an 
InternalScanner impl which also extends Shipper since it's just one method. A 
common pattern will likely be to delegate to the underlying real scanner.
 # Add a new DelegateInternalScanner which coprocessors can return. In that 
case, call scanner.getDelegate() and check if _that_ extends KeyValueScanner.

I prefer option 1 as it's very simple and makes it really easy to reason about 
– a coprocessor may not _want_ to release cells, in which case they don't need 
to extend Shipper. The downside of this is we'd have to promote Shipper 
interface to IA.LimitedPrivate(COPROC)

Option 2 could open us to other possibilities in the future, and could be 
configurable. Like maybe it has a scanner.supportsShipped() or something. If 
true, the underlying scanner's shipped method will be called. If not, it won't.

Barring other opinions, I will go with option 1 as it's trivial.

cc [~anoopsamjohn] since you were involved in the linked change and the 
creation of Shipper interface. In the linked code, we don't use any other 
method on KeyValueScanner interface so it seems totally fine to use Shipper 
instead.

  was:
In 
[https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/Compactor.java#L492,]
 when the compaction scanner extends KeyValueScanner, we call shipped() 
periodically. This releases shared buffers which makes compactions less memory 
hungry.

When a preCompact hook returns a custom scanner, that scanner is unlikely to 
extend KeyValueScanner. So installing a preCompact hook can cause a regression 
in terms of memory pressure.

KeyValueScanner is a large interface with many methods, but the method being 
called (shipped()) is actually on the Shipper interface (which KeyValueScanner 
extends). It would be nice to allow coprocessors to more easily integrate here.

I can think of two options:
 # Simply change the cast on [line 
436|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/Compactor.java#L436]
 from KeyValueScanner to Shipper. A coprocessor can easily return an 
InternalScanner impl which also extends Shipper since it's just one method. A 
common pattern will likely be to delegate to the underlying real scanner.
 # Add a new DelegateInternalScanner which coprocessors can return. In that 
case, call scanner.getDelegate() and check if _that_ extends KeyValueScanner.

I prefer option 1 as it's very simple and makes it really easy to reason about 
– a coprocessor may not _want_ to release cells, in which case they don't need 
to extend Shipper. The downside of this is we'd have to promote Shipper 
interface to IA.LimitedPrivate(COPROC)

Option 2 could open us to other possibilities in the future, and could be 
configurable. Like maybe it has a scanner.supportsShipped() or something. If 
true, the underlying scanner's shipped method will be called. If not, it won't.

Barring other opinions, I will go with option 1 as it's trivial.

cc [~anoopsamjohn] since you were involved in the linked change and the 
creation of Shipper interface. In the linked code, we don't use any other 
method on KeyValueScanner interface so it seems totally fine to use Shipper 
instead.


> Allow preCompact hooks to return scanners whose cells can be shipped
> --------------------------------------------------------------------
>
>                 Key: HBASE-27666
>                 URL: https://issues.apache.org/jira/browse/HBASE-27666
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Bryan Beaudreault
>            Priority: Major
>
> In 
> [Compactor.java#L492|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/Compactor.java#L492],
>  when the compaction scanner extends KeyValueScanner, we call shipped() 
> periodically. This releases shared buffers which makes compactions less 
> memory hungry.
> When a preCompact hook returns a custom scanner, that scanner is unlikely to 
> extend KeyValueScanner. So installing a preCompact hook can cause a 
> regression in terms of memory pressure.
> KeyValueScanner is a large interface with many methods, but the method being 
> called (shipped()) is actually on the Shipper interface (which 
> KeyValueScanner extends). It would be nice to allow coprocessors to more 
> easily integrate here.
> I can think of two options:
>  # Simply change the cast on [line 
> 436|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/Compactor.java#L436]
>  from KeyValueScanner to Shipper. A coprocessor can easily return an 
> InternalScanner impl which also extends Shipper since it's just one method. A 
> common pattern will likely be to delegate to the underlying real scanner.
>  # Add a new DelegateInternalScanner which coprocessors can return. In that 
> case, call scanner.getDelegate() and check if _that_ extends KeyValueScanner.
> I prefer option 1 as it's very simple and makes it really easy to reason 
> about – a coprocessor may not _want_ to release cells, in which case they 
> don't need to extend Shipper. The downside of this is we'd have to promote 
> Shipper interface to IA.LimitedPrivate(COPROC)
> Option 2 could open us to other possibilities in the future, and could be 
> configurable. Like maybe it has a scanner.supportsShipped() or something. If 
> true, the underlying scanner's shipped method will be called. If not, it 
> won't.
> Barring other opinions, I will go with option 1 as it's trivial.
> cc [~anoopsamjohn] since you were involved in the linked change and the 
> creation of Shipper interface. In the linked code, we don't use any other 
> method on KeyValueScanner interface so it seems totally fine to use Shipper 
> instead.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to