[jira] [Comment Edited] (HBASE-11125) Introduce a higher level interface for registering interest in coprocessor upcalls

Andrew Purtell (JIRA) Sat, 27 Dec 2014 11:57:28 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-11125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14259470#comment-14259470
 ]


Andrew Purtell edited comment on HBASE-11125 at 12/27/14 7:57 PM:
------------------------------------------------------------------

The core principle of the current coprocessor API is minimization of overhead. 
We have a “kernel hook” API where execution of extension code takes place in 
the current thread to avoid a context switch and copying, using low level types 
to avoid translation costs, allocations and copying. This is why the current 
API has been successful, and we want to retain it, but as a result of this 
choice:
# Misbehaving code can take down the server.
# Many low level types that do not and cannot have compatibility guarantees are 
exposed to coprocessor applications.
# Interfaces like RegionObserver carry a lot of internal details that might be 
unrelated to the task(s) at hand.

This issue focuses on the latter two problems. (The first can be addressed by 
HBASE-4047.)

A proposal.

Create a new API based around an interface called Extension. Extension can knit 
together coprocessors and plugins.

Extensions would have a method called at load time that returns a list of 
objects for which their types express intentions. Intention types would be 
fine-grained, expressing:
- A request to listen for an event (read only), a _xxx_Listener, either 
globally or on a per-table basis
- A request to intercept an event (read with possible modification or drop), a 
_xxx_Transformer, either globally or on a per-table basis
- A request to implement an Endpoint interface (or part of one?)

As a rule of thumb we would define one intention type for each:
- Invocation of a method of an Observer: _xxx_Transformer for pre hooks, 
_xxx_Listener for post hooks, e.g. DeleteTransformer -> preDelete, 
DeleteListener -> postDelete
- Invocation of a method of a plugin: flush policy, compaction policy, split 
policy, etc. 
- Endpoint

A naive implementation would maintain lists of intentions at various hook 
points. For each operation perhaps several lists would need to be walked and 
processed in turn. I think we can do better and maintain the performance of the 
current API.

An Extension ClassLoader could generate code for wiring up intentions to low 
level hooks or plugin sites. For example if we have several intentions that map 
to RegionObserver methods, we would codegen a BaseRegionObserver subclass, 
folding in bytecode of the intentions, and install it. Or if we find intention 
to override split policy, we would codegen a delegating split policy 
implementation, folding in the bytecode of the intention, delegating everything 
else to whatever plugin is already installed, then install the result.

It will not be necessary to have complete coverage of all coprocessor hooks in 
the collection of intent types for the higher level API to be useful. We should 
start with straightforward cases and then extend it over time. Consider 
RegionObserver#preBatchMutate. We don't want to expose 
MiniBatchOperationInProgress. Too tied into low level details of how the 
regionserver processes batch RPCs. Instead, we'd collect intentions scoped 
narrowly to mutation types (Append, Increment, Put) and synthesize a hook for 
preBatchMutate as needed. Or, consider RegionObserver#preCheckAndDelete. We 
might want to combine Get and Delete intentions into a synthetic hook for 
preCheckAndDelete, but not have an explicit CheckAndDelete intention, which 
exposes a RPC detail. Design for different cases can be done in subtasks.

Code generation allows us to decouple intention types from internals. For 
example, a PutTransformer would be installed as a RegionObserver with an 
implemented prePut method. This is what prePut hooks look like today:

{code}
void prePut(ObserverContext<RegionCoprocessorEnvironment> c, Put put, WALEdit 
edit, Durability durability)
{code}

Ideally the PutTransformer intention type should only know about the Put type 
and have a reference to a context if it needs to be stateful. We can carefully 
add state to the intention type for controlling durability. We should have a 
separate intention for modifying WALEdits. We can do this without leaking out 
the WALEdit type. Yet the "transformer" code would run in a prePut hook and get 
good performance. We could even change the signature of RegionObserver#prePut 
at any time, provided the code generator that maps intentions to low level 
implementation is updated likewise (setting aside other considerations for the 
moment).

We would aim for code generation that can be maintained by committers not 
experts in JVM internals. That said, some complexity is unavoidable. I think 
the promise of composability of fine grained intentions, API-level 
supportability of hiding internal types, and the implied performance of 
“inlining” intentions into straight line code for low level hooks could be well 
worth it. We can mitigate maintenance risks by placing the Extension API and 
code generator into its own Maven module. This module would provide a system 
level coprocessor that must be installed via site configuration for 
experimental “Extension” API support. It would be optional and decoupled from 
the client and server core modules. Should at some point the project feel the 
"Extension" API is not supportable, the module can simply be removed. It could 
even start life as a separate tree, hosted on GitHub.

Because we are keeping the low level "kernel-hook"-style API the lack of access 
to internal types and lack of functional coverage in a higher level API 
wouldn't be a problem. An implementor could always resort to direct use of low 
level interfaces. Of course we would want to figure out how to implement the 
desired extension in higher level terms.


was (Author: apurtell):
The core principle of the current coprocessor API is minimization of overhead. 
We have a “kernel hook” API where execution of extension code takes place in 
the current thread to avoid a context switch and copying, using low level types 
to avoid translation costs, allocations and copying. This is why the current 
API has been successful, and we want to retain it, but as a result of this 
choice:
# Misbehaving code can take down the server.
# Many low level types that do not and cannot have compatibility guarantees are 
exposed to coprocessor applications.
# Interfaces like RegionObserver carry a lot of internal details that might be 
unrelated to the task(s) at hand.

This issue focuses on the latter two problems. (The first can be addressed by 
HBASE-4047.)

A proposal.

Create a new API based around an interface called Extension. Extension can knit 
together coprocessors and plugins.

Extensions would have a method called at load time that returns a list of 
objects for which their types express intentions. Intention types would be 
fine-grained, expressing:
- A request to listen for an event (read only), a _xxx_Listener, either 
globally or on a per-table basis
- A request to intercept an event (read with possible modification or drop), a 
_xxx_Transformer, either globally or on a per-table basis
- A request to implement an Endpoint interface (or part of one?)

As a rule of thumb we would define one intention type for each:
- Invocation of a method of an Observer: _xxx_Transformer for pre hooks, 
_xxx_Listener for post hooks, e.g. DeleteTransformer -> preDelete, 
DeleteListener -> postDelete
- Invocation of a method of a plugin: flush policy, compaction policy, split 
policy, etc. 
- Endpoint

A naive implementation would maintain lists of intentions at various hook 
points. For each operation perhaps several lists would need to be walked and 
processed in turn. I think we can do better and maintain the performance of the 
current API.

An Extension ClassLoader could generate code for wiring up intentions to low 
level hooks or plugin sites. For example if we have several intentions that map 
to RegionObserver methods, we would codegen a BaseRegionObserver subclass, 
folding in bytecode of the intentions, and install it. Or if we find intention 
to override split policy, we would codegen a delegating split policy 
implementation, folding in the bytecode of the intention, delegating everything 
else to whatever plugin is already installed, then install the result.

It will not be necessary to have complete coverage of all coprocessor hooks in 
the collection of intent types for the higher level API to be useful. We should 
start with straightforward cases and then extend it over time. Consider 
RegionObserver#preBatchMutate. We don't want to expose 
MiniBatchOperationInProgress. Too tied into low level details of how the 
regionserver processes batch RPCs. Instead, we'd collect intentions scoped 
narrowly to mutation types (Append, Increment, Put) and synthesize a hook for 
preBatchMutate as needed. Or, consider RegionObserver#preCheckAndDelete. We 
might want to combine Get and Delete intentions into a synthetic hook for 
preCheckAndDelete, but not have an explicit CheckAndDelete intention, which 
exposes a RPC detail. Design for different cases can be done in subtasks.

Code generation allows us to decouple intention types from internals. For 
example, a PutTransformer would be installed as a RegionObserver with an 
implemented prePut method. This is what prePut hooks look like today:

{code}
void prePut(ObserverContext<RegionCoprocessorEnvironment> c, Put put, WALEdit 
edit, Durability durability)
{code}

Ideally the PutTransformer intention type should only know about the Put type 
and have a reference to a context if it needs to be stateful. We can carefully 
add state to the intention type for controlling durability. We should have a 
separate intention for modifying WALEdits. We can do this without leaking out 
the WALEdit type. Yet the "transformer" code would run in a prePut hook and get 
good performance. We could even change the signature of RegionObserver#prePut 
at any time, provided the code generator that maps intentions to low level 
implementation is updated likewise (setting aside other considerations for the 
moment).

We would aim for code generation that can be maintained by committers not 
experts in JVM internals. That said, some complexity is unavoidable. I think 
the promise of composability of fine grained intentions, API-level 
supportability of hiding internal types, and the implied performance of 
“inlining” intentions into straight line code for low level hooks could be well 
worth it. We can mitigate maintenance risks by placing the Extension API and 
code generator into its own Maven module. This module would provide a system 
level coprocessor that must be installed via site configuration for 
experimental “Extension” API support. It would be optional and decoupled from 
the client and server core modules. 

Because we are keeping the low level "kernel-hook"-style API the lack of access 
to internal types and lack of functional coverage in a higher level API 
wouldn't be a problem. An implementor could always resort to direct use of low 
level interfaces. Of course we would want to figure out how to implement the 
desired extension in higher level terms.

> Introduce a higher level interface for registering interest in coprocessor 
> upcalls
> ----------------------------------------------------------------------------------
>
>                 Key: HBASE-11125
>                 URL: https://issues.apache.org/jira/browse/HBASE-11125
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Andrew Purtell
>            Priority: Critical
>
> We should introduce a higher level interface for managing the registration of 
> 'user' code for execution from the low level hooks. It should not be 
> necessary for coprocessor implementers to learn the universe of available low 
> level hooks and the subtleties of their placement within HBase core code. 
> Instead the higher level API should allow the implementer to describe their 
> intent and then this API should choose the appropriate low level hook 
> placement.
> A very desirable side effect is a layer of indirection between coprocessor 
> implementers and the actual hooks. This will address the perennial complaint 
> that the low level hooks change too much from release to release, as recently 
> discussed during the RM panel at HBaseCon. If we try to avoid changing the 
> particular placement and arguments of hook functions in response to those 
> complaints, this can be an onerous constraint on necessary internals 
> evolution. Instead we can direct coprocessor implementers to consider the new 
> API and provide the same interface stability guarantees there as we do for 
> client API, 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HBASE-11125) Introduce a higher level interface for registering interest in coprocessor upcalls

Reply via email to