[
https://issues.apache.org/jira/browse/HBASE-11125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14259470#comment-14259470
]
Andrew Purtell edited comment on HBASE-11125 at 12/27/14 7:57 PM:
------------------------------------------------------------------
The core principle of the current coprocessor API is minimization of overhead.
We have a “kernel hook” API where execution of extension code takes place in
the current thread to avoid a context switch and copying, using low level types
to avoid translation costs, allocations and copying. This is why the current
API has been successful, and we want to retain it, but as a result of this
choice:
# Misbehaving code can take down the server.
# Many low level types that do not and cannot have compatibility guarantees are
exposed to coprocessor applications.
# Interfaces like RegionObserver carry a lot of internal details that might be
unrelated to the task(s) at hand.
This issue focuses on the latter two problems. (The first can be addressed by
HBASE-4047.)
A proposal.
Create a new API based around an interface called Extension. Extension can knit
together coprocessors and plugins.
Extensions would have a method called at load time that returns a list of
objects for which their types express intentions. Intention types would be
fine-grained, expressing:
- A request to listen for an event (read only), a _xxx_Listener, either
globally or on a per-table basis
- A request to intercept an event (read with possible modification or drop), a
_xxx_Transformer, either globally or on a per-table basis
- A request to implement an Endpoint interface (or part of one?)
As a rule of thumb we would define one intention type for each:
- Invocation of a method of an Observer: _xxx_Transformer for pre hooks,
_xxx_Listener for post hooks, e.g. DeleteTransformer -> preDelete,
DeleteListener -> postDelete
- Invocation of a method of a plugin: flush policy, compaction policy, split
policy, etc.
- Endpoint
A naive implementation would maintain lists of intentions at various hook
points. For each operation perhaps several lists would need to be walked and
processed in turn. I think we can do better and maintain the performance of the
current API.
An Extension ClassLoader could generate code for wiring up intentions to low
level hooks or plugin sites. For example if we have several intentions that map
to RegionObserver methods, we would codegen a BaseRegionObserver subclass,
folding in bytecode of the intentions, and install it. Or if we find intention
to override split policy, we would codegen a delegating split policy
implementation, folding in the bytecode of the intention, delegating everything
else to whatever plugin is already installed, then install the result.
It will not be necessary to have complete coverage of all coprocessor hooks in
the collection of intent types for the higher level API to be useful. We should
start with straightforward cases and then extend it over time. Consider
RegionObserver#preBatchMutate. We don't want to expose
MiniBatchOperationInProgress. Too tied into low level details of how the
regionserver processes batch RPCs. Instead, we'd collect intentions scoped
narrowly to mutation types (Append, Increment, Put) and synthesize a hook for
preBatchMutate as needed. Or, consider RegionObserver#preCheckAndDelete. We
might want to combine Get and Delete intentions into a synthetic hook for
preCheckAndDelete, but not have an explicit CheckAndDelete intention, which
exposes a RPC detail. Design for different cases can be done in subtasks.
Code generation allows us to decouple intention types from internals. For
example, a PutTransformer would be installed as a RegionObserver with an
implemented prePut method. This is what prePut hooks look like today:
{code}
void prePut(ObserverContext<RegionCoprocessorEnvironment> c, Put put, WALEdit
edit, Durability durability)
{code}
Ideally the PutTransformer intention type should only know about the Put type
and have a reference to a context if it needs to be stateful. We can carefully
add state to the intention type for controlling durability. We should have a
separate intention for modifying WALEdits. We can do this without leaking out
the WALEdit type. Yet the "transformer" code would run in a prePut hook and get
good performance. We could even change the signature of RegionObserver#prePut
at any time, provided the code generator that maps intentions to low level
implementation is updated likewise (setting aside other considerations for the
moment).
We would aim for code generation that can be maintained by committers not
experts in JVM internals. That said, some complexity is unavoidable. I think
the promise of composability of fine grained intentions, API-level
supportability of hiding internal types, and the implied performance of
“inlining” intentions into straight line code for low level hooks could be well
worth it. We can mitigate maintenance risks by placing the Extension API and
code generator into its own Maven module. This module would provide a system
level coprocessor that must be installed via site configuration for
experimental “Extension” API support. It would be optional and decoupled from
the client and server core modules. Should at some point the project feel the
"Extension" API is not supportable, the module can simply be removed. It could
even start life as a separate tree, hosted on GitHub.
Because we are keeping the low level "kernel-hook"-style API the lack of access
to internal types and lack of functional coverage in a higher level API
wouldn't be a problem. An implementor could always resort to direct use of low
level interfaces. Of course we would want to figure out how to implement the
desired extension in higher level terms.
was (Author: apurtell):
The core principle of the current coprocessor API is minimization of overhead.
We have a “kernel hook” API where execution of extension code takes place in
the current thread to avoid a context switch and copying, using low level types
to avoid translation costs, allocations and copying. This is why the current
API has been successful, and we want to retain it, but as a result of this
choice:
# Misbehaving code can take down the server.
# Many low level types that do not and cannot have compatibility guarantees are
exposed to coprocessor applications.
# Interfaces like RegionObserver carry a lot of internal details that might be
unrelated to the task(s) at hand.
This issue focuses on the latter two problems. (The first can be addressed by
HBASE-4047.)
A proposal.
Create a new API based around an interface called Extension. Extension can knit
together coprocessors and plugins.
Extensions would have a method called at load time that returns a list of
objects for which their types express intentions. Intention types would be
fine-grained, expressing:
- A request to listen for an event (read only), a _xxx_Listener, either
globally or on a per-table basis
- A request to intercept an event (read with possible modification or drop), a
_xxx_Transformer, either globally or on a per-table basis
- A request to implement an Endpoint interface (or part of one?)
As a rule of thumb we would define one intention type for each:
- Invocation of a method of an Observer: _xxx_Transformer for pre hooks,
_xxx_Listener for post hooks, e.g. DeleteTransformer -> preDelete,
DeleteListener -> postDelete
- Invocation of a method of a plugin: flush policy, compaction policy, split
policy, etc.
- Endpoint
A naive implementation would maintain lists of intentions at various hook
points. For each operation perhaps several lists would need to be walked and
processed in turn. I think we can do better and maintain the performance of the
current API.
An Extension ClassLoader could generate code for wiring up intentions to low
level hooks or plugin sites. For example if we have several intentions that map
to RegionObserver methods, we would codegen a BaseRegionObserver subclass,
folding in bytecode of the intentions, and install it. Or if we find intention
to override split policy, we would codegen a delegating split policy
implementation, folding in the bytecode of the intention, delegating everything
else to whatever plugin is already installed, then install the result.
It will not be necessary to have complete coverage of all coprocessor hooks in
the collection of intent types for the higher level API to be useful. We should
start with straightforward cases and then extend it over time. Consider
RegionObserver#preBatchMutate. We don't want to expose
MiniBatchOperationInProgress. Too tied into low level details of how the
regionserver processes batch RPCs. Instead, we'd collect intentions scoped
narrowly to mutation types (Append, Increment, Put) and synthesize a hook for
preBatchMutate as needed. Or, consider RegionObserver#preCheckAndDelete. We
might want to combine Get and Delete intentions into a synthetic hook for
preCheckAndDelete, but not have an explicit CheckAndDelete intention, which
exposes a RPC detail. Design for different cases can be done in subtasks.
Code generation allows us to decouple intention types from internals. For
example, a PutTransformer would be installed as a RegionObserver with an
implemented prePut method. This is what prePut hooks look like today:
{code}
void prePut(ObserverContext<RegionCoprocessorEnvironment> c, Put put, WALEdit
edit, Durability durability)
{code}
Ideally the PutTransformer intention type should only know about the Put type
and have a reference to a context if it needs to be stateful. We can carefully
add state to the intention type for controlling durability. We should have a
separate intention for modifying WALEdits. We can do this without leaking out
the WALEdit type. Yet the "transformer" code would run in a prePut hook and get
good performance. We could even change the signature of RegionObserver#prePut
at any time, provided the code generator that maps intentions to low level
implementation is updated likewise (setting aside other considerations for the
moment).
We would aim for code generation that can be maintained by committers not
experts in JVM internals. That said, some complexity is unavoidable. I think
the promise of composability of fine grained intentions, API-level
supportability of hiding internal types, and the implied performance of
“inlining” intentions into straight line code for low level hooks could be well
worth it. We can mitigate maintenance risks by placing the Extension API and
code generator into its own Maven module. This module would provide a system
level coprocessor that must be installed via site configuration for
experimental “Extension” API support. It would be optional and decoupled from
the client and server core modules.
Because we are keeping the low level "kernel-hook"-style API the lack of access
to internal types and lack of functional coverage in a higher level API
wouldn't be a problem. An implementor could always resort to direct use of low
level interfaces. Of course we would want to figure out how to implement the
desired extension in higher level terms.
> Introduce a higher level interface for registering interest in coprocessor
> upcalls
> ----------------------------------------------------------------------------------
>
> Key: HBASE-11125
> URL: https://issues.apache.org/jira/browse/HBASE-11125
> Project: HBase
> Issue Type: New Feature
> Reporter: Andrew Purtell
> Priority: Critical
>
> We should introduce a higher level interface for managing the registration of
> 'user' code for execution from the low level hooks. It should not be
> necessary for coprocessor implementers to learn the universe of available low
> level hooks and the subtleties of their placement within HBase core code.
> Instead the higher level API should allow the implementer to describe their
> intent and then this API should choose the appropriate low level hook
> placement.
> A very desirable side effect is a layer of indirection between coprocessor
> implementers and the actual hooks. This will address the perennial complaint
> that the low level hooks change too much from release to release, as recently
> discussed during the RM panel at HBaseCon. If we try to avoid changing the
> particular placement and arguments of hook functions in response to those
> complaints, this can be an onerous constraint on necessary internals
> evolution. Instead we can direct coprocessor implementers to consider the new
> API and provide the same interface stability guarantees there as we do for
> client API,
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)