-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/816/
-----------------------------------------------------------

(Updated 2010-09-24 00:01:53.015127)


Review request for hbase, Andrew Purtell and Jonathan Gray.


Changes
-------

Changes since last:
- HTable.BatchCall and HTable.BatchCallback moved to Batch.Call and 
Batch.Callback
- added Batch.returning() factory method for generating common Batch.Call 
instances
- o.a.h.h.client.Exec now extends o.a.h.h.ipc.Invocation since there was quite 
a bit of overlap


To go into more detail about client usage, let me start with a slightly more 
applicable case for an exported CoprocessorProtocol (taken from the HBASE-2001 
patch).  Suppose a user creates a coprocessor that returns simple stats for the 
rows and key values in a region.  First the coprocessor would define an RPC 
interface:

  public interface CountProtocol extends CoprocessorProtocol {
    public long getRowCount();
    public long getRowCount(Filter filt);
    public long getKeyValueCount();
  }

The coprocessor code would implement this interface and either use the region 
coprocessor hooks to track state or to scan the data directly (we can happily 
ignore the details).

When the coprocessor is loaded with the region, the coprocessor framework 
identifies that the code implements a CoprocessorProtocol and automatically 
registers it with the region (see HRegion.registerProtocol()).

So we then need a way for clients to use a standard set of calls to invoke 
these registered protocols, when we know nothing about the details of the 
protocols themselves.  I looked at two options:


Option 1) Use a ioctl-type interface, passing the desired protocol class, 
method name (or code), and an optional list of arguments (in some cases we 
might also need the list of class types for the arguments).  Lookup the method 
in the protocol class based on the arguments or argument types if available, 
then batch the call to the server and return the result.  In this case, the 
client code might be:

  HTable table = new HTable("mytable");
  List<Row> rows = ...rows for regions to query...
  Invocation call = new Invocation(CountProtocol.class, "getRowCount");
  Map<byte[],Long> results = table.exec(rows, call);

This works alright for a basic method call, but there's no verification that 
the method exists or can be called with the given arguments until runtime.  In 
addition, if we want to make multiple method calls, we have to instantiate 
additional Invocation instances.  And if we want to tie those results together 
by region, we would need to do so by cross referencing the results in multiple 
maps.


Option 2) Expose a dynamic proxy of the protocol interface directly to the 
client code.  In this case, the client can directly call methods on the proxy, 
with compile time checking and readability as normal code, not deconstructed 
methods.  The Batch.Call interface allows the user to implement the client 
calls against a proxy instance we hand back.  As an optimization for the simple 
case from (1) we have a factory method to do the same thing with defining your 
own anonymous class:

  HTable table = new HTable("mytable");
  List<Row> rows = ...rows for regions to query...
  Batch.Call<CountProtocol,Long> call = Batch.returning(CountProtocol.class, 
"getRowCount");
  Map<byte[],Long> results = table.exec(CountProtocol.class, rows, call);

However, if you want to combine multiple calls to the same region or do some 
manipulation of the results, an anonymous class may be more useful:

  HTable table = new HTable("mytable");
  List<Row> rows = ...rows for regions to query...
  // combine row count and kv count for region
  Map<byte[],Pair<Long,Long>> results = table.exec(CountProtocol.class, rows,
      new Batch.Call<CountProtocol,Pair<Long,Long>>() {
        public Pair<Long,Long> call(CountProtocol counter) {
          return new Pair(counter.getRowCount(), counter.getKeyValueCount());
        }
      });

  // or return the average number of KVs per row
  Map<byte[],Double> results = table.exec(CountProtocol.class, rows,
      new Batch.Call<CountProtocol,Double>() {
        public Double call(CountProtocol counter) {
          return ((double)counter.getKeyValueCount()) / 
((double)counter.getRowCount());
        }
      });


The anonymous class does make for more annoying boilerplate in Java, but I 
think the usage of the CountProtocol interface is actually more 
straightforward.  And since the user code is executed together per-region, 
there's no need to manually stitch together region results from multiple method 
calls.  So despite the additional brace-noise, I think the proxied approach 
allow for a lot more client flexibility and cleaner code in terms of the calls 
and manipulations going on.

For some other examples see the 
org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol test case, or the 
TestCommandTarget test case in the HBASE-2001 patch up for review.


Summary
-------

This is really two separate patches in one, though with some overlapping 
changes.  If necessary I can split them apart for separate review.  Please let 
me know if that would make review easier.

Part 1:
==============
Port over of HADOOP-6422 to the HBase RPC code.  The goal of this change is to 
allow alternate RPC client/server implementations to be enabled through a 
simple configuration change.  Ultimately I would like to use this to allow 
secure RPC to be enabled through configuration, while not blocking normal 
(current) RPC operation on non-secure Hadoop versions.

This portion of the patch abstracts out two interfaces from the RPC code:

RpcEngine: HBaseRPC uses this to obtain proxy instances for client calls and 
server instances for HMaster and HRegionServer
RpcServer: this allows differing RPC server implementations, breaking the 
dependency on HBaseServer

The bulk of the current code from HBaseRPC is moved into WritableRpcEngine and 
is unchanged other than the interface requirements.  So the current call path 
remains the same, other than the HBaseRPC.getProtocolEngine() abstraction.


Part 2:
===============
The remaining changes provide server-side hooks for registering new RPC 
protocols/handlers (per-region to support coprocessors), and client side hooks 
to support dynamic execution of the registered protocols.

The new RPC protocol actions are constrained to 
org.apache.hadoop.hbase.ipc.CoprocessorProtocol implementations (which extends 
VersionedProtocol) to prevent arbitrary execution of methods against 
HMasterInterface, HRegionInterface, etc.

For protocol handler registration, HRegionServer provides a new method:

  public <T extends CoprocessorProtocol> boolean registerProtocol(
      byte[] region, Class<T> protocol, T handler)

which builds a Map of region name to protocol instances for dispatching client 
calls.


Client invocations are performed through HTable, which adds the following 
methods:


  public <T extends CoprocessorProtocol> T proxy(Class<T> protocol, Row row)

This directly returns a proxy instance to the CoprocessorProtocol 
implementation registered for the region serving row "row".  Any method calls 
will be proxied to the region's server and invoked using the map of registered 
region name -> handler instances.

Calls directed against multiple rows are a bit more complicated.  They are 
supported with the methods:

  public <T extends CoprocessorProtocol, R> void exec(
      Class<T> protocol, List<? extends Row> rows,
      BatchCall<T,R> callable, BatchCallback<R> callback)

  public <T extends CoprocessorProtocol, R> void exec(
      Class<T> protocol, RowRange range,
      BatchCall<T,R> callable, BatchCallback<R> callback)

where BatchCall and BatchCallback are simple interfaces defining the methods to 
be called and a callback instance to be invoked for each result.

For the sample CoprocessorProtocol interface:

  interface PingProtocol extends CoprocessorProtocol {
    public String ping();
    public String hello(String name);
  }

a client invocation might look like:

    final Map<byte[],R> results = new TreeMap<byte[],R>(...)
    List<Row> rows = ...
    table.exec(PingProtocol.class, rows,
        new HTable.BatchCall<PingProtocol,String>() {
          public String call(PingProtocol instance) {
            return instance.ping();
          }
        },
        new BatchCallback<R>(){
          public void update(byte[] region, byte[] row, R value) {
            results.put(region, value);
          }
        });

The BatchCall.call() method will be invoked for each row in the passed in list, 
and the BatchCallback.update() method will be invoked for each return value.  
However, currently the PingProtocol.ping() invocation will result in a separate 
RPC call per row, which is less that ideal.

Support is in place to make use of the HRegionServer.multi() invocations for 
batched RPC (see the org.apache.hadoop.hbase.client.Exec class), but this does 
not mesh well with the current client-side interface.

In addition to standard code review, I'd appreciate any thoughts on the client 
interactions in particular, and whether they would meet some of the anticipated 
uses of coprocessors.


This addresses bugs HBASE-2002 and HBASE-2321.
    http://issues.apache.org/jira/browse/HBASE-2002
    http://issues.apache.org/jira/browse/HBASE-2321


Diffs (updated)
-----

  src/main/java/org/apache/hadoop/hbase/client/Action.java 556ea81 
  src/main/java/org/apache/hadoop/hbase/client/Batch.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/client/Exec.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/client/ExecResult.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/client/HConnection.java 65f7618 
  src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java fbdec0b 
  src/main/java/org/apache/hadoop/hbase/client/HTable.java 0dbf263 
  src/main/java/org/apache/hadoop/hbase/client/MultiAction.java c6ea838 
  src/main/java/org/apache/hadoop/hbase/client/MultiResponse.java 91bd04b 
  src/main/java/org/apache/hadoop/hbase/client/RowRange.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/client/Scan.java 29b3cb0 
  src/main/java/org/apache/hadoop/hbase/io/HbaseObjectWritable.java 83f623d 
  src/main/java/org/apache/hadoop/hbase/ipc/ConnectionHeader.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/ipc/CoprocessorProtocol.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/ipc/ExecRPCInvoker.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java 2b5eeb6 
  src/main/java/org/apache/hadoop/hbase/ipc/HBaseRPC.java e23a629 
  src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java e4c356d 
  src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java a4810a6 
  src/main/java/org/apache/hadoop/hbase/ipc/Invocation.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/ipc/RpcEngine.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/ipc/RpcServer.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/ipc/WritableRpcEngine.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/master/HMaster.java 36ba5c1 
  src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1be9cf5 
  src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java e9d7751 
  src/main/resources/hbase-default.xml 5452fd1 
  
src/test/java/org/apache/hadoop/hbase/regionserver/TestServerCustomProtocol.java
 PRE-CREATION 

Diff: http://review.cloudera.org/r/816/diff


Testing
-------


Thanks,

Gary

Reply via email to