> On 2010-09-25 16:18:25, Andrew Purtell wrote: > > src/main/java/org/apache/hadoop/hbase/client/HTable.java, line 1400 > > <http://review.cloudera.org/r/816/diff/5/?file=12533#file12533line1400> > > > > Maybe for sake of clarity call this getStartKeysInRange? > > Gary Helmling wrote: > Makes sense as well, will rename. > > Himanshu Vashishtha wrote: > Gary: I meant this use case (might be irrelevant, considering scope of > cp): > If I want to know count of number of rows in a range (Row A, Row B): with > the help of this method, I can get the starting row of all regions that are > in this range but for the first (where you take the starting row of the > range: Row A in this case). > So, when the processing is done in the last region, one shd be aware of > the last row in the range, Row B. This processing will be done in the cp > impl, but that impl will be same on all region servers, so that check will be > there for all regions (no?). > It is entirely possible that this use case is not the one to be > supported by cp; or as I haven't really looked at mlai's code thoroughly yet, > might be missing something obvious.
A key point to understand in the HTable.exec() calls is that List<Row> and RowRange arguments are _only_ used to locate the regions against which we'll invoke the CoprocessorProtocol method. Since coprocessors run in place per-region, we need to somehow indicate the region/coprocessor instances where the method should be invoked. Note that the List<Row> or RowRange that were passed to HTable.exec() are _not_ made directly available to the CoprocessorProtocol method invoked, and couldn't easily be passed without a different approach to the framework. Of course, if the CoprocessorProtocol method needs a certain row restriction to operate, you could just make it a parameter to the method -- sum(RowRange) for example. But that is up to the CoprocessorProtocol implementor. But I think this raises a good question: should the HTable interface use rows to identify the regions (the current methods) exec(Class protocol, List<Row> rows, Call method) exec(Class protocol, RowRange range, Call method) or would it be better to identify the regions directly using region name, or HRegionInfo, etc exec(Class protocol, List<byte[]> regionNames, Call method) Anyone have strong opinions here? My thought was that using rows was a bit more consistent with other client calls, but maybe it raises the wrong expectations. For the moment I'll re-examine the javadoc to see if I can make this clearer, but I'd appreciate other thoughts. - Gary ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/816/#review1336 ----------------------------------------------------------- On 2010-09-30 17:13:36, Gary Helmling wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > http://review.cloudera.org/r/816/ > ----------------------------------------------------------- > > (Updated 2010-09-30 17:13:36) > > > Review request for hbase, Andrew Purtell and Jonathan Gray. > > > Summary > ------- > > This is really two separate patches in one, though with some overlapping > changes. If necessary I can split them apart for separate review. Please > let me know if that would make review easier. > > Part 1: > ============== > Port over of HADOOP-6422 to the HBase RPC code. The goal of this change is > to allow alternate RPC client/server implementations to be enabled through a > simple configuration change. Ultimately I would like to use this to allow > secure RPC to be enabled through configuration, while not blocking normal > (current) RPC operation on non-secure Hadoop versions. > > This portion of the patch abstracts out two interfaces from the RPC code: > > RpcEngine: HBaseRPC uses this to obtain proxy instances for client calls and > server instances for HMaster and HRegionServer > RpcServer: this allows differing RPC server implementations, breaking the > dependency on HBaseServer > > The bulk of the current code from HBaseRPC is moved into WritableRpcEngine > and is unchanged other than the interface requirements. So the current call > path remains the same, other than the HBaseRPC.getProtocolEngine() > abstraction. > > > Part 2: > =============== > The remaining changes provide server-side hooks for registering new RPC > protocols/handlers (per-region to support coprocessors), and client side > hooks to support dynamic execution of the registered protocols. > > The new RPC protocol actions are constrained to > org.apache.hadoop.hbase.ipc.CoprocessorProtocol implementations (which > extends VersionedProtocol) to prevent arbitrary execution of methods against > HMasterInterface, HRegionInterface, etc. > > For protocol handler registration, HRegionServer provides a new method: > > public <T extends CoprocessorProtocol> boolean registerProtocol( > byte[] region, Class<T> protocol, T handler) > > which builds a Map of region name to protocol instances for dispatching > client calls. > > > Client invocations are performed through HTable, which adds the following > methods: > > > public <T extends CoprocessorProtocol> T proxy(Class<T> protocol, Row row) > > This directly returns a proxy instance to the CoprocessorProtocol > implementation registered for the region serving row "row". Any method calls > will be proxied to the region's server and invoked using the map of > registered region name -> handler instances. > > Calls directed against multiple rows are a bit more complicated. They are > supported with the methods: > > public <T extends CoprocessorProtocol, R> void exec( > Class<T> protocol, List<? extends Row> rows, > BatchCall<T,R> callable, BatchCallback<R> callback) > > public <T extends CoprocessorProtocol, R> void exec( > Class<T> protocol, RowRange range, > BatchCall<T,R> callable, BatchCallback<R> callback) > > where BatchCall and BatchCallback are simple interfaces defining the methods > to be called and a callback instance to be invoked for each result. > > For the sample CoprocessorProtocol interface: > > interface PingProtocol extends CoprocessorProtocol { > public String ping(); > public String hello(String name); > } > > a client invocation might look like: > > final Map<byte[],R> results = new TreeMap<byte[],R>(...) > List<Row> rows = ... > table.exec(PingProtocol.class, rows, > new HTable.BatchCall<PingProtocol,String>() { > public String call(PingProtocol instance) { > return instance.ping(); > } > }, > new BatchCallback<R>(){ > public void update(byte[] region, byte[] row, R value) { > results.put(region, value); > } > }); > > The BatchCall.call() method will be invoked for each row in the passed in > list, and the BatchCallback.update() method will be invoked for each return > value. However, currently the PingProtocol.ping() invocation will result in > a separate RPC call per row, which is less that ideal. > > Support is in place to make use of the HRegionServer.multi() invocations for > batched RPC (see the org.apache.hadoop.hbase.client.Exec class), but this > does not mesh well with the current client-side interface. > > In addition to standard code review, I'd appreciate any thoughts on the > client interactions in particular, and whether they would meet some of the > anticipated uses of coprocessors. > > > This addresses bugs HBASE-2002 and HBASE-2321. > http://issues.apache.org/jira/browse/HBASE-2002 > http://issues.apache.org/jira/browse/HBASE-2321 > > > Diffs > ----- > > src/main/java/org/apache/hadoop/hbase/client/Action.java 556ea81 > src/main/java/org/apache/hadoop/hbase/client/Batch.java PRE-CREATION > src/main/java/org/apache/hadoop/hbase/client/Exec.java PRE-CREATION > src/main/java/org/apache/hadoop/hbase/client/ExecResult.java PRE-CREATION > src/main/java/org/apache/hadoop/hbase/client/HConnection.java 65f7618 > src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java > fbdec0b > src/main/java/org/apache/hadoop/hbase/client/HTable.java 0dbf263 > src/main/java/org/apache/hadoop/hbase/client/MultiAction.java c6ea838 > src/main/java/org/apache/hadoop/hbase/client/MultiResponse.java 91bd04b > src/main/java/org/apache/hadoop/hbase/client/RowRange.java PRE-CREATION > src/main/java/org/apache/hadoop/hbase/client/Scan.java 29b3cb0 > src/main/java/org/apache/hadoop/hbase/io/HbaseObjectWritable.java 83f623d > src/main/java/org/apache/hadoop/hbase/ipc/ConnectionHeader.java > PRE-CREATION > src/main/java/org/apache/hadoop/hbase/ipc/CoprocessorProtocol.java > PRE-CREATION > src/main/java/org/apache/hadoop/hbase/ipc/ExecRPCInvoker.java PRE-CREATION > src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java 2b5eeb6 > src/main/java/org/apache/hadoop/hbase/ipc/HBaseRPC.java e23a629 > src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java e4c356d > src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java 27f9cc0 > src/main/java/org/apache/hadoop/hbase/ipc/Invocation.java PRE-CREATION > src/main/java/org/apache/hadoop/hbase/ipc/RpcEngine.java PRE-CREATION > src/main/java/org/apache/hadoop/hbase/ipc/RpcServer.java PRE-CREATION > src/main/java/org/apache/hadoop/hbase/ipc/WritableRpcEngine.java > PRE-CREATION > src/main/java/org/apache/hadoop/hbase/master/HMaster.java fb1e834 > src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 0a4fbce > src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java > 595cf2e > src/main/resources/hbase-default.xml 5fafe65 > > src/test/java/org/apache/hadoop/hbase/regionserver/TestServerCustomProtocol.java > PRE-CREATION > > Diff: http://review.cloudera.org/r/816/diff > > > Testing > ------- > > > Thanks, > > Gary > >