I think it's tricky, because we dont really encourage people to think
of regions, but think of rows instead.  The fact that regions exist is
a bit of an implementation detail, although like indexes in databases
a critical and crucial one that we cant really ignore in the end.
Requiring people to specify regions by start/stop row but without
giving them a lot of support to discover and identify regions might
not be such a good idea.  Especially if we allow people to specify
arbitrary row ranges without regards to regions, and the rest is 'just
implementation'... it might be better to stick to something like a
pair of row keys or something.

On Thu, Sep 30, 2010 at 5:43 PM, Gary Helmling <ghelml...@gmail.com> wrote:
>
>
>> On 2010-09-25 16:18:25, Andrew Purtell wrote:
>> > src/main/java/org/apache/hadoop/hbase/client/HTable.java, line 1400
>> > <http://review.cloudera.org/r/816/diff/5/?file=12533#file12533line1400>
>> >
>> >     Maybe for sake of clarity call this getStartKeysInRange?
>>
>> Gary Helmling wrote:
>>     Makes sense as well, will rename.
>>
>> Himanshu Vashishtha wrote:
>>     Gary: I meant this use case (might be irrelevant, considering scope of 
>> cp):
>>     If I want to know count of number of rows in a range (Row A, Row B): 
>> with the help of this method, I can get the starting row of all regions that 
>> are in this range but for the first (where you take the starting row of the 
>> range: Row A in this case).
>>     So, when the processing is done in the last region, one shd be aware of 
>> the last row in the range, Row B. This processing will be done in the cp 
>> impl, but that impl will be same on all region servers, so that check will 
>> be there for all regions (no?).
>>     It is entirely possible  that this use case is not the one to be 
>> supported by cp; or as I haven't really looked at mlai's code thoroughly 
>> yet, might be missing something obvious.
>
> A key point to understand in the HTable.exec() calls is that List<Row> and 
> RowRange arguments are _only_ used to locate the regions against which we'll 
> invoke the CoprocessorProtocol method.  Since coprocessors run in place 
> per-region, we need to somehow indicate the region/coprocessor instances 
> where the method should be invoked.
>
> Note that the List<Row> or RowRange that were passed to HTable.exec() are 
> _not_ made directly available to the CoprocessorProtocol method invoked, and 
> couldn't easily be passed without a different approach to the framework.
>
> Of course, if the CoprocessorProtocol method needs a certain row restriction 
> to operate, you could just make it a parameter to the method -- sum(RowRange) 
> for example.  But that is up to the CoprocessorProtocol implementor.
>
> But I think this raises a good question: should the HTable interface use rows 
> to identify the regions (the current methods)
>   exec(Class protocol, List<Row> rows, Call method)
>   exec(Class protocol, RowRange range, Call method)
>
> or would it be better to identify the regions directly using region name, or 
> HRegionInfo, etc
>   exec(Class protocol, List<byte[]> regionNames, Call method)
>
>
> Anyone have strong opinions here?  My thought was that using rows was a bit 
> more consistent with other client calls, but maybe it raises the wrong 
> expectations.  For the moment I'll re-examine the javadoc to see if I can 
> make this clearer, but I'd appreciate other thoughts.
>
>
> - Gary
>
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://review.cloudera.org/r/816/#review1336
> -----------------------------------------------------------
>
>
> On 2010-09-30 17:13:36, Gary Helmling wrote:
>>
>> -----------------------------------------------------------
>> This is an automatically generated e-mail. To reply, visit:
>> http://review.cloudera.org/r/816/
>> -----------------------------------------------------------
>>
>> (Updated 2010-09-30 17:13:36)
>>
>>
>> Review request for hbase, Andrew Purtell and Jonathan Gray.
>>
>>
>> Summary
>> -------
>>
>> This is really two separate patches in one, though with some overlapping 
>> changes.  If necessary I can split them apart for separate review.  Please 
>> let me know if that would make review easier.
>>
>> Part 1:
>> ==============
>> Port over of HADOOP-6422 to the HBase RPC code.  The goal of this change is 
>> to allow alternate RPC client/server implementations to be enabled through a 
>> simple configuration change.  Ultimately I would like to use this to allow 
>> secure RPC to be enabled through configuration, while not blocking normal 
>> (current) RPC operation on non-secure Hadoop versions.
>>
>> This portion of the patch abstracts out two interfaces from the RPC code:
>>
>> RpcEngine: HBaseRPC uses this to obtain proxy instances for client calls and 
>> server instances for HMaster and HRegionServer
>> RpcServer: this allows differing RPC server implementations, breaking the 
>> dependency on HBaseServer
>>
>> The bulk of the current code from HBaseRPC is moved into WritableRpcEngine 
>> and is unchanged other than the interface requirements.  So the current call 
>> path remains the same, other than the HBaseRPC.getProtocolEngine() 
>> abstraction.
>>
>>
>> Part 2:
>> ===============
>> The remaining changes provide server-side hooks for registering new RPC 
>> protocols/handlers (per-region to support coprocessors), and client side 
>> hooks to support dynamic execution of the registered protocols.
>>
>> The new RPC protocol actions are constrained to 
>> org.apache.hadoop.hbase.ipc.CoprocessorProtocol implementations (which 
>> extends VersionedProtocol) to prevent arbitrary execution of methods against 
>> HMasterInterface, HRegionInterface, etc.
>>
>> For protocol handler registration, HRegionServer provides a new method:
>>
>>   public <T extends CoprocessorProtocol> boolean registerProtocol(
>>       byte[] region, Class<T> protocol, T handler)
>>
>> which builds a Map of region name to protocol instances for dispatching 
>> client calls.
>>
>>
>> Client invocations are performed through HTable, which adds the following 
>> methods:
>>
>>
>>   public <T extends CoprocessorProtocol> T proxy(Class<T> protocol, Row row)
>>
>> This directly returns a proxy instance to the CoprocessorProtocol 
>> implementation registered for the region serving row "row".  Any method 
>> calls will be proxied to the region's server and invoked using the map of 
>> registered region name -> handler instances.
>>
>> Calls directed against multiple rows are a bit more complicated.  They are 
>> supported with the methods:
>>
>>   public <T extends CoprocessorProtocol, R> void exec(
>>       Class<T> protocol, List<? extends Row> rows,
>>       BatchCall<T,R> callable, BatchCallback<R> callback)
>>
>>   public <T extends CoprocessorProtocol, R> void exec(
>>       Class<T> protocol, RowRange range,
>>       BatchCall<T,R> callable, BatchCallback<R> callback)
>>
>> where BatchCall and BatchCallback are simple interfaces defining the methods 
>> to be called and a callback instance to be invoked for each result.
>>
>> For the sample CoprocessorProtocol interface:
>>
>>   interface PingProtocol extends CoprocessorProtocol {
>>     public String ping();
>>     public String hello(String name);
>>   }
>>
>> a client invocation might look like:
>>
>>     final Map<byte[],R> results = new TreeMap<byte[],R>(...)
>>     List<Row> rows = ...
>>     table.exec(PingProtocol.class, rows,
>>         new HTable.BatchCall<PingProtocol,String>() {
>>           public String call(PingProtocol instance) {
>>             return instance.ping();
>>           }
>>         },
>>         new BatchCallback<R>(){
>>           public void update(byte[] region, byte[] row, R value) {
>>             results.put(region, value);
>>           }
>>         });
>>
>> The BatchCall.call() method will be invoked for each row in the passed in 
>> list, and the BatchCallback.update() method will be invoked for each return 
>> value.  However, currently the PingProtocol.ping() invocation will result in 
>> a separate RPC call per row, which is less that ideal.
>>
>> Support is in place to make use of the HRegionServer.multi() invocations for 
>> batched RPC (see the org.apache.hadoop.hbase.client.Exec class), but this 
>> does not mesh well with the current client-side interface.
>>
>> In addition to standard code review, I'd appreciate any thoughts on the 
>> client interactions in particular, and whether they would meet some of the 
>> anticipated uses of coprocessors.
>>
>>
>> This addresses bugs HBASE-2002 and HBASE-2321.
>>     http://issues.apache.org/jira/browse/HBASE-2002
>>     http://issues.apache.org/jira/browse/HBASE-2321
>>
>>
>> Diffs
>> -----
>>
>>   src/main/java/org/apache/hadoop/hbase/client/Action.java 556ea81
>>   src/main/java/org/apache/hadoop/hbase/client/Batch.java PRE-CREATION
>>   src/main/java/org/apache/hadoop/hbase/client/Exec.java PRE-CREATION
>>   src/main/java/org/apache/hadoop/hbase/client/ExecResult.java PRE-CREATION
>>   src/main/java/org/apache/hadoop/hbase/client/HConnection.java 65f7618
>>   src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 
>> fbdec0b
>>   src/main/java/org/apache/hadoop/hbase/client/HTable.java 0dbf263
>>   src/main/java/org/apache/hadoop/hbase/client/MultiAction.java c6ea838
>>   src/main/java/org/apache/hadoop/hbase/client/MultiResponse.java 91bd04b
>>   src/main/java/org/apache/hadoop/hbase/client/RowRange.java PRE-CREATION
>>   src/main/java/org/apache/hadoop/hbase/client/Scan.java 29b3cb0
>>   src/main/java/org/apache/hadoop/hbase/io/HbaseObjectWritable.java 83f623d
>>   src/main/java/org/apache/hadoop/hbase/ipc/ConnectionHeader.java 
>> PRE-CREATION
>>   src/main/java/org/apache/hadoop/hbase/ipc/CoprocessorProtocol.java 
>> PRE-CREATION
>>   src/main/java/org/apache/hadoop/hbase/ipc/ExecRPCInvoker.java PRE-CREATION
>>   src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java 2b5eeb6
>>   src/main/java/org/apache/hadoop/hbase/ipc/HBaseRPC.java e23a629
>>   src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java e4c356d
>>   src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java 27f9cc0
>>   src/main/java/org/apache/hadoop/hbase/ipc/Invocation.java PRE-CREATION
>>   src/main/java/org/apache/hadoop/hbase/ipc/RpcEngine.java PRE-CREATION
>>   src/main/java/org/apache/hadoop/hbase/ipc/RpcServer.java PRE-CREATION
>>   src/main/java/org/apache/hadoop/hbase/ipc/WritableRpcEngine.java 
>> PRE-CREATION
>>   src/main/java/org/apache/hadoop/hbase/master/HMaster.java fb1e834
>>   src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 0a4fbce
>>   src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 
>> 595cf2e
>>   src/main/resources/hbase-default.xml 5fafe65
>>   
>> src/test/java/org/apache/hadoop/hbase/regionserver/TestServerCustomProtocol.java
>>  PRE-CREATION
>>
>> Diff: http://review.cloudera.org/r/816/diff
>>
>>
>> Testing
>> -------
>>
>>
>> Thanks,
>>
>> Gary
>>
>>
>
>

Reply via email to