[ 
https://issues.apache.org/jira/browse/HBASE-9291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13860763#comment-13860763
 ] 

James Taylor commented on HBASE-9291:
-------------------------------------

For the sending of the cache for Put operations, there needs to be a guarantee 
that the Region Server has the data being cached prior to any calling of the 
coprocessor hooks on the server-side. If this data is added to the first Put 
for each region server, is there any guarantee that one of the other regions 
isn't processed first (since these are sent in parallel from the client)?

I think the join/scan scenarios may be more complicated, as Phoenix does it's 
own parallelization of the scan by breaking it up into row key ranges. From the 
POV of the HBase client, these look like separate scans. I think we're stuck 
establishing the region server cache ourselves for this case.

Given the flexibility of region observer coprocessors, I'm sure we can work out 
a way to send the cache through these rather than an endpoint coprocessor. For 
example, we can just issue a Get with a single key per region server to get the 
data over. FWIW, in the case of the data not being on the region server as 
expected, we'll end up throwing and the client will retry.

As far as HBASE-6505, we couldn't take advantage of it since it only allows the 
shared state to be shared between the same coprocessor. In this case, we have a 
different one that sends the data to cache versus the ones that use the data 
(our Put and Scan region observer coprocessors).

> Enable client to setAttribute that is sent once to each region server
> ---------------------------------------------------------------------
>
>                 Key: HBASE-9291
>                 URL: https://issues.apache.org/jira/browse/HBASE-9291
>             Project: HBase
>          Issue Type: New Feature
>          Components: IPC/RPC
>            Reporter: James Taylor
>
> Currently a Scan and Mutation allow the client to set its own attributes that 
> get passed through the RPC layer and are accessible from a coprocessor. This 
> is very handy, but breaks down if the amount of information is large, since 
> this information ends up being sent again and again to every region. Clients 
> can work around this with an endpoint "pre" and "post" coprocessor invocation 
> that:
> 1) sends the information and caches it on the region server in the "pre" 
> invocation
> 2) invokes the Scan or sends the batch of Mutations, and then
> 3) removes it in the "post" invocation.
> In this case, the client is forced to identify all region servers (ideally, 
> all region servers that will be involved in the Scan/Mutation), make extra 
> RPC calls, manage the caching of the information on the region server, 
> age-out the information (in case the client dies before step (3) that clears 
> the cached information), and must deal with the possibility of a split 
> occurring while this operation is in-progress.
> Instead, it'd be much better if an attribute could be identified as a "region 
> server" attribute in OperationWithAttributes and the HBase RPC layer would 
> take care of doing the above.
> The use case where the above are necessary in Phoenix include:
> 1) Hash joins, where the results of the smaller side of a join scan are 
> packaged up and sent to each region server, and
> 2) Secondary indexing, where the metadata of knowing a) which column 
> family/column qualifier pairs and b) which part of the row key contributes to 
> which indexes are sent to each region server that will process a batched put.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to