[jira] Commented: (HBASE-2038) Coprocessors: Region level indexing

Alex Baranau (JIRA) Sun, 21 Nov 2010 10:08:23 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12934308#action_12934308
 ]


Alex Baranau commented on HBASE-2038:
-------------------------------------

Hello,

As first cut of Coprocessors (CP) implementation has been committed to trunk 
(HBASE-2001 and HBASE-2002) I think there's a good opportunity to get going 
with this issue. I believe it's a good time for this effort and hope that 
CP-based implementation of region-level indexing will confirm that CP API is 
complete and has all one might need (for now).

I revised the design/approach of the IHBase contrib and have several questions 
to ask with regard to transforming the code based on CPs. It would be great if 
someone can help me with them!

1) Are coprocessors meant to be stateless? If not, then I assume that one 
instance is created and "assigned" to a region and that CP implementation 
should be thread-safe (e.g. multiple scanners can be handled at the same time 
for the regions). Otherwise, if coprocessors are meant to be stateless, I 
believe that CoprocessorEnvironment's get/put/remove methods are used to store 
intermediate data (aka attributes) between method calls (if we really need it). 
Is CoprocessorEnvironment instance is created one-per-region? I know, e.g. I 
can store some scan-related data using scanId passed to the scan-related 
callbacks (is it safe?), but what about region-related data (no problem with it 
in case cp env is one-per-region)?
In general, do I understand the CP's API correctly (based on assumptions I 
share in this point)?

2) During batch scan (smth which was added in trunk but wasn't supported in 
previous HBase versions, and hence current IHBase implementation doesn't take 
it into account) we need to return multiple rows from scan's next() method. It 
looks like if we apply current approach (from current IHBase implementation) of 
"fast forwarding" to next value we'll only fastforward scan to the *first* 
value of those to return. Others will be fetched using "usual" scan logic 
without using index which isn't efficient. There's not a lot we can do without 
changing scan (and deeper) code. Am I right here? Perhaps it's ok to have a 
lack of support for batch reads for the first version of CP-based IHBase? Or, 
it might me that we should change the approach?

3) Is it in general a good idea to take this initiave (transform IHBase 
implementation to CP-based one) by me? I fear that it might be that due to a 
lot of changes in HBase codebase (trunk versus e.g. 0.20.5) there are going to 
be severe changes in approach/design of indices implementation (from the 
current one, which I could use as a base), so poking you guys (HBase devs) from 
my side *a lot* (if really needed) to learn things about it isn't very 
efficient way to work on this issue :)? Anyways, I'd be glad to work on the 
issue if someone can provide needed guidance.

4) Haven't dug into THBase contrib (as in IHBase). Are these contribs (IHBase 
and THBase) will be "transferred" to CP-based implementation as a single 
effort? I believe they won't be merged based on how differently they act now. 
Was it really meant to put the tasks for *both* into single JIRA issue?

Thank you!

> Coprocessors: Region level indexing
> -----------------------------------
>
>                 Key: HBASE-2038
>                 URL: https://issues.apache.org/jira/browse/HBASE-2038
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: Andrew Purtell
>            Priority: Minor
>
> HBASE-2037 is a good candidate to be done as coprocessor. It also serve as a 
> good goalpost for coprocessor environment design -- there should be enough of 
> it so region level indexing can be reimplemented as a coprocessor without any 
> loss of functionality. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2038) Coprocessors: Region level indexing

Reply via email to