[
https://issues.apache.org/jira/browse/HBASE-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12934308#action_12934308
]
Alex Baranau commented on HBASE-2038:
-------------------------------------
Hello,
As first cut of Coprocessors (CP) implementation has been committed to trunk
(HBASE-2001 and HBASE-2002) I think there's a good opportunity to get going
with this issue. I believe it's a good time for this effort and hope that
CP-based implementation of region-level indexing will confirm that CP API is
complete and has all one might need (for now).
I revised the design/approach of the IHBase contrib and have several questions
to ask with regard to transforming the code based on CPs. It would be great if
someone can help me with them!
1) Are coprocessors meant to be stateless? If not, then I assume that one
instance is created and "assigned" to a region and that CP implementation
should be thread-safe (e.g. multiple scanners can be handled at the same time
for the regions). Otherwise, if coprocessors are meant to be stateless, I
believe that CoprocessorEnvironment's get/put/remove methods are used to store
intermediate data (aka attributes) between method calls (if we really need it).
Is CoprocessorEnvironment instance is created one-per-region? I know, e.g. I
can store some scan-related data using scanId passed to the scan-related
callbacks (is it safe?), but what about region-related data (no problem with it
in case cp env is one-per-region)?
In general, do I understand the CP's API correctly (based on assumptions I
share in this point)?
2) During batch scan (smth which was added in trunk but wasn't supported in
previous HBase versions, and hence current IHBase implementation doesn't take
it into account) we need to return multiple rows from scan's next() method. It
looks like if we apply current approach (from current IHBase implementation) of
"fast forwarding" to next value we'll only fastforward scan to the *first*
value of those to return. Others will be fetched using "usual" scan logic
without using index which isn't efficient. There's not a lot we can do without
changing scan (and deeper) code. Am I right here? Perhaps it's ok to have a
lack of support for batch reads for the first version of CP-based IHBase? Or,
it might me that we should change the approach?
3) Is it in general a good idea to take this initiave (transform IHBase
implementation to CP-based one) by me? I fear that it might be that due to a
lot of changes in HBase codebase (trunk versus e.g. 0.20.5) there are going to
be severe changes in approach/design of indices implementation (from the
current one, which I could use as a base), so poking you guys (HBase devs) from
my side *a lot* (if really needed) to learn things about it isn't very
efficient way to work on this issue :)? Anyways, I'd be glad to work on the
issue if someone can provide needed guidance.
4) Haven't dug into THBase contrib (as in IHBase). Are these contribs (IHBase
and THBase) will be "transferred" to CP-based implementation as a single
effort? I believe they won't be merged based on how differently they act now.
Was it really meant to put the tasks for *both* into single JIRA issue?
Thank you!
> Coprocessors: Region level indexing
> -----------------------------------
>
> Key: HBASE-2038
> URL: https://issues.apache.org/jira/browse/HBASE-2038
> Project: HBase
> Issue Type: Sub-task
> Reporter: Andrew Purtell
> Priority: Minor
>
> HBASE-2037 is a good candidate to be done as coprocessor. It also serve as a
> good goalpost for coprocessor environment design -- there should be enough of
> it so region level indexing can be reimplemented as a coprocessor without any
> loss of functionality.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.