[ 
https://issues.apache.org/jira/browse/HBASE-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784016#action_12784016
 ] 

Andrew Purtell commented on HBASE-2001:
---------------------------------------

{quote}
"Regions contain references to the coprocessor implementation classes 
associated with them."
Q: On above, its indeed the classes, not objects?  Objects can cross the split? 
 Not easily anyways.
{quote}

When regions are split, new coprocessor object instances would be allocated on 
the daughters -- one instance for each of the coprocessor classes listed in the 
region metadata -- when they are opening and the coprocessor's onOpen method is 
invoked to give it a chance to initialize. Prior to this the parent would be 
informed of the impending split via an onSplit invocation, and when it closes 
its onClose method would be called so it can clean up. How to manage the split 
beyond this would be the problem of the coprocessor. 

{quote}
Do we need both closing and pendingClose? [...]
{quote}

I found that state transition in the master code and copied it verbatim from a 
comment block. Actually coprocessors only go through three states: opening, 
open, closing. 

{quote}
Why no control over flush?  Maybe it would want to hold up a flush?  You think 
that too dangerous?
{quote}

I do think that is too dangerous. 

{quote}
Rather, should we do the java Events model where one method gets all event 
types, the passed in object says that the event is.  In the method, first thing 
you check if its an event you are interested in?  Makes things easier to 
implement especially if you are only implementing part of the functionality.  
This model may not make sense though for this context or may be overkill (See 
java.util.EventObject and some of its implementations).
{quote}

I thought about that and go back and forth. Explicit interface is also 
self-documenting while arcane gotchas can hide in event specific detail. 
There's also the notion of using ASM to weave in policy enforcement. That could 
be easier if each callback is its own well defined method. On the other hand 
there's a lot of foo() { super(); } crap for each callback that a coprocessor 
does not care about. My current thinking is the later does not outweigh the 
former. 

By the way, I am thinking about using ASM to weave in CPU and memory accounting 
and limit enforcement as a generic code safety policy regardless.

{quote}
Will Coprocessors make for lots of new object instantiations?  Its going to be 
invoked on each Get and Scan.
{quote}

Not unless the coprocessor does it. 

{quote}
The logging interface seems odd.  Why have new define?  Why not just use apache 
logging?
{quote}

The idea is no I/O outside of the interface is allowed. There will be an 
additional verification step at classload time, implemented with ASM, that 
checks against a whitelist. Making the whitelist to the extent possible a 
single interface is a simplifying choice.

{quote}
Should we be extracting an Interface from Region so we can have a Region 
implemetention and so your Coprocessor can have an implementation too?  We sort 
of did something like with the "Incommon" interface we have for testing that 
has allows for implementations that run the same tests only now against the 
Region and then against the client-side.  Extracting a 'official' Region 
interface sounds grand to me... would help with testing?
{quote}

That's a good idea. Should be a separate issue? 

{quote}
How does the PrivateStore persist?  Where?  What you thinking?
{quote}

One PrivateStore for each coprocessor would persist as an HFile+log in the 
region's store. Would be cloned into daughters on split. Would get periodic 
compaction whenever the store is compacted. The general idea is to do something 
less than manage a real table in a way that hooks in naturally with store 
management. I gave it a table interface but it could be just a bag of KVs if 
supporting multiple column families in a single HFile+log is too much trouble. 



> Coprocessors: Colocate arbitrary code with regions
> --------------------------------------------------
>
>                 Key: HBASE-2001
>                 URL: https://issues.apache.org/jira/browse/HBASE-2001
>             Project: Hadoop HBase
>          Issue Type: Sub-task
>            Reporter: Andrew Purtell
>            Assignee: Andrew Purtell
>         Attachments: asm-3.2-bin.zip, asm-transformations.pdf, 
> org.apache.hadoop.hbase.HCoprocessor.java, 
> org.apache.hadoop.hbase.HCoprocessor.pdf
>
>
> "Support arbitrary code that runs run next to each region in table. As 
> regions split and move, coprocessor code should automatically  move also."
> Use classloader which looks on HDFS.
> Associate a list of classes to load with each table. Put this in HRI so it 
> inherits from table but can be changed on a per region basis (so then those 
> region specific changes can inherited by daughters). 
> Not completely arbitrary code, should require implementation of an interface 
> with callbacks for:
> * Open
> * Close
> * Split
> * Compact
> * (Multi)get and scanner next()
> * (Multi)put
> * (Multi)delete
> Add method to HRegionInterface for invoking coprocessor methods and 
> retrieving results.  
> Add methods in o.a.h.h.regionserver or subpackage which implement convenience 
> functions for coprocessor methods and consistent/controlled access to 
> internals: store access, threading, persistent and ephemeral state, scratch 
> storage, etc. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to