[
https://issues.apache.org/jira/browse/HBASE-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784016#action_12784016
]
Andrew Purtell commented on HBASE-2001:
---------------------------------------
{quote}
"Regions contain references to the coprocessor implementation classes
associated with them."
Q: On above, its indeed the classes, not objects? Objects can cross the split?
Not easily anyways.
{quote}
When regions are split, new coprocessor object instances would be allocated on
the daughters -- one instance for each of the coprocessor classes listed in the
region metadata -- when they are opening and the coprocessor's onOpen method is
invoked to give it a chance to initialize. Prior to this the parent would be
informed of the impending split via an onSplit invocation, and when it closes
its onClose method would be called so it can clean up. How to manage the split
beyond this would be the problem of the coprocessor.
{quote}
Do we need both closing and pendingClose? [...]
{quote}
I found that state transition in the master code and copied it verbatim from a
comment block. Actually coprocessors only go through three states: opening,
open, closing.
{quote}
Why no control over flush? Maybe it would want to hold up a flush? You think
that too dangerous?
{quote}
I do think that is too dangerous.
{quote}
Rather, should we do the java Events model where one method gets all event
types, the passed in object says that the event is. In the method, first thing
you check if its an event you are interested in? Makes things easier to
implement especially if you are only implementing part of the functionality.
This model may not make sense though for this context or may be overkill (See
java.util.EventObject and some of its implementations).
{quote}
I thought about that and go back and forth. Explicit interface is also
self-documenting while arcane gotchas can hide in event specific detail.
There's also the notion of using ASM to weave in policy enforcement. That could
be easier if each callback is its own well defined method. On the other hand
there's a lot of foo() { super(); } crap for each callback that a coprocessor
does not care about. My current thinking is the later does not outweigh the
former.
By the way, I am thinking about using ASM to weave in CPU and memory accounting
and limit enforcement as a generic code safety policy regardless.
{quote}
Will Coprocessors make for lots of new object instantiations? Its going to be
invoked on each Get and Scan.
{quote}
Not unless the coprocessor does it.
{quote}
The logging interface seems odd. Why have new define? Why not just use apache
logging?
{quote}
The idea is no I/O outside of the interface is allowed. There will be an
additional verification step at classload time, implemented with ASM, that
checks against a whitelist. Making the whitelist to the extent possible a
single interface is a simplifying choice.
{quote}
Should we be extracting an Interface from Region so we can have a Region
implemetention and so your Coprocessor can have an implementation too? We sort
of did something like with the "Incommon" interface we have for testing that
has allows for implementations that run the same tests only now against the
Region and then against the client-side. Extracting a 'official' Region
interface sounds grand to me... would help with testing?
{quote}
That's a good idea. Should be a separate issue?
{quote}
How does the PrivateStore persist? Where? What you thinking?
{quote}
One PrivateStore for each coprocessor would persist as an HFile+log in the
region's store. Would be cloned into daughters on split. Would get periodic
compaction whenever the store is compacted. The general idea is to do something
less than manage a real table in a way that hooks in naturally with store
management. I gave it a table interface but it could be just a bag of KVs if
supporting multiple column families in a single HFile+log is too much trouble.
> Coprocessors: Colocate arbitrary code with regions
> --------------------------------------------------
>
> Key: HBASE-2001
> URL: https://issues.apache.org/jira/browse/HBASE-2001
> Project: Hadoop HBase
> Issue Type: Sub-task
> Reporter: Andrew Purtell
> Assignee: Andrew Purtell
> Attachments: asm-3.2-bin.zip, asm-transformations.pdf,
> org.apache.hadoop.hbase.HCoprocessor.java,
> org.apache.hadoop.hbase.HCoprocessor.pdf
>
>
> "Support arbitrary code that runs run next to each region in table. As
> regions split and move, coprocessor code should automatically move also."
> Use classloader which looks on HDFS.
> Associate a list of classes to load with each table. Put this in HRI so it
> inherits from table but can be changed on a per region basis (so then those
> region specific changes can inherited by daughters).
> Not completely arbitrary code, should require implementation of an interface
> with callbacks for:
> * Open
> * Close
> * Split
> * Compact
> * (Multi)get and scanner next()
> * (Multi)put
> * (Multi)delete
> Add method to HRegionInterface for invoking coprocessor methods and
> retrieving results.
> Add methods in o.a.h.h.regionserver or subpackage which implement convenience
> functions for coprocessor methods and consistent/controlled access to
> internals: store access, threading, persistent and ephemeral state, scratch
> storage, etc.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.