[jira] Updated: (HBASE-2001) Coprocessors: Colocate user code with regions

Andrew Purtell (JIRA) Sun, 07 Feb 2010 21:45:56 -0800

     [ 
https://issues.apache.org/jira/browse/HBASE-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Andrew Purtell updated HBASE-2001:
----------------------------------

    Attachment: HBASE-2001.patch.gz

Latest patch contains simple working unit tests for basic Coprocessor hooks and 
also RegionObserver interface hooks. 

Also, the initial implementation of an in-process MapReduce framework. 
Coprocessors can optionally implement a 'MapReduce' interface which clients 
will at some point be able to invoke concurrently on all regions of the table 
within the HRS processes. (Server side needs unit tests and testing; no client 
side yet.) Note this is not MapReduce on the table; this is MapReduce on each 
region, concurrently.

In-process MapReduce is multithreaded. Concurrency of mappers and reducers is 
specified separately. Map jobs are submitted with a Scan object which defines 
the scope and any filters for a scanner which feeds mappers. Mappers can emit 
intermediate KeyValues to a collector for reduction or can get references to 
objects in the coprocessor's environment and perform operations on them, e.g. 
increment an AtomicLong, etc. Reducers will get KeyValues from map phase output 
ordered and grouped by key. Reducers also have access to objects in the 
coprocessor environment. Therefore one can implement MapReduce in a manner very 
similar to Hadoop's MR framework, or e.g. aggregating functions can use shared 
variables to avoid the overhead of generating (and processing) a lot of 
intermediates.

An in-process MapReduce job can be configured to auto commit. If so, KeyValues 
written to the reduce collector by reducers will be automatically committed 
back to the region after all reducers have completed execution. Up until all 
mappers and reducers successfully complete execution no values are committed to 
the region. Then, we try really hard to commit them all. 

KeyValues emitted by reducers must have a row key that falls within the bounds 
of the region if the job is auto committing. Otherwise, the output can be 
arbitrary.

If a job is not auto committing, when it completes clients have access to the 
KeyValues output by the reducer via a scanner like interface. 

The in-process MapReduce framework uses leases. A job is only alive as long as 
it has a lease. Its output KeyValues are only available as long as it has a 
lease. So for long running jobs the client must periodically poll status to 
keep it alive, and then retrieval by "scanner" will also renew the lease. A 
lease cannot expire during auto commit. 


> Coprocessors: Colocate user code with regions
> ---------------------------------------------
>
>                 Key: HBASE-2001
>                 URL: https://issues.apache.org/jira/browse/HBASE-2001
>             Project: Hadoop HBase
>          Issue Type: Sub-task
>            Reporter: Andrew Purtell
>            Assignee: Andrew Purtell
>         Attachments: asm-3.2-bin.zip, asm-transformations.pdf, 
> HBASE-2001.patch.gz
>
>
> Support user code that runs run next to each region in table. As regions 
> split and move, coprocessor code should automatically  move also.
> Use classloader which looks on HDFS.
> Associate a list of classes to load with each table. Put this in HRI so it 
> inherits from table but can be changed on a per region basis (so then those 
> region specific changes can inherited by daughters). 
> Not completely arbitrary code, should require implementation of an interface 
> with callbacks for:
> * Open
> * Close
> * Split
> * Compact
> * (Multi)get and scanner next()
> * (Multi)put
> * (Multi)delete
> Add method to HRegionInterface for invoking coprocessor methods and 
> retrieving results.  
> Add methods in o.a.h.h.regionserver or subpackage which implement convenience 
> functions for coprocessor methods and consistent/controlled access to 
> internals: store access, threading, persistent and ephemeral state, scratch 
> storage, etc. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-2001) Coprocessors: Colocate user code with regions

Reply via email to