[ 
https://issues.apache.org/jira/browse/HBASE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-3220:
----------------------------------

    Issue Type: Brainstorming  (was: Sub-task)
        Parent:     (was: HBASE-2000)

> Coprocessors: Streaming distributed computation framework
> ---------------------------------------------------------
>
>                 Key: HBASE-3220
>                 URL: https://issues.apache.org/jira/browse/HBASE-3220
>             Project: HBase
>          Issue Type: Brainstorming
>          Components: coprocessors
>            Reporter: Andrew Purtell
>
> Consider a computational framework based on a stream processing model. 
> Logically: Generators emit keys (row keys, or full keys with 
> row+column:qualifier), fetch operators join keys to data fetched from the 
> region, filters drop according to (perhaps complex) matching on the keys 
> and/or values, combiners perform aggregation, mutators change values, 
> decorators add data, sinks do something useful with items arriving from the 
> stream, i.e. insert into response buffer, commit to region, replicate to 
> peer. Pipelines execute in parallel. Partitioners can split streams for 
> mulltithreading. Generators can be observers on a region for anchoring a 
> continuous process or an iterator as the first stage of a pipeline 
> constructed on demand with a terminating condition (like a Hadoop task). Kind 
> of like Cascading within regionserver processes, a nice model if not 
> literally Cascading the implementation. MapReduce can be supported with this 
> model, is a subset of it. Data can be ordered or unordered, depends on the 
> generator. Filters could be stateful or stateless: stateless filters could 
> handle data arriving in any order; stateful filters could be used with an 
> ordered generator.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply via email to