[
https://issues.apache.org/jira/browse/HBASE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrew Purtell updated HBASE-3220:
----------------------------------
Issue Type: Brainstorming (was: Sub-task)
Parent: (was: HBASE-2000)
> Coprocessors: Streaming distributed computation framework
> ---------------------------------------------------------
>
> Key: HBASE-3220
> URL: https://issues.apache.org/jira/browse/HBASE-3220
> Project: HBase
> Issue Type: Brainstorming
> Components: coprocessors
> Reporter: Andrew Purtell
>
> Consider a computational framework based on a stream processing model.
> Logically: Generators emit keys (row keys, or full keys with
> row+column:qualifier), fetch operators join keys to data fetched from the
> region, filters drop according to (perhaps complex) matching on the keys
> and/or values, combiners perform aggregation, mutators change values,
> decorators add data, sinks do something useful with items arriving from the
> stream, i.e. insert into response buffer, commit to region, replicate to
> peer. Pipelines execute in parallel. Partitioners can split streams for
> mulltithreading. Generators can be observers on a region for anchoring a
> continuous process or an iterator as the first stage of a pipeline
> constructed on demand with a terminating condition (like a Hadoop task). Kind
> of like Cascading within regionserver processes, a nice model if not
> literally Cascading the implementation. MapReduce can be supported with this
> model, is a subset of it. Data can be ordered or unordered, depends on the
> generator. Filters could be stateful or stateless: stateless filters could
> handle data arriving in any order; stateful filters could be used with an
> ordered generator.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira