Online aggregation and continuous query support
-----------------------------------------------
Key: MAPREDUCE-1211
URL: https://issues.apache.org/jira/browse/MAPREDUCE-1211
Project: Hadoop Map/Reduce
Issue Type: New Feature
Components: task
Reporter: Tyson Condie
Priority: Minor
The purpose of this post is to propose a modified MapReduce architecture that
allows data to be pipelined between operators. This extends the MapReduce
programming model beyond batch processing, and can reduce completion times and
improve system utilization for batch jobs as well. We have built a modified
version of the Hadoop MapReduce framework that supports online aggregation,
which allows users to see "early returns" from a job as it is being computed.
Our Hadoop Online Prototype (HOP) also supports continuous queries, which
enable MapReduce programs to be written for applications such as event
monitoring and stream processing. HOP retains the fault tolerance properties of
Hadoop, and can run unmodified user-defined MapReduce programs.
For more information on the HOP design, please see our technical report.
http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-136.html
Further details are discussed in the following blog posts.
http://databeta.wordpress.com/2009/10/18/mapreduce-online/
http://radar.oreilly.com/2009/10/pipelining-and-real-time-analytics-with-mapreduce-online.html
http://dbmsmusings.blogspot.com/2009/10/analysis-of-mapreduce-online-paper.html
The HOP code has been published at the following location.
http://code.google.com/p/hop/
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.