Hivemall has a MixServer, a external Key-value store, for exchanging messages over map tasks.
https://github.com/myui/hivemall/tree/master/src/main/java/hivemall/mix FYI, Optimus tries to express iteration by rewriting DAGs at runtime. http://research.microsoft.com/en-us/projects/optimus/ http://research.microsoft.com/pubs/185714/Optimus.pptx On Wed, Mar 25, 2015 at 6:02 PM, Johannes Zillmann <[email protected]> wrote: > Hey Gopal, > >> On 25 Mar 2015, at 05:26, Gopal Vijayaraghavan <[email protected]> wrote: >> >> Hi, >> >> Iterative algorithms are expressed as DAGs in a loop. >> >> The acyclic nature of DAGs, whether in Tez or Spark (since you mention the >> paper) make that the natural way to implement that - repeated application >> of the same operation over the same data, with a decision condition >> determining whether to stay in the loop or not. > > Can you point to a piece of code which implements this approach ? > If you each look operation is a single DAG, how would that avoid hdfs barrier > ? > > Johannes > >> >> You might want to look at last year¹s Hadoop Summit presentations for a >> direct example of Iterative algorithms with Tez. >> >> http://www.slideshare.net/Hadoop_Summit/pig-on-tez-low-latency-etl-with-big >> -data/25 >> >> >> Logistic regression needs you to use a library which implements that >> specific algorithm [1]. >> >> On that note, something which needs incremental iteration can probably be >> even more efficient in Tez than these approaches if you unroll the >> iteration as 1-1 edges all of the final tasks ending up generating outputs. >> >> Cheers, >> Gopal >> [1] - https://github.com/myui/hivemall#regression >> >> >> On 3/24/15, 8:43 PM, "Chang Chen" <[email protected]> wrote: >> >>> Hi >>> >>> from the PhD Disseration >>> <http://www.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-12.pdf> of >>> Matei >>> Zaharia, there are four computation models in the large scale clusters: >>> >>> >>> 1. *Iterative algorithm*, such as graph processing and machine leaning >>> algorithm >>> 2. *Relational query* >>> 3. *MapReduce*, a general parallel computation model >>> 4. *Stream processing*, >>> >>> Obviously, Tez supports #2 and #3, but for #1 and #4, I don't see any >>> examples. >>> >>> As for streaming, I guess if we implement appropriate input, there is no >>> reason that tez can't support in theory. >>> >>> But for Machine Leaning, how do we use vertex and edge to express >>> *Logistic >>> Regression*? >>> >>> Thanks >>> Chang >> >> >
