Hi, Iterative algorithms are expressed as DAGs in a loop.
The acyclic nature of DAGs, whether in Tez or Spark (since you mention the paper) make that the natural way to implement that - repeated application of the same operation over the same data, with a decision condition determining whether to stay in the loop or not. You might want to look at last year¹s Hadoop Summit presentations for a direct example of Iterative algorithms with Tez. http://www.slideshare.net/Hadoop_Summit/pig-on-tez-low-latency-etl-with-big -data/25 Logistic regression needs you to use a library which implements that specific algorithm [1]. On that note, something which needs incremental iteration can probably be even more efficient in Tez than these approaches if you unroll the iteration as 1-1 edges all of the final tasks ending up generating outputs. Cheers, Gopal [1] - https://github.com/myui/hivemall#regression On 3/24/15, 8:43 PM, "Chang Chen" <[email protected]> wrote: >Hi > >from the PhD Disseration ><http://www.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-12.pdf> of >Matei >Zaharia, there are four computation models in the large scale clusters: > > > 1. *Iterative algorithm*, such as graph processing and machine leaning > algorithm > 2. *Relational query* > 3. *MapReduce*, a general parallel computation model > 4. *Stream processing*, > >Obviously, Tez supports #2 and #3, but for #1 and #4, I don't see any >examples. > >As for streaming, I guess if we implement appropriate input, there is no >reason that tez can't support in theory. > >But for Machine Leaning, how do we use vertex and edge to express >*Logistic >Regression*? > >Thanks >Chang
