Hey Gopal, > On 25 Mar 2015, at 05:26, Gopal Vijayaraghavan <[email protected]> wrote: > > Hi, > > Iterative algorithms are expressed as DAGs in a loop. > > The acyclic nature of DAGs, whether in Tez or Spark (since you mention the > paper) make that the natural way to implement that - repeated application > of the same operation over the same data, with a decision condition > determining whether to stay in the loop or not.
Can you point to a piece of code which implements this approach ? If you each look operation is a single DAG, how would that avoid hdfs barrier ? Johannes > > You might want to look at last year¹s Hadoop Summit presentations for a > direct example of Iterative algorithms with Tez. > > http://www.slideshare.net/Hadoop_Summit/pig-on-tez-low-latency-etl-with-big > -data/25 > > > Logistic regression needs you to use a library which implements that > specific algorithm [1]. > > On that note, something which needs incremental iteration can probably be > even more efficient in Tez than these approaches if you unroll the > iteration as 1-1 edges all of the final tasks ending up generating outputs. > > Cheers, > Gopal > [1] - https://github.com/myui/hivemall#regression > > > On 3/24/15, 8:43 PM, "Chang Chen" <[email protected]> wrote: > >> Hi >> >> from the PhD Disseration >> <http://www.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-12.pdf> of >> Matei >> Zaharia, there are four computation models in the large scale clusters: >> >> >> 1. *Iterative algorithm*, such as graph processing and machine leaning >> algorithm >> 2. *Relational query* >> 3. *MapReduce*, a general parallel computation model >> 4. *Stream processing*, >> >> Obviously, Tez supports #2 and #3, but for #1 and #4, I don't see any >> examples. >> >> As for streaming, I guess if we implement appropriate input, there is no >> reason that tez can't support in theory. >> >> But for Machine Leaning, how do we use vertex and edge to express >> *Logistic >> Regression*? >> >> Thanks >> Chang > >
