On Tue, 2009-04-14 at 07:59 -0500, Pankil Doshi wrote: > Hey, > > I am trying complex queries on hadoop and in which i require more than one > job to run to get final result..results of job one captures few joins of the > query and I want to pass those results as input to 2nd job and again do > processing so that I can get final results.queries are such that I cant do > all types of joins and filterin in job1 and so I require two jobs. > > right now I write results of job 1 to hdfs and read dem for job2..but thats > take unecessary IO time.So was looking for something that I can store my > results of job1 in memory and use them as input for job 2.
Hi, I am a programming language and compiler designer. We have a workflow engine which is capable of taking a description of a complex workflow and analysing it as a multi-stage map-reduce system to generate an optimal resource allocation. I'm hunting around for people who have problems like this, since I'm considering whether to port the whole thing to hadoop as a high-level language. Do you, or any other users have descriptions of workflows more complex than "one map, maybe one reduce" which you would like to be able to express easily? S.