Re: Modeling WordCount in a different way

2009-04-15 Thread Pankil Doshi
On Wed, Apr 15, 2009 at 1:26 AM, Sharad Agarwal shara...@yahoo-inc.comwrote: I am trying complex queries on hadoop and in which i require more than one job to run to get final result..results of job one captures few joins of the query and I want to pass those results as input to 2nd job

Re: Modeling WordCount in a different way

2009-04-14 Thread Pankil Doshi
Hey, I am trying complex queries on hadoop and in which i require more than one job to run to get final result..results of job one captures few joins of the query and I want to pass those results as input to 2nd job and again do processing so that I can get final results.queries are such that I

Re: Modeling WordCount in a different way

2009-04-13 Thread Pankil Doshi
Hey Did u find any class or way out for storing results of Job1 map/reduce in memory and using that as an input to job2 map/Reduce?I am facing a situation where I need to do similar thing.If anyone can help me out.. Pankil On Wed, Apr 8, 2009 at 12:51 AM, Sharad Agarwal

Re: Modeling WordCount in a different way

2009-04-13 Thread sharad agarwal
Pankil Doshi wrote: Hey Did u find any class or way out for storing results of Job1 map/reduce in memory and using that as an input to job2 map/Reduce?I am facing a situation where I need to do similar thing.If anyone can help me out.. Normally you would write the job output to a file and

Re: Modeling WordCount in a different way

2009-04-07 Thread Sharad Agarwal
Suppose a batch of inputsplits arrive in the beginning to every map, and reduce gives the word, frequency for this batch of inputsplits. Now after this another batch of inputsplits arrive and the results from subsequent reduce are aggregated to the previous results(if the word that has

Re: Modeling WordCount in a different way

2009-04-07 Thread Aayush Garg
I have confusion how would I start the next job after finishing the one, could you just make it clear by some rough example. Also do I need to use SequenceFileInputFormat to maintain the results in the memory and then accessing it. On Tue, Apr 7, 2009 at 10:43 AM, Sharad Agarwal

Re: Modeling WordCount in a different way

2009-04-07 Thread Norbert Burger
Aayush, out of curiosity, why do you want model wordcount this way? What benefit do you see? Norbert On 4/6/09, Aayush Garg aayush.g...@gmail.com wrote: Hi, I want to make experiments with wordcount example in a different way. Suppose we have very large data. Instead of splitting all the

Re: Modeling WordCount in a different way

2009-04-07 Thread Aayush Garg
I want to investigate whether hadoop could handle streams. Like data is coming as a infinite stream and hadoop is used to perform online aggregation. Hadoop comes with fault tolerance and other nice features so these are directly used in such scenario. On Tue, Apr 7, 2009 at 4:28 PM, Norbert

RE: Modeling WordCount in a different way

2009-04-07 Thread Sharad Agarwal
I have confusion how would I start the next job after finishing the one, could you just make it clear by some rough example. See JobControl class to chain the jobs. You can specify dependencies as well. You can checkout the TestJobControl class for example code. Also do I need to use

Modeling WordCount in a different way

2009-04-06 Thread Aayush Garg
Hi, I want to make experiments with wordcount example in a different way. Suppose we have very large data. Instead of splitting all the data one time, we want to feed some splits in the map-reduce job at a time. I want to model the hadoop job like this, Suppose a batch of inputsplits arrive in