Re: Modeling WordCount in a different way

Sharad Agarwal Tue, 14 Apr 2009 23:27:29 -0700


> I am trying complex queries on hadoop and in which i require more than one
> job to run to get final result..results of job one captures few joins of the
> query and I want to pass those results as input to 2nd job and again do
> processing so that I can get final results.queries are such that I cant do
> all types of joins and filterin in job1 and so I require two jobs.
>
> right now I write results of job 1 to hdfs and read dem for job2..but thats
> take unecessary IO time.So was looking for something that I can store my
> results of job1 in memory and use them as input for job 2.
>
> do let me know if you need any  more details.
How big is your input and output data ? How many nodes you are using?
What is your job runtime?
I don't completely understand your usecase but my guess is that amout of IO
time might not be significant as compared to your overall job runtime., assuming
you have data local maps.


In case you are joining two data sets and one being small, you can load the 
smaller
one in memory. With jvm reuse feature you can load it once in a static field
in the mapper.

- Sharad

Re: Modeling WordCount in a different way

Reply via email to