Re: Modeling WordCount in a different way

Pankil Doshi Wed, 15 Apr 2009 06:07:50 -0700

On Wed, Apr 15, 2009 at 1:26 AM, Sharad Agarwal <shara...@yahoo-inc.com>wrote:


>
>
> > I am trying complex queries on hadoop and in which i require more than
> one
> > job to run to get final result..results of job one captures few joins of
> the
> > query and I want to pass those results as input to 2nd job and again do
> > processing so that I can get final results.queries are such that I cant
> do
> > all types of joins and filterin in job1 and so I require two jobs.
> >
> > right now I write results of job 1 to hdfs and read dem for job2..but
> thats
> > take unecessary IO time.So was looking for something that I can store my
> > results of job1 in memory and use them as input for job 2.
> >
> > do let me know if you need any  more details.
> How big is your input and output data ?

And my total data is of 7.8 gb out of which for Job 1 i use around 3
gb.output of job1 is of about 1gb and I use this output as input to job 2.


> How many nodes you are using?

Well Right now due to lack of Resources I have only 4 nodes each dual core
processors with 1GB og ram and about 80gb hard
disk in  each..

>
> What is your job runtime?

My first jobs takes long time after reaching 90% of reduce phase as it does
in-memory merge sort and so that is also an big issue.I will have to arrange
for more memory for my clusters I suppose.

I will have look at jvm reuse feature. thanks



> Pankil

Re: Modeling WordCount in a different way

Reply via email to