On Wed, Apr 15, 2009 at 1:26 AM, Sharad Agarwal <shara...@yahoo-inc.com>wrote:
> > > > I am trying complex queries on hadoop and in which i require more than > one > > job to run to get final result..results of job one captures few joins of > the > > query and I want to pass those results as input to 2nd job and again do > > processing so that I can get final results.queries are such that I cant > do > > all types of joins and filterin in job1 and so I require two jobs. > > > > right now I write results of job 1 to hdfs and read dem for job2..but > thats > > take unecessary IO time.So was looking for something that I can store my > > results of job1 in memory and use them as input for job 2. > > > > do let me know if you need any more details. > How big is your input and output data ? And my total data is of 7.8 gb out of which for Job 1 i use around 3 gb.output of job1 is of about 1gb and I use this output as input to job 2. > How many nodes you are using? Well Right now due to lack of Resources I have only 4 nodes each dual core processors with 1GB og ram and about 80gb hard disk in each.. > > What is your job runtime? My first jobs takes long time after reaching 90% of reduce phase as it does in-memory merge sort and so that is also an big issue.I will have to arrange for more memory for my clusters I suppose. I will have look at jvm reuse feature. thanks > Pankil