Try looking into distributed cache.. may be it solves your problem ? Regards, Praveenesh
On Wed, Apr 4, 2012 at 6:01 PM, Ravi teja ch n v <raviteja.c...@huawei.com>wrote: > Hi Stuti, > > > > In that case, you can run the Job with dependent file (file2) first, then > go for the job using file1. > > Then your second mapper can use the already processed output. > > > > I guess this will solve the problem u have mentioned. > > > > Thanks, > > Ravi Teja > > > ------------------------------ > *From:* Stuti Awasthi [stutiawas...@hcl.com] > *Sent:* 04 April 2012 17:25:02 > > *To:* mapreduce-user@hadoop.apache.org > *Subject:* RE: Calling one MR job within another MR job > > Hi Ravi, > > > > There is no job dependency so I cannot use chaining MR or JobControl as > you suggested. > > I have 2 relatively big files, I start processing with File1 as input to > MR1 job , now this processing required to find the data from File2. One way > to do is loop through File2 and get the data. Other way to pass File2 in > MR2 job for parallel processing. > > > > Second option is making hinting me to call an MR2 job inside from MR1 job. > I am sure this is the common problem that people usually face. What is the > best way to resolve this kind of issue. > > > > Thanks > > > > *From:* Ravi teja ch n v [mailto:raviteja.c...@huawei.com] > *Sent:* Wednesday, April 04, 2012 4:35 PM > *To:* mapreduce-user@hadoop.apache.org > *Subject:* RE: Calling one MR job within another MR job > > > > Hi Stuti, > > > > If you are looking for MRjob2 to run after MRjob1, ie the job dependency, > > you can use JobControl API, where you can manage the dependencies. > > > > Calling another Job from a Mapper is not a good idea. > > > > Thanks, > > Ravi Teja > > > ------------------------------ > > *From:* Stuti Awasthi [stutiawas...@hcl.com] > *Sent:* 04 April 2012 16:04:19 > *To:* mapreduce-user@hadoop.apache.org > *Subject:* Calling one MR job within another MR job > > Hi all, > > > > We have a usecase in which I start with first MR1 job with input file as > File1.txt, and from this job, call another MR2 job with input as File2.txt > > So : > > MRjob1{ > > Map(){ > > MRJob2(File2.txt) > > } > > } > > > > MRJob2{ > > Processing…. > > } > > > > My queries are is this kind of approach is possible and how much are the > implications from the performance perspective. > > > > > > Regards, > > *Stuti Awasthi* > > HCL Comnet Systems and Services Ltd > > F-8/9 Basement, Sec-3,Noida. > > > > > ------------------------------ > > ::DISCLAIMER:: > > ----------------------------------------------------------------------------------------------------------------------- > > The contents of this e-mail and any attachment(s) are confidential and > intended for the named recipient(s) only. > It shall not attach any liability on the originator or HCL or its > affiliates. Any views or opinions presented in > this email are solely those of the author and may not necessarily reflect > the opinions of HCL or its affiliates. > Any form of reproduction, dissemination, copying, disclosure, > modification, distribution and / or publication of > this message without the prior written consent of the author of this > e-mail is strictly prohibited. If you have > received this email in error please delete it and notify the sender > immediately. Before opening any mail and > attachments please check them for viruses and defect. > > > ----------------------------------------------------------------------------------------------------------------------- >