Hi Stuti,

If you want deal with different types of files in the map phase, you can use 
org.apache.hadoop.mapred.lib.MultipleInputs API(different input formats, 
mappers) and then the output of those mappers can same type. After map phase, 
partitioner can send the map outputs from file1 and file2(which are similar 
based on your business need) to same reducer. You can compare these in the 
reduce phase.


If you give the scenario with some more details, people maylp you better. 

Thanks
Devaraj
________________________________________
From: Stuti Awasthi [stutiawas...@hcl.com]
Sent: Wednesday, April 04, 2012 5:25 PM
To: mapreduce-user@hadoop.apache.org
Subject: RE: Calling one MR job within another MR job

Hi Ravi,

There is no job dependency so I cannot use chaining MR or JobControl as you 
suggested.
I have 2 relatively big files, I start processing with File1 as input to MR1 
job , now this processing required to find the data from File2. One way to do 
is loop through File2 and get the data. Other way to pass File2 in MR2 job for 
parallel processing.

Second option is making hinting me to call an MR2 job inside from MR1 job. I am 
sure this is the common problem that people usually face. What is the best way 
to resolve this  kind of issue.

Thanks

From: Ravi teja ch n v [mailto:raviteja.c...@huawei.com]
Sent: Wednesday, April 04, 2012 4:35 PM
To: mapreduce-user@hadoop.apache.org
Subject: RE: Calling one MR job within another MR job


Hi Stuti,



If you are looking for MRjob2 to run after MRjob1, ie the job dependency,

you can use JobControl API, where you can manage the dependencies.



Calling another Job from a Mapper is not a good idea.



Thanks,

Ravi Teja



________________________________
From: Stuti Awasthi [stutiawas...@hcl.com]
Sent: 04 April 2012 16:04:19
To: mapreduce-user@hadoop.apache.org
Subject: Calling one MR job within another MR job
Hi all,

We have a usecase in which I start with first MR1 job with input file as 
File1.txt, and from this job, call another MR2 job with input as File2.txt
So :
MRjob1{
Map(){
MRJob2(File2.txt)
}
}

MRJob2{
Processing….
}

My queries are is this kind of approach is possible and how much are the 
implications from the performance perspective.


Regards,
Stuti Awasthi
HCL Comnet Systems and Services Ltd
F-8/9 Basement, Sec-3,Noida.


________________________________
::DISCLAIMER::
-----------------------------------------------------------------------------------------------------------------------

The contents of this e-mail and any attachment(s) are confidential and intended 
for the named recipient(s) only.
It shall not attach any liability on the originator or HCL or its affiliates. 
Any views or opinions presented in
this email are solely those of the author and may not necessarily reflect the 
opinions of HCL or its affiliates.
Any form of reproduction, dissemination, copying, disclosure, modification, 
distribution and / or publication of
this message without the prior written consent of the author of this e-mail is 
strictly prohibited. If you have
received this email in error please delete it and notify the sender 
immediately. Before opening any mail and
attachments please check them for viruses and defect.

-----------------------------------------------------------------------------------------------------------------------

Reply via email to