Prioritize hadoop parameter "mapred.reduce.task" above estimation of reducer
number
-----------------------------------------------------------------------------------
Key: PIG-1810
URL: https://issues.apache.org/jira/browse/PIG-1810
Project: Pig
Issue Type: Bug
Affects Versions: 0.9.0
Reporter: Jeff Zhang
Assignee: Jeff Zhang
Fix For: 0.9.0
Anup Point this problem in PIG-1249
{quote}
Anup added a comment - 18/Jan/11 07:46 PM
one thing that we didn't take care is the use of the hadoop parameter
"mapred.reduce.tasks".
If I specify the hadoop parameter -Dmapred.reduce.tasks=450 for all the MR jobs
, it is overwritten by estimateNumberOfReducers(conf,mro), which in my case is
15.
I am not specifying any default_parallel and PARALLEL statements.
Ideally, the number of reducer should be 450.
I think we should prioritize this parameter above the estimate reducers
calculations.
The priority list should be
1. PARALLEL statement
2. default_parallel statement
3. mapred.reduce.task hadoop parameter
4. estimateNumberOfreducers();
{quote}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.