[
https://issues.apache.org/jira/browse/PIG-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jie Li updated PIG-2779:
------------------------
Attachment: PIG-2779.2.patch
Thanks Bill for the review! Attached PIG-2779.2.patch with fixes.
{quote}
MRCompiler
1. You removed some logic which seems to now be in JCC. What about the removed
call to eng.getJobConf().getNumReduceTasks()? Is that still being picked up
elsewhere?
It appears like that check was dropped.
{quote}
Yeah I purposely leave out the parameter "mapred.reduce.tasks", as the runtime
adjustment used to ignore it.
I thought about including it as an option for setting #reducers, but the
problem is its default value is 1 instead of -1, so we can't distinguish
whether users set it to 1 or not. (this was one of bugs before). Hive supports
this parameter by making it default to -1 and let users override it. We can
also add this support if we want.
{quote}
2. HExecutionEngine inport no longer needed.
{quote}
Nice catch. Removed.
{quote}
JobControlCompiler
3. Should ParallelConstantVisitor be called with mro.reducePlan or
nextMro.reducePlan?
{quote}
Should be mro, the sample job.
{quote}
4. calculateRuntimeReducers doesn't need MROperPlan plan in it's signature.
{quote}
Sure. Removed.
{quote}
5. Line 804, if statements should have curly braces.
6. Line 819, should have space between else and trailing curly bracket.
{quote}
Nice catch. Btw do you use any syntax checking tool?
{quote}
test/smoke-tests
7. Did you mean to include this in the patch?
{quote}
Sorry for that. Removed.
> Refactoring the code for setting number of reducers
> ---------------------------------------------------
>
> Key: PIG-2779
> URL: https://issues.apache.org/jira/browse/PIG-2779
> Project: Pig
> Issue Type: Bug
> Reporter: Jie Li
> Assignee: Jie Li
> Fix For: 0.11
>
> Attachments: PIG-2779.0.patch, PIG-2779.1.patch, PIG-2779.2.patch,
> TestNumberOfReducers.java, TestNumberOfReducers.java
>
>
> As PIG-2652 observed, currently the code for setting number of reducers is a
> little messy. MapReduceOper.requestedParallelism seems being misused in some
> plases, and now we support runtime estimation of #reducer which further
> complicates the problem.
> For example, if we specify parallel 1 for the order-by, the estimated
> #reducer will be used. If we specify parallel 2 while it estimates 4,
> order-by will fail due to "Illegal partition for Null". If we specify
> parallel 4 while it estimates 2, then some reducers will have nothing to do.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira