[
https://issues.apache.org/jira/browse/MAHOUT-414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913434#action_12913434
]
Sean Owen commented on MAHOUT-414:
----------------------------------
I tend to think this is, in fact, a Hadoop-level configuration. At times a job
may wish to force concurrency -- 1 job only when it knows there is no
parallelism available, or 2x more reducers than mappers when that's known to be
good.
Users can control this already via Hadoop. Letting them control it via
duplicate command line parameters doesn't add that. I agree, it's sometimes
hard to know how to set parallelism, though Hadoop's guesses are good.
When I see Hadoop's guesses are too low, it's because input is too small to
create enough input shards. This is a different issue.
So I guess I'm wondering what the concrete change here could be, for
discussion? since it's marked as 0.4.
> Usability: Mahout applications need a consistent API to allow users to
> specify desired map/reduce concurrency
> -------------------------------------------------------------------------------------------------------------
>
> Key: MAHOUT-414
> URL: https://issues.apache.org/jira/browse/MAHOUT-414
> Project: Mahout
> Issue Type: Bug
> Affects Versions: 0.3
> Reporter: Jeff Eastman
> Fix For: 0.4
>
>
> If specifying the number of mappers and reducers is a common activity which
> users need to perform in running Mahout applications on Hadoop clusters then
> we need to have a standard way of specifying them in our APIs without
> exposing the full set of Hadoop options, especially for our non-power-users.
> This is the case for some applications already but others require the use of
> Hadoop-level -D arguments to achieve reasonable out-of-the-box parallelism
> even when running our examples. The usability defect is that some of our
> algorithms won't scale without it and that we don't have a standard way to
> express this in our APIs.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.