as far as i understand, this option is not forced. I suspect it actually
means 'minimum degree of parallelism'. so if you expect to use that to
reduce number of mappers, i don't think this is expected to work so much.
The one that do enforce anything are min split size and max split size in
file input so i guess you can try those. I rely on them (and open it up as a
job-specific option) in stochastic svd.

but usually forcing split size to increase creates a 'superslits' problem,
where a lot of data is moved around to just supply data to mappers. which is
perhaps why this option is meant to increase parallelism only, but probably
not to decrease it.

-d

On Tue, Dec 28, 2010 at 4:05 PM, Jeff Eastman <[email protected]> wrote:

> This is supposed to be a generic option. You should be able to specify
> Hadoop options such as this on the command line invocation of your favorite
> Mahout routine, but I'm having a similar problem setting
> -Dmapred.reduce.tasks=10 with Canopy and k-Means. This is both with and
> without a space after the -D.
>
> Can someone point me to a Mahout command where this does work? Both drivers
> extend AbstractJob and do the usual option processing pushups. I don't have
> Hadoop source locally so I can't debug the generic options parsing.
>
> -----Original Message-----
> From: beneo_7 [mailto:[email protected]]
> Sent: Monday, December 27, 2010 10:45 PM
> To: [email protected]
> Subject: where i can set -Dmapred.map.tasks=X
>
> i read onMahout in Action that I should set -Dmapred.map.tasks=X
> but it did not work for hadoop
>

Reply via email to