Good point. I thought the logic was awkward, testing startsWith twice, so I went with the more direct solution.

On 12/29/10 6:29 PM, Lance Norskog wrote:
The Tree Map and Set classes preserve the order of addition to the Map/Set.

On Wed, Dec 29, 2010 at 11:50 AM, Jeff Eastman<[email protected]>  wrote:
The patch to MahoutDriver involves the code in the for loop at lines 203-216. If the 
arg.startsWith("-D") then the arg needs to be added to argsList at position 1, 
else at the end. I will commit a patch for this tonight as I have not got my Narus CLA 
signed yet.

-----Original Message-----
From: Dmitriy Lyubimov [mailto:[email protected]]
Sent: Wednesday, December 29, 2010 11:46 AM
To: [email protected]
Cc: [email protected]
Subject: Re: where i can set -Dmapred.map.tasks=X

ok, thank you, Jeff. Good to know. I actually expected to rely on this for a
wide range of issues (most common being task jvm parameters override).

On Wed, Dec 29, 2010 at 11:29 AM, Jeff Eastman<[email protected]>  wrote:

I've found the problem: the MahoutDriver uses a Map to organize the command
line arguments and this reorders them so that the -D arguments may not be
first. This causes them to be treated as job-specific options, causing the
failures. I'm working on a fix.

Jeff

-----Original Message-----
From: Jeff Eastman [mailto:[email protected]]
Sent: Tuesday, December 28, 2010 5:19 PM
To: [email protected]
Subject: RE: where i can set -Dmapred.map.tasks=X

That's where I'm beginning to look too. It seems the driver code is working
correctly (I thought I had tested that) but the CLI isn't.

The original post was for -Dmapred.map.tasks but I noticed the reduce.tasks
didn't work either.

-----Original Message-----
From: Dmitriy Lyubimov [mailto:[email protected]]
Sent: Tuesday, December 28, 2010 5:15 PM
To: [email protected]
Subject: Re: where i can set -Dmapred.map.tasks=X

Oh, so you are trying to set number of reduce tasks. i missed that,
original
post was about # of map tasks. sorry.

No, no idea why that error pops up in mahout command line. i would need to
dig into the mahout's cli code -- i don't thing i dug that deep there
before.

On Tue, Dec 28, 2010 at 5:06 PM, Jeff Eastman<[email protected]>  wrote:

It's very odd: when I run k-means from Eclipse and add
-Dmapred.reduce.tasks=10 as the first argument the driver loves it and
job.getNumReduceTasks() is set correctly to 10. When I run the same
command
line using bin/mahout; however, it fails:  with "Unexpected
-Dmapred.reduce.tasks=10 while processing Job-Specific Options.

The CLI invocation is: ./bin/mahout kmeans -Dmapred.reduce.tasks-10 -I
...


-----Original Message-----
From: Dmitriy Lyubimov [mailto:[email protected]]
Sent: Tuesday, December 28, 2010 4:55 PM
To: [email protected]
Subject: Re: where i can set -Dmapred.map.tasks=X

PPS it doesn't tell you what InputFileFormat actually uses for it as a
property, and i don't remember on top of my head either. but i assume you
could use them with -D as well.

On Tue, Dec 28, 2010 at 4:54 PM, Dmitriy Lyubimov<[email protected]>
wrote:

In particular, QJob is one of the drivers that uses that , in the
following
way:

f ( minSplitSize>0)
  SequenceFileInputFormat.setMinInputSplitSize(job, minSplitSize);

Interestng pecularity about that parameter is that in the current
hadoop
release for anything derived from InputFileFormat it ensures that all
splits
are at least that big and the last split is at least times 1.1  that
big.
I
am not quite sure why special treatment for the last split but that's
how
it
goes there.

-Dmitriy


On Tue, Dec 28, 2010 at 4:48 PM, Dmitriy Lyubimov<[email protected]
wrote:

Jeff,

it's mahout-376 patch i don't think it is committed. the driver class
there is SSVDCli, for your convenience you can find it here :

https://github.com/dlyubimov/ssvd-lsi/tree/givens-ssvd/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd
but like i said, i did not try to use it with -D option since i wanted
to
give an explicit option to increase split size if needed (and a help
for
it). Another reason is that solver has a series of jobs and only those
reading the source matrix have anything to do with the split size.


-d


On Tue, Dec 28, 2010 at 4:39 PM, Jeff Eastman<[email protected]>
wrote:
What's the driver class? If the -D parameters are working for you I
want
to compare to the clustering drovers

-----Original Message-----
From: Dmitriy Lyubimov [mailto:[email protected]]
Sent: Tuesday, December 28, 2010 4:37 PM
To: [email protected]
Subject: Re: where i can set -Dmapred.map.tasks=X

as far as i understand, this option is not forced. I suspect it
actually
means 'minimum degree of parallelism'. so if you expect to use that
to
reduce number of mappers, i don't think this is expected to work so
much.
The one that do enforce anything are min split size and max split
size
in
file input so i guess you can try those. I rely on them (and open it
up
as a
job-specific option) in stochastic svd.

but usually forcing split size to increase creates a 'superslits'
problem,
where a lot of data is moved around to just supply data to mappers.
which
is
perhaps why this option is meant to increase parallelism only, but
probably
not to decrease it.

-d

On Tue, Dec 28, 2010 at 4:05 PM, Jeff Eastman<[email protected]>
wrote:

This is supposed to be a generic option. You should be able to
specify
Hadoop options such as this on the command line invocation of your
favorite
Mahout routine, but I'm having a similar problem setting
-Dmapred.reduce.tasks=10 with Canopy and k-Means. This is both with
and
without a space after the -D.

Can someone point me to a Mahout command where this does work? Both
drivers
extend AbstractJob and do the usual option processing pushups. I
don't
have
Hadoop source locally so I can't debug the generic options parsing.

-----Original Message-----
From: beneo_7 [mailto:[email protected]]
Sent: Monday, December 27, 2010 10:45 PM
To: [email protected]
Subject: where i can set -Dmapred.map.tasks=X

i read onMahout in Action that I should set -Dmapred.map.tasks=X
but it did not work for hadoop





Reply via email to