PPS it doesn't tell you what InputFileFormat actually uses for it as a
property, and i don't remember on top of my head either. but i assume you
could use them with -D as well.

On Tue, Dec 28, 2010 at 4:54 PM, Dmitriy Lyubimov <[email protected]> wrote:

> In particular, QJob is one of the drivers that uses that , in the following
> way:
>
> f ( minSplitSize>0)
>  SequenceFileInputFormat.setMinInputSplitSize(job, minSplitSize);
>
> Interestng pecularity about that parameter is that in the current hadoop
> release for anything derived from InputFileFormat it ensures that all splits
> are at least that big and the last split is at least times 1.1  that big. I
> am not quite sure why special treatment for the last split but that's how it
> goes there.
>
> -Dmitriy
>
>
> On Tue, Dec 28, 2010 at 4:48 PM, Dmitriy Lyubimov <[email protected]>wrote:
>
>> Jeff,
>>
>> it's mahout-376 patch i don't think it is committed. the driver class
>> there is SSVDCli, for your convenience you can find it here :
>> https://github.com/dlyubimov/ssvd-lsi/tree/givens-ssvd/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd
>>
>> but like i said, i did not try to use it with -D option since i wanted to
>> give an explicit option to increase split size if needed (and a help for
>> it). Another reason is that solver has a series of jobs and only those
>> reading the source matrix have anything to do with the split size.
>>
>>
>> -d
>>
>>
>> On Tue, Dec 28, 2010 at 4:39 PM, Jeff Eastman <[email protected]> wrote:
>>
>>> What's the driver class? If the -D parameters are working for you I want
>>> to compare to the clustering drovers
>>>
>>> -----Original Message-----
>>> From: Dmitriy Lyubimov [mailto:[email protected]]
>>> Sent: Tuesday, December 28, 2010 4:37 PM
>>> To: [email protected]
>>> Subject: Re: where i can set -Dmapred.map.tasks=X
>>>
>>> as far as i understand, this option is not forced. I suspect it actually
>>> means 'minimum degree of parallelism'. so if you expect to use that to
>>> reduce number of mappers, i don't think this is expected to work so much.
>>> The one that do enforce anything are min split size and max split size in
>>> file input so i guess you can try those. I rely on them (and open it up
>>> as a
>>> job-specific option) in stochastic svd.
>>>
>>> but usually forcing split size to increase creates a 'superslits'
>>> problem,
>>> where a lot of data is moved around to just supply data to mappers. which
>>> is
>>> perhaps why this option is meant to increase parallelism only, but
>>> probably
>>> not to decrease it.
>>>
>>> -d
>>>
>>> On Tue, Dec 28, 2010 at 4:05 PM, Jeff Eastman <[email protected]>
>>> wrote:
>>>
>>> > This is supposed to be a generic option. You should be able to specify
>>> > Hadoop options such as this on the command line invocation of your
>>> favorite
>>> > Mahout routine, but I'm having a similar problem setting
>>> > -Dmapred.reduce.tasks=10 with Canopy and k-Means. This is both with and
>>> > without a space after the -D.
>>> >
>>> > Can someone point me to a Mahout command where this does work? Both
>>> drivers
>>> > extend AbstractJob and do the usual option processing pushups. I don't
>>> have
>>> > Hadoop source locally so I can't debug the generic options parsing.
>>> >
>>> > -----Original Message-----
>>> > From: beneo_7 [mailto:[email protected]]
>>> > Sent: Monday, December 27, 2010 10:45 PM
>>> > To: [email protected]
>>> > Subject: where i can set -Dmapred.map.tasks=X
>>> >
>>> > i read onMahout in Action that I should set -Dmapred.map.tasks=X
>>> > but it did not work for hadoop
>>> >
>>>
>>
>>
>

Reply via email to