I tried it first with the smaller number and got a lot of maps! But that's good... so long as I got past the single map task. I am tweaking it now but this did the trick. Thanks again.
On Tue, Jan 12, 2010 at 12:33 PM, deneche abdelhakim <[email protected]> wrote: > oups, sorry the size should be specified in bytes and not kB. so 8.8Mb ~ > 9227468b. to get 10 mappers use a mapred.max.split.size=922747 > > --- En date de : Mar 12.1.10, deneche abdelhakim <[email protected]> a > écrit : > >> De: deneche abdelhakim <[email protected]> >> Objet: Re: LDA only executes a single map task per iteration when running in >> actual distributed mode? >> À: [email protected] >> Date: Mardi 12 Janvier 2010, 17h43 >> try using a small value for Hadoop >> parameter "mapred.max.split.size". For a file size of 8.8 Mb >> (~9000 Kb) if you want 10 mappers you should use a max split >> size of 9000/10=900. >> >> I don't now if LDADriver implements Hadoop Tool interface, >> but if it does you can pass the desired value in the command >> line as follows: >> >> hadoop jar /root/mahout-core-0.2.job >> org.apache.mahout.clustering.lda.LDADriver >> -Dmapred.max.split.size=900 -i >> hdfs://master/lda/input/vectors -o hdfs://master/lda/output >> -k 20 -v 10000 >> --maxIter 40 >> >> Please note that it won't work if LDADriver is using a >> fancy InputFormat other than InputFileFormat. The easiest >> way to now is just to try it ! >> >> --- En date de : Mar 12.1.10, Chad Hinton <[email protected]> >> a écrit : >> >> > De: Chad Hinton <[email protected]> >> > Objet: Re: LDA only executes a single map task per >> iteration when running in actual distributed mode? >> > À: "mahout-user" <[email protected]> >> > Date: Mardi 12 Janvier 2010, 17h13 >> > Ted, David - thanks for your replies. >> > I thought Hadoop would >> > automatically split the file but it is not. The >> vectors >> > file generated >> > from build-reuters.sh (by using >> > org.apache.mahout.utils.vectors.lucene.Driver over >> the >> > Lucene index) >> > comes out to around 8.8 mb. Perhaps that is to small >> and >> > won't be >> > split if it's below the HDFS block size. I'm using >> the >> > default 64mb >> > for the HDFS. Perhaps a custom InputSplit/RecordReader >> is >> > needed to >> > split the sequence file. I'll investigate further. If >> > anyone has >> > further pointers or more info please chime in. >> > >> > Thanks, >> > Chad >> > >> > > It should just happen if the file is large enough >> and >> > the program is >> > > configured for more than one mapper task and the >> file >> > type is correct. >> > >> > > If you are reading an uncompressed sequence file >> you >> > should be set. >> > >> > > On Mon, Jan 11, 2010 at 9:53 PM, David Hall >> <[email protected]> >> > wrote: >> > >> > >> I can brush up on my hadoop foo to figure >> > out how to have >> > >> hadoop split up a single file, if you want. >> > >> >> > >> > >-- >> > >Ted Dunning, CTO >> > >DeepDyve >> > >> >> >> >> > > > >
