did you configure map output compression ?
2012/6/15 Chen He <airb...@gmail.com> > Let me know when you get the correct answer. > > Chen > > On Thu, Jun 14, 2012 at 11:42 AM, Nan Zhu <zhunans...@gmail.com> wrote: > > > Hi, Chen, > > > > Thank you for your reply, > > > > but in its README, there is no value which is larger than 100%, it means > > that the size of intermediate results will never be larger than input > size, > > > > it will not be the case, because the input data is compressed, the size > of > > the generated data will expand to be very large.... > > > > it's just my guessing, can anyone correct me? > > > > Best, > > > > Nan > > > > > > On Thu, Jun 14, 2012 at 11:50 PM, Chen He <airb...@gmail.com> wrote: > > > > > Hi Nan > > > > > > probably the map stage will output 10% of the total input, and the > reduce > > > stage will output 40% of intermediate results (10% of total input). > > > > > > For example, 500GB input, after the map stage, it will be 50GB and it > > will > > > become 20GB after the reduce stage. > > > > > > It may be similar to the loadgen in hadoop test example. > > > > > > Anyone has suggestion? > > > > > > Chen > > > System Architect Intern @ ZData > > > PhD student@CSE Dept. > > > > > > > > > On Thu, Jun 14, 2012 at 1:58 AM, Nan Zhu <zhunans...@gmail.com> wrote: > > > > > > > Hi, all > > > > > > > > I'm using gridmix2 to test my cluster, while in its README file, > there > > > are > > > > statements like the following: > > > > > > > > +1) Three stage map/reduce job > > > > + Input: 500GB compressed (2TB uncompressed) > SequenceFile > > > > + (k,v) = (5 words, 100 words) > > > > + hadoop-env: FIXCOMPSEQ > > > > + *Compute1: keep 10% map, 40% reduce > > > > + Compute2: keep 100% map, 77% reduce > > > > + Input from Compute1 > > > > + Compute3: keep 116% map, 91% reduce > > > > + Input from Compute2 > > > > + *Motivation: Many user workloads are implemented as pipelined > > > > map/reduce > > > > + jobs, including Pig workloads > > > > > > > > > > > > Can anyone tell me what does "keep 10% map, 40% reduce" mean here? > > > > > > > > Best, > > > > > > > > -- > > > > Nan Zhu > > > > School of Electronic, Information and Electrical Engineering,229 > > > > Shanghai Jiao Tong University > > > > 800,Dongchuan Road,Shanghai,China > > > > E-Mail: zhunans...@gmail.com > > > > > > > > > > > > > > > -- > > Nan Zhu > > School of Electronic, Information and Electrical Engineering,229 > > Shanghai Jiao Tong University > > 800,Dongchuan Road,Shanghai,China > > E-Mail: zhunans...@gmail.com > > >