Ah okay, I did not the fs.inmemory.size.mb setting in any of the default config files located here:
http://hadoop.apache.org/common/docs/r0.20.2/mapred-default.html http://hadoop.apache.org/common/docs/r0.20.2/core-default.html http://hadoop.apache.org/common/docs/r0.20.2/hdfs-default.html Should this be something that needs to be added? Thank you for the help! ~Ed On Mon, Sep 27, 2010 at 11:18 AM, Ted Yu <[email protected]> wrote: > The setting should be fs.inmemory.size.mb > > On Mon, Sep 27, 2010 at 7:15 AM, pig <[email protected]> wrote: > > > HI Sriguru, > > > > Thank you for the tips. Just to clarify a few things. > > > > Our machines have 32 GB of RAM. > > > > I'm planning on setting each machine to run 12 mappers and 2 reducers > with > > the heap size set to 2048MB so total memory usage for the heap at 28GB. > > > > If this is the case should io.sort.mb be set to 70% of 2048MB (so ~1400 > > MB)? > > > > Also, I did not see a fs.inmemorysize.mb setting in any of the hadoop > > configuration files. Is that the correct setting I should be looking > for? > > Should this also be set to 70% of the heap size or does it need to share > > with the io.sort.mb setting. > > > > I assume if I'm bumping up io.sort.mb that much I also need to increase > > io.sort.factor from the default of 10. Is there a recommended relation > > between these two? > > > > Thank you for your help! > > > > ~Ed > > > > On Sun, Sep 26, 2010 at 3:05 AM, Srigurunath Chakravarthi < > > [email protected]> wrote: > > > > > Ed, > > > Tuning io.sort.mb will be certainly worthwhile if you have enough RAM > to > > > allow for a higher Java heap per map task without risking swapping. > > > > > > Similarly, you can decrease spills on the reduce side using > > > fs.inmemorysize.mb. > > > > > > You can use the following thumb rules for tuning those two: > > > > > > - Set these to ~70% of Java heap size. Pick heap sizes to utilize ~80% > > RAM > > > across all processes (maps, reducers, TT, DN, other) > > > - Set it small enough to avoid swap activity, but > > > - Set it large enough to minimize disk spills. > > > - Ensure that io.sort.factor is set large enough to allow full use of > > > buffer space. > > > - Balance space for output records (default 95%) & record meta-data > (5%). > > > Use io.sort.spill.percent and io.sort.record.percent > > > > > > Your mileage may vary. We've seen job exec time improvements worth > 1-3% > > > via spill-avoidance for miscellaneous applications. > > > > > > Your other option of running a map per 32MB or 64MB of input should > give > > > you better performance if your map task execution time is significant > > (i.e., > > > much larger than a few seconds) compared to the overhead of launching > map > > > tasks and reading input. > > > > > > Regards, > > > Sriguru > > > > > > >-----Original Message----- > > > >From: pig [mailto:[email protected]] > > > >Sent: Saturday, September 25, 2010 2:36 AM > > > >To: [email protected] > > > >Subject: Proper blocksize and io.sort.mb setting when using compressed > > > >LZO files > > > > > > > >Hello, > > > > > > > >We just recently switched to using lzo compressed file input for our > > > >hadoop > > > >cluster using Kevin Weil's lzo library. The files are pretty uniform > > > >in > > > >size at around 200MB compressed. Our block size is 256MB. > > > >Decompressed the > > > >average LZO input file is around 1.0GB. I noticed lots of our jobs > are > > > >now > > > >spilling lots of data to disk. We have almost 3x more spilled records > > > >than > > > >map input records for example. I'm guessing this is because each > > > >mapper is > > > >getting a 200 MB lzo file which decompresses into 1GB of data per > > > >mapper. > > > > > > > >Would you recommend solving this by reducing the block size to 64MB, > or > > > >even > > > >32MB and then using the LZO indexer so that a single 200MB lzo file is > > > >actually split among 3 or 4 mappers? Would it be better to play with > > > >the > > > >io.sort.mb value? Or, would it be best to play with both? Right now > > > >the > > > >io.sort.mb value is the default 200MB. Have other lzo users had to > > > >adjust > > > >their block size to compensate for the "expansion" of the data after > > > >decompression? > > > > > > > >Thank you for any help! > > > > > > > >~Ed > > > > > >
