[ 
https://issues.apache.org/jira/browse/HADOOP-7714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13122616#comment-13122616
 ] 

Scott Carey commented on HADOOP-7714:
-------------------------------------

{quote}I think the issue is that Linux's native readahead is not very 
aggressive,
{quote}

I have been tuning my systems for quite a while with aggressive OS readahead.  
The default is 128K, but it can be upped significantly which helps quite a bit 
on sequential reads to SATA drives.  Additionally, the 'deadline' scheduler is 
better at sequential throughput under contention.  I wonder how much of your 
manual read-ahead is just compensating for the poor OS defaults?  In other 
applications, I maximized read speeds (and reduced CPU use) by using small read 
buffers in Java (32KB) and large Linux read-ahead settings.

Additionaly, I always set up a separate file system for M/R temp space away 
from HDFS.  The HDFS one is tuned for sequential reads and fast flush from OS 
buffers to disk, with the deadline scheduler.  The temp space is tuned to delay 
flush to disk for up to 60 seconds (small jobs don't even make it to disk this 
way), and uses the CFQ scheduler.

This combination reduced the time of many of our jobs significantly (CDH2 and 
CDH3) -- especially job chains with many small tasks mixed in.

The Linux tuning parameters that have a big effect on disk performance and 
pagecache behavior are:
vm.dirty_ratio
vm.dirty_background_ratio
swappiness
readahead (e.g. blockdev --setra 4096 /dev/sda)
ext4 also has inode_readahead_blks=n and commit=nrsec


                
> Add support in native libs for OS buffer cache management
> ---------------------------------------------------------
>
>                 Key: HADOOP-7714
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7714
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: native
>    Affects Versions: 0.24.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>         Attachments: graphs.pdf, hadoop-7714-2.txt, hadoop-7714-20s-prelim.txt
>
>
> Especially in shared HBase/MR situations, management of the OS buffer cache 
> is important. Currently, running a big MR job will evict all of HBase's hot 
> data from cache, causing HBase performance to really suffer. However, caching 
> of the MR input/output is rarely useful, since the datasets tend to be larger 
> than cache and not re-read often enough that the cache is used. Having access 
> to the native calls {{posix_fadvise}} and {{sync_data_range}} on platforms 
> where they are supported would allow us to do a better job of managing this 
> cache.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to