[ 
https://issues.apache.org/jira/browse/HDFS-4817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13660245#comment-13660245
 ] 

Colin Patrick McCabe commented on HDFS-4817:
--------------------------------------------

[~saint....@gmail.com]: The problem is that reads and writes by the client 
don't necessarily translate 1:1 to operations done with the DataNode.  We often 
do read stuff from a buffer if it's available.  So adding the ability to do 
something different with caching on each read might result in more network 
traffic and load.

Another question is what the size of HBase random I/O is.  Correct me if I'm 
wrong, but I thought it was usually kind of small-- I've heard the numbers 16kb 
or 32kb thrown around.  We don't currently use fadvise when doing small I/Os.  
The code has been like this for a while-- I believe HDFS-2465 introduced this 
behavior along with {{fadvise}} itself.  See {{isLongRead}} in {{BlockSender}}.

The idea is that for small reads, the overhead of calling fadvise might degrade 
performance.  This is something we could potentially revisit (maybe Todd can 
comment here?) but we should keep it in mind.  We also could do this 
incrementally, adding the ability to do per-file first, and then later adding 
an API to do per-read if it makes sense and improves perf.
                
> make HDFS advisory caching configurable on a per-file basis
> -----------------------------------------------------------
>
>                 Key: HDFS-4817
>                 URL: https://issues.apache.org/jira/browse/HDFS-4817
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs-client
>    Affects Versions: 3.0.0
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>            Priority: Minor
>         Attachments: HDFS-4817.001.patch
>
>
> HADOOP-7753 and related JIRAs introduced some performance optimizations for 
> the DataNode.  One of them was readahead.  When readahead is enabled, the 
> DataNode starts reading the next bytes it thinks it will need in the block 
> file, before the client requests them.  This helps hide the latency of 
> rotational media and send larger reads down to the device.  Another 
> optimization was "drop-behind."  Using this optimization, we could remove 
> files from the Linux page cache after they were no longer needed.
> Using {{dfs.datanode.drop.cache.behind.writes}} and 
> {{dfs.datanode.drop.cache.behind.reads}} can improve performance  
> substantially on many MapReduce jobs.  In our internal benchmarks, we have 
> seen speedups of 40% on certain workloads.  The reason is because if we know 
> the block data will not be read again any time soon, keeping it out of memory 
> allows more memory to be used by the other processes on the system.  See 
> HADOOP-7714 for more benchmarks.
> We would like to turn on these configurations on a per-file or per-client 
> basis, rather than on the DataNode as a whole.  This will allow more users to 
> actually make use of them.  It would also be good to add unit tests for the 
> drop-cache code path, to ensure that it is functioning as we expect.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to