[
https://issues.apache.org/jira/browse/HDFS-4817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Colin Patrick McCabe updated HDFS-4817:
---------------------------------------
Attachment: HDFS-4817.004.patch
* fix findbugs warning
* don't change {{CACHE_DROP_INTERVAL_BYTES}} or {{CACHE_DROP_LAG_BYTES}}
* if the user explicitly requests cache dropping or readahead on small reads,
honor that request. But leave the default behavior unchanged if no cache hints
are given.
> make HDFS advisory caching configurable on a per-file basis
> -----------------------------------------------------------
>
> Key: HDFS-4817
> URL: https://issues.apache.org/jira/browse/HDFS-4817
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: hdfs-client
> Affects Versions: 3.0.0
> Reporter: Colin Patrick McCabe
> Assignee: Colin Patrick McCabe
> Priority: Minor
> Attachments: HDFS-4817.001.patch, HDFS-4817.002.patch,
> HDFS-4817.004.patch
>
>
> HADOOP-7753 and related JIRAs introduced some performance optimizations for
> the DataNode. One of them was readahead. When readahead is enabled, the
> DataNode starts reading the next bytes it thinks it will need in the block
> file, before the client requests them. This helps hide the latency of
> rotational media and send larger reads down to the device. Another
> optimization was "drop-behind." Using this optimization, we could remove
> files from the Linux page cache after they were no longer needed.
> Using {{dfs.datanode.drop.cache.behind.writes}} and
> {{dfs.datanode.drop.cache.behind.reads}} can improve performance
> substantially on many MapReduce jobs. In our internal benchmarks, we have
> seen speedups of 40% on certain workloads. The reason is because if we know
> the block data will not be read again any time soon, keeping it out of memory
> allows more memory to be used by the other processes on the system. See
> HADOOP-7714 for more benchmarks.
> We would like to turn on these configurations on a per-file or per-client
> basis, rather than on the DataNode as a whole. This will allow more users to
> actually make use of them. It would also be good to add unit tests for the
> drop-cache code path, to ensure that it is functioning as we expect.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira