On Thu, Sep 13, 2012 at 10:28 AM, Stack <[email protected]> wrote: > Write a short paragraph and I'll make an HDFS configuration sections > like this HBase configurations section on manual and stick it in > there: http://hbase.apache.org/book.html#perf.configurations
Here's a first stab: Leveraging local data Since Hadoop 1.0.0 (also 0.22.1, 0.23.1, CDH3u3 and HDP 1.0) via HDFS-2246[1], it is possible for the DFSClient to take a shortcut and read directly from disk instead of going through the DataNode when the data is local. What this means for HBase is that the RegionServers can read directly off their machine's disks instead of having to open a socket to talk to the DataNode, the former being generally much faster[2]. In order to enable it, first hdfs-site.xml needs to be amended with: dfs.block.local-path-access.user = the _only_ user that can use the shortcut. This has to be the user that started HBase. And in hbase-site.xml: dfs.client.read.shortcircuit = true The DataNodes need to be restarted in order to pick up the new configuration. Be aware that if a process started under another username than the one configured here also has the shortcircuit enabled, it will get an Exception regarding an unauthorized access but the data will still be read. 1. https://issues.apache.org/jira/browse/HDFS-2246 2. http://files.meetup.com/1350427/hug_ebay_jdcryans.pdf
