Gabriel Reid created CRUNCH-644:
-----------------------------------

             Summary: Set HDFS node affinity on created HFiles to improve 
locality
                 Key: CRUNCH-644
                 URL: https://issues.apache.org/jira/browse/CRUNCH-644
             Project: Crunch
          Issue Type: Improvement
            Reporter: Gabriel Reid


When creating HFiles via the {{HFileUtils.writeToHFilesForIncrementalLoad}} 
method, the underlying HDFS blocks of the created HFiles will end up on a 
selection of HDFS data nodes -- the selection of which nodes is left up to the 
HDFS Namenode. This means that there is a relatively small chance (depending on 
cluster size and replication factor) that the created HFiles will end up on the 
same physical machine as the region server which will make use of these HFiles, 
which limits the ability to use short-circuit reads to the local file system. 
Typically, this lack of locality is only really completely resolved after a 
major compaction.

It's possible to set a node affinity on HDFS files at creation time, to provide 
a suggestion to the namenode about a preferred data node for blocks to be 
located on. The intention of this ticket is to make use of this functionality 
to set the node affinity during HFile creation in 
{{HFileUtils.writeToHFilesForIncrementalLoad}} so that at least one (HDFS) 
block of each created HFile will be located on the same physical machine as the 
region server which will be using the file (assuming HDFS data nodes are 
running on the same machines as HBase region servers).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to