[ 
https://issues.apache.org/jira/browse/CRUNCH-644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15997951#comment-15997951
 ] 

Gabriel Reid commented on CRUNCH-644:
-------------------------------------

I take it nobody is wildly against this, so I'll commit it shortly unless I 
hear otherwise.

> Set HDFS node affinity on created HFiles to improve locality
> ------------------------------------------------------------
>
>                 Key: CRUNCH-644
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-644
>             Project: Crunch
>          Issue Type: Improvement
>            Reporter: Gabriel Reid
>         Attachments: CRUNCH-644.patch
>
>
> When creating HFiles via the {{HFileUtils.writeToHFilesForIncrementalLoad}} 
> method, the underlying HDFS blocks of the created HFiles will end up on a 
> selection of HDFS data nodes -- the selection of which nodes is left up to 
> the HDFS Namenode. This means that there is a relatively small chance 
> (depending on cluster size and replication factor) that the created HFiles 
> will end up on the same physical machine as the region server which will make 
> use of these HFiles, which limits the ability to use short-circuit reads to 
> the local file system. Typically, this lack of locality is only really 
> completely resolved after a major compaction.
> It's possible to set a node affinity on HDFS files at creation time, to 
> provide a suggestion to the namenode about a preferred data node for blocks 
> to be located on. The intention of this ticket is to make use of this 
> functionality to set the node affinity during HFile creation in 
> {{HFileUtils.writeToHFilesForIncrementalLoad}} so that at least one (HDFS) 
> block of each created HFile will be located on the same physical machine as 
> the region server which will be using the file (assuming HDFS data nodes are 
> running on the same machines as HBase region servers).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to