[ 
https://issues.apache.org/jira/browse/HADOOP-18629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhuyaogai updated HADOOP-18629:
-------------------------------
    Description: 
When importing large scale data to HBase, we always generate the hfiles with 
other Hadoop cluster, use the Distcp tool to copy the data to the HBase 
cluster, and bulkload data to HBase table. However, the data locality is rather 
low which may result in high query latency. After taking a compaction it will 
recover. Therefore, we can increase the data locality by specifying the 
favoredNodes in Distcp.

Could I submit a pull request to optimize it?

  was:
When importing large scale data to HBase, we always generate the hfiles with 
other Hadoop clusters, use the Distcp tool to copy the data to the HBase 
cluster, and bulkload data to HBase table. However, the data locality is rather 
low which may result in high query latency. After taking a compaction it will 
recover. Therefore, we can increase the data locality by specifying the 
favoredNodes in Distcp.

Could I submit a pull request to optimize it?


> Hadoop DistCp supports specifying favoredNodes for data copying
> ---------------------------------------------------------------
>
>                 Key: HADOOP-18629
>                 URL: https://issues.apache.org/jira/browse/HADOOP-18629
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: tools
>            Reporter: zhuyaogai
>            Priority: Major
>
> When importing large scale data to HBase, we always generate the hfiles with 
> other Hadoop cluster, use the Distcp tool to copy the data to the HBase 
> cluster, and bulkload data to HBase table. However, the data locality is 
> rather low which may result in high query latency. After taking a compaction 
> it will recover. Therefore, we can increase the data locality by specifying 
> the favoredNodes in Distcp.
> Could I submit a pull request to optimize it?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to