[jira] Commented: (HBASE-3451) Cluster migration best practices

Daniel Einspanjer (JIRA) Tue, 18 Jan 2011 22:12:13 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12983544#action_12983544
 ]


Daniel Einspanjer commented on HBASE-3451:
------------------------------------------

Anyone ever try to distcp hbase data using a file list?

We got this error:
11/01/18 21:21:50 INFO tools.DistCp: 
destPath=hdfs://hp-node70.phx1.mozilla.com:8020/hbase
org.apache.hadoop.tools.DistCp$DuplicationException: Invalid input, there are 
duplicated files in the sources: 
hftp://cm-hadoop-adm03.mozilla.org:50070/hbase/archive/683263177/.regioninfo, 
hftp://cm-hadoop-adm03.mozilla.org:50070/hbase/crash_counts/2038233953/.regioninfo
        at org.apache.hadoop.tools.DistCp.checkDuplication(DistCp.java:1383)
        at org.apache.hadoop.tools.DistCp.setup(DistCp.java:1186)
        at org.apache.hadoop.tools.DistCp.copy(DistCp.java:666)
        at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
        at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)

It seems that checkDuplication doesn't evaluate the full path and hence will 
choke the first time it evaluates two .regioninfo files.

> Cluster migration best practices
> --------------------------------
>
>                 Key: HBASE-3451
>                 URL: https://issues.apache.org/jira/browse/HBASE-3451
>             Project: HBase
>          Issue Type: Brainstorming
>    Affects Versions: 0.20.6, 0.89.20100924
>            Reporter: Daniel Einspanjer
>            Priority: Critical
>
> Mozilla is currently in the process of trying to migrate our HBase cluster to 
> a new datacenter.
> We have our existing 25 node cluster in our SJC datacenter.  It is serving 
> production traffic 24/7.  While we can take downtimes, it is very costly and 
> difficult to take them for more than a few hours in the evening.
> We have two new 30 node clusters in our PHX datacenter.  We are wanting to 
> cut production over to one of these this week.
> The old cluster is running 0.20.6.  The new clusters are running CDH3b3 with 
> HBase 0.89.
> We have tried running a pull distcp using hftp URLs.  If HBase is running, 
> this causes SAX XML Parsing exceptions when a directory is removed during the 
> scan.
> If HBase is stopped, it takes hours for the directory compare to finish 
> before it even begins copying data.
> We have tried a custom backup MR job.  This job uses the map phase to 
> evaluate and copy changed files. It can run while HBase is live, but that 
> results in a dirty copy of the data.
> We have tried running the custom backup job while HBase is shut down as well. 
>  When we do this, even on two back to back runs, it still copies over some 
> data and seems to not be an entirely clean copy.
> When we have gotten what we thought was an entire copy onto the new cluster, 
> we ran add_table on it, but the resulting hbase table had holes.  
> Investigating the holes revealed there were directories that were not 
> transfered.
> We had a meeting to brainstorm ideas and two further suggestions that came up 
> were:
> 1. Build a file list of files to transfer on the SJC side, transfer that file 
> list to PHX and then run distcp on it.
> 2. Try a full copy instead of incremental, skipping the expensive file 
> compare step
> 3. Evaluate copying from SJC to S3 then from S3 to PHX.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-3451) Cluster migration best practices

Reply via email to