[ 
https://issues.apache.org/jira/browse/HBASE-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell resolved HBASE-3451.
-----------------------------------

    Resolution: Not a Problem

Stale brainstorming issue, closing

> Cluster migration best practices
> --------------------------------
>
>                 Key: HBASE-3451
>                 URL: https://issues.apache.org/jira/browse/HBASE-3451
>             Project: HBase
>          Issue Type: Brainstorming
>    Affects Versions: 0.20.6, 0.89.20100924
>            Reporter: Daniel Einspanjer
>            Priority: Critical
>
> Mozilla is currently in the process of trying to migrate our HBase cluster to 
> a new datacenter.
> We have our existing 25 node cluster in our SJC datacenter.  It is serving 
> production traffic 24/7.  While we can take downtimes, it is very costly and 
> difficult to take them for more than a few hours in the evening.
> We have two new 30 node clusters in our PHX datacenter.  We are wanting to 
> cut production over to one of these this week.
> The old cluster is running 0.20.6.  The new clusters are running CDH3b3 with 
> HBase 0.89.
> We have tried running a pull distcp using hftp URLs.  If HBase is running, 
> this causes SAX XML Parsing exceptions when a directory is removed during the 
> scan.
> If HBase is stopped, it takes hours for the directory compare to finish 
> before it even begins copying data.
> We have tried a custom backup MR job.  This job uses the map phase to 
> evaluate and copy changed files. It can run while HBase is live, but that 
> results in a dirty copy of the data.
> We have tried running the custom backup job while HBase is shut down as well. 
>  When we do this, even on two back to back runs, it still copies over some 
> data and seems to not be an entirely clean copy.
> When we have gotten what we thought was an entire copy onto the new cluster, 
> we ran add_table on it, but the resulting hbase table had holes.  
> Investigating the holes revealed there were directories that were not 
> transfered.
> We had a meeting to brainstorm ideas and two further suggestions that came up 
> were:
> 1. Build a file list of files to transfer on the SJC side, transfer that file 
> list to PHX and then run distcp on it.
> 2. Try a full copy instead of incremental, skipping the expensive file 
> compare step
> 3. Evaluate copying from SJC to S3 then from S3 to PHX.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to