[
https://issues.apache.org/jira/browse/HBASE-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrew Purtell resolved HBASE-3451.
-----------------------------------
Resolution: Not a Problem
Stale brainstorming issue, closing
> Cluster migration best practices
> --------------------------------
>
> Key: HBASE-3451
> URL: https://issues.apache.org/jira/browse/HBASE-3451
> Project: HBase
> Issue Type: Brainstorming
> Affects Versions: 0.20.6, 0.89.20100924
> Reporter: Daniel Einspanjer
> Priority: Critical
>
> Mozilla is currently in the process of trying to migrate our HBase cluster to
> a new datacenter.
> We have our existing 25 node cluster in our SJC datacenter. It is serving
> production traffic 24/7. While we can take downtimes, it is very costly and
> difficult to take them for more than a few hours in the evening.
> We have two new 30 node clusters in our PHX datacenter. We are wanting to
> cut production over to one of these this week.
> The old cluster is running 0.20.6. The new clusters are running CDH3b3 with
> HBase 0.89.
> We have tried running a pull distcp using hftp URLs. If HBase is running,
> this causes SAX XML Parsing exceptions when a directory is removed during the
> scan.
> If HBase is stopped, it takes hours for the directory compare to finish
> before it even begins copying data.
> We have tried a custom backup MR job. This job uses the map phase to
> evaluate and copy changed files. It can run while HBase is live, but that
> results in a dirty copy of the data.
> We have tried running the custom backup job while HBase is shut down as well.
> When we do this, even on two back to back runs, it still copies over some
> data and seems to not be an entirely clean copy.
> When we have gotten what we thought was an entire copy onto the new cluster,
> we ran add_table on it, but the resulting hbase table had holes.
> Investigating the holes revealed there were directories that were not
> transfered.
> We had a meeting to brainstorm ideas and two further suggestions that came up
> were:
> 1. Build a file list of files to transfer on the SJC side, transfer that file
> list to PHX and then run distcp on it.
> 2. Try a full copy instead of incremental, skipping the expensive file
> compare step
> 3. Evaluate copying from SJC to S3 then from S3 to PHX.
--
This message was sent by Atlassian JIRA
(v6.2#6252)