[
https://issues.apache.org/jira/browse/HBASE-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12983544#action_12983544
]
Daniel Einspanjer commented on HBASE-3451:
------------------------------------------
Anyone ever try to distcp hbase data using a file list?
We got this error:
11/01/18 21:21:50 INFO tools.DistCp:
destPath=hdfs://hp-node70.phx1.mozilla.com:8020/hbase
org.apache.hadoop.tools.DistCp$DuplicationException: Invalid input, there are
duplicated files in the sources:
hftp://cm-hadoop-adm03.mozilla.org:50070/hbase/archive/683263177/.regioninfo,
hftp://cm-hadoop-adm03.mozilla.org:50070/hbase/crash_counts/2038233953/.regioninfo
at org.apache.hadoop.tools.DistCp.checkDuplication(DistCp.java:1383)
at org.apache.hadoop.tools.DistCp.setup(DistCp.java:1186)
at org.apache.hadoop.tools.DistCp.copy(DistCp.java:666)
at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)
It seems that checkDuplication doesn't evaluate the full path and hence will
choke the first time it evaluates two .regioninfo files.
> Cluster migration best practices
> --------------------------------
>
> Key: HBASE-3451
> URL: https://issues.apache.org/jira/browse/HBASE-3451
> Project: HBase
> Issue Type: Brainstorming
> Affects Versions: 0.20.6, 0.89.20100924
> Reporter: Daniel Einspanjer
> Priority: Critical
>
> Mozilla is currently in the process of trying to migrate our HBase cluster to
> a new datacenter.
> We have our existing 25 node cluster in our SJC datacenter. It is serving
> production traffic 24/7. While we can take downtimes, it is very costly and
> difficult to take them for more than a few hours in the evening.
> We have two new 30 node clusters in our PHX datacenter. We are wanting to
> cut production over to one of these this week.
> The old cluster is running 0.20.6. The new clusters are running CDH3b3 with
> HBase 0.89.
> We have tried running a pull distcp using hftp URLs. If HBase is running,
> this causes SAX XML Parsing exceptions when a directory is removed during the
> scan.
> If HBase is stopped, it takes hours for the directory compare to finish
> before it even begins copying data.
> We have tried a custom backup MR job. This job uses the map phase to
> evaluate and copy changed files. It can run while HBase is live, but that
> results in a dirty copy of the data.
> We have tried running the custom backup job while HBase is shut down as well.
> When we do this, even on two back to back runs, it still copies over some
> data and seems to not be an entirely clean copy.
> When we have gotten what we thought was an entire copy onto the new cluster,
> we ran add_table on it, but the resulting hbase table had holes.
> Investigating the holes revealed there were directories that were not
> transfered.
> We had a meeting to brainstorm ideas and two further suggestions that came up
> were:
> 1. Build a file list of files to transfer on the SJC side, transfer that file
> list to PHX and then run distcp on it.
> 2. Try a full copy instead of incremental, skipping the expensive file
> compare step
> 3. Evaluate copying from SJC to S3 then from S3 to PHX.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.