Re: A better way to migrate the whole cluster?

Esteban Gutierrez Fri, 15 Aug 2014 10:13:11 -0700

1.8TB in a day is not terrible slow if that number comes from the CopyTable
counters and you are moving data across data centers using public networks,
that should be about 20MB/sec. Also, CopyTable won't compress anything on
the wire so the network overhead should be a lot. If you use anything like
snappy for block compression and/or fast_diff for block encoding the
HFiles, then using snapshots and export them using the ExportSnapshot tool
should be the way to go.


cheers,
esteban.



--
Cloudera, Inc.



On Thu, Aug 14, 2014 at 11:24 PM, tobe <[email protected]> wrote:

> Thank @lars.
>
> We're using HBase 0.94.11 and follow the instruction to run `./bin/hbase
> org.apache.hadoop.hbase.mapreduce.CopyTable --peer.adr=hbase://cluster_name
> table_name`. We have namespace service to find the ZooKeeper with
> "hbase://cluster_name". And the job ran on a shared yarn cluster.
>
> The performance is affected by many factors, but we haven't found out the
> reason. It would be great to see your suggestions.
>
>
> On Fri, Aug 15, 2014 at 1:34 PM, lars hofhansl <[email protected]> wrote:
>
> > What version of HBase? How are you running CopyTable? A day for 1.8T is
> > not what we would expect.
> > You can definitely take a snapshot and then export the snapshot to
> another
> > cluster, which will move the actual files; but CopyTable should not be so
> > slow.
> >
> >
> > -- Lars
> >
> >
> >
> > ________________________________
> >  From: tobe <[email protected]>
> > To: "[email protected]" <[email protected]>
> > Cc: [email protected]
> > Sent: Thursday, August 14, 2014 8:18 PM
> > Subject: A better way to migrate the whole cluster?
> >
> >
> > Sometimes our users want to upgrade their servers or move to a new
> > datacenter, then we have to migrate the data from HBase. Currently we
> > enable the replication from the old cluster to the new cluster, and run
> > CopyTable to move the older data.
> >
> > It's a little inefficient. It takes more than one day to migrate 1.8T
> data
> > and more time to verify. Can we have a better way to do that, like
> snapshot
> > or purely HDFS files?
> >
> > And what's the best practise or your valuable experience?
> >
>

Re: A better way to migrate the whole cluster?

Reply via email to