Re: A better way to migrate the whole cluster?

Ted Yu Fri, 15 Aug 2014 13:26:24 -0700

Bryan:
Thanks for sharing your valuable experience.

bq. the diff function should be changed to inspect the actual HFiles


Inspecting / comparing metadata for actual HFiles should help.


On Fri, Aug 15, 2014 at 11:27 AM, Bryan Beaudreault <
[email protected]> wrote:

> The reason it swallows exceptions is because the idea is that you first run
> a diff, then use the output of that diff as the input of the Backup job.
>  So if a HFile fails to copy, the job will still finish, copying most of
> the HFiles.  Then you run the diff again, and it will see that an HFile was
> missed.  The next run will only copy that file and any other new files.
>
> You run the diff before each job run.  The diff is basically run `hdfs dfs
> -ls -R /hbase` on each cluster, and pass the output of those commands into
>
> https://github.com/mozilla-metrics/akela/blob/master/src/main/python/lsr_diff.py
> .
>
> So the way we've used this job (modified internally, like I said, but very
> much the same concept) is:
>
> 1) Run the job, usually takes a while
> 2) Run the job again. This takes much shorter because most of the files
> were copied and we're just copying those new or failed files since the last
> job began.
> 3) Depending on how long the job is taking, i.e. how fast the data
> ingestion is, we'll run it as many times as we need to get the window
> small.
> 4) Stop the source cluster
> 5) Run the job one more time
> 6) If there was an exception, run it again.  I've had to do this maybe
> once.
> 6) Start the target cluster, and bounce applications so they connect to the
> new cluster.
>
> The idea is you can run the job over and over to get closer to being
> in-sync.  This is how we handled errors too.  In the times I've used it
> (10+ on multiple production clusters), I've never seen an exception that
> couldn't be resolved by running it again.
>
> So we previously worked with the leniency of having a short window (<10
> mins) of downtime.  There are some challenges with doing this in an online
> migration with replication, but maybe still doable with some thought around
> the diffing. For example, compactions will never stop in a running source
> cluster, so currently we might have to keep running the Backup over and
> over, never reaching parity.  So maybe the diff function should be changed
> to inspect the actual HFiles instead of just comparing file name and
> length.
>
>
>
> On Fri, Aug 15, 2014 at 1:47 PM, Ted Yu <[email protected]> wrote:
>
> > Bryan:
> > From javadoc of Backup.java:
> > bq. it favors swallowing exceptions and incrementing counters as opposed
> to
> > failing
> >
> > Can you share some experience how you handled the errors reported by
> Backup
> > ?
> >
> > Thanks
> >
> >
> > On Fri, Aug 15, 2014 at 10:38 AM, Bryan Beaudreault <
> > [email protected]> wrote:
> >
> > > I agree it would be nice if this was provided by HBase, but it's
> already
> > > possible to work straight with the HFiles.  All you need is a custom
> > hadoop
> > > job.  A good starting point is
> > >
> > >
> >
> https://github.com/mozilla-metrics/akela/blob/master/src/main/java/com/mozilla/hadoop/Backup.java
> > > and modify it to your needs. We've used our own modification of this
> job
> > > many times when we do our own cluster migrations.  The idea is that it
> is
> > > incremental, so as HFiles get compacted, deleted, etc, you can just run
> > it
> > > again and move smaller and smaller amounts of data.
> > >
> > > Working at the hdfs level should be faster, as you can use more
> mappers.
> > > You will still be taxing the IO of the source cluster, but not adding
> > load
> > > to the actual regionserver processes (ipc queue, memory, etc).
> > >
> > > If you upgrade to CDH5 (or the equivalent hdfs version), you can use
> hdfs
> > > snapshots to minimize the need to re-run the above Backup job (since
> you
> > > are already using replication to keep data up-to-date).
> > >
> > >
> > > On Fri, Aug 15, 2014 at 1:11 PM, Esteban Gutierrez <
> [email protected]
> > >
> > > wrote:
> > >
> > > > 1.8TB in a day is not terrible slow if that number comes from the
> > > CopyTable
> > > > counters and you are moving data across data centers using public
> > > networks,
> > > > that should be about 20MB/sec. Also, CopyTable won't compress
> anything
> > on
> > > > the wire so the network overhead should be a lot. If you use anything
> > > like
> > > > snappy for block compression and/or fast_diff for block encoding the
> > > > HFiles, then using snapshots and export them using the ExportSnapshot
> > > tool
> > > > should be the way to go.
> > > >
> > > > cheers,
> > > > esteban.
> > > >
> > > >
> > > >
> > > > --
> > > > Cloudera, Inc.
> > > >
> > > >
> > > >
> > > > On Thu, Aug 14, 2014 at 11:24 PM, tobe <[email protected]>
> wrote:
> > > >
> > > > > Thank @lars.
> > > > >
> > > > > We're using HBase 0.94.11 and follow the instruction to run
> > > `./bin/hbase
> > > > > org.apache.hadoop.hbase.mapreduce.CopyTable
> > > > --peer.adr=hbase://cluster_name
> > > > > table_name`. We have namespace service to find the ZooKeeper with
> > > > > "hbase://cluster_name". And the job ran on a shared yarn cluster.
> > > > >
> > > > > The performance is affected by many factors, but we haven't found
> out
> > > the
> > > > > reason. It would be great to see your suggestions.
> > > > >
> > > > >
> > > > > On Fri, Aug 15, 2014 at 1:34 PM, lars hofhansl <[email protected]>
> > > wrote:
> > > > >
> > > > > > What version of HBase? How are you running CopyTable? A day for
> > 1.8T
> > > is
> > > > > > not what we would expect.
> > > > > > You can definitely take a snapshot and then export the snapshot
> to
> > > > > another
> > > > > > cluster, which will move the actual files; but CopyTable should
> not
> > > be
> > > > so
> > > > > > slow.
> > > > > >
> > > > > >
> > > > > > -- Lars
> > > > > >
> > > > > >
> > > > > >
> > > > > > ________________________________
> > > > > >  From: tobe <[email protected]>
> > > > > > To: "[email protected]" <[email protected]>
> > > > > > Cc: [email protected]
> > > > > > Sent: Thursday, August 14, 2014 8:18 PM
> > > > > > Subject: A better way to migrate the whole cluster?
> > > > > >
> > > > > >
> > > > > > Sometimes our users want to upgrade their servers or move to a
> new
> > > > > > datacenter, then we have to migrate the data from HBase.
> Currently
> > we
> > > > > > enable the replication from the old cluster to the new cluster,
> and
> > > run
> > > > > > CopyTable to move the older data.
> > > > > >
> > > > > > It's a little inefficient. It takes more than one day to migrate
> > 1.8T
> > > > > data
> > > > > > and more time to verify. Can we have a better way to do that,
> like
> > > > > snapshot
> > > > > > or purely HDFS files?
> > > > > >
> > > > > > And what's the best practise or your valuable experience?
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: A better way to migrate the whole cluster?

Reply via email to