Re: snapshotting key ranges

Ted Yu Thu, 12 Feb 2015 09:50:34 -0800

bq. allow keys to be munged to the nearest region boundary.

Interesting idea.


If there is region merge, this may get a bit complicated.

On Thu, Feb 12, 2015 at 9:26 AM, Jesse Yates <jesse.k.ya...@gmail.com>
wrote:

> Not a crazy idea at all :)
>
> It becomes very tractable if you are willing to allow keys to be munged to
> the nearest region boundary. The snapshot only considers the HFiles in each
> region and creates links to those files for the snapshot. So just capturing
> a subset of regions (as dictated by the 'hint' key ranges) would be
> reasonable.
>
> We might need a way to differentiate them from normal snapshots, but maybe
> not - if you supply key ranges, then its on you to know what you are doing
> with that snapshot.
>
> Would you ever want to restore only part of a table? Im not sure that even
> makes sense.... maybe restoring a chunk at a time? If the latter, then we
> will likely need to change the restore mechanics to make sure it works (but
> it may just work out the box, IIRC).
>
> we could do the process in batches
>
>
> Would you be willing to manage that your self or would you see this as
> something HBase would manage for you?
>
> -------------------
> Jesse Yates
> @jesse_yates
> jyates.github.com
>
> On Thu, Feb 12, 2015 at 9:18 AM, rahul gidwani <rahul.gidw...@gmail.com>
> wrote:
>
> > Before proposing this idea, I would like to state I have recently had a
> > through psychiatric evaluation and I'm not crazy.
> >
> > We here at flurry land have some very large tables on the order of 1PB,
> 3PB
> > with dfs replication.  We wanted to ship this table to another cluster
> > using snapshots.  Problem is that the data will take weeks to ship and
> > during that time major compaction will happen and we will end up with
> > potentially double the data on our cluster.  (We really don't want to
> turn
> > off major compaction because we will really suffer with reads).
> >
> > Additionally there is one really large CF that dominates this table.  So
> to
> > mitigate this problem we were thinking that a user could pass in the key
> > ranges for a snapshot and we could do the process in batches.  This might
> > also be useful for sampling data, or keys which are based on something
> like
> > timestamps, where you could archive certain portions of data known to be
> > stale.
> >
> > If people are interested we could get into more details about
> > implementation.
> >
> > Cheers
> > rahul
> >
>

Re: snapshotting key ranges

Reply via email to