Re: [hypertable-dev] BACKUP option to SELECT

Doug Judd Tue, 19 Jan 2010 17:49:27 -0800

DUMP sounds good.  I may add another option COLUMNS=c1,c2...  So that
specific columns can get dumped.


- Doug

On Tue, Jan 19, 2010 at 2:43 AM, Luke <[email protected]> wrote:

> A different top level command would be a better approach (even though
> the implementation can sure much of the scan spec parsing code.) OTOH,
> DUMP would be a better name though as "BACKUP" without a file (just
> dump to stdout) would sound strange. Plus, it's shorter :)
>
> On Mon, Jan 18, 2010 at 10:42 PM, Doug Judd <[email protected]> wrote:
> > The BACKUP feature is really to allow for the generation of efficient
> backup
> > files.  Certain WHERE clauses and options such as ROW, CELL, and LIMIT
> would
> > be incompatible with the BACKUP option since BACKUP would be a completely
> > separate code path and those other options don't really jibe with the
> > concept of backing up a table.  The reason that I suggest folding it in
> with
> > SELECT is because some of the other options, such as TIMESTAMP, column
> > selection, and REVS, could be useful features of table backup.
> >
> > The other approach would be to add a toplevel BACKUP TABLE command that
> > would support a subset of SELECT options that would be appropriate for
> table
> > backups.
> >
> > BACKUP TABLE <table> [WHERE <where-clause>] [OPTIONS]
> >
> > Supported where-clause options:
> >   TIMESTAMP
> >
> > Other supported options:
> >   REVS revision_count
> >   INTO FILE filename[.gz]
> >
> > - Doug
> >
> > On Mon, Jan 18, 2010 at 10:04 PM, Sanjit Jhala <[email protected]>
> wrote:
> >>
> >> I assume it will also allow SELECT (list, of, cfs) FROM foo BACKUP INTO
> >> FILE "foo-backup.tgz".
> >> Also I'm wondering if the work BACKUP ought to be replaced by something
> >> like RANDOM or SHUFFLED to decouple this change from backups (although I
> >> agree that fast restores are the main use case for this feature). So,
> >> "SELECT * FROM foo SHUFFLED LIMIT=N;" returns N samples across all
> ranges
> >> and one can additionally choose to store the output of the SELECT into
> the
> >> tgz file for fast restores.
> >>
> >> -Sanjit
> >>
> >>
> >> On Mon, Jan 18, 2010 at 8:49 PM, Doug Judd <[email protected]> wrote:
> >>>
> >>> The current method of using SELECT to take table backups causes
> >>> efficiency problems during restore.  Because the cells are dumped
> in-order,
> >>> when it comes time to restore from backup, the data ends up getting
> loaded
> >>> into one range at a time.  I propose adding a BACKUP option to SELECT
> that
> >>> would cause the data to get dumped in random order (uniformly
> distributed
> >>> across key space).  This will cause restores to be parallelized, since
> >>> ranges distributed across the cluster will receive updates
> simultaneously.
> >>> Here's example syntax:
> >>>
> >>> SELECT * FROM foo BACKUP INTO FILE "foo-backup.gz";
> >>>
> >>> I also propose having the BACKUP option force timestamps to be dumped
> as
> >>> well, since this will preserve the table state exactly.  Thoughts?
> >>>
> >>> - Doug
> >>>
> >>>
> >>> --
> >>> You received this message because you are subscribed to the Google
> Groups
> >>> "Hypertable Development" group.
> >>> To post to this group, send email to [email protected].
> >>> To unsubscribe from this group, send email to
> >>> [email protected]<hypertable-dev%[email protected]>
> .
> >>> For more options, visit this group at
> >>> http://groups.google.com/group/hypertable-dev?hl=en.
> >>>
> >>
> >>
> >> --
> >> You received this message because you are subscribed to the Google
> Groups
> >> "Hypertable Development" group.
> >> To post to this group, send email to [email protected].
> >> To unsubscribe from this group, send email to
> >> [email protected]<hypertable-dev%[email protected]>
> .
> >> For more options, visit this group at
> >> http://groups.google.com/group/hypertable-dev?hl=en.
> >>
> >
> >
> > --
> > You received this message because you are subscribed to the Google Groups
> > "Hypertable Development" group.
> > To post to this group, send email to [email protected].
> > To unsubscribe from this group, send email to
> > [email protected]<hypertable-dev%[email protected]>
> .
> > For more options, visit this group at
> > http://groups.google.com/group/hypertable-dev?hl=en.
> >
> >
>
> --
> You received this message because you are subscribed to the Google Groups
> "Hypertable Development" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to
> [email protected]<hypertable-dev%[email protected]>
> .
> For more options, visit this group at
> http://groups.google.com/group/hypertable-dev?hl=en.
>
>
>
>

--
You received this message because you are subscribed to the Google Groups "Hypertable Development" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to [email protected].
For more options, visit this group at http://groups.google.com/group/hypertable-dev?hl=en.

Re: [hypertable-dev] BACKUP option to SELECT

Reply via email to