The BACKUP feature is really to allow for the generation of efficient backup files. Certain WHERE clauses and options such as ROW, CELL, and LIMIT would be incompatible with the BACKUP option since BACKUP would be a completely separate code path and those other options don't really jibe with the concept of backing up a table. The reason that I suggest folding it in with SELECT is because some of the other options, such as TIMESTAMP, column selection, and REVS, could be useful features of table backup.
The other approach would be to add a toplevel BACKUP TABLE command that would support a subset of SELECT options that would be appropriate for table backups. BACKUP TABLE <table> [WHERE <where-clause>] [OPTIONS] Supported where-clause options: TIMESTAMP Other supported options: REVS revision_count INTO FILE filename[.gz] - Doug On Mon, Jan 18, 2010 at 10:04 PM, Sanjit Jhala <[email protected]> wrote: > I assume it will also allow SELECT (list, of, cfs) FROM foo BACKUP INTO > FILE "foo-backup.tgz". > Also I'm wondering if the work BACKUP ought to be replaced by something > like RANDOM or SHUFFLED to decouple this change from backups (although I > agree that fast restores are the main use case for this feature). So, > "SELECT * FROM foo SHUFFLED LIMIT=N;" returns N samples across all ranges > and one can additionally choose to store the output of the SELECT into the > tgz file for fast restores. > > -Sanjit > > > On Mon, Jan 18, 2010 at 8:49 PM, Doug Judd <[email protected]> wrote: > >> The current method of using SELECT to take table backups causes efficiency >> problems during restore. Because the cells are dumped in-order, when it >> comes time to restore from backup, the data ends up getting loaded into one >> range at a time. I propose adding a BACKUP option to SELECT that would >> cause the data to get dumped in random order (uniformly distributed across >> key space). This will cause restores to be parallelized, since ranges >> distributed across the cluster will receive updates simultaneously. Here's >> example syntax: >> >> SELECT * FROM foo BACKUP INTO FILE "foo-backup.gz"; >> >> I also propose having the BACKUP option force timestamps to be dumped as >> well, since this will preserve the table state exactly. Thoughts? >> >> - Doug >> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "Hypertable Development" group. >> To post to this group, send email to [email protected]. >> To unsubscribe from this group, send email to >> [email protected]<hypertable-dev%[email protected]> >> . >> For more options, visit this group at >> http://groups.google.com/group/hypertable-dev?hl=en. >> >> > > -- > You received this message because you are subscribed to the Google Groups > "Hypertable Development" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]<hypertable-dev%[email protected]> > . > For more options, visit this group at > http://groups.google.com/group/hypertable-dev?hl=en. > >--
You received this message because you are subscribed to the Google Groups "Hypertable Development" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to [email protected].
For more options, visit this group at http://groups.google.com/group/hypertable-dev?hl=en.
