I assume it will also allow SELECT (list, of, cfs) FROM foo BACKUP INTO FILE "foo-backup.tgz". Also I'm wondering if the work BACKUP ought to be replaced by something like RANDOM or SHUFFLED to decouple this change from backups (although I agree that fast restores are the main use case for this feature). So, "SELECT * FROM foo SHUFFLED LIMIT=N;" returns N samples across all ranges and one can additionally choose to store the output of the SELECT into the tgz file for fast restores.
-Sanjit On Mon, Jan 18, 2010 at 8:49 PM, Doug Judd <[email protected]> wrote: > The current method of using SELECT to take table backups causes efficiency > problems during restore. Because the cells are dumped in-order, when it > comes time to restore from backup, the data ends up getting loaded into one > range at a time. I propose adding a BACKUP option to SELECT that would > cause the data to get dumped in random order (uniformly distributed across > key space). This will cause restores to be parallelized, since ranges > distributed across the cluster will receive updates simultaneously. Here's > example syntax: > > SELECT * FROM foo BACKUP INTO FILE "foo-backup.gz"; > > I also propose having the BACKUP option force timestamps to be dumped as > well, since this will preserve the table state exactly. Thoughts? > > - Doug > > > -- > You received this message because you are subscribed to the Google Groups > "Hypertable Development" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]<hypertable-dev%[email protected]> > . > For more options, visit this group at > http://groups.google.com/group/hypertable-dev?hl=en. > >--
You received this message because you are subscribed to the Google Groups "Hypertable Development" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to [email protected].
For more options, visit this group at http://groups.google.com/group/hypertable-dev?hl=en.
