The current method of using SELECT to take table backups causes efficiency problems during restore. Because the cells are dumped in-order, when it comes time to restore from backup, the data ends up getting loaded into one range at a time. I propose adding a BACKUP option to SELECT that would cause the data to get dumped in random order (uniformly distributed across key space). This will cause restores to be parallelized, since ranges distributed across the cluster will receive updates simultaneously. Here's example syntax:
SELECT * FROM foo BACKUP INTO FILE "foo-backup.gz"; I also propose having the BACKUP option force timestamps to be dumped as well, since this will preserve the table state exactly. Thoughts? - Doug--
You received this message because you are subscribed to the Google Groups "Hypertable Development" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to [email protected].
For more options, visit this group at http://groups.google.com/group/hypertable-dev?hl=en.
