On Tue, Feb 15, 2011 at 1:48 PM, Jeffrey Hutzelman <[email protected]> wrote: > --On Tuesday, February 15, 2011 01:07:50 AM -0500 Derrick Brashear > <[email protected]> wrote: > >>> I'm not clear on how snapshotting interacts with GetFile/SendFile and >>> active operations. I think in practice the mechanism you need is one >>> that allows you to "freeze" the target's databases so that active >>> transactions read from the frozen copy, while sendfile prepares a "new" >>> copy; note that there can be no write transactions, since writes happen >>> only on the sync site and these calls are made only by the sync site and >>> never to itself. Having done a snapshot and sent some new files, it must >>> be possible to either commit the new files or discard them; recovery >>> should only do the commit operation if it is still sync site. >> >> the original intent of getfilediff was for some future use, not at this >> time. >> >> sendfilediff is an optimization. just because you're recovering >> doesn't mean the extant quorum can't continue taking writes. so i take >> writes and when sendfile to you finishes, i stop taking writes, send >> *only* a diff, and then commit and resume taking writes, not unlike a >> volume release. > > First, properly, "recovering" is something that only the sync site does. > Other sites don't "recover"; they simply do what they're told.
which, for the purpose of this discussion i refer to as "recovering"; the master site says "take this" if the "this" you are taking is not recovering you, i'm not really sure what to call it. > Still, your > point is taken -- the sync site can send the bulk of the database while > still handling write transactions, and then do an incremental update of some > sort at the end. right, that's the goal here. > However, I think you will discover you need an operation which throws away > changes since the snapshot, because as soon as you allow not only for > multiple files but also for the sync site to keep taking updates during > sendfile, there is the possibility that the sync site will stop being sync > site, and need to abort any sends it has in progress. Previously this was > not an issue, because even though the SendFile took time to run, it was an > atomic operation with respect to anything that might modify the database on > either side. at the end which is having its database updated, you mean? so e.g. an RPC which at the end of sending files to a site, does either a commit or abort of the data sent for the recovery process. and assuming this is openafs-specific (which it seems like a reasonable thing for it to be; it's certainly not client-facing for any of these changes, and mixing ubik versions would be a mess) should we move this discussion to openafs-devel? -- Derrick _______________________________________________ AFS3-standardization mailing list [email protected] http://lists.openafs.org/mailman/listinfo/afs3-standardization
