Are you saying that instead of adding a Sync/Flush method to the blobserver.Storage interface (which is what I believe Brad is proposing), you'd add an argument to the blobserver.ReceiveBlob method?
Or were you talking more specifically about how it would translate for higher level tools like camput? On 3 October 2016 at 20:40, Theodore Tso <[email protected]> wrote: > Or maybe as an optional argument to the put operation that requests whether > or not the blob write should be flushed or not, with a config option to > define what happens in the default case when the caller doesn't specify one > way or another? > > - Ted > > On Mon, Oct 3, 2016 at 12:26 PM, Brad Fitzpatrick <[email protected]> wrote: >> >> I've actually been thinking that sync should be an explicit part of the >> protocol so higher levels can decide the atomicity that they require. >> >> Then we make everything async by default, but all blob storage >> implementations must support a sync (or "Flush"?) operation. And then camput >> and other tools be sure to do a sync at the end before they return success. >> Or maybe they even have a flag (defaulting to --sync=true?) to let the >> caller control. >> >> Thoughts? And on naming? >> >> >> On Sun, Oct 2, 2016 at 12:21 PM, Theodore Ts'o <[email protected]> >> wrote: >>> >>> >>> >>> On Sunday, October 2, 2016 at 2:48:31 PM UTC-4, Theodore Ts'o wrote: >>>> >>>> (Especially since if you crash before the permanode is written, the >>>> client is going to have to restart the whole backup from scratch anyway.) >>> >>> >>> One thought --- as an automated heuristic, if the blobserver receives a >>> stream of unsigned blobs, it doesn't need to fsync() them. After all, any >>> objects which aren't referenced by a permanode are subject to GC treatment. >>> So if you crash and then run a GC, any immutable, non-signed objects that >>> were uploaded just before the crash would be GC'ed anyway. Hence, there's >>> no point to treat them as precious objects that have to be fsync'ed before >>> the client upload is acknowledged. So what could be done is when the first >>> signed object is received, the blob server could send down a sync(2) >>> command, and then write all of the signed objects using fsync(2). >>> >>> If we did this, the next obvious optimization would be to tune the >>> writeback interval for the disk in question to be 2-3 minutes, instead of >>> the usual 30 seconds. I noticed that objects were getting written as loose >>> files, and then repacked into pack file approximately every 2 minutes or so. >>> All modern file systems do delayed allocation, which means that if we're not >>> fsync'ing the loose files, they won't get flushed to disk, and so if they >>> are written into the packed file and then get deleted within the writeback >>> interval, the loose files will never get written to disk. This will >>> double camlistore's effective write throughput to the disk, since we won't >>> be writing each byte being backed up twice --- once to the loose file, and a >>> second time to the pack file. >>> >>> Cheers, >>> >>> - Ted >>> >>> P.S. I assume there are good reasons why we can't just stream the >>> objects straight to the pack file, which is what git does? I noticed >>> there were some comments about wanting to rearrange the objects so they >>> would be in an optimal order for later access. Is that right? >>> >>> -- >>> You received this message because you are subscribed to the Google Groups >>> "Camlistore" group. >>> To unsubscribe from this group and stop receiving emails from it, send an >>> email to [email protected]. >>> For more options, visit https://groups.google.com/d/optout. >> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "Camlistore" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> For more options, visit https://groups.google.com/d/optout. > > > -- > You received this message because you are subscribed to the Google Groups > "Camlistore" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups "Camlistore" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
