Re: Reducing blobserver fsync() calls?

Theodore Ts'o Mon, 03 Oct 2016 21:10:33 -0700

On Tue, Oct 04, 2016 at 12:31:52AM +0200, Mathieu Lonjaret wrote:
> Are you saying that instead of adding a Sync/Flush method to the
> blobserver.Storage interface (which is what I believe Brad is
> proposing), you'd add an argument to the blobserver.ReceiveBlob
> method?


I'm suggesting that we do both things.  If the client has a way of
explicitly asking for the fsync() to be skipped while it is uploading
objects to the server, the client might also want to have a way of
asking the server, "please issue a sync(2) or otherwise make sure
everything I sent w/o the fsync being requested has been flushed to
disk".

This gives full control to client, which I think is what Brad was
suggesting.  The server could also have a config option which
describes what should happen if the client doesn't say explicitly one
way or another.

> Or were you talking more specifically about how it would translate for
> higher level tools like camput?

I wasn't thinking about that at all.  My initial thoughts are that
camput should have command line options that would allow the user (or
shell script) to specify the behavior, again with perhaps some
defaults that could be controlled by a client config.

I have started thinking, however, that if we want Camlistore to have a
proper full backup functionality which is compariable with other
backup solutions (even simple backup solutions like "rsync" and "tar",
never mind more advanced tools like Areca and Bacula), we will need to
make a program separate from camput, and maybe it's better to start
sooner rather than later.

Maybe it would start as a fork of camput, but then I would want to add
different defaults, add exclude / exclude file support, maybe have the
ability to start camlistore servers on private ports to deal with
backups to externally mounted USB drives (which aren't always
connected to the laptop), etc.  I'd also want to have automated tags
so the user isn't having to manually specify a large number of useful
tags (backup, hostname:callcc, backup-config:homedir, etc.)

But that's a whole seprate discussion; my main concern with this mail
thread was that the blobserver is writing every byte being backed up
twice, with fsyncs() to guarantee that we are wasting a huge amount of
disk write bandwidth (as well as flash write endurance on SSD's), and
in the backup use case it's really not adding any value.

Hence, it would be useful to update the protocol as Brad suggested,
since that's needed as a prerequisite towards non-embarassing
performance numbers when compared to other backup solutions.  Whether
the initial implementation is done as a shell script on top of camput,
or as a clone and specializatoin of camput implemented in Go, is
really a separate issue.

Cheers,

                                        - Ted

-- 
You received this message because you are subscribed to the Google Groups 
"Camlistore" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: Reducing blobserver fsync() calls?

Reply via email to