Thanks Paul! My only worry is that in delayed_commits the 1s guarantee goes to 2s, but I think in an error scenario, how much data you lose is less of an issue. Plus this is a feature on the way out. No action required, IMHO, except maybe document this.
Best Jan -- > On 22 May 2015, at 00:28, Paul Davis <[email protected]> wrote: > > Assuming that Erlang doesn't lie about the return status, then we'd > throw an error on a broken fsync which would kill the > couch_db_updater. In the case of delayed_commits we'd lose the last > delayed commit interval of writes just as any other error. > > That's based on these two lines: > > https://github.com/apache/couchdb-couch/blob/master/src/couch_file.erl#L207 > https://github.com/apache/couchdb-couch/blob/master/src/couch_file.erl#L312 > > Since we assert that the return value is ok. > > A quick skim of unix_efile.c shows that its passing the return of > fsync to check_error which sets an errno if there was an error. So > assuming that EINTR doesn't some how crazily get mutated into an ok > atom response, we're fine. > > https://github.com/erlang/otp/blob/master/erts/emulator/drivers/unix/unix_efile.c#L478-L482 > https://github.com/erlang/otp/blob/master/erts/emulator/drivers/unix/unix_efile.c#L94-L102 > > On Thu, May 21, 2015 at 3:10 PM, Jan Lehnardt <[email protected]> wrote: >> >>> On 21 May 2015, at 21:40, Alexander Shorin <[email protected]> wrote: >>> >>> I think it worth to cross post to erlang-questions@ ML. Would you? >> >> if we don’t get any further here, sure :) — I just don’t want to make >> a fool of myself, should this be a simple answer and I feel more >> comfortable in this particular crowd, with the CoC and all :) >> >> Best >> Jan >> -- >> >>> -- >>> ,,,^..^,,, >>> >>> >>> On Thu, May 21, 2015 at 10:23 PM, Jan Lehnardt <[email protected]> wrote: >>>> Hi all, >>>> >>>> I stumbled across https://ldpreload.com/blog/signalfd-is-useless and >>>> wondered how this squares against our use of fsync(). >>>> >>>> A quick glance at >>>> https://github.com/erlang/otp/blob/master/erts/emulator/drivers/unix/unix_efile.c >>>> reveals that EINTR is handled in multiple places, but only in >>>> read/write/sendfile functions, but not fsync. I also tried to trace the >>>> calling code of efile_fsync() (or efile_fdatasync()), but I got lost >>>> pretty quickly in some dtrace macro indirections, so I don’t know if there >>>> is any retry logic higher up. >>>> >>>> I’m not experienced enough here to make a call, but does that mean that we >>>> have a possible scenario where EINTR interrupts an fsync call after which >>>> a crash (machine or CouchDB) leaves part of a database not fsynced? Or >>>> would the failing fsync bubble up to the corresponding, say, PUT request >>>> handler? How about with delayed_commits=true, is the possible data-loss >>>> window then 2 seconds rather than the documented 1s? >>>> >>>> Can anyone shed any light on this? >>>> >>>> Best >>>> Jan >>>> -- >>>> >>>> >>>> >> >> -- >> Professional Support for Apache CouchDB: >> http://www.neighbourhood.ie/couchdb-support/ >> -- Professional Support for Apache CouchDB: http://www.neighbourhood.ie/couchdb-support/
