Committed to trunk and 1.0.x. On Aug 7, 2010, at 8:33 PM, Randall Leeds wrote:
> http://github.com/tilgovi/couchdb/tree/fixlostcommits > > Test and fix in separate commits at the end of that branch, based off > current trunk. > Would appreciate verification that the test is initially broken but > fixed by the patch. > > On Sat, Aug 7, 2010 at 17:16, Damien Katz <[email protected]> wrote: >> I reproduced this manually: >> >> Create document with id "x", ensure full commit (simply wait longer than 1 >> sec, say 2 secs). >> >> Attempt to create document "x" again, get conflict error. >> >> Wait at least 2 secs to ensure the delayed commit attempt happens. >> >> Now create document "y". >> >> Wait at least 2 secs because the delayed commit should happen >> >> Restart server. >> >> Document "y" is now missing. >> >> The last delayed commit isn't happening. From then on out, no docs updated >> with delayed commit with be available after a restart. >> >> -Damien >> >> On Aug 7, 2010, at 4:58 PM, Adam Kocoloski wrote: >> >>> I believe it's a single delayed conflict write attempt and no successes in >>> that same interval. >>> >>> On Aug 7, 2010, at 7:51 PM, Damien Katz wrote: >>> >>>> Looks like all that's necessary is a single delayed conflict write >>>> attempt, and all subsequent delayed commits won't be commit, the header >>>> never gets written. >>>> >>>> 1.0 loses data. This is ridiculously bad. >>>> >>>> We need a test to reproduce this and fix. >>>> >>>> -Damien >>>> >>>> On Aug 7, 2010, at 4:35 PM, Adam Kocoloski wrote: >>>> >>>>> Good sleuthing guys, and my apologies for letting this through. Randall, >>>>> your patch in COUCHDB-794 was actually fine, it was my reworking of it >>>>> that caused this serious bug. >>>>> >>>>> With respect to that gist 513282, I think it would be better to return >>>>> Db#db{waiting_delayed_commit=nil} when the headers match instead of >>>>> moving the cancel_timer() command as you did. After all, we did perform >>>>> the check here -- it was just that nothing needed to be committed. >>>>> >>>>> Adam >>>>> >>>>> On Aug 7, 2010, at 6:55 PM, Damien Katz wrote: >>>>> >>>>>> Yes, I think it requires 2 conflicting writes in row, because it needs >>>>>> to trigger the delayed_commit timer without actually having anything to >>>>>> commit, so the header never changes. >>>>>> >>>>>> Try to reproduce this and add a test case. >>>>>> >>>>>> -Damien >>>>>> >>>>>> >>>>>> On Aug 7, 2010, at 3:47 PM, Randall Leeds wrote: >>>>>> >>>>>>> I think you may be right, Damien. >>>>>>> If ever a write happens that only contains conflicts while waiting for >>>>>>> a delayed commit message we might still be cancelling the timer. Is >>>>>>> this what you're thinking? This would be the fix: >>>>>>> http://gist.github.com/513282 >>>>>>> >>>>>>> On Sat, Aug 7, 2010 at 15:42, Damien Katz <[email protected]> wrote: >>>>>>>> I think the problem might be that 2 conflicting write attempts in row >>>>>>>> can leave the #db.waiting_delayed_commit set but the timer has been >>>>>>>> cancelled. One that happens, the header may never be written, as it >>>>>>>> always thinks a delayed commit will fire soon. >>>>>>>> >>>>>>>> -Damien >>>>>>>> >>>>>>>> >>>>>>>> On Aug 7, 2010, at 12:08 PM, Randall Leeds wrote: >>>>>>>> >>>>>>>>> On Sat, Aug 7, 2010 at 11:56, Randall Leeds <[email protected]> >>>>>>>>> wrote: >>>>>>>>>> I agree completely! I immediately thought of this because I wrote >>>>>>>>>> that >>>>>>>>>> change. I spent a while staring at it last night but still can't >>>>>>>>>> imagine how it's a problem. >>>>>>>>>> >>>>>>>>>> On Sat, Aug 7, 2010 at 11:12, Damien Katz <[email protected]> wrote: >>>>>>>>>>> SVN commit r954043 looks suspicious. Digging further. >>>>>>>>>>> >>>>>>>>>>> -Damien >>>>>>>>>> >>>>>>>>> >>>>>>>>> I still want to stare at r954043, but it looks to me like there's at >>>>>>>>> least one situation where we do not commit data correctly during >>>>>>>>> compaction. This has to do with the way we now use the path to sync >>>>>>>>> outside the couch_file:process. Check this diff: >>>>>>>>> http://gist.github.com/513081 >>>>>>>> >>>>>>>> >>>>>> >>>>> >>>> >>> >> >>
