Re: [fossil-users] Unintentional fork/race condition

Matt Welland Sat, 12 Jan 2013 22:45:56 -0800

On Sat, Jan 12, 2013 at 5:31 PM, Richard Hipp <d...@sqlite.org> wrote:

>
>
> On Sat, Jan 12, 2013 at 6:41 PM, Matt Welland <estifo...@gmail.com> wrote:
>
>> This is with regards to the problem described here:
>>
>>
>> http://lists.fossil-scm.org:8080/pipermail/fossil-users/2008-February/000060.html
>>
>> We are seeing on the order of 3-5 of these a year in our heaviest hit
>> repos. While this may seem like no big deal the fact that it is so silent
>> is quite disruptive. The problem is that a developer working intently on a
>> problem may not notice for hours or even days that they are no longer
>> actually working on the main thread of development.
>>
>
> I contend that this points up issues with your development process, not
> with Fossil.  If your developers do not notice that a fork has occurred for
> days, then they are doing "heads down" programming.  They are not
> maintaining situational awareness.  (
> http://en.wikipedia.org/wiki/Situation_awareness)  They are fixating on
> their own (small) problems and missing the big picture.  This can lead
> dissatisfied customers and/or quality problems.
>
> "Situational awareness" is usually studied in dynamic environments that
> are safety critical, such as aviation and surgery.  Loss of situational
> awareness is a leading cause of airplane crashes and medical errors.  Loss
> of situational awareness is sometimes referred to as "tunnel vision".  The
> person fixates on one tiny aspect of the problem and ignores the much large
> crisis unfolding around him.  Eastern Airlines flight 401 (
> http://en.wikipedia.org/wiki/Eastern_Air_Lines_Flight_401) is a classic
> example of this: All three pilots of an L-1011 where "working intently" on
> a malfunctioning indicator light to the point that none of them noticed
> that the plane was losing altitude until seconds before it crashed in the
> Florida Everglades.
>
> Though usually studied in safety critical environments, situational
> awareness is applicable in any complex and dynamic problem environment,
> such as a developing advanced software.  When you tell me that your
> developers are "intently working" on one small aspect of the problem, to
> the point of not noticing for several days that the trunk as forked - that
> tells me that there are likely other far more serious problems that they
> are also not noticing.  The fork is easily fixed with a merge.  The other
> more serious problems might not have such an easy fix.  And they might go
> undetected until your customer stumbles over them.
>
> So, I would use the observation that forks are going undetected as a
> symptom of more serious process problems in your organization, and
> encourage you to seek ways of getting your developers to spend more time
> "heads up" and looking at the big picture.
>
> (Did you notice - "situational awareness" is kind of a big issue with me.
> Fossil is my effort at building a DVCS that does a better job of promoting
> situational awareness that the other popular VCSes out there.  I'm
> constantly looking for ways to enhance Fossil to promote better situational
> awareness.  Suggestions are welcomed.)
>

Curious response. Did you intend to be insulting? I'm working with a bunch
of very smart people who are very reluctantly learning a new tool and a
different way of doing things and forks are very confusing when they happen
in a scenario where they seemingly should not. We are not operating in a
disconnected fashion here. Fossil falls somewhat short in the support of
people who like to get their job done at the command line (about 80% of
users on my team). Distilling from the fossil timeline command that there
is a fork and how to fix it is not easy. It is very tiresome to have to go
back to the ui to ensure that a fork hasn't magically appeared.

Anyhow, I misunderstood the exact nature of the cause. I assumed that the
race condition lay within the users fossil process between the time the db
query that checked for leaf and the insertion of the new checkin data in to
the db. That is of course incorrect. The actual cause is that the central
database is free to receive a commit via sync after having just done a sync
that informs the users fossil process that it is fine to commit. Something
like the following:

User1           User2        central
sync
leafcheck       sync
commit          leafcheck
sync            commit       receives delta from user1 just fine
                sync         receives delta from user2 and now a fork exists

As you point out below that is very difficult if not impossible to "fix".
What I think would alleviate this issue would be a check for fork creation
at the end of the final sync. If a fork is found notify the user so it can
be dealt with before confusion is created.

Just to illustrate, I think monotone deals rather nicely with the natural
but annoying creation of forks. The user is informed immediately the fork
occurs. Then the user only has to issue "mtn merge" and it does the easy
and obvious merge. With fossil I have to poll the ui to ensure I don't have
a fork, if I do have a fork I have to browse the UI and figure out the hash
id of the fork, do the merge and finally do a commit, manually doing what
could probably be mostly automated.

Contrast with git where you know when you are causing a fork because you do
it all the time and dealing with forks is just day to day business. Fossil
will silently fork and only by starting up the ui and digging around will
it become apparent that there is a fork.

In the referred to message DRH writes:

DVCSs make it very easy to fork the tree.  To listen to
Linus Torvalds you would think this is a good thing.  But
experience suggests otherwise.

I still mostly agree with this, but requiring that every developer poll the
database for forks or risk confusion makes me think that the git approach
is perhaps not so crazy after all. If forks suck but only take seconds to
resolve, get people used to dealing with them, don't randomly create them
for no apparent reason. At least provide a heads up when they happen and
provide some help to resolve them.

In short fossil does an imperfect job of hiding the pain of forking and so
when it does occur it can be surprising and a hassle..

>
>
>
>>
>> We added the fork detection code to the fossil wrapper which helps (we
>> also see forks due to time lag on syncing between remote sites) but it is
>> still a rather annoying problem.
>>
>> My question is can this be solved by wrapping the code that determines
>> that we are at a leaf and the code that does the final commit with a "BEGIN
>> IMMEDIATE;" ... "END;"?
>>
>
> No.  Fossil already does that.  Has done so for years.
>

Ah, I saw the calls to db_begin_transaction in commit.c wrapping the check
for a fork and db_begin_transaction does "BEGIN" not "BEGIN IMMEDIATE".

>
> The problem is that there are multiple disconnected replica of the
> database.  You cannot (reasonably) lock them all.  See
> http://en.wikipedia.org/wiki/CAP_theorem - DVCSes like Fossil choose
> availability and partition tolerance and the expense of (immediate)
> consistency, since consistency is easily restored later by merging in the
> rare event where it doesn't work out straight away.
>
> To "fix" this problem (and again - I'm not yet convinced that it is a
> problem that needs fixing) I think what you would need to do is create some
> kind of "reservation" system for commits.  Suppose user A and user B both
> are about to commit.  Each local fossil sends a message to the central
> repository that tries to "reserve" the tip of trunk for some limited period
> of time, say 60 seconds.  (The reservation interval might need to be
> adjusted depending on network latencies).  The first reservation wins.  If
> user B is second, he gets back an error that says "User A is also trying to
> commit - wait 60 seconds and try again".  That gives user B an opportunity
> to go for coffee, then merge in user A's changes before he tries again
> later.  You can make a reasonable argument that this is a good approach to
> development.  In terms of the CAP theorem, you are selecting CP rather than
> the current AP.
>
> Of course, this fix doesn't really work if you try to do a commit while
> off network, since then you cannot make a reservation.  It also doesn't
> work if you don't have a single central repository that everybody commits
> to.  So it isn't for everybody.  But I can understand how some
> organizations would want this.
>
>
>
>
>
>>
>> This increases the risk of leaving the db in a locked state so having a
>> fossil command to unlock a database would be nice.
>>
>> In this same vein it would be very nice to be able to control the sqlite3
>> timeout. I'm fairly sure that a longer timeout would give us much better
>> behaviour in our usage model.
>>
>> I have some scripting that can generate the forks and I'm willing to take
>> a stab at making this change but wanted to hear from the list if this
>> solution was worth trying.
>>
>> _______________________________________________
>> fossil-users mailing list
>> fossil-users@lists.fossil-scm.org
>> http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
>>
>>
>
>
> --
> D. Richard Hipp
> d...@sqlite.org
> _______________________________________________
> fossil-users mailing list
> fossil-users@lists.fossil-scm.org
> http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
>
>

_______________________________________________
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users

Re: [fossil-users] Unintentional fork/race condition

Reply via email to