Re: Collective wisdom about repos on NFS accessed by concurrent clients (== corruption!?)

Kenneth Ölwing Fri, 05 Apr 2013 05:36:35 -0700

Hi

Basically, I'm at a place where I'm considering giving up getting thisto work reliably. In general, my setup work really fine, except for theitty-bitty detail that when I put pressure on things I tend to get intovarious kinds of trouble with the central repo being corrupted.


Can anyone authoritatively state anything either way?

TIA,

ken1

On 2013-03-28 11:22, Kenneth Ölwing wrote:

Hi,
I'm hoping to hear some wisdom on the subject so I can decide if I'mchasing a pipe dream or if it should be expected to work and I justneed to work out the kinks.
Finding things like this makes it sound possible:
  http://permalink.gmane.org/gmane.comp.version-control.git/122670
but then again, in threads like this:
  http://kerneltrap.org/mailarchive/git/2010/11/14/44799
opinions are that it's just not reliable to trust.
My experience so far is that I eventually get repo corruption when Istress it with concurrent read/write access from multiple hosts(beyond any sort of likely levels, but still). Maybe I'm doingsomething wrong, missing a configuration setting somewhere, put anunfair stress on the system, there's a bona fide bug - or, given theinherent difficulty in achieving perfect coherency between machines onwhat's visible on the mount, it's just impossible (?) to truly get itworking under all situations.
My eventual usecase is to set up a bunch of (gitolite) hosts that allare effectively identical and all seeing the same storage and clientscan then be directed to any of them to get served. For the purpose ofthis query I assume it can be boiled down to be the same as n usersworking on their own workstations and accessing a central repo on anNFS share they all have mounted, using regular file paths. Correct?
I have a number of load-generating test scripts (that from humblebeginnings have grown to beasts), but basically, they fork a number ofcode pieces that proceed to hammer the repo with concurrent readingand writing. If necessary, the scripts can be started on differenthosts, too. It's all about the central repo so clients will retry invarious modes whenever they get an error, up to re-cloning andstarting over. All in all, they can yield quite a load.
On a local filesystem everything seems to be holding up fine which isexpected. In the scenario with concurrency on top of shared NFSstorage however, the scripts eventually fails with various problems(when the timing finally finds a hole, I guess) - possible to clonebut checkouts fails, errors about refs pointing wrong and errors wherethe original repo is actually corrupted and can't be cloned from.Depending on test strategy, I've also had everything going fine (opslooks fine in my logs), but afterwards the repo has lost a branch or two.
I've experimented with various NFS settings (e.g. turning off theattribute cache), but haven't reached a conclusion. I do suspect thatit just is a fact of life with a remote filesystem to have coherencyproblems with high concurrency like this but I'd be happily provenwrong - I'm not an expert in either NFS or git.
So, any opionions either way would be valuable, e.g. 'it should work'or 'it can never work 100%' is fine, as well as any suggestions.Obviously, the concurrency needed to make it probable to hit thisseems so unlikely that maybe I just shouldn't worry...
I'd be happy to share scripts and whatever if someone would like totry it out themselves.
Thanks for your time,

ken1

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Collective wisdom about repos on NFS accessed by concurrent clients (== corruption!?)

Reply via email to