Re: EDEADLK in svn_repos_fs_begin_txn_for_commit2

Blair Zajac Wed, 26 Jan 2011 10:31:05 -0800

On 01/26/2011 02:56 AM, Stefan Sperling wrote:

On Tue, Jan 25, 2011 at 11:00:31PM -0800, Blair Zajac wrote:

We're seeing deadlocks in our Subversion multithreaded server when
two distinct processes try to fcntl(F_SETLKW) on two fsfs
repositories' db/txn-current-lock, when the processes begin
transactions in reverse order.


Process 1                               Process 2
---------                               ---------
thread 1: begin txn in repos A          thread 1: being txn in repos B
thread 2: begin txn in repos B          thread 2: begin txn in repos A

During normal working hours, we get over 1 commit per second,
peaking at 6, which is why we're seeing this.

Questions:

Should a fix for this be put in libsvn_fs_fs() or should I do this
in my application?  I'm thinking putting this in libsvn_fs_fs() is
an appropriate fix, even though other people probably won't see it.

I'm also thinking the code should retry a maximum of 100 times with
a 1ms sleep, doubling each sleep upon failure to a maximum 128 ms,
such as WIN32_RETRY_LOOP.

Comments?


If possible it should be fixed in libsvn_fs_fs.

I'm now thinking of putting the retry in svn_io_file_lock2() instead ofhandling a deadlock in libsvn_fs_fs itself. It shouldn't hurt any otheruse cases and be a general, defensive code.


Blair

Re: EDEADLK in svn_repos_fs_begin_txn_for_commit2

Reply via email to