On 01/26/2011 02:56 AM, Stefan Sperling wrote:
On Tue, Jan 25, 2011 at 11:00:31PM -0800, Blair Zajac wrote:
We're seeing deadlocks in our Subversion multithreaded server when
two distinct processes try to fcntl(F_SETLKW) on two fsfs
repositories' db/txn-current-lock, when the processes begin
transactions in reverse order.

Process 1                               Process 2
---------                               ---------
thread 1: begin txn in repos A          thread 1: being txn in repos B
thread 2: begin txn in repos B          thread 2: begin txn in repos A

During normal working hours, we get over 1 commit per second,
peaking at 6, which is why we're seeing this.

Questions:

Should a fix for this be put in libsvn_fs_fs() or should I do this
in my application?  I'm thinking putting this in libsvn_fs_fs() is
an appropriate fix, even though other people probably won't see it.

I'm also thinking the code should retry a maximum of 100 times with
a 1ms sleep, doubling each sleep upon failure to a maximum 128 ms,
such as WIN32_RETRY_LOOP.

Comments?

If possible it should be fixed in libsvn_fs_fs.

I'm now thinking of putting the retry in svn_io_file_lock2() instead of handling a deadlock in libsvn_fs_fs itself. It shouldn't hurt any other use cases and be a general, defensive code.

Blair

Reply via email to