Attempting to perform high commit rates into an fsfs repository on NFS with two or more Linux boxes, one of the processes can get stuck in fcntl() for over 30 seconds:

open("repo/db/write-lock", O_RDWR)      = 4
fcntl(4, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0}

Sample Python script below that easily shows the issue.

I've observed this with NetApp and Isilon NFS servers. I don't observe this with a single Linux box running multiple svn processes, I'm guessing the kernel decides who wins the lock on the system and then goes to the NFS server?

Even after the svn processes on one box are stopped, if the svn processes on the other box are blocked in fcntl(), it can take over 30 seconds for the svn process waiting on the lock to start.

I have a patch that replaces fsfs.c:get_lock_on_filesystem()'s implementation with apr_file_open(APR_WRITE | APR_CREATE | APR_EXCL | APR_DELONCLOSE). If it fails, it sleeps 1ms and doubles the sleep to a maximum of 25ms, until it succeeds. I haven't seen it hang to the degree that fcntl() does.

Using APR_EXCL requires a NFSv3 server and for Linux, a 2.6.6 or greater kernel (see http://nfs.sourceforge.net/#faq_d10).

Questions:

1) Is there a better algorithm than exponential sleeps for a resource when you need to explicitly try to get the resource? I've noticed that having a slow and a fast Linux client trying to do as many commits per second, the fast one locks out the slow one, so the slow one ends up sleeping a lot more. I'm thinking of using a random sleep between 1 and 100ms, where 100ms is an average commit time.

2) Would this be an appropriate patch to put into 1.7, if the locking strategy can be configured in the fsfs.conf file?

3) I understand some of the large svn hosting providers host on NetApp, don't they see this issue? Do they use a master/standby deployment so it doesn't matter?

Thanks,
Blair




#!/usr/bin/python -u

import os
import svn.repos
import time

repo_name = 'repo'

if os.path.isdir(repo_name):
    repo = svn.repos.open(repo_name)
else:
    repo = svn.repos.create(repo_name,
                            None,
                            None,
                            None,
                            {svn.fs.CONFIG_FS_TYPE: svn.fs.TYPE_FSFS})

fs = svn.repos.fs(repo)
youngest = svn.fs.youngest_rev(fs)

path = '/%s' % (1000*1000*time.time())

while True:
    t1 = time.time()
    txn = svn.repos.fs_begin_txn_for_commit2(repo, youngest, {})
    fs_root = svn.fs.txn_root(txn)
    if svn.core.svn_node_none == svn.fs.check_path(fs_root, path):
        svn.fs.make_dir(fs_root, path)
    svn.fs.change_node_prop(fs_root,
                            path,
                            'foo',
                            '%s' % (1000*1000*time.time()))
    youngest = svn.repos.fs_commit_txn(repo, txn)
    t2 = time.time()
    print t2 - t1


Reply via email to