On Thu, Sep 13, 2007 at 04:46:37PM -0500, Bob Peterson wrote:
> The following is a patch for bugzilla bug 276631.
> 
> The problem boiled down to a race between the gdlm_init_threads()
> function initializing thread1 and its setting of blist = 1.
> Essentially, "if (current == ls->thread1)" was checked by the thread
> before the thread creator set ls->thread1.
> 
> Since thread1 is the only thread who is allowed to work on the
> blocking queue, and since neither thread thought it was thread1, no one
> was working on the queue.  So everything just sat.
> 
> This patch reuses the ls->async_lock spin_lock to fix the race,
> and it fixes the problem.  I've done more than 2000 iterations of the
> loop that was recreating the failure and it seems to work.
> 
> Dave Teigland brought up the question of whether we should do this
> another way.  For example, by checking for the task name "lock_dlm1"
> instead.  I'm open to opinions.
> --
> Signed-off-by: Bob Peterson <[EMAIL PROTECTED]> 
> --
> diff -pur a/fs/gfs2/locking/dlm/thread.c b/fs/gfs2/locking/dlm/thread.c
> --- a/fs/gfs2/locking/dlm/thread.c    2007-09-13 15:51:08.000000000 -0500
> +++ b/fs/gfs2/locking/dlm/thread.c    2007-09-13 15:21:07.000000000 -0500
> @@ -279,8 +279,10 @@ static int gdlm_thread(void *data)
>       /* Only thread1 is allowed to do blocking callbacks since gfs
>          may wait for a completion callback within a blocking cb. */
>  
> +     spin_lock(&ls->async_lock);
>       if (current == ls->thread1)
>               blist = 1;
> +     spin_unlock(&ls->async_lock);
>  
>       while (!kthread_should_stop()) {
>               set_current_state(TASK_INTERRUPTIBLE);
> @@ -338,6 +340,7 @@ int gdlm_init_threads(struct gdlm_ls *ls
>       struct task_struct *p;
>       int error;
>  
> +     spin_lock(&ls->async_lock);
>       p = kthread_run(gdlm_thread, ls, "lock_dlm1");
>       error = IS_ERR(p);
>       if (error) {
> @@ -354,6 +357,7 @@ int gdlm_init_threads(struct gdlm_ls *ls
>               return error;
>       }
>       ls->thread2 = p;
> +     spin_unlock(&ls->async_lock);
>  
>       return 0;
>  }
>

If theres an error we'll return holding the spin lock,

Josef 

Reply via email to