On Wed, 2008-05-21 at 21:05 +0200, Jakob Goldbach wrote:
> > 
> > kernel is 2.6.23.17 with patchless lustre 1.6.4.3, 
> 
> I'm running 1.6.4.3 patchless as well against an 2.6.18 vanilla kernel.
> Or at least that is what I thought. OpenVz patch effectively makes the
> kernel a 2.6.18++ kernel because they add features from newer kernels in
> their maintained 2.6.18 based kernel.  
> 
> So the lockup in __d_lookup may just relate to newer patchless clients. 
> 
> I got a debug patch from the OpenVz community which indicate dcache
> chain corruption in a lustre code path. 
> 
> Patch snippet is
> 
> --- ./fs/dcache.c.ddebug2     2008-05-21 14:52:15.000000000 +0400
> +++ ./fs/dcache.c     2008-05-21 15:10:06.000000000 +0400
> @@ -1350,6 +1350,18 @@ static void __d_rehash(struct dentry * e
>  {
>  
>       entry->d_flags &= ~DCACHE_UNHASHED;
> +     if (!spin_is_locked(&dcache_lock)) {
> +             printk(KERN_ERR "Dcache lock is not taken on add\n");
> +             dump_stack();
> +     } else if (list->first != NULL &&
> +                     list->first->pprev != &list->first) {
> +             printk(KERN_ERR "Dcache chain corruption:\n");
> +             printk(KERN_ERR "Chain %p --next-> %p\n",
> +                             list, list->first);
> +             printk(KERN_ERR "First %p <-pprev- %p\n",
> +                             list->first, list->first->pprev);
> +             dump_stack();
> +     }
>       hlist_add_head_rcu(&entry->d_hash, list);
>  }
> 
> and stack trace 
> 
> [ 6447.548789] Dcache chain corruption:
> [ 6447.549529] Chain ffff8100010de880 --next-> ffff8100b4ce00b0
> [ 6447.550699] First ffff8100b4ce00b0 <-pprev- 0000000000200200
> [ 6447.551711] 
> [ 6447.551713] Call Trace:
> [ 6447.552809]  [<ffffffff8020ae20>] show_trace+0xae/0x360
> [ 6447.553784]  [<ffffffff8020b0e7>] dump_stack+0x15/0x17
> [ 6447.554727]  [<ffffffff8029ee94>] __d_rehash+0x75/0x97
> [ 6447.555797]  [<ffffffff8029ef2a>] d_rehash+0x74/0x91
> [ 6447.556846]  [<ffffffff883b4c6a>] :lustre:ll_revalidate_it+0xa1a/0xd90
> [ 6447.557966]  [<ffffffff883b529c>] :lustre:ll_revalidate_nd+0x2bc/0x360
> [ 6447.559082]  [<ffffffff80295741>] do_lookup+0x15d/0x193
> [ 6447.560142]  [<ffffffff80296fd9>] __link_path_walk+0x409/0x10ac
> [snip]
> 

This patch and backtrace say - dcache chain was damaged _before_ enter
to lustre, lustre start add entry to new position in dentry cache, and
find damaged entry in list.


-- 
Alex Lyashkov <[EMAIL PROTECTED]>
Lustre Group, Sun Microsystems

_______________________________________________
Lustre-discuss mailing list
[email protected]
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Reply via email to