Please don't reply to lustre-devel. Instead, comment in Bugzilla by using the 
following link:
https://bugzilla.lustre.org/show_bug.cgi?id=11562

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED


the problem is reproducible locally in a SL10.1 environment.
The folling hot fix (with debugging code) fixes the problem. 
However it is not tried on buffalo.

diff -u -p -r1.27.40.3.14.3.26.4 symlink.c
--- lustre/llite/symlink.c      16 Nov 2006 19:21:39 -0000      
1.27.40.3.14.3.26.4
+++ lustre/llite/symlink.c      18 Jan 2007 10:41:09 -0000
@@ -130,7 +130,7 @@ static LL_FOLLOW_LINK_RETURN_TYPE ll_fol
         struct inode *inode = dentry->d_inode;
         struct ll_inode_info *lli = ll_i2info(inode);
         struct lookup_intent *it = ll_nd2it(nd);
-        struct ptlrpc_request *request;
+        struct ptlrpc_request *request = NULL;
         int rc;
         char *symname;
         ENTRY;
@@ -145,6 +145,17 @@ static LL_FOLLOW_LINK_RETURN_TYPE ll_fol
         }

         CDEBUG(D_VFSTRACE, "VFS Op\n");
+
+        {
+                int dummy = 1;
+                printk("SP x%p lelel = %d\n", &dummy, current->link_count);
+        }
+
+        if (current->link_count > 5) {
+                path_release(nd);
+                GOTO(out, rc = -ELOOP);
+        }
+

A simpler test found which causes stack overflow in a luster client
without the hot fix:

       $ ln -sf foo foo
       $ ls foo

The debugging code above gives information about how stack usage
grows with each ll_follow_link call:

SP 0xffff810001defb14 lelel = 1
SP 0xffff810001def8c4 lelel = 2
SP 0xffff810001def674 lelel = 3
SP 0xffff810001def424 lelel = 4
SP 0xffff810001def1d4 lelel = 5
SP 0xffff810001deef84 lelel = 6

It means these functions together eat 592 bytes on stack: 

link_path_walk
__link_path_walk
do_follow_link
__do_follow_link
__vfs_follow_link
(link_path_walk again)  

especially link_path_walk takes 200 bytes and __link_path_walk takes 280
(from checkstack.pl report)

for comparing, 
the same functions in the same kernel for i386 arch take:
__link_path_walk:                    280
link_path_walk:                      200

and stack usage report for newer kernel 2.6.20-rc5 on x86_64:
link_path_walk [vmlinux]:            152
__link_path_walk [vmlinux]:          104

2.6.9-rhel4 and x86_64:
0xffffffff80184f6f link_path_walk:                      192
0xffffffff80183f80 __link_path_walk:                    136

_______________________________________________
Lustre-devel mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-devel

Reply via email to