Vlad, This is exactly what I'd been doing, though I was getting stumped at what to look for in the fbt trace of lstat64(). However, once I did an sdiff between that trace and one from a system that doesn't have the problem, it led to looking at it being an SNFS issue. Specifically, the first difference in the scall traces was to what seemed to be a SNFS name cache search function. This has been helpful in dealing with Quantum (who owns SNFS) and now we're thinking, based on other testing as well, that it may be a known bug in SNFS that on rare occasions causes directory corruption. So at the very least, you reassured me that I was taking the right track in analyzing the problem.
Thanks, Justin -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Vladimir Marek Sent: Wednesday, June 25, 2008 12:16 AM To: [email protected] Subject: Re: [dtrace-discuss] Shared filesystem weirdness [...] > On system X, doing a "/bin/ls /foo/bar/duh" (or "cd /foo/bar/duh; > /bin/ls") lists file f but any command that tries to access the file > (e.g. via a stat, open, etc. system call) fails saying file not found. [...] > While I'm not looking for a script from anyone, I would appreciate any > advice on how to figure out why the kernel (snfs/cvfs driver?) is not > able to access the file from system X. Remember that I can use system > Y as a control system. Well, without any filesystem knowledge, I would start looking at one specific syscall, say 'stat'. Then I would find out which syscall exactly it is $ dtrace -n 'syscall::stat*:entry{trace(copyinstr(arg0))}' Then I would record every function being executed during the syscall processing, with the function return values. (let's say it's stat64, and you are doing 'ls -l /foo/bar/duh/f' $ dtrace -x flowindent -n 'syscall::stat64:entry/copyinstr(arg0)=="/foo/bar/duh/f"/{self->go=1} fbt:::entry/self->go/{} fbt:::return/self->go/{trace(arg1)} syscall::stat64:return/self->go/{self->go=0; exit(0)}' Then compare one run when the syscall succeeded and one where it failed. -- Vlad _______________________________________________ dtrace-discuss mailing list [email protected]
