Vlad,

This is exactly what I'd been doing, though I was getting stumped at what to 
look for in the fbt trace of lstat64(). However, once I did an sdiff between 
that trace and one from a system that doesn't have the problem, it led to 
looking at it being an SNFS issue. Specifically, the first difference in the 
scall traces was to what seemed to be a SNFS name cache search function. This 
has been helpful in dealing with Quantum (who owns SNFS) and now we're 
thinking, based on other testing as well, that it may be a known bug in SNFS 
that on rare occasions causes directory corruption. So at the very least, you 
reassured me that I was taking the right track in analyzing the problem.

Thanks,
Justin

-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Vladimir Marek
Sent: Wednesday, June 25, 2008 12:16 AM
To: [email protected]
Subject: Re: [dtrace-discuss] Shared filesystem weirdness

[...]
> On system X, doing a "/bin/ls /foo/bar/duh" (or "cd /foo/bar/duh;
> /bin/ls") lists file f but any command that tries to access the file 
> (e.g. via a stat, open, etc. system call) fails saying file not found.

[...]
> While I'm not looking for a script from anyone, I would appreciate any 
> advice on how to figure out why the kernel (snfs/cvfs driver?) is not 
> able to access the file from system X. Remember that I can use system 
> Y as a control system.

Well, without any filesystem knowledge, I would start looking at one specific 
syscall, say 'stat'. Then I would find out which syscall exactly it is

$ dtrace -n 'syscall::stat*:entry{trace(copyinstr(arg0))}'

Then I would record every function being executed during the syscall 
processing, with the function return values. (let's say it's stat64, and you 
are doing 'ls -l /foo/bar/duh/f'

$ dtrace -x flowindent -n 
'syscall::stat64:entry/copyinstr(arg0)=="/foo/bar/duh/f"/{self->go=1}
fbt:::entry/self->go/{}
fbt:::return/self->go/{trace(arg1)}
syscall::stat64:return/self->go/{self->go=0; exit(0)}'

Then compare one run when the syscall succeeded and one where it failed.

-- 
        Vlad
_______________________________________________
dtrace-discuss mailing list
[email protected]

Reply via email to