Re: NFS Locking Issue

User Freebsd Wed, 05 Jul 2006 07:52:56 -0700

On Wed, 5 Jul 2006, Robert Watson wrote:

On Wed, 5 Jul 2006, Danny Braniss wrote:
In my case our main servers are NetApp, and the problems are more relatedto am-utils running into some race condition (need more time to debug this:-) the other problem is related to throughput, freebsd is slower thanlinux, and while freebsd/nfs/tcp is faster on Freebsd than udp, on linuxit's the same. So it seems some tunning is needed.
our main problem now is samba/rpc.lockd, we are stuck with a server runningFreeBSD 5.4 which crashes, and we can't upgrade to 6.1 because lockddoesn't work.
So, if someone is willing to look into the lockd issue, we would like tohelp.
The most significant problem working with rpc.lockd is creating easy toreproduce test cases. Not least because they can potentially involvemultiple clients. If you can help to produce simple test cases to reproducethe bugs you're seeing, that would be invaluable.
I'm aware of two general classes of problems with rpc.lockd. First,architectural issues, some derived from architectural problems in the NLMprotocol: for example, assumptions that there can be a clean mapping ofprocess lock owners to locks, which fall down as locks are properties of filedescriptors that can be inheritted. Second, implementation bugs/misfeatures,such as the kernel not knowing how to cancel lock requests, so being unableto implement interruptible waits on locks in the distributed case.
Reducing complex failure modes to easily reproduced test cases is trickyalso, though. It requires careful analysis, often with ktrace andtcpdump/ethereal to work out what's going on, and not a little luck toperform the reduction of a large trace down to a simple test scenario. Thefirst step is to try and figure out what, if any, specific workload resultsin a problem. For example, can you trigger it using work on just one clientagainst a server, without client<->client interactions? This makes trackingand reproduction a lot easier, as multi-client test cases are really tricky!Once you've established whether it can be reproduced with a single client,you have to track down the behavior that triggers it -- normally, this isdone by attempting to narrow down the specific program or sequence of eventsthat causes the bug to trigger, removing things one at a time to see whatcauses the problem to disappear. This is made more difficult as lockmanagers are sensitive to timing, so removing a high load item from the list,even if it isn't the source of the problem, might cause it to trigger lessfrequently.

I'm not sure if this is an option for anyone, either developer or user,but in the past, on particularly tricky bugs where I seemed to be the onlyone to be able to produce it, I've given access to a 'trusted developer'to the machine itself, to minimize the time lag that emails create ...but, also, to let the developer at a machine that has the load required toeasily reproduce it ...

Not sure if there is anyone out there, on either side of the proverbialfence, that feels comfortable doing this, but figured I'd throw the ideaout ...

I believe, in Francisco's case, they are willing to pay someone to fix theNFS issues they are having, which, i'd assume, means easy access to theproblematic server(s) to do proper testing in a "real life scenario" ...


----
Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
Email . [EMAIL PROTECTED]                              MSN . [EMAIL PROTECTED]
Yahoo . yscrappy               Skype: hub.org        ICQ . 7615664
_______________________________________________
[email protected] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: NFS Locking Issue

Reply via email to