On Wed, 5 Jul 2006, Kostik Belousov wrote:
Also, the both lockd processes now put identification information in the
proctitle (srv and kern). SIGUSR1 shall be sent to srv process.
Hmm, after looking at the dump there and some code reading, I have noted the
following:
1. NLM lock request contains the field caller_name. It is filled by (let
call it) kernel rpc.lockd by the results of hostname(3).
2. This caller_name is used by server rpc.lockd to send request for host
monitoring to rpc.statd (see send_granted). Request is made by clnt_call,
that is blocking rpc call.
3. rpc.statd does getaddrinfo on caller_name to determine address of the
host to monitor.
If the getaddrinfo in step 3 waits for resolver, then your client machine
will get locking process in"lockd" state.
Could people experiencing rpc.lockd mistery at least report whether _server_
machine successfully resolve hostname of clients as reported by hostname?
And, if yes, to what family of IP protocols ?
It's not impossible. It would be interesting to see if ps axl reports that
rpc.lockd is in the kqread state, which would suggest it was blocked in the
resolver. We probably ought to review rpc.statd and make sure it's generally
sensible. I've noticed that its notification process on start is a bit poorly
structured in terms of how it notifies hosts of its state change -- if one
host is down, it may take a very long time to notify other hosts.
There are a number of other dubious things about the NLM protocol design (at
least, from my reading last night). I've also noticed that our rpc.lockd is
particularly sensitive, on the client side, to locks being released by a
different process than the process that acquired the lock, which is triggered
excessively by our new libpidfile in RELENG_6.
Robert N M Watson
Computer Laboratory
University of Cambridge
_______________________________________________
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"