Control: retitle -1 nfs4_reclaim_open_state: Lock reclaim failed

For what it's worth, I've seen the same symptoms in jessie (kernel 3.16.36
at the time) and Ubuntu trusty (3.13.0-93). In my experience, NFSv4 in
stretch is no worse than in jessie.

Rate-limiting those "Lock reclaim failed!" messages would be useful. I've
had to add a filter for them in rsyslog to prevent a DoS on my central
logging infrastructure. I don't see them often, but when a client gets stuck
it can emit this message *many* times.

There is definitely more than one trigger for these. I'm under the impression
that network partitioning events generate short bursts of such messages, but
this is usually benign and does not require a reboot for recovery. Not sure
what causes the more severe incidents (I haven't had one in a while, and
my NFS environment is intentionally v4-only).

My troubleshooting checklist for the next incident includes
  echo 1 > /sys/kernel/debug/tracing/events/nfs4_lock_reclaim/enable
but I haven't had a chance to put this into practice yet.

Reply via email to