I've been doing a bunch of recovery testing with DLM and discovered some
issues. This collection of 6 patches addresses those issues. Some of them
are of my own making, introduced by the recent patches that made DLM
print socket connection errors, and recovery from those errors.

The first patch changes the TCP "connect to sock" function to more closely
match the SCTP version of the function. The idea is to not create a kernel
socket until we have a valid node address, like it does in the SCTP path.

The second patch removes a "return" from lowcomms_error_report that should
not be there. The return was causing it to bypass calling the original
error report code, thus skipping an important part in the reporting.

The third patch changes function tcp_create_listen_sock so that its
error path is consistent. Only one of its error paths was setting
con->sock to NULL, but it should be done in both cases.

The fourth patch eliminates a useless goto, to make the code more clear.

The fifth patch adds a layer of locking by way of the sk->sk_callback_lock
which is needed to prevent multiple send/receive sockets from
interfering with one another when reporting the socket errors and
subsequent recovery. This makes it similar to how sunrpc handles errors.

The sixth and final patch makes the socket error code save and restore
all four callbacks, whereas before we were only saving and restoring the
error report callback.

Bob Peterson (6):
  DLM: Don't create kernel socket until we have valid node address
  DLM: Call original error report when socket is NULL
  DLM: Make consistent error path through tcp_create_listen_sock
  DLM: Eliminate useless goto
  DLM: Add locking to protect save callback assignments
  DLM: save / restore all socket callbacks

 fs/dlm/lowcomms.c | 103 ++++++++++++++++++++++++++++++++++++++++--------------
 1 file changed, 77 insertions(+), 26 deletions(-)

-- 
2.5.0

Reply via email to