I've been doing a bunch of recovery testing with DLM and discovered some issues. This collection of 6 patches addresses those issues. Some of them are of my own making, introduced by the recent patches that made DLM print socket connection errors, and recovery from those errors.
The first patch changes the TCP "connect to sock" function to more closely match the SCTP version of the function. The idea is to not create a kernel socket until we have a valid node address, like it does in the SCTP path. The second patch removes a "return" from lowcomms_error_report that should not be there. The return was causing it to bypass calling the original error report code, thus skipping an important part in the reporting. The third patch changes function tcp_create_listen_sock so that its error path is consistent. Only one of its error paths was setting con->sock to NULL, but it should be done in both cases. The fourth patch eliminates a useless goto, to make the code more clear. The fifth patch adds a layer of locking by way of the sk->sk_callback_lock which is needed to prevent multiple send/receive sockets from interfering with one another when reporting the socket errors and subsequent recovery. This makes it similar to how sunrpc handles errors. The sixth and final patch makes the socket error code save and restore all four callbacks, whereas before we were only saving and restoring the error report callback. Bob Peterson (6): DLM: Don't create kernel socket until we have valid node address DLM: Call original error report when socket is NULL DLM: Make consistent error path through tcp_create_listen_sock DLM: Eliminate useless goto DLM: Add locking to protect save callback assignments DLM: save / restore all socket callbacks fs/dlm/lowcomms.c | 103 ++++++++++++++++++++++++++++++++++++++++-------------- 1 file changed, 77 insertions(+), 26 deletions(-) -- 2.5.0