Hi, I would like to ask you for code review for this bug:
6900751 Corrupt call_table / callist structure leads to networking hang The webrev is available here: http://cr.opensolaris.org/~aragorn/6900751-rpcmod-calltable/ Here is root cause of the problem: ================================== clnt_clts_kcallit_addr() called AUTH_REFRESH() while the "call" was still in the call table. The AUTH_REFRESH() (in this case rpc_gss_refresh() function) just reused the "call" adding it to the call table again, but with different xid this time. This caused the call table corruption. 454 enum clnt_stat 455 clnt_clts_kcallit_addr(CLIENT *h, rpcproc_t procnum, xdrproc_t xdr_args, 456 caddr_t argsp, xdrproc_t xdr_results, caddr_t resultsp, 457 struct timeval wait, struct netbuf *sin) ... 496 call_again: ... 584 error = clnt_clts_dispatch_send(p->cku_endpnt->e_wq, mp, 585 &p->cku_addr, call, p->cku_xid, p->cku_cred); <--- HERE the call is added to the call table ... 877 if (refreshes > 0 && 878 AUTH_REFRESH(h->cl_auth, &reply_msg, p->cku_cred)) { <--- HERE we reused the call (we added it to the call table, then removed it) ... 891 call_table_remove(call); <--- HERE we tried to remove the already removed call ... 901 goto call_again; 902 } 903 /* <------ HERE we are aware that we reused it (client handle contains the call in the structures) 904 * We have used the client handle to do an AUTH_REFRESH 905 * and the RPC status may be set to RPC_SUCCESS; 906 * Let's make sure to set it to RPC_AUTHERROR. 907 */ ... 925 call_table_remove(call); <--- different path where we tried to remove the already removed call In addition, the webrev contains some cleanup and minor fixes in other parts of the clnt_clts_kcallit_addr() function. Should you have any question, please ask. Thank you. -- Marcel Telka Solaris RPE