Hi,

I would like to ask you for code review for this bug:

6900751 Corrupt call_table / callist structure leads to networking hang

The webrev is available here:
http://cr.opensolaris.org/~aragorn/6900751-rpcmod-calltable/


Here is root cause of the problem:
==================================

clnt_clts_kcallit_addr() called AUTH_REFRESH() while the "call" was still in
the call table. The AUTH_REFRESH() (in this case rpc_gss_refresh() function)
just reused the "call" adding it to the call table again, but with different
xid this time. This caused the call table corruption.


    454 enum clnt_stat
    455 clnt_clts_kcallit_addr(CLIENT *h, rpcproc_t procnum, xdrproc_t xdr_args,
    456         caddr_t argsp, xdrproc_t xdr_results, caddr_t resultsp,
    457         struct timeval wait, struct netbuf *sin)

...

    496 call_again:

...

    584         error = clnt_clts_dispatch_send(p->cku_endpnt->e_wq, mp,
    585             &p->cku_addr, call, p->cku_xid, p->cku_cred);   <--- HERE 
the call is added to the call table

...

    877                 if (refreshes > 0 &&
    878                     AUTH_REFRESH(h->cl_auth, &reply_msg, p->cku_cred))  
{      <--- HERE we reused the call (we added it to the call table, then 
removed it)

...

    891                         call_table_remove(call);   <--- HERE we tried 
to remove the already removed call

...

    901                         goto call_again;
    902                 }
    903                 /*                <------ HERE we are aware that we 
reused it (client handle contains the call in the structures)
    904                  * We have used the client handle to do an AUTH_REFRESH
    905                  * and the RPC status may be set to RPC_SUCCESS;
    906                  * Let's make sure to set it to RPC_AUTHERROR.
    907                  */

...

    925                                 call_table_remove(call);     <--- 
different path where we tried to remove the already removed call



In addition, the webrev contains some cleanup and minor fixes in other parts of
the clnt_clts_kcallit_addr() function.

Should you have any question, please ask.


Thank you.

-- 
Marcel Telka
Solaris RPE

Reply via email to