On Mar 12, 2009, at 12:28 PM, Dai Ngo wrote: > Hi Jim, > > james wahlig wrote: >> >> I suppose I could run through the server code and see if there is a >> precedent already set, but thought maybe someone on the alias might >> know. This is something that Spencer probably would have known off >> the top of his head, but I don't. >> >> Maybe there is a retry variable defined somewhere that we could >> reuse instead of creating a new one. > I consulted with Jeff, and we did not find any existing configurable > variable to use so > I created a new one.
Hi Jim, I also thought that we'd have a "num retries" variable defined somewhere. I found nfs4_max_recov_error_retry which is used by client recovery and set to 3. The client has another retry global for retrying a mount (set to 2). My personal favorite is recov_state.rs_num_retry_despite_err (client). :-) I didn't think it made sense to limit max retries for this bug using one of the other retry-related vars I found. So I thought it would be okay for Dai to define a new one for this case. I did mention that 10 felt "too big" to me (probably because I'm a little polluted with client defaults of 2-3), but I didn't insist on making it smaller because the fix is about not retrying forever, and for that, 10 is as good as 3. The more I think about it, the more I like no retries. Maybe I'm missing something, but why would the client drop/ignore the first cbnull but process a subsequent cbnull? I'm thinking that if it doesn't reply to the first, then it will probably not reply to subsequent cb_null calls. It would be interesting to know how many times our client fails to reply to cb_null. I suspect that we'd see server either issue no retries or the max number of retries and nothing in between. Jeff