Hi Tom, Tom Haynes wrote: > Dai Ngo wrote: >> Hi, >> >> I'd like to have a code review for the change to fix CR 6831781. >> >> The problem was caused by the current handling of QFULL condition in >> rpcmod by delaying 10 secs before returning an error to caller, see >> related >> CR 6762222. <http://monaco.sfbay/detail.jsf?cr=6762222> >> >> The fix is to retry dispatching the RPC call, when write queue is >> full, in 1 >> second interval until the RPC timeout expires and returns error to >> caller. >> Also replacing the "nfs server not responding..." message with "send >> queue >> full.." message to help user to identify the problem better. >> > > Dai, > > How often will this message spam the console? > > I understand we had an existing message going to the console, but if > you are > going from 1 delay of 10s to 10 delays of 1s, I have to wonder if that > means > 10 more messages?
Prior to the fix, whenever a QFULL condition ocurs, rpcmod delays 10 secs (to allow the queue to clear) then returns an error to caller. The caller (NFS) writes an error message in system log then retry the call again. This fix modified rpcmod to retry, in 1 sec interval, until the timeout specified in the RPC call expires. For TCP, the default timeout is 60 secs. If the QFULL condition is not cleared after this retry period (60 secs) then rpcmod returns an error to the caller which then displays an error message in the system log. With this fix, the error message is almost never displayed since the QFULL condition is usually cleared in less than 5 secs (seen with vdbench, with few worst cases peaked out at 20 secs). > > Also, I think you need to do a 'hg reci' - the comment section on the > webrev > is showing up more than the bug and description. could you be more specific on this? rasta.dainx[516] pwd /export/home/dain/NFS_BUGS/6831781/onnv-clone rasta.dainx[517] hg reci abort: workspace has uncommitted changes rasta.dainx[518] What does 'hg reci' do? Thanks, -Dai