Mike, On Mon, Jun 7, 2010 at 12:00 PM, Mike Heinz <[email protected]> wrote: > Hal said: > Should a busy be retried at all at the mad layer ? Is a "special" longer) > timeout policy for busy needed ? > > Also, should this be done for all MADs classified by ib_response_mad (e.g. > trap represses) ? > > Hal, > > The idea of processing BUSY responses in the MAD layer is to BUSY responses > like timeouts - which are currently handled by the MAD layer. Right now there > is an issue where various apps and ULPs either treat BUSY as a cause to > immediately retry or as a permanent error. This doesn't seem to affect users > of the OpenSM so much because (as I understand it) the OpenSM seems to > discard requests when it gets too busy - but for other SA/SMs, it can cause a > major packet storm or, worse, a simple loss of connectivity where MPI jobs or > kernel ULPs simply assume the SA is broken because they got a BUSY reply. > > By treating the BUSY reply as a timeout, we're actually simplifying matters > by fitting into existing practice.
Understood. Timing these out makes sense to me but still does not preclude the client from potentially handling this if the retries fail. > As for needing a longer timeout - in our old proprietary stack, QLogic did > have a longer timeout for retrying busy replies than for normal timeouts How much longer ? What are the two timeouts used ? > - but we should try to get this in now so we can get some relief before we > begin the long term discussion of the best way to handle this issue overall. All I was getting at here was: does retrying when busy work ? If not, why retry at all at the MAD layer (regardless of retries requested) and perhaps use a longer timeout for this. If it does work, maybe the timeout on the subsequent retries should be extended. I think my two other comments on details are relevant to an updated patch. -- Hal -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html
