On Thu, Feb 27, 2014 at 9:17 PM, Pat Donlin <p...@sgi.com> wrote: > I had submitted a patch back on Nov 19, 2013 regarding a fix to lanplus > retry. This had resolved a problem whereby a retry of a payload type of > IPMI_PAYLOAD_TYPE_IPMI first removed the request from the queue before going > back for a retry of the message. I have been able to determine why this fix > works correctly. More importantly I have been able to resolve other retry > problems in lanplus where assertion panics were hitting on certain retry > operations. A new, replacement patch for resolving both of these types of > retry bugs follows. > > The first bug,where the ipmi_lanplus_send_payload() is sending a payload > type of IPMI_PAYLOAD_TYPE_IPMI is retryable, however I found in testing that > it did not remove the previous request entry from the list of requests > chain. If the original message had timed out, a second message sent, the > second reply would not match up to the right entry on the list as the req > command and sequence numbers are the same. By first removing the first > request from the chain this resolves it. The consequence of not removing > the stale entry was random errors. > > The second bug is when waiting for a message response times out during the > ipmi_lanplus_send_payload types IPMI_PAYLOAD_TYPE(s) RCMP_OPEN_REQUEST, > RAKP1, RAKP_3. In various testing where the message timed out on either of > these three payload types, ipmitool would assertion panic upon retry as the > session_state was wrong. The timeout could be due to the message never > getting to the BMC, the BMC never acting/responding to the message, or the > reply message packet dropped (it is UDP after all). If the BMC had acted on > the message but the reply was not received, the BMC state would had > advanced, and a retry of any of these three commands would error. It is not > knowable at retry time if the BMC had acted on the message or not. The > solution is upon message timeout failure, retry all three commands in the > sequence. This has shown to be reliable and does not result in assertions or > any unexpected BMC behaviors. Should the original message response > eventually arrive very late, it is just discarded. > > The testing for these problems was elusive until we found a moderately slow > BMC and had separate sessions direct a fusillade of nmap operations on the > BMC, then run simple ipmitool commands. This caused sufficient loading of > the network and BMC to cause lengthy delays and outright packet drops. The > general approach on the second fix is to return a timeout error code back > through ipmi_lanplus_open where the sequence can be retried. >
Hello Pat, please, fix the following and I'll give it a commit, hm? ~~~ + if ((payload->payload_type == IPMI_PAYLOAD_TYPE_IPMI) && entry) + ipmi_req_remove_entry( entry->rq_seq, entry->req.msg.cmd); ~~~ Missing '{ ... }' ~~~ + int rc, retry; ~~~ Please, split it. ~~~ + //session->v2_data.session_state = LANPLUS_STATE_PRESESSION; ~~~ I see no reason why to keep this around. Thanks, Z. ------------------------------------------------------------------------------ Subversion Kills Productivity. Get off Subversion & Make the Move to Perforce. With Perforce, you get hassle-free workflows. Merge that actually works. Faster operations. Version large binaries. Built-in WAN optimization and the freedom to use Git, Perforce or both. Make the move to Perforce. http://pubads.g.doubleclick.net/gampad/clk?id=122218951&iu=/4140/ostg.clktrk _______________________________________________ Ipmitool-devel mailing list Ipmitool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ipmitool-devel