On Thu, Feb 27, 2014 at 9:17 PM, Pat Donlin <p...@sgi.com> wrote:
> I had submitted a patch back on Nov 19, 2013 regarding a fix to lanplus
> retry. This had resolved a problem whereby a retry of a payload type of
> IPMI_PAYLOAD_TYPE_IPMI first removed the request from the queue before going
> back for a retry of the message. I have been able to determine why this fix
> works correctly. More importantly I have been able to resolve other retry
> problems in lanplus where assertion panics were hitting on certain retry
> operations.  A new, replacement patch for resolving both of these types of
> retry bugs follows.
>
> The first bug,where the ipmi_lanplus_send_payload() is sending a payload
> type of IPMI_PAYLOAD_TYPE_IPMI is retryable, however I found in testing that
> it did not remove the previous request entry from the list of requests
> chain. If the original message had timed out, a second message sent, the
> second reply would not match up to the right entry on the list as the req
> command and sequence numbers are the same. By first removing the first
> request from the chain this resolves it.  The consequence of not removing
> the stale entry was random errors.
>
> The second bug is when waiting for a message response times out during the
> ipmi_lanplus_send_payload types IPMI_PAYLOAD_TYPE(s) RCMP_OPEN_REQUEST,
> RAKP1, RAKP_3. In various testing where the message timed out on either of
> these three payload types, ipmitool would assertion panic upon retry as the
> session_state was wrong. The timeout could be due to the message never
> getting to the BMC, the BMC never acting/responding to the message, or the
> reply message packet dropped (it is UDP after all). If the BMC had acted on
> the message but the reply was not received, the BMC state would had
> advanced, and a retry of any of these three commands would error. It is not
> knowable at retry time if the BMC had acted on the message or not. The
> solution is upon message timeout failure, retry all three commands in the
> sequence. This has shown to be reliable and does not result in assertions or
> any unexpected BMC behaviors. Should the original message response
> eventually arrive very late, it is just discarded.
>
> The testing for these problems was elusive until we found a moderately slow
> BMC and had separate sessions direct a fusillade of nmap operations on the
> BMC, then run simple ipmitool commands. This caused sufficient loading of
> the network and BMC to cause lengthy delays and outright packet drops. The
> general approach on the second fix is to return a timeout error code back
> through ipmi_lanplus_open where the sequence can be retried.
>

Hello Pat,

please, fix the following and I'll give it a commit, hm?


~~~
+            if ((payload->payload_type == IPMI_PAYLOAD_TYPE_IPMI) && entry)
+                ipmi_req_remove_entry( entry->rq_seq, entry->req.msg.cmd);
~~~

Missing '{ ... }'

~~~
+    int rc, retry;
~~~

Please, split it.

~~~
+    //session->v2_data.session_state    = LANPLUS_STATE_PRESESSION;
~~~

I see no reason why to keep this around.

Thanks,
Z.

------------------------------------------------------------------------------
Subversion Kills Productivity. Get off Subversion & Make the Move to Perforce.
With Perforce, you get hassle-free workflows. Merge that actually works. 
Faster operations. Version large binaries.  Built-in WAN optimization and the
freedom to use Git, Perforce or both. Make the move to Perforce.
http://pubads.g.doubleclick.net/gampad/clk?id=122218951&iu=/4140/ostg.clktrk
_______________________________________________
Ipmitool-devel mailing list
Ipmitool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ipmitool-devel

Reply via email to