Funada,

I have seen similar retry failures in the lanplus driver, and have 
submitted fixes which are presently part of the 1.8.14-rc1 build. The 
specific problem you describe is likely the lack of cleanup on the prior 
request before starting the retry request. The last 3 lines added to the 
diff below highlights the call to clean up the old request off the list.

In general I am an advocate of increasing the default timeout for 
lanplus from 1 second to at least 4 seconds. My experience with Intel 
Romley and Grantley platforms has shown a steady increase in load on 
BMCs and more frequent occasions where the BMC simply cannot respond 
within 1 second.

Regards,

Pat Donlin
Principal Engineer
SGI



  diff -c lanplus.c.orig lanplus.c.leakfix
*** lanplus.c.orig      2014-01-15 07:48:36.000000000 -0600
--- lanplus.c.leakfix   2014-01-15 07:48:43.000000000 -0600
***************
*** 2099,2104 ****
--- 2099,2105 ----
         uint8_t             * msg_data;
         int                   msg_length;
         struct ipmi_session * session = intf->session;
+       struct ipmi_rq_entry * entry = NULL;
         int                   try = 0;
         int                   xmit = 1;
         time_t                ltime;
***************
*** 2123,2129 ****
                                 /*
                                  * Build an IPMI v1.5 or v2 command
                                  */
-                               struct ipmi_rq_entry * entry;
                                 struct ipmi_rq * ipmi_request = 
payload->payload.ipmi_request.request;

                                 lprintf(LOG_DEBUG, "");
--- 2124,2129 ----
***************
*** 2304,2309 ****
--- 2304,2312 ----

                         if (rsp)
                                 break;
+                       // req timed out, remove entry
+                       if ((payload->payload_type == 
IPMI_PAYLOAD_TYPE_IPMI) && entry)
+                               ipmi_req_remove_entry( entry->rq_seq, 
entry->req.msg.cmd);
                 }

                 /* only timeout if time exceeds the timeout value */

On 4/17/2014 7:22 AM, ipmitool-devel-requ...@lists.sourceforge.net wrote:
> Today's Topics:
>
>     1. regarding libipmitool library (sarath azad)
>     2. Implementation of lanplus for retry (Kazuyuki Funada)
>     3. [BMR #81324] PigeonPoint Systems various patches (Dmitry Bazhenov)
>
>
> ----------------------------------------------------------------------
>
>
> ------------------------------
>
> Message: 2
> Date: Fri, 11 Apr 2014 09:25:58 +0000
> From: Kazuyuki Funada <funada-...@necst.nec.co.jp>
> Subject: [Ipmitool-devel] Implementation of lanplus for retry
> To: "ipmitool-devel@lists.sourceforge.net"
>       <ipmitool-devel@lists.sourceforge.net>
> Message-ID:
>       <c91f56d9b1a7be42b84e6a85bcb0aaf301004...@bpxm22gp.gisp.nec.co.jp>
> Content-Type: text/plain; charset="iso-2022-jp"
>
>
> Hello. I'm a newbie in this list and I have a question about current 
> implementation of lanplus for retry.
> Our firmware developers including me have faced a issue that target 
> controller occasionally returns 0xc1 response to "Get Chassis Status" command.
> We found the issue happened when target controller could not respond within 1 
> second and ipmitool retried sending packet. We checked debugging output of 
> ipmitool and understood the mechanism was below.
>
> - user issued command(netfn=0x00 command=0x01) by ipmitool using lanplus I/F
> - ipmitool sent command(netfn=0x06 command=0x01) and added it to list with 
> seq#2
>    (target controller did not respond within 1 second)
> - ipmitool sent command again and added it to list with seq#2
> - target controller sent response for 1st command
> - ipmitool received it and removed 1st entry(seq#2)
> - ipmitool sent command(netfn=0x2c command=0x00) and added it to list with 
> seq#3
> - target controller sent response for 2nd command(retry)
> - ipmitool received it and removed 2nd entry(seq#2)
> - ipmitool sent user command(netfn=0x00 command=0x01) and added it to list 
> with seq#4
> - target controller sent response for 3rd command(seq#3) and it had 0xc1 
> response
> - ipmitool received it and removed 3rd entry(seq#3)
> - ipmitool returned this 0xc1 response to user even though it was not for 
> user command(seq#4).
>
> I guess ipmitool should remove 1st entry by itself before adding retried 
> command to list.
> I also checked source code of version 1.8.13 and found "lanplus.c" does not 
> have a code for the purpose but "lan.c" has it.
> My question is whether current implementation of "lanplus.c" is correct or 
> not.
> I also know we can avoid this problem by using "-N" option and we will use it 
> for a while.
>
> Best Regards,
> Kazuyuki Funada
>
>
> ------------------------------
>


------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/NeoTech
_______________________________________________
Ipmitool-devel mailing list
Ipmitool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ipmitool-devel

Reply via email to