On Fri, Aug 12, 2022 at 04:33:18PM -0700, Jay Vosburgh wrote:
> We have observed issues wherein the IPMI driver will sleep forever in
> uninterruptible wait with mutexes held (specifically, dyn_mutex and
> bmc_reg_mutex, acquired by __bmc_get_device_id).  This occurs ultimately
> due to a BMC firmware bug; the BMC will fail to respond to requests,
> apparently related to the request rate, and the current logic will wait
> forever.

This really isn't the right fix.  The state machines running the
interfaces are required to time out after a period of time, usually
5 seconds, but that depends on how the hardware is behaving, or
misbehaving in this case.  So though these are not timed mutexes, what
is running below should be timed, so it shouldn't be necessary here.

What is the particular hardware involved?  The buggy hardware may be
exercising a software bug.

-corey

> 
> When the problem occurs, as each successive process queries the BMC,
> they will block in D state waiting to acquire the mutex, and with time
> the machine hangs. The BMC vendor has agreed it may be a hardware fault.
> 
> This patch addresses the problem by replacing wait_event() with
> wait_event_timeout(). If the event times out (meaning the problem has
> occurred) the bmc->dyn_id_set and bmc->dyn_guid_set are set to 0 and the
> process eventually returns.
> 
> Fixes: aa9c9ab2443e ("ipmi: allow dynamic BMC version information")
> Signed-off-by: Jay Vosburgh <[email protected]>
> Signed-off-by: Ioanna Alifieraki <[email protected]>
> 
> ---
>  drivers/char/ipmi/ipmi_msghandler.c | 14 ++++++++++----
>  1 file changed, 10 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/char/ipmi/ipmi_msghandler.c 
> b/drivers/char/ipmi/ipmi_msghandler.c
> index 703433493c85..a510839853b5 100644
> --- a/drivers/char/ipmi/ipmi_msghandler.c
> +++ b/drivers/char/ipmi/ipmi_msghandler.c
> @@ -2572,7 +2572,9 @@ static int __get_device_id(struct ipmi_smi *intf, 
> struct bmc_device *bmc)
>       if (rv)
>               goto out_reset_handler;
>  
> -     wait_event(intf->waitq, bmc->dyn_id_set != 2);
> +     rv = wait_event_timeout(intf->waitq, bmc->dyn_id_set != 2, HZ * 5);
> +     if (!rv)
> +             bmc->dyn_id_set = 0;
>  
>       if (!bmc->dyn_id_set) {
>               if (bmc->cc != IPMI_CC_NO_ERROR &&
> @@ -3337,11 +3339,15 @@ static void __get_guid(struct ipmi_smi *intf)
>       bmc->dyn_guid_set = 2;
>       intf->null_user_handler = guid_handler;
>       rv = send_guid_cmd(intf, 0);
> -     if (rv)
> +     if (rv) {
>               /* Send failed, no GUID available. */
>               bmc->dyn_guid_set = 0;
> -     else
> -             wait_event(intf->waitq, bmc->dyn_guid_set != 2);
> +     } else {
> +             rv = wait_event_timeout(intf->waitq, bmc->dyn_guid_set != 2,
> +                                     HZ * 5);
> +             if (!rv)
> +                     bmc->dyn_guid_set = 0;
> +     }
>  
>       /* dyn_guid_set makes the guid data available. */
>       smp_rmb();
> -- 
> 2.34.1
> 


_______________________________________________
Openipmi-developer mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/openipmi-developer

Reply via email to