On Tue, 2020-07-21 at 17:13 +0530, Vaibhav Jain wrote:
> We recently discovered intermittent failures while reading label-area
> of PAPR-NVDimms and the command 'read-labels' would in such a case
> generated empty output like below:
> 
> $ sudo ndctl read-labels -j nmem0
> [
> ]
> read 0 nmem
> 
> Upon investigation we found that this was caused by a spurious error
> code returned from ndctl_cmd_submit_xlat() when its called from
> ndctl_dimm_read_label_extent() while trying to read the label-area
> contents of the NVDIMM.
> 
> Digging further it was relieved that ndctl_cmd_submit_xlat() would
> always call papr_xlat_firmware_status() via pointer
> 'papr_dimm_ops->xlat_firmware_status' to retrieve translated firmware
> status for all ndctl_cmds even though they arent really PAPR PDSM
> commands.
> 
> In this case ndctl_cmd->type == ND_CMD_GET_CONFIG_DATA and was
> represented by type 'struct nd_cmd_get_config_data_hdr' and
> papr_xlat_firmware_status() incorrectly assumed it to be of type
> 'struct nd_pkg_pdsm' and wrongly dereferenced it returning an invalid
> value.
> 
> A proper fix for this would probably need introducing a new ndctl_cmd
> callback like 'ndctl_cmd.get_xlat_firmware_status' similar to one
> introduced in [1]. However such a change could be disruptive, hence
> the patch introduces a small workaround in papr_xlat_firmware_status()
> that checks if the 'struct ndctl_cmd *' provided if of correct type
> CMD_CALL and if not then it ignores it and return '0'
> 
> References:
> [1]: commit fa754dd8acdb ("ndctl/dimm: Rework dimm command status
> reporting")
> 
> Fixes: 151d2876c49e ("papr: Add scaffolding to issue and handle PDSM 
> requests")
> Reported-by: "Aneesh Kumar K.V" <[email protected]>
> Signed-off-by: Vaibhav Jain <[email protected]>
> ---
>  ndctl/lib/papr.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/ndctl/lib/papr.c b/ndctl/lib/papr.c
> index d9ce253369b3..63561f8f9797 100644
> --- a/ndctl/lib/papr.c
> +++ b/ndctl/lib/papr.c
> @@ -56,9 +56,7 @@ static u32 papr_get_firmware_status(struct ndctl_cmd *cmd)
>  
>  static int papr_xlat_firmware_status(struct ndctl_cmd *cmd)
>  {
> -     const struct nd_pkg_pdsm *pcmd = to_pdsm(cmd);
> -
> -     return pcmd->cmd_status;
> +     return (cmd->type == ND_CMD_CALL) ? to_pdsm(cmd)->cmd_status : 0;

Is this correct? -- for non ND_CMD_CALL commands this will always return a 0,
and it seems like you will lose any error state for commands
like ND_CMD_SET_CONFIG_DATA.

>  }
>  
>  /* Verify if the given command is supported and valid */

_______________________________________________
Linux-nvdimm mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to