On Tue, 2020-07-21 at 17:13 +0530, Vaibhav Jain wrote:
> We recently discovered intermittent failures while reading label-area
> of PAPR-NVDimms and the command 'read-labels' would in such a case
> generated empty output like below:
>
> $ sudo ndctl read-labels -j nmem0
> [
> ]
> read 0 nmem
>
> Upon investigation we found that this was caused by a spurious error
> code returned from ndctl_cmd_submit_xlat() when its called from
> ndctl_dimm_read_label_extent() while trying to read the label-area
> contents of the NVDIMM.
>
> Digging further it was relieved that ndctl_cmd_submit_xlat() would
> always call papr_xlat_firmware_status() via pointer
> 'papr_dimm_ops->xlat_firmware_status' to retrieve translated firmware
> status for all ndctl_cmds even though they arent really PAPR PDSM
> commands.
>
> In this case ndctl_cmd->type == ND_CMD_GET_CONFIG_DATA and was
> represented by type 'struct nd_cmd_get_config_data_hdr' and
> papr_xlat_firmware_status() incorrectly assumed it to be of type
> 'struct nd_pkg_pdsm' and wrongly dereferenced it returning an invalid
> value.
>
> A proper fix for this would probably need introducing a new ndctl_cmd
> callback like 'ndctl_cmd.get_xlat_firmware_status' similar to one
> introduced in [1]. However such a change could be disruptive, hence
> the patch introduces a small workaround in papr_xlat_firmware_status()
> that checks if the 'struct ndctl_cmd *' provided if of correct type
> CMD_CALL and if not then it ignores it and return '0'
>
> References:
> [1]: commit fa754dd8acdb ("ndctl/dimm: Rework dimm command status
> reporting")
>
> Fixes: 151d2876c49e ("papr: Add scaffolding to issue and handle PDSM
> requests")
> Reported-by: "Aneesh Kumar K.V" <[email protected]>
> Signed-off-by: Vaibhav Jain <[email protected]>
> ---
> ndctl/lib/papr.c | 4 +---
> 1 file changed, 1 insertion(+), 3 deletions(-)
>
> diff --git a/ndctl/lib/papr.c b/ndctl/lib/papr.c
> index d9ce253369b3..63561f8f9797 100644
> --- a/ndctl/lib/papr.c
> +++ b/ndctl/lib/papr.c
> @@ -56,9 +56,7 @@ static u32 papr_get_firmware_status(struct ndctl_cmd *cmd)
>
> static int papr_xlat_firmware_status(struct ndctl_cmd *cmd)
> {
> - const struct nd_pkg_pdsm *pcmd = to_pdsm(cmd);
> -
> - return pcmd->cmd_status;
> + return (cmd->type == ND_CMD_CALL) ? to_pdsm(cmd)->cmd_status : 0;
Is this correct? -- for non ND_CMD_CALL commands this will always return a 0,
and it seems like you will lose any error state for commands
like ND_CMD_SET_CONFIG_DATA.
> }
>
> /* Verify if the given command is supported and valid */
_______________________________________________
Linux-nvdimm mailing list -- [email protected]
To unsubscribe send an email to [email protected]