On Mon, Mar 4, 2019 at 10:23 AM Javier González <[email protected]> wrote:
>
> > On 4 Mar 2019, at 10.02, Hans Holmberg <[email protected]> 
> > wrote:
> >
> > Igor: Have you seen this happening in real life?
> >
> > I think it would be better to count all expected errors and put them
> > in the right bucket (without spamming dmesg). If we need a new bucket
> > for i.e. vendor-specific-errors, let's do that instead.
> >
> > Someone wiser than me told me that every error print in the log is a
> > potential customer call.
> >
> > Javier: Yeah, I think S.M.A.R.T is the way to deliver this
> > information. Why can't we let the drives expose this info and remove
> > this from pblk? What's blocking that?
>
> Until now the spec. We added some new log information in Denali exactly
> for this. But since pblk supports OCSSD 1.2 and 2.0 I think it is needed to
> have it here, at least for debugging.

Why add it to the spec? Why not use whatever everyone else is using?

https://en.wikipedia.org/wiki/S.M.A.R.T. :
"S.M.A.R.T. (Self-Monitoring, Analysis and Reporting Technology; often
written as SMART) is a monitoring system included in computer hard
disk drives (HDDs), solid-state drives (SSDs),[1] and eMMC drives. Its
primary function is to detect and report various indicators of drive
reliability with the intent of anticipating imminent hardware
failures."
Sounds like what we want here.

For debugging, a trace point or something(i.e. BPF) would be a better
solution that would not impact hot-path performance.

>
> >
> > Thanks,
> > Hans
> >
> > On Mon, Mar 4, 2019 at 8:42 AM Javier González <[email protected]> wrote:
> >>> On 27 Feb 2019, at 18.14, Igor Konopko <[email protected]> wrote:
> >>>
> >>> Currently when unknown error occurs on read path
> >>> there is only dmesg information about it, but it
> >>> is not counted in sysfs statistics. Since this is
> >>> still an error we should also count it there.
> >>>
> >>> Signed-off-by: Igor Konopko <[email protected]>
> >>> ---
> >>> drivers/lightnvm/pblk-core.c | 1 +
> >>> 1 file changed, 1 insertion(+)
> >>>
> >>> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
> >>> index eabcbc119681..a98b2255f963 100644
> >>> --- a/drivers/lightnvm/pblk-core.c
> >>> +++ b/drivers/lightnvm/pblk-core.c
> >>> @@ -493,6 +493,7 @@ void pblk_log_read_err(struct pblk *pblk, struct 
> >>> nvm_rq *rqd)
> >>>              atomic_long_inc(&pblk->read_failed);
> >>>              break;
> >>>      default:
> >>> +             atomic_long_inc(&pblk->read_failed);
> >>>              pblk_err(pblk, "unknown read error:%d\n", rqd->error);
> >>>      }
> >>> #ifdef CONFIG_NVM_PBLK_DEBUG
> >>> --
> >>> 2.17.1
> >>
> >> I left this out intentionally  so that we could correlate the logs from
> >> the controller and the errors in the read path. Since we do not have an
> >> standard way to correlate this on SMART yet, let’s add this now (I
> >> assume that you are using it for something?) and we can separate the
> >> error stats in the future.
> >>
> >> Reviewed-by: Javier González <[email protected]>

Reply via email to