Thanks for reviewing this patch,
Dan Williams <[email protected]> writes: > On Fri, Feb 21, 2020 at 9:55 AM Vaibhav Jain <[email protected]> wrote: >> >> This patch adds a new command argument to the 'monitor' command namely >> '--check-interval' that triggers a call to notify_dimm_event() at >> regular intervals forcing a periodic check of dimm smart events. >> >> This behavior is useful for dimms that do not support event notifications >> in case the health status of an nvdimm changes. This is especially >> true in case of PAPR-SCM dimms as the PHYP hyper-visor doesn't provide >> any notifications to the guest kernel on a change in nvdimm health >> status. In such case periodic polling of the is the only way to track >> the health of a nvdimm. >> >> The patch updates monitor_event() adding a timeout value to >> epoll_wait() call. Also to prevent the possibility of a single dimm >> generating enough events thereby preventing check of health status of >> other nvdimms, a 'fullpoll_ts' time-stamp is added to keep track of >> when full health check of all dimms happened. If after epoll_wait() >> returns 'fullpoll_ts' time-stamp indicates last full dimm health check >> happened beyond 'check-interval' seconds then a full dimm health check >> is enforced. >> >> Signed-off-by: Vaibhav Jain <[email protected]> >> --- >> Documentation/ndctl/ndctl-monitor.txt | 4 ++++ >> ndctl/monitor.c | 31 ++++++++++++++++++++++++--- >> 2 files changed, 32 insertions(+), 3 deletions(-) >> >> diff --git a/Documentation/ndctl/ndctl-monitor.txt >> b/Documentation/ndctl/ndctl-monitor.txt >> index 2239f047266d..14cc59d57157 100644 >> --- a/Documentation/ndctl/ndctl-monitor.txt >> +++ b/Documentation/ndctl/ndctl-monitor.txt >> @@ -108,6 +108,10 @@ will not work if "--daemon" is specified. >> The monitor will attempt to enable the alarm control bits for all >> specified events. >> >> +-i:: >> +--check-interval=:: >> + Force a recheck of dimm health every <n> seconds. > > I'd just call this "-p --poll=" to put the monitor into polled mode. Agree to the suggestion. Will update the flag name in V2. > >> + >> -u:: >> --human:: >> Output monitor notification as human friendly json format instead >> diff --git a/ndctl/monitor.c b/ndctl/monitor.c >> index 1755b87a5eeb..b72c5852524e 100644 >> --- a/ndctl/monitor.c >> +++ b/ndctl/monitor.c >> @@ -4,6 +4,7 @@ >> #include <stdio.h> >> #include <json-c/json.h> >> #include <libgen.h> >> +#include <time.h> >> #include <dirent.h> >> #include <util/json.h> >> #include <util/filter.h> >> @@ -33,6 +34,7 @@ static struct monitor { >> bool daemon; >> bool human; >> bool verbose; >> + unsigned int poll_timeout; >> unsigned int event_flags; >> struct log_ctx ctx; >> } monitor; >> @@ -322,9 +324,14 @@ static int monitor_event(struct ndctl_ctx *ctx, >> struct monitor_filter_arg *mfa) >> { >> struct epoll_event ev, *events; >> - int nfds, epollfd, i, rc = 0; >> + int nfds, epollfd, i, rc = 0, polltimeout = -1; >> struct monitor_dimm *mdimm; >> char buf; >> + /* last time a full poll happened */ >> + struct timespec fullpoll_ts, ts; >> + >> + if (monitor.poll_timeout) >> + polltimeout = monitor.poll_timeout * 1000; >> >> events = calloc(mfa->num_dimm, sizeof(struct epoll_event)); >> if (!events) { >> @@ -354,14 +361,30 @@ static int monitor_event(struct ndctl_ctx *ctx, >> } >> } >> >> + clock_gettime(CLOCK_BOOTTIME, &fullpoll_ts); >> while (1) { >> did_fail = 0; >> - nfds = epoll_wait(epollfd, events, mfa->num_dimm, -1); >> - if (nfds <= 0 && errno != EINTR) { >> + nfds = epoll_wait(epollfd, events, mfa->num_dimm, >> polltimeout); >> + if (nfds < 0 && errno != EINTR) { >> err(&monitor, "epoll_wait error: (%s)\n", >> strerror(errno)); >> rc = -errno; >> goto out; >> } >> + >> + /* If needed force a full poll of dimm health */ >> + clock_gettime(CLOCK_BOOTTIME, &ts); >> + if ((fullpoll_ts.tv_sec - ts.tv_sec) > monitor.poll_timeout) >> { >> + nfds = 0; >> + dbg(&monitor, "forcing a full poll\n"); >> + } > > Why is this hunk above needed? If the DIMMs are firing events then all > of them that fired will get serviced and on any timeout all DIMMs will > get serviced. Why does fullpoll_ts need to be tracked? > This hunk was added to prevent any dimm thats firing too many events from starving the poll of events from other dimms. >> + >> + /* If we timed out then fill events array with all dimms */ >> + if (nfds == 0) { >> + list_for_each(&mfa->dimms, mdimm, list) >> + events[nfds++].data.ptr = mdimm; >> + fullpoll_ts = ts; >> + } >> + >> for (i = 0; i < nfds; i++) { >> mdimm = events[i].data.ptr; >> if (util_dimm_event_filter(mdimm, >> monitor.event_flags)) { >> @@ -570,6 +593,8 @@ int cmd_monitor(int argc, const char **argv, struct >> ndctl_ctx *ctx) >> "use human friendly output formats"), >> OPT_BOOLEAN('v', "verbose", &monitor.verbose, >> "emit extra debug messages to log"), >> + OPT_UINTEGER('i', "check-interval", &monitor.poll_timeout, >> + "force a dimm health recheck every <n> >> seconds"), > > I'd replace "force a dimm health check" with "poll and report status". Fair point. Will update the this usage text in v2. > > The monitor can theoretically also register for region and bus events, > so --poll option just forces the monitor to report the all recorded > events not necessarily only dimm events. Agree -- Vaibhav Jain <[email protected]> Linux Technology Center, IBM India Pvt. Ltd. _______________________________________________ Linux-nvdimm mailing list -- [email protected] To unsubscribe send an email to [email protected]
