> >>>>
> >>>>>>   Another approach could be to integrate NVDIMM event
> >>>>>> monitoring into some other utility, like the rasdaemon.  I'm 
> >>>>>> interested in
> >>>>>> your thoughts.
> >>>>>
> >>>>> Though I'm not sure which (existing or new) utility is appropriate yet.
> >>>>> I prefer this way. So, I'll think about it.
> >>>>
> >>>> I investigated the issue that notification/monitoring feature of over-
> >>>> threshold event with my co-worker. Here is current our understandings.
> >>>>
> >>>>
> >>>> a) rasdaemon
> >>>>   It is good tools for machine check error, and if machine check occurs 
> >>>> on
> >>>>   NVDIMM, I suppose it will work not only traditional RAM but also 
> >>>> NVDIMM.
> >>>>   But, it may not fit the purpose of notification/monitoring threshold 
> >>>> event.
> >>>>
> >>>>
> >>>> b) smartmontools (https://www.smartmontools.org/)
> >>>>   This tool may fit the purpose of notification/monitoring of health of 
> >>>> NVDIMMs.
> >>>>   However, it may a bit troublesome due to the followings.
> >>>>
> >>>>     - The smartd seems to check smart values of each devices with
> >>>>       ioctl() periodically (In other words, "polling").
> >>>>       Probably, other devices does not have the
> >>>>       notification interface like "ndctl_dimm_get_health_eventfd()
> >>>>       and poll()/select()".
> >>>>
> >>>>     - smartmontools supports many OSs (Windows, darwin, xxxBSDs, os2(!)).
> >>>>       I'm not sure other OSs have similar notification interface like 
> >>>> Linux.
> >>>>       So, it may need to "polling" like other devices.
> >>>>
> >>>> c) udev
> >>>>    Udev can kick any programs if udev.rules is created.
> >>>>    However, there is no uevent for the event of over-threshold currently.
> >>>>    In addition, I'm not sure that udev fits this type of event 
> >>>> notification.
> >>>>
> >>>>
> >>>> d) make a new tiny daemon in ndctl tree
> >>>>    This may be simpler way.
> >>>>    It can use ndctl_dimm_get_health_eventfd() and poll()/select().
> >>>>
> >>>>    But, ndctl may be included in kernel source,
> >>>>    and I don't know whether kernel includes other daemon tools or not.
> >>>
> >>> e) acpid
> >>
> >> Except acpid is ACPI specific, and the event sources that libnvdimm
> >> generates are generic. For example, we may be getting an Open Firmware
> >> libnvdimm bus in the next merge window.
> >
> > Can you say more about that?  It seems that the notifications we're worried
> > about here and the interface for getting information about the notification
> > are both ACPI-specific.
> 
> Capturing the raw acpi events is not that interesting because we'll
> immediately want to turn around and ask what those mean to Linux
> kernel objects, so might as well monitor those objects directly.
> 
> > We haven't talked much about iwhat a daemon would do once it gets a
> > notification from whatever the source is.  That might help us determine
> > the right tool.  Is it just logging?
> 
> Yes, logging, and maybe a simple framework to call external helper
> applications when a given events fires, or fires too many times within
> a certain threshold.

I agree.

I guess some uses would like to use Logstash, Fluentd, or any other
log monitor/collector/analyzer tools. But another users want to kick
applications to avoid serious trouble like data corruption.

Thanks,
---
Yasunori Goto



_______________________________________________
Linux-nvdimm mailing list
[email protected]
https://lists.01.org/mailman/listinfo/linux-nvdimm

Reply via email to