2010/2/23 Harald Stürzebecher <hara...@cs.tu-berlin.de>

> 2010/2/22 Samuel Hassine <samuel.hass...@gmail.com>:
> > I'm also looking for a way to monitor gluster nodes.
> >
> > Any solutions ?
> >
> > Le lundi 22 février 2010 à 10:12 +0500, Anton a écrit :
> >> Hello!
> >>
> >>
> >>
> >> I'm looking for the way to determine the health of the GLUSTER
> >> cluster. Is there any way to determine if any of the nodes failed? In
> >> the log files it is possible to grep that there is "remotexx:
> >> disconnected" - but it is not sutable for monitoring. There should be
> >> the simple way to just query the cluster against the .vol file and
> >> see, if any node/brick failed to attach and so trigger the alarm. Is
> >> there anything like "gluster --reporthealth"?
> Checking if a connection to the GlusterFS TCP server port (6996 IIRC)
> is possible might be an indicator for working/failing - at least for
> setups that use TCP. I don't know if anything like that is possible
> for Infiniband-only setups.
IPoIB (IP over Infiniband)?
> IIRC, Nagios can check if a port is open on a remote machine. That
> won't find something like disk/filesystem problems on the server, but
> it could report crashed GlusterFS server processes and machines that
> are not working at all.
nagios can run checks remotely


so it can check the real status of glusterfsd or whatever we want on remote

> I know that this simple method won't provide a positive status (=it
> works) which would be preferable, but at least it can provide a
> negative status (=_something_ failed on _that_ machine) in some cases.

glusterfsd port can be stolen, check of open port is indirect and unreliable
way to check status


> IIRC, some time ago someone requested a syslog feature to debug
> problems with GlusterFS as root filesystem for a diskless cluster -
> are there any news on that?
> Having the clients report problems to a central logging server might
> be useful for monitoring.
monitoring of glusterfs daemons from client side is unreliable as monitoring
errors can be caused by faults on the client side (I suppose nagios server
host(s) to be reliable host)

I insist on remote checks because
  1) glusterfsd should abort if non-recoverable error happened, in the case
remote check of real status is the most reliable check
  2) if glustefsd or any FS-related service continues to work in a
non-healthy state after non-recoverable error happened then it can lead to
damage and irreversible loss of data. Non-recoverable errors should be
investigated and fixed only by system administrator with complete set of
system tools at hands.



> Regards,
> Harald
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel@nongnu.org
> http://lists.nongnu.org/mailman/listinfo/gluster-devel
Gluster-devel mailing list

Reply via email to