2010/2/23 Harald Stürzebecher <hara...@cs.tu-berlin.de> > 2010/2/22 Samuel Hassine <samuel.hass...@gmail.com>: > > I'm also looking for a way to monitor gluster nodes. > > > > Any solutions ? > > > > Le lundi 22 février 2010 à 10:12 +0500, Anton a écrit : > >> Hello! > >> > >> > >> > >> I'm looking for the way to determine the health of the GLUSTER > >> cluster. Is there any way to determine if any of the nodes failed? In > >> the log files it is possible to grep that there is "remotexx: > >> disconnected" - but it is not sutable for monitoring. There should be > >> the simple way to just query the cluster against the .vol file and > >> see, if any node/brick failed to attach and so trigger the alarm. Is > >> there anything like "gluster --reporthealth"? > > Checking if a connection to the GlusterFS TCP server port (6996 IIRC) > is possible might be an indicator for working/failing - at least for > setups that use TCP. I don't know if anything like that is possible > for Infiniband-only setups. > IPoIB (IP over Infiniband)? > > > IIRC, Nagios can check if a port is open on a remote machine. That > won't find something like disk/filesystem problems on the server, but > it could report crashed GlusterFS server processes and machines that > are not working at all. > nagios can run checks remotely
http://www.logix.cz/michal/devel/nagios/ http://blogs.techrepublic.com.com/opensource/?p=321 so it can check the real status of glusterfsd or whatever we want on remote host > > I know that this simple method won't provide a positive status (=it > works) which would be preferable, but at least it can provide a > negative status (=_something_ failed on _that_ machine) in some cases. glusterfsd port can be stolen, check of open port is indirect and unreliable way to check status > @gluster.org: > IIRC, some time ago someone requested a syslog feature to debug > problems with GlusterFS as root filesystem for a diskless cluster - > are there any news on that? > Having the clients report problems to a central logging server might > be useful for monitoring. > monitoring of glusterfs daemons from client side is unreliable as monitoring errors can be caused by faults on the client side (I suppose nagios server host(s) to be reliable host) I insist on remote checks because 1) glusterfsd should abort if non-recoverable error happened, in the case remote check of real status is the most reliable check 2) if glustefsd or any FS-related service continues to work in a non-healthy state after non-recoverable error happened then it can lead to damage and irreversible loss of data. Non-recoverable errors should be investigated and fixed only by system administrator with complete set of system tools at hands. Regards, Alexey. > > > Regards, > > Harald > > > _______________________________________________ > Gluster-devel mailing list > Gluster-devel@nongnu.org > http://lists.nongnu.org/mailman/listinfo/gluster-devel >
_______________________________________________ Gluster-devel mailing list Gluster-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/gluster-devel