On Mon, Aug 3, 2015 at 5:10 PM, Quentin Hartman
<[email protected]> wrote:
> The problem with this kind of monitoring is that there are so many possible
> metrics to watch and so many possible ways to watch them. For myself, I'm
> working on implementing a couple of things:
> - Watching error counters on servers
> - Watching error counters on switches
> - Watching performance

I would also check:

- link speed (on both servers and switches)
- link usage (over 80% issue a warning)

.a.

-- 
[email protected]
S3IT: Services and Support for Science IT        http://www.s3it.uzh.ch/
University of Zurich                             Y12 F 84
Winterthurerstrasse 190
CH-8057 Zurich Switzerland
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to