I recently spent some time looking at this, I believe the 'summary' and 
'overall_status' sections are now deprecated. The 'status' and 'checks' fields 
are the ones to use now.

The 'status' field gives you the OK/WARN/ERR, but returning the most severe 
error condition from the 'checks' section is less trivial. AFAIK all 
health_warn states are treated as equally severe, and same for health_err. We 
ended up formatting our single line human readable output as something like:

"HEALTH_ERR: 1 inconsistent pg, HEALTH_ERR: 1 scrub error, HEALTH_WARN: 20 
large omap objects"

To make it obvious which check is causing which state. We needed to supress 
specific checks for callouts, so had to look at each check and the resulting 
state. If you're not trying to do something similar there may be a more 
lightweight way to go about it.

Cheers,
Tom

> -----Original Message-----
> From: ceph-users <ceph-users-boun...@lists.ceph.com> On Behalf Of Jan
> Kasprzak
> Sent: 02 January 2019 09:29
> To: ceph-users@lists.ceph.com
> Subject: [ceph-users] ceph health JSON format has changed sync?
> 
>       Hello, Ceph users,
> 
> I am afraid the following question is a FAQ, but I still was not able to find 
> the
> answer:
> 
> I use ceph --status --format=json-pretty as a source of CEPH status for my
> Nagios monitoring. After upgrading to Luminous, I see the following in the
> JSON output when the cluster is not healthy:
> 
>         "summary": [
>             {
>                 "severity": "HEALTH_WARN",
>                 "summary": "'ceph health' JSON format has changed in 
> luminous. If
> you see this your monitoring system is scraping the wrong fields. Disable this
> with 'mon health preluminous compat warning = false'"
>             }
>         ],
> 
> Apart from that, the JSON data seems reasonable. My question is which part
> of JSON structure are the "wrong fields" I have to avoid. Is it just the
> "summary" section, or some other parts as well? Or should I avoid the whole
> ceph --status and use something different instead?
> 
> What I want is a single machine-readable value with OK/WARNING/ERROR
> meaning, and a single human-readable text line, describing the most severe
> error condition which is currently present. What is the preferred way to get
> this data in Luminous?
> 
>       Thanks,
> 
> -Yenya
> 
> --
> | Jan "Yenya" Kasprzak <kas at {fi.muni.cz - work | yenya.net - private}> |
> | http://www.fi.muni.cz/~kas/                         GPG: 4096R/A45477D5 |
>  This is the world we live in: the way to deal with computers is to google  
> the
> symptoms, and hope that you don't have to watch a video. --P. Zaitcev
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to