Hey,
On 2014-03-31 18:49, Eron Nicholson wrote:
Hey all,
Thanks for the responses and the info. I appreciate that you guys
are responsive to these issues. I also posted this to the check_mk
users list and haven't gotten any response yet (see
http://lists.mathias-kettner.de/pipermail/checkmk-en/2014-March/011881.html).
Since we are looking to use both Naemon and Check_mk in our new
monitoring system, I would certainly prefer it if there was a single
supported livestatus version shared between the two projects. We do
see some issues when trying to use the Check_MK UI with
naemon-livestatus, as they have added new columns :
Primary - Livestatus error
Unhandled exception: 400: Table 'hosts' has no column
'host_comments_with_extra_info'
We have built our own UI and Thruk is also perfectly fine, so this
isn't really a big concern. As long as the backends are compatible,
we should be fine with either version.
The major issue with the current version of naemon-livestatus is that
it crashes after ~10 seconds in our environment. As I mentioned
earlier, we have tons of passive services being sent in via livestatus
- both from the check_mk agent checks and our own custom checks. If
it disable our custom checks, naemon-livestatus will not crash, so it
has something to do with the additional passive checks we are sending.
I have enabled livestatus logging and debugging via :
broker_module=/usr/lib/naemon/livestatus.o /var/cache/naemon/live
log_file=/var/log/naemon/livestatus.log debug=1
And do not see any errors in the livestatus.log when the process dies.
I do sometimes see segfault errors in the naemon.log :
[1396281951] Caught SIGSEGV, shutting down...
I know there were a lot of those when we originally forked due
livestatus not even trying to synchronize with nagios/naemon before
submitting commands in the wrong thread, but I think Sven fixed that
before we released 0.8. I'm assuming your livestatus is built from our
latest git version?
We are very, very reliant on livestatus for both pushing in passive
service checks and pulling data for our UI. So our (new) monitoring
system is basically unusable until we can get a livestatus that works
with naemon and doesn't crash. Fortunately, we still have our nagios3
system up and working, so we have some time to try to figure out these
kinds of issues.
I would love to help out in troubleshooting this problem. Let me know
if there's a newer version of naemon-livestatus that I can try or if
you would like me to gather some more data on the crashes.
Thanks,
Eron Nicholson
Systems Administrator | Basecamp
Not to go all fanboi, but if there's one company name that makes me go
all, well, fanboi, it's basecamp. Love the company, would love to help
out if you give me a coredump.