Thanks for the feedback.
When bischeck "stop working" it would be interesting to understand if
anything gets logged after it "stops" and also what is logged when you
do a restart - but I suggest you do a stop and see what is logged before
starting.
I would suggest that you change the log level in logback.xml for all
packages
<root level="INFO">
<appender-ref ref="bischeck"/>
</root>
To avoid duplicates you should also add the additivity="false" on the
other logger. Based on the standard logback.xml you can test this in
your test environment first, have not tested it my self, and if it looks
good deploy in in production according to your specific customization of
paths, etc.
logback.xml:
<?xml version="1.0" encoding="UTF-8"?>
<configuration>
<jmxConfigurator />
<appender name="bischeck"
class="ch.qos.logback.core.rolling.RollingFileAppender">
<!--See also
http://logback.qos.ch/manual/appenders.html#RollingFileAppender-->
<File>/var/tmp/bischeck.log</File>
<encoder>
<pattern>%d{yyyy-MM-dd HH:mm:ss.SSS,Europe/Stockholm} ; %p ; %t ;
%c ; %m%ex%n</pattern>
</encoder>
<rollingPolicy
class="ch.qos.logback.core.rolling.FixedWindowRollingPolicy">
<maxIndex>3</maxIndex>
<FileNamePattern>/var/tmp/bischeck.log.%i</FileNamePattern>
</rollingPolicy>
<triggeringPolicy
class="ch.qos.logback.core.rolling.SizeBasedTriggeringPolicy">
<MaxFileSize>1000KB</MaxFileSize>
</triggeringPolicy>
</appender>
<logger name="com.ingby" level="INFO" additivity="false">
<appender-ref ref="bischeck"/>
</logger>
<logger name="com.ingby.socbox.bischeck.configuration.CachePurgeJob"
level="DEBUG" additivity="false">
<appender-ref ref="bischeck"/>
</logger>
<logger name="com.ingby.socbox.bischeck.cache.provider.redis"
level="DEBUG" additivity="false">
<appender-ref ref="bischeck"/>
</logger>
<logger name="org.quartz" level="INFO" additivity="false">
<appender-ref ref="bischeck"/>
</logger>
<root level="WARN">
<appender-ref ref="bischeck"/>
</root>
</configuration>
The root section will secure that everything from any java packages with
WARN or ERROR is logged to the bischeck appender.
Regards
Anders
On 07/25/2017 09:55 AM, Francesco Giuseppe Toffoli wrote:
Hi Anders,
thanks for your reply. I'll answer you to the variuos questions:
(1) the java version is:
openjdk version "1.8.0_91"
OpenJDK Runtime Environment (build 1.8.0_91-b14)
OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode)
and has not been updated recently. In our test environment, (where the
problem does not occur), the version is nearly the same (1.8.0_121).
The OS has not been updated, (CentOS release 6.6).
(2) Redis has not been uptaded recently, (redis 2.8.23). At the moment
we have more or less 13.000 keys used.
(3) We usually add checks, maybe weekly. The issue started to occur
some months ago, but it could happen that for 2 or 3 weeks everything
is ok, then we have several crashes in a week. I'm not so inclined to
give the guilt to some new checks, also because the testing server is
aligned to the production one.
(5) Yes, the restart is done via '/etc/init.d/bischeckd restart' and
it solves the issue. Physical memory on the server is always OK, i
don't think to a jvm out of memory.
In the Bischeck logs i didn't notice any error. However, at the next
crash i'll try have a deeper look at them.
Could i have a look at some other logs maybe?
Thanks,
Francesco
Il 24/07/2017 21:57, Anders Håål ha scritto:
Hi Giuseppe,
Sounds strange that it just stopped working after along time of
stability if not something has change:
- Anything change on the server you run bischeck on - OS, jdk
version, ......
- Update redis version? Change in configuration?
- Added any new bischeck check or changed something in the configuration?
- Anything else you can think about that may have change?
When you say restarting is it the normal /etc/init.d/bischeckd
restart that fix the problem? The reason I ask is that the script
just do a kill with TERM signal. If the jvm would be in a out of
memory situation it may not be enough, but you should have seen that
in the log I guess. Sure you do not have any ERROR or WARN entries in
the log.
/Anders
On 07/24/2017 02:14 PM, Francesco Giuseppe Toffoli wrote:
Hi,
we are experiencing a critical problem with Bischeck. It's a couple
of months it sometimes suddenly stops working: the daemon
/etc/init.d/bicheckd is running but no check results are sent to
Nagios. Restarting bischeck daemon fixes the issue.
Unfortunately we can't find any clue about the root cause on
bischeck logs, not even with DEBUG logging level enabled. Redis
database seems working properly and no increasing of memory/cpu
usage are reported on the server hosting bischeck while the issue
occurs.
Do you have any suggestion on how to deeply investigate this?
Regards,
Francesco
--
Francesco Giuseppe Toffoli
Monitoring Engineer
GSE Department
Tel: +39 01127387488
Mobile: +39 349.800.60.35
Email: _ftoff...@skylogic.it <mailto:ftoff...@skylogic.it>_
*
**Skylogic S. p. A.*
Strada Pianezza, 289
10151 Torino, Italy
This message contains confidential information and is intended only
for the individual named. If you are not the named addressee you
should not disseminate, distribute or copy this e-mail. Please
notify the sender immediately by e-mail if you have received this
e-mail by mistake and delete this e-mail from your system. E-mail
transmission cannot be guaranteed to be secure or error-free as
information could be intercepted, corrupted, lost, destroyed, arrive
late or incomplete, or contain viruses. The sender therefore does
not accept liability for any errors or omissions in the contents of
this message, which arise as a result of e-mail transmission. If
verification is required please request a hard-copy version. Please
note that any views or opinions presented in this email are solely
those of the author and do not necessarily represent those of the
Company.
No employee or agent is authorized to conclude any binding agreement
on behalf of this Company nor, through this latter, any of the
Eutelsat Communication group with another party by email without
express written confirmation by a duly authorized officer of the
Company. The list of duly authorized officers and the scope of their
powers is published on the Trade Register according to the national
law of each affiliate.
--
Ingby<http://www.ingby.com>
bischeck - dynamic and adaptive monitoring for Nagios<http://www.bischeck.org>
anders.h...@ingby.com<mailto:anders.h...@ingby.com>
Mjukvara genom ingenjörsmässig kreativitet och kompetens
Ingenjörsbyn
Box 531
101 30 Stockholm
Sweden
www.ingby.com <http://www.ingby.com/>
Mobil: +46 70 575 35 46
Tele: +46 75 75 75 090
Fax: +46 75 75 75 091
--
Francesco Giuseppe Toffoli
Monitoring Engineer
GSE Department
Tel: +39 01127387488
Mobile: +39 349.800.60.35
Email: _ftoff...@skylogic.it <mailto:ftoff...@skylogic.it>_
*
**Skylogic S. p. A.*
Strada Pianezza, 289
10151 Torino, Italy
This message contains confidential information and is intended only
for the individual named. If you are not the named addressee you
should not disseminate, distribute or copy this e-mail. Please notify
the sender immediately by e-mail if you have received this e-mail by
mistake and delete this e-mail from your system. E-mail transmission
cannot be guaranteed to be secure or error-free as information could
be intercepted, corrupted, lost, destroyed, arrive late or incomplete,
or contain viruses. The sender therefore does not accept liability for
any errors or omissions in the contents of this message, which arise
as a result of e-mail transmission. If verification is required please
request a hard-copy version. Please note that any views or opinions
presented in this email are solely those of the author and do not
necessarily represent those of the Company.
No employee or agent is authorized to conclude any binding agreement
on behalf of this Company nor, through this latter, any of the
Eutelsat Communication group with another party by email without
express written confirmation by a duly authorized officer of the
Company. The list of duly authorized officers and the scope of their
powers is published on the Trade Register according to the national
law of each affiliate.
--
Ingby <http://www.ingby.com>
bischeck - dynamic and adaptive monitoring for Nagios <http://www.bischeck.org>
anders.h...@ingby.com<mailto:anders.h...@ingby.com>
Mjukvara genom ingenjörsmässig kreativitet och kompetens
Ingenjörsbyn
Box 531
101 30 Stockholm
Sweden
www.ingby.com <http://www.ingby.com/>
Mobil: +46 70 575 35 46
Tele: +46 75 75 75 090
Fax: +46 75 75 75 091