Francesco - any progress on the issue?

On 07/26/2017 05:52 PM, Anders Håål wrote:

Thanks for the feedback.

When bischeck "stop working" it would be interesting to understand if anything gets logged after it "stops" and also what is logged when you do a restart - but I suggest you do a stop and see what is logged before starting.

I would suggest that you change the log level in logback.xml for all packages

 <root level="INFO">
    <appender-ref ref="bischeck"/>
  </root>

To avoid duplicates you should also add the additivity="false" on the other logger. Based on the standard logback.xml you can test this in your test environment first, have not tested it my self, and if it looks good deploy in in production according to your specific customization of paths, etc.


logback.xml:

<?xml version="1.0" encoding="UTF-8"?>

<configuration>
  <jmxConfigurator />
<appender name="bischeck" class="ch.qos.logback.core.rolling.RollingFileAppender"> <!--See also http://logback.qos.ch/manual/appenders.html#RollingFileAppender-->
    <File>/var/tmp/bischeck.log</File>
    <encoder>
<pattern>%d{yyyy-MM-dd HH:mm:ss.SSS,Europe/Stockholm} ; %p ; %t ; %c ; %m%ex%n</pattern>
    </encoder>

<rollingPolicy class="ch.qos.logback.core.rolling.FixedWindowRollingPolicy">
      <maxIndex>3</maxIndex>
<FileNamePattern>/var/tmp/bischeck.log.%i</FileNamePattern>
    </rollingPolicy>

<triggeringPolicy class="ch.qos.logback.core.rolling.SizeBasedTriggeringPolicy">
      <MaxFileSize>1000KB</MaxFileSize>
    </triggeringPolicy>

  </appender>

  <logger name="com.ingby" level="INFO" additivity="false">
    <appender-ref ref="bischeck"/>
  </logger>


<logger name="com.ingby.socbox.bischeck.configuration.CachePurgeJob" level="DEBUG" additivity="false">
    <appender-ref ref="bischeck"/>
  </logger>

<logger name="com.ingby.socbox.bischeck.cache.provider.redis" level="DEBUG" additivity="false">
    <appender-ref ref="bischeck"/>
  </logger>


  <logger name="org.quartz" level="INFO" additivity="false">
    <appender-ref ref="bischeck"/>
  </logger>

  <root level="WARN">
    <appender-ref ref="bischeck"/>
  </root>

</configuration>


The root section will secure that everything from any java packages with WARN or ERROR is logged to the bischeck appender.
Regards
Anders

On 07/25/2017 09:55 AM, Francesco Giuseppe Toffoli wrote:

Hi Anders,
thanks for your reply. I'll answer you to the variuos questions:

(1) the java version is:

openjdk version "1.8.0_91"
OpenJDK Runtime Environment (build 1.8.0_91-b14)
OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode)

and has not been updated recently. In our test environment, (where the problem does not occur), the version is nearly the same (1.8.0_121).
The OS has not been updated, (CentOS release 6.6).

(2) Redis has not been uptaded recently, (redis 2.8.23). At the moment we have more or less 13.000 keys used.

(3) We usually add checks, maybe weekly. The issue started to occur some months ago, but it could happen that for 2 or 3 weeks everything is ok, then we have several crashes in a week. I'm not so inclined to give the guilt to some new checks, also because the testing server is aligned to the production one.


(5) Yes, the restart is done via '/etc/init.d/bischeckd restart' and it solves the issue. Physical memory on the server is always OK, i don't think to a jvm out of memory.

In the Bischeck logs i didn't notice any error. However, at the next crash i'll try have a deeper look at them.
Could i have a look at some other logs maybe?

Thanks,
Francesco





Il 24/07/2017 21:57, Anders Håål ha scritto:

Hi Giuseppe,

Sounds strange that it just stopped working after along time of stability if not something has change:

- Anything change on the server you run bischeck on - OS, jdk version, ......

- Update redis version? Change in configuration?

- Added any new bischeck check or changed something in the configuration?

- Anything else you can think about that may have change?

When you say restarting is it the normal /etc/init.d/bischeckd restart that fix the problem? The reason I ask is that the script just do a kill with TERM signal. If the jvm would be in a out of memory situation it may not be enough, but you should have seen that in the log I guess. Sure you do not have any ERROR or WARN entries in the log.

/Anders



On 07/24/2017 02:14 PM, Francesco Giuseppe Toffoli wrote:

Hi,
we are experiencing a critical problem with Bischeck. It's a couple of months it sometimes suddenly stops working: the daemon /etc/init.d/bicheckd is running but no check results are sent to Nagios. Restarting bischeck daemon fixes the issue. Unfortunately we can't find any clue about the root cause on bischeck logs, not even with DEBUG logging level enabled. Redis database seems working properly and no increasing of memory/cpu usage are reported on the server hosting bischeck while the issue occurs.

Do you have any suggestion on how to deeply investigate this?

Regards,
Francesco

--

Francesco Giuseppe Toffoli
Monitoring Engineer

GSE Department

Tel: +39 01127387488

Mobile: +39 349.800.60.35
Email: _ftoff...@skylogic.it <mailto:ftoff...@skylogic.it>_
*
**Skylogic S. p. A.*
Strada Pianezza, 289
10151 Torino, Italy



This message contains confidential information and is intended only for the individual named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message, which arise as a result of e-mail transmission. If verification is required please request a hard-copy version. Please note that any views or opinions presented in this email are solely those of the author and do not necessarily represent those of the Company. No employee or agent is authorized to conclude any binding agreement on behalf of this Company nor, through this latter, any of the Eutelsat Communication group with another party by email without express written confirmation by a duly authorized officer of the Company. The list of duly authorized officers and the scope of their powers is published on the Trade Register according to the national law of each affiliate.

--


Ingby<http://www.ingby.com>

bischeck - dynamic and adaptive monitoring for Nagios<http://www.bischeck.org>

anders.h...@ingby.com<mailto:anders.h...@ingby.com>

Mjukvara genom ingenjörsmässig kreativitet och kompetens

Ingenjörsbyn
Box 531
101 30 Stockholm
Sweden
www.ingby.com  <http://www.ingby.com/>
Mobil: +46 70 575 35 46
Tele: +46 75 75 75 090
Fax:  +46 75 75 75 091

--

Francesco Giuseppe Toffoli
Monitoring Engineer

GSE Department

Tel: +39 01127387488

Mobile: +39 349.800.60.35
Email: _ftoff...@skylogic.it <mailto:ftoff...@skylogic.it>_
*
**Skylogic S. p. A.*
Strada Pianezza, 289
10151 Torino, Italy



This message contains confidential information and is intended only for the individual named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message, which arise as a result of e-mail transmission. If verification is required please request a hard-copy version. Please note that any views or opinions presented in this email are solely those of the author and do not necessarily represent those of the Company. No employee or agent is authorized to conclude any binding agreement on behalf of this Company nor, through this latter, any of the Eutelsat Communication group with another party by email without express written confirmation by a duly authorized officer of the Company. The list of duly authorized officers and the scope of their powers is published on the Trade Register according to the national law of each affiliate.

--


Ingby<http://www.ingby.com>

bischeck - dynamic and adaptive monitoring for Nagios<http://www.bischeck.org>

anders.h...@ingby.com<mailto:anders.h...@ingby.com>

Mjukvara genom ingenjörsmässig kreativitet och kompetens

Ingenjörsbyn
Box 531
101 30 Stockholm
Sweden
www.ingby.com  <http://www.ingby.com/>
Mobil: +46 70 575 35 46
Tele: +46 75 75 75 090
Fax:  +46 75 75 75 091

--


Ingby <http://www.ingby.com>

bischeck - dynamic and adaptive monitoring for Nagios <http://www.bischeck.org>

anders.h...@ingby.com<mailto:anders.h...@ingby.com>

Mjukvara genom ingenjörsmässig kreativitet och kompetens

Ingenjörsbyn
Box 531
101 30 Stockholm
Sweden
www.ingby.com <http://www.ingby.com/>
Mobil: +46 70 575 35 46
Tele: +46 75 75 75 090
Fax:  +46 75 75 75 091

Reply via email to