Hi, everybody,

I have monit restarting a web app when it runs into an error condition.

check file tomcat_log with path /var/log/tomcat5/catalina.out
  alert [email protected]
  start program = "/etc/init.d/tomcat5 start"
  stop  program = "/etc/init.d/tomcat5 stop"
  if match "^java.net.SocketException: Too many open files" then restart

When this happens there will be hundreds of matching lines in the log file and 
we'll see an alert for each of them, until the restart completes.  monit does 
not seem to run the restart cycle hundreds of times, so the necessary logic 
must be there already but I can't seem to see how to configure the alerting to 
act the same way as the actual restart handler does.

[EST Dec 11 14:18:25] info     : Monit has not changed                          
                                                   
[EST Dec 15 12:43:20] error    : 'tomcat_log' content match 
[java.net.SocketException: Too many open files]                        
[EST Dec 15 12:43:21] info     : 'tomcat_log' trying to restart
[EST Dec 15 12:43:21] info     : 'tomcat_log' stop: /etc/init.d/tomcat5
[EST Dec 15 12:43:21] info     : 'tomcat_log' start: /etc/init.d/tomcat5
[EST Dec 15 12:43:21] error    : 'tomcat_log' content match 
[java.net.SocketException: Too many open files]
[EST Dec 15 12:43:21] info     : 'tomcat_log' trying to restart
[EST Dec 15 12:43:21] error    : 'tomcat_log' content match 
[java.net.SocketException: Too many open files]
[EST Dec 15 12:43:22] info     : 'tomcat_log' trying to restart
[EST Dec 15 12:43:22] error    : 'tomcat_log' content match 
[java.net.SocketException: Too many open files]
[EST Dec 15 12:43:22] info     : 'tomcat_log' trying to restart
[EST Dec 15 12:43:22] error    : 'tomcat_log' content match 
[java.net.SocketException: Too many open files]
[EST Dec 15 12:43:22] info     : 'tomcat_log' trying to restart
---snip 288 lines---
[EST Dec 15 12:44:28] error    : 'tomcat_log' content match 
[java.net.SocketException: Too many open files]
[EST Dec 15 12:44:29] info     : 'tomcat_log' trying to restart

I think I could dampen the noise using <X> [TIMES WITHIN] <Y> CYCLES but that 
would also delay my restart, and I want the app to come back as soon as 
possible.  I could write my own state machine to call with exec on errors or 
the startup message ("^INFO: Server startup in") with my own stop/start and 
monitor/unmonitor but that seems like a bad idea as I'd be re-implementing 
monit functionality.

I see a few open questions on StackExchange with others hitting this but 
couldn't find a solution with a mailing list archive search - hoping there's 
something obvious I'm missing.

Thanks,
-Bill


-- 
Bill McGonigle, Owner   
BFC Computing, LLC       
http://bfccomputing.com/ 
Telephone: +1.855.SW.LIBRE
Email, IM, VOIP: [email protected]           
VCard: http://bfccomputing.com/vcard/bill.vcf
Social networks: bill_mcgonigle/bill.mcgonigle

--
To unsubscribe:
https://lists.nongnu.org/mailman/listinfo/monit-general

Reply via email to