This is also being discussed here:
https://communities.bmc.com/message/463704#463704

We created workflow that monitors the AR System Email Messages form for
emails that have not been sent in 10 minutes and restarts the Email Engine.

Jason


On Thu, Aug 28, 2014 at 9:32 AM, William Rentfrow <
[email protected]> wrote:

> **
>
> I've seen this in Linux from time to time as well.  It's not really
> frequent but it does happen.  We're on SuSe linux running 7.6.04 sp 5.
> Another environment is on SuSe with 8.1 - and it's happened to both.
>
>
>
> There's not a great way to test it honestly, since when it dies this way
> it doesn't appear to do anything bad.  There's nothing in the log files for
> the monitoring tools to grab.  In fact, a couple of weeks ago this died on
> a Saturday and for some reason no one noticed until Tuesday morning.  Then
> I fixed it....and it sent 200,000+ emails out.  I was **very** popular
> that day....
>
>
>
> We've kicked around a couple of idea like writing workflow to notify us of
> this, but the problem there is that everyone wants to get notified by
> email...so....that's not going to work.  It turns out a broken email
> process won't send email either :)
>
>
>
> I think long term the best solution would be for BMC to separate the email
> process completely from the AR server and do a check-in like it does for
> the server group.   Right now in a server group if email dies but the ar
> server itself stays up the email process won't hop to another machine.
> It's annoying and completely fixable, but BMC has not yet chosen to do that.
>
>
>
> If it did have a check-in then armonitor could kill it when it wasn't
> responding, regardless of if you were in a server group or not.
>
>
>
> Right now we just check it intermittently and hope for the best.
> Fortunately our email volume is high enough that our customers usually
> notice within an hour or two.
>
>
>
> *From:* Action Request System discussion list(ARSList) [mailto:
> [email protected]] *On Behalf Of *Rick Westbrock
> *Sent:* Thursday, August 28, 2014 10:25 AM
> *To:* [email protected]
> *Subject:* E-mail engine not getting POP3 messages (Linux) but not
> logging errors
>
>
>
> **
>
> Hi all-
>
>
>
> I had an interesting issue today and wondered if someone else had run into
> it before. I am running my e-mail engine (7.1) on a Linux server (RHEL
> 5.10) and using POP3 to get messages from a remote mail server. Normally if
> there’s a problem the Email Error form fills up with connection errors but
> this time it failed to pull down messages for over 24 hours but never
> logged an error.
>
>
>
> I used the emaild.sh script with the stop parameter to kill the process
> and normally it stops it immediately, then a monitoring script sees that it
> isn’t running and starts it up again. However today the stop script
> appeared to hang and after five minutes I finally did a kill -9 on the PID
> to kill the process. The monitoring script started it back up immediately
> with a new PID and it processed the 124 waiting messages via POP3 within 30
> seconds.
>
>
>
> Any ideas what would cause the engine to hang without logging an error?
> Any suggestions on how to monitor and alert on this situation? To date I
> have just been visually looking at the Inbox via Outlook on my local
> machine to make sure there are no messages waiting (the e-mail engine polls
> every two minutes) but that is obviously not an optimal solution.
> Apparently I forgot to check it yesterday, hence the 24 hour backup of
> messages.
>
>
>
> Thanks in advance,
>
> Rick
>
>
>
> *_________________________*
>
>
> *Rick Westbrock *AppOps Engineer | IT Department
> 24 Hour Fitness USA, Inc.
>
>
>  ------------------------------
>
> No virus found in this message.
> Checked by AVG - www.avg.com
> Version: 2014.0.4745 / Virus Database: 4007/8094 - Release Date: 08/24/14
>
> _ARSlist: "Where the Answers Are" and have been for 20 years_
>  _ARSlist: "Where the Answers Are" and have been for 20 years_
>

_______________________________________________________________________________
UNSUBSCRIBE or access ARSlist Archives at www.arslist.org
"Where the Answers Are, and have been for 20 years"

Reply via email to