I would suggest looking at what types of notifications they're looking for.  
When I'm been working on AI jobs, I tend to put failure notifications within 
the AI jobs and transformations themselves.  

For instance, within transformations I use a combination of 'Filter rows' to 
define when something is not meant to dump information in a file and then send 
that file through the 'send email' step.  I also use the 'error handling' 
connection (mostly on the AR steps) to write errors to a file and then send 
that file through email via 'send email' step.  This is probably more "error 
handling" then monitoring, but for me if gives me the ability to be notified on 
errors while letting the AI job continue through without having to error out.

In AI jobs, I use the failure connection to send an email when a specific 
transformation returns an error to notify me when the transformation didn't 
complete correctly.

The only thing that the above doesn't provide me is in events where an AI job 
doesn't kicked kicked off via the scheduler.  The only times we have seen this 
within our environment is when the diserver runs out of memory, which we are 
currently working on putting in monitoring and automation to restart the 
diserver when it reaches a certain memory threshold.

The way I had identified our memory issue is that I had seen the following 
errors within the ‘\BMC Software\ARSystem\Arserver\Db\arcarte.log on the server:

java.lang.OutOfMemoryError: Java heap space
java.lang.OutOfMemoryError: GC overhead limit exceeded

You can look at tying other monitoring if needed into the arcarte.log file to 
look for specific events as well.

_______________________________________________________________________________
UNSUBSCRIBE or access ARSlist Archives at www.arslist.org
"Where the Answers Are, and have been for 20 years"

Reply via email to