Hi

We have a small shell script (runs on Solaris/Linux) which uses wget (or curl) to log into the mid-tier, display the home page and then log out. Both output pages are stored in separate files. Wget is set to timeout in 15 seconds, and the script runs every couple minutes.

The home page displays the quick link list, which exercises all of the system components including mid-tier, arserver and database.

If the second file isn't there, somethings wrong and the admins get mailed, paged, etc.

This is Solaris, so the script runs as a cron job.

I'm not sure it's a good idea to use Remedy to monitor itself. You'd need something to run the test and something else to check for non results, and that would be both messy and obscure. Just for fun, try passing the same DSO record bsck and forth every minute. If your copy is older than a couple minutes, the other server needs attention :-)

Doug

--
Doug Blair
Sent from my iPhone, typographic errors likely
+1-224-558-5462

On Jan 7, 2010, at 7:50 AM, "Shellman, David" <[email protected] > wrote:

**
There is a set length of time that the app server will attempt to reconnect to the DB instance. If it has re-established the connection after that length of time the system is in a state of limbo. You can add a variable (Db-Connection-Retries) in the conf file that will increase the length of time that the app server will attempts the database connection.

Dave
From: Action Request System discussion list(ARSList) [mailto:[email protected]] On Behalf Of Jason Miller
Sent: Wednesday, January 06, 2010 6:09 PM
To: [email protected]
Subject: Re: ARServer Status monitor

** Ah, I used to have a 7.1 servers that would do that. The app and db server were plugged into the same darn switch but they would lose connection. Fortunately it was short enough that ARS usually recovered with no issue. I am surprise that a restart of Portmapper would resolve the issue.

Do you have DSO? You could check available by transferring records. If not you could create a web service that does a simple query against a small form on prod and then have an Escalation on dev that would call the web service. If there is an error you could have dev perform whatever action you wish. This would include the Mid-Tier as a point of failure but maybe you want to make sure that MT is responding at the same time?

Another idea would be to call runmacro from an Escalation on the dev server to try export a small set of records from prod. You can then either parse the return for error messages or if no file was created assume that the server is down.

Look at the sc (Service Control) command to start/stop remote services.

HTH,
Jason

On Wed, Jan 6, 2010 at 1:42 PM, Reiser, John J <[email protected]> wrote:
**
Jason,



The Service looked like it was running but there were entries in the error.log file that pointed to a dropped network connection or DB unavailable.

General network error. Check your network documentation. (SQL Server 11)

SQL Server does not exist or access denied. (SQL Server 17)

Cannot open database "ARSystem" requested by the login. The login failed. (SQL Server 4060)



Then all night long the error.log was filled with

Dispatch : Timeout during database query -- consider using more specific search criteria to narrow the results, and retry the operation (<servername removed>) ARERR - 94

Distrib : Timeout during database query -- consider using more specific search criteria to narrow the results, and retry the operation (ARERR 94)



This happened every 6 minutes until they called me at 6 AM and I kick started the Portmapper.

Thanks,



---
John J. Reiser
Senior Software Development Analyst
Remedy Administrator/Developer
Lockheed Martin - MS2
The star that burns twice as bright burns half as long.
Pay close attention and be illuminated by its brilliance. - paraphrased by me

From: Action Request System discussion list(ARSList) [mailto:[email protected]] On Behalf Of Jason Miller
Sent: Wednesday, January 06, 2010 3:54 PM
To: [email protected]
Subject: Re: ARServer Status monitor



** Hi John,



Is ARS actually down on production or is just not allowing connections because Portmapper unhappy?

Jason
On Wed, Jan 6, 2010 at 10:10 AM, Reiser, John J <[email protected] > wrote:

**

Hello Listers,



ARServer 7.1 Patch 4

MS SQL Server 2005



Has anyone used server A to watch for outages on server B?



Since we do have slow periods overnight the production server has occasionally been knocked offline and panic sets in a 5AM when the day shift comes on.

It always just takes a ARSystem Portmapper restart to get things going again.

I’d like to set up something with the development server, which seem s to be unaffected, to periodically ping the production server and p erform X if it gets no response.

Like Query a form on production every 30 minutes between Midnight and 5 AM.

No reply of a known value will either: (In order of preference)

Restart the services on Production and send an email to let everyone know that a restart happened.
Email the Admin (me)
Email the operations team with instructions on restarting the services. (Operations people don’t like to restart things because th ey get blamed for unrelated breakage.)
Thanks,





---
John J. Reiser
Senior Software Development Analyst
Remedy Administrator/Developer
Lockheed Martin - MS2
The star that burns twice as bright burns half as long.
Pay close attention and be illuminated by its brilliance. - paraphrased by me



_Platinum Sponsor: [email protected] ARSlist: "Where the Answers Are"_


_Platinum Sponsor: [email protected] ARSlist: "Where the Answers Are"_

_Platinum Sponsor: [email protected] ARSlist: "Where the Answers Are"_

_Platinum Sponsor: [email protected] ARSlist: "Where the Answers Are"_ _Platinum Sponsor: [email protected] ARSlist: "Where the Answers Are"_

_______________________________________________________________________________
UNSUBSCRIBE or access ARSlist Archives at www.arslist.org
Platinum Sponsor:[email protected] ARSlist: "Where the Answers Are"

Reply via email to