Hi Chuck and discuss-list:
I guess no one noticed the OpenSRS outage early this morning.
I'm not sure how long it was down, however we confirmed it was definitely
down (RWI web and client script access to rr-n1-tor.opensrs.net) between
6:30am to 7:00am EST. We notified emergency support during this time and
they replied with "We are investigating this issue please await our
resposnse." We still haven't heard back from them and we did not monitor
when the system returned to normal.
Can we get a status report on this? What happened and what is being done
to prevent it in the future? A similar problem occurred in the recent
past. Please do you best to see that it doesn't happen again. I would
suggest a simple script that connects to the system (not just a ping as
the machine was not down--just the RWI and client services were not
responding) and tries logging in or performing a domain lookup--if not
successful, send a page to "emergency support" so that you can jump on
this a bit quicker instead of waiting for an RSP to complain. :)