Austin wrote:
We need an update. It has been 6 hours.

Here's everything I know about the situation so far...

PROBLEM SUMMARY
===============
Two power circuits under the floor went offline and all connected
Power supplies lost power. Normally a circuit going offline wouldn't
introduce a problem due to our Power Layout (detailed below for those
who are interested). In this case, since two circuits went offline and
were with adjoining racks, this affected multiple systems on the two
affected racks.


OVERALL IMPACT
==============
Due to the wide array of services affected, I will keep it brief for
each system. All listed began at the same time (4:14PM EDT)

OpenSRS : Offline, including AWI, RWI, API, Batch, Whois, Mailer, due
to Oracle Database being offline

OpenHRS : Offline, including AWI, RWI, API, Whois, Mailer, due to
Oracle Database being offline

Tulips : Offline including AWI, RWI, API, EMail Admin., due to Oracle
Database being offline

EMail : Offline including POP, IMAP, Webmail, API, due to all 3
frontends losing power

EMail Defense : Degraded (Web Portal unavailable), due to Postgres
Database being offline

Also lost power to 6 of 16 frontend servers - no problems with
mail delays reported

Tucows Main Website (www.tucows.com) : Heavily degraded due to 2 out
of 3 servers losing power

Managed DNS : Core DNS was functional but URL redirects (essential
component) offline, due to Oracle Database being offline

Digital Certificates : Digital Certificates operational, Provisioning
& management offline, due to Oracle Database being offline

Website Builder Service : Builder available, Provisioning &
management offline, due to Oracle Database being offline


CURRENT STATUS
==============
The Oracle database which services OpenSRS, OpenHRS, Tulips & Managed
DNS is still offline due to hardware/systemic issues when trying to
start backup up. We are working to restore asap. Otherwise, everything
else is back online and were restored at about 7:45PM EDT.


SYSTEM STATUS PAGE
==================
There are some deficiencies with the System Status page which in
effect, have resulted in incorrect reporting of the status of Blogware
(showing as Offline). This was fixed earlier tonight via the database
but a scheduled process appears to have modified it to display
incorrectly yet again. To be clear, Blogware has remained online for
the duration of these events.


POWER LAYOUT
============
Each rack has a single power circuit under the floor. Each rack houses
(2) RPCs (Power Units with 8 ports for 8 power supplies). The first
RPC in each rack is connected to the power circuit underneath the
rack. The second RPC is connected to a second circuit underneath an
adjoining rack. Each server has dual power supplies and is connected
to RPC#1 & RPC#2. This effectively results in fully redundant power to
each individual server.

A further update will be provided asap regarding the Oracle database
when there is something to report. Please be patient as we are nearing
the end of this nightmare.

--
_________________________________________________
Joey deVilla - Tucows, Inc. - [EMAIL PROTECTED]
TC/DC (Technical Community Development Coordinator)
"Nerdy Deeds Done Dirt Cheap"
_______________________________________________
domains-gen mailing list
[email protected]
http://discuss.tucows.com/mailman/listinfo/domains-gen

Reply via email to