My limited experience has been when you first set up you yourself (or whomever ops is) have to fly out there and spend quality time getting it all actually installed at the remote site no matter what, hiring new guys to do the initial setup and troubleshooting will just cost more, take longer and end up being even more headache than flying out there yourself. Plus if you do it yourself, you don't have to wonder if the job was done right from the get-go -- you'll know. It's probably obvious to say this but if you have time and the resouces -- this is what I did -- is to stage all the new core equipment in your QA or whathaveyour environment, test DC failover locally etc, before even going out there. This saves the most time, another thing is, mostly if your colo is not a telco, make sure you get someone from the colo to check your T1's to make sure they're all provisioned and actually live before you get there - I've been stalled a day or more by dumb things like that.
CMS/Console management systems which remotely cycle power but more importantly afford serial access for lom/lights-out-mgmt and bios manipulation will let you do pretty much anything aside from pushing in a fresh drive. I work a lot with Netfinity's and I use an APM card IBM makes that does the same thing, daisychains to twelve x330's at a time. There's also ethernet/IP-based KVM applicances that will let you do the same thing but they're really expensive so I've never used those, but if you have the money, I'd look into that.
Lastly its always cheaper to have a independent local tech on call outside the colo to fix those problems you can't get to, but as you might expect, response time is not as good as in-house -- but Colo support can be wierd; for instance your basic contract may allow an inhouse tech to power cycle the machine, but not replace a drive ('1 finger' = $ 'five fingers' = $$$$$, etc) - unless you pay biiig bucks. If hardware failures is the main concern, when we first got our machines, I had support contracts with IBM, NetApp, Cisco such that, as soon as a hardware failure was detected, an email was sent out from the box (or the monitoring box) to said co. and a company tech was there in 4 hours to install the replacement, without me having to lift a finger. That ran out, but it was pretty nice while it lasted. As long as all is nice and redundant (multi-proc, raids, teamed adapters, redundant switches/T1's, et al) 4 hour turnaround or longer is usually tolerable. For non-raided systems I just have several spare drives I update regularly on-site, ready for some human to push in, as well as a couple spare boxed CPUs, RAM, blah blah. So I would fly Ops out there for the first week or as long as it takes, test failover, then contract a cheap local tech, and keep the main Ops guys on one centralized on one coast. I think you get more done overall in a face-to-face environment anyway. Even if it gets really bad the local tech can almost always do enough to triage the situation, and then its still cheaper to fly one of your more experienced crew out than bankroll a pricey annual colo support contract. This will also allow you to more easily move your racks out elsewhere in the future should the need arise (and it does with everyone filing for bankruptcy, I can tell you - why I'm getting ready to move our DC to San Antonio right now, too), without uprooting anyone at the remote site. Then again if price is no object maybe thats not a concern, either.
-d
At 01:45 PM 6/24/2003 -0700, you wrote:
Greetings.
I'm wrestling with a decision and I'm hoping you guys can provide me with some relevant experience. I'm assuming some of you have worked places with lots of servers in colos. If you haven't you don't need to read the rest of this.
We are going to be moving to a bicoastal datacenter setup. The decision we need to make is whether to staff bicoastally as well.
We currently have a DC (colocated) in santa clara with about 100 machines in it. We'll be building a similar setup in an east coast datacenter.
There are currently 3 Network Operations folks who watch over the system, but the security guy and the director of netops are also in the oncall rotation. Plans are to hire 3 more netops folks by the time the second DC is live. The question is: do we keep this new staff here on the west coast? Or do we hire on the east coast?
I'm interested in hearing any experiences of small ops organizations (less than 15 people) supporting bicoastal datacenters. How were they organized? Did it work?
I'm also interested in experiences with larger organizations, and in your opinions on the matter even if you have no relevant experience.
Thanks. ___________________________________________________________________ P a u l [EMAIL PROTECTED]
_______________________________________________ Bits mailing list [EMAIL PROTECTED] http://www.sugoi.org/mailman/listinfo/bits
_______________________________________________ Bits mailing list [EMAIL PROTECTED] http://www.sugoi.org/mailman/listinfo/bits
