>>>>> "Luke" == Luke Crawford <[EMAIL PROTECTED]> writes:

  Luke> On Mon, 30 Oct 2006, Luke Kanies wrote:
  >> Luke Crawford wrote:
>>
>> Most of the places I have worked, most of the 'fires' were caused
  >>> by an admin mistake.
  >>> 
  >>> Even now that my job is to put out other people's fires,
  >>> configuration errors are close to half the problems I encounter.
  >> 
  >> I have one more comment about this:
  >> 
  >> Even for cases like yours where this is true, these errors are
  >> essentially bugs in the system configuration.  Do you attempt to
  >> choose languages that won't allow developers to make bugs, or do
  >> you hire competent programmers who use modern tools and
  >> methodologies (like unit tests and version control) to increase
  >> their quality?

  Luke> Yes; testing is very important.  At the moment I am brute
  Luke> forcing it with virtual servers for the really important
  Luke> stuff; but that is a pain in the ass, and gets skipped for
  Luke> important stuff.

  Luke> what I am currently looking for is a way to keep the configs
  Luke> in cvs or similar, then automatically deply first to test,
  Luke> then to production.

We actually have a paper in at LISA this year about using this sort of
technique with bcfg2. There are a bunch of corner cases that makes
this a little tricky, but there are a ton of nifty things that you can
do (including configuration transactions and workflows) if you do it
right. 

  Luke> even with modern languages; you don't edit code on the
  Luke> production server. (well, I've seen it done, but everyone
  Luke> knows it's bad; this is not so much the case with System
  Luke> Administration)

I would argue that there are some cases where this is needed, and that
tools need to cope with it gracefully. Everyone has run into the case
where a daemon stops behaving properly, do to some latent problem in
how things were configured. It wasn't a problem before, but is now,
and random debugging on the system exhibiting the problem is the only
way to nail it down and fix it. If there are any repercussions to
downtime, it is imperative that this process be as short as
possible. Regardless of how fast a config tool is, it _will_ be slower
than changing /etc/daemon.conf and kicking the server. 

I think that we focus too much on configuration goals and ignore the
administrative process more than we should...
 -nld

_______________________________________________
lssconf-discuss mailing list
[email protected]
http://lists.inf.ed.ac.uk/mailman/listinfo/lssconf-discuss

Reply via email to