Re: [lssconf-discuss] "Theory" vs "Practice"

Narayan Desai Thu, 12 Oct 2006 19:07:07 -0700

[Warning for the faint of heart; some plugging of bcfg2 occurs in this mail.]


>>>>> "Luke" == Luke Crawford <[EMAIL PROTECTED]> writes:

  Luke> First off, I'm a "practical" type- a "computer janitor" as it
  Luke> were.  I am also a consultant and an Entrepreneur, so I see my
  Luke> job as eliminating my job- I'm quite interested in
  Luke> configuration management systems, but on a conceptual level, I
  Luke> simply don't understand how they would usefully work.

Do you mean tools at all, or more researchy systems like autonomics
and the like?

  Luke> Now, I see you are speaking of validation tools; This is
  Luke> something I understand quite well, and have implemented
  Luke> (perhaps in a more 'bottom up' than 'top down' fashion than
  Luke> the theory types would have) using Nagios; The idea being that
  Luke> those who consume my services should not need to know my phone
  Luke> number, so my policy is to run an external nagios server to
  Luke> monitor every service I provide to other people.  As I am
  Luke> checking from an external perspective (sending mail through a
  Luke> mail system, or retrieving a html page and comparing
  Luke> known-good bits) the nagios-with-plugin system can catch just
  Luke> about any configuration error the customer can.

  Luke> Now, the problem with this is that it catches the error after,
  Luke> not before the error hits production; as I'm doing quite a lot
  Luke> of work with Xen-based paravirtualized servers, I'm thinking
  Luke> about simply running a full test environment with a duplicate
  Luke> of every real server within my virtual environment; then use
  Luke> some tool (perhaps systemimager? maybe systemimager with
  Luke> service-specific scripts to gracefully reload modified
  Luke> servers?)  to copy configs from test to production after the
  Luke> test passes validation.

We've done a bit of work using staging to catch errors before
installation. We basically could tag new configuration bits as being
in testing, and then the testing systems would get them before the
rest of systems, so that errors could be freely encountered and
repaired. 

Basically what you are describing here is a set of software
engineering/testing methodologies. Our paper at LISA this year
describes a set of slick ways to integrate timeline and versioning
data into configuration management specification, and the things you
can do with this info once you have it. (All implemented with bcfg2,
of course) I would suggest taking a look at it once it comes
out. 

We have done a lot of this sort of server replication, now that our
specification is complete. We have found that using the configuration
management system to rebuild a system (upon system disk failure or the
like) is frequently faster and easier than going to backups. Producing
multiple instances of the same service in less tense situations is a
breeze. I would greatly suggest you look at bcfg2 for this sort of
thing. 

(We've actually use bcfg2 on top of system imager; SI is used for
basic system builds, but all differentiation and updates are done
through bcfg2.)

<snip>

  Luke> Me, I'm on the list because I'm interested in configuration
  Luke> management; but frankly, my brain it too small to comprehend
  Luke> how you might go about replacing my configuration management
  Luke> duties with a program.  I like the idea, I just don't know how
  Luke> you would do any better than a systemimager style "base image
  Luke> for each class of machine, then per-box lists of diffs to
  Luke> apply"

This is a good illustration of the communication gaps between research
and practitioners. Bcfg2 does a great job of managing a symbolic
configuration for a system in a similar fashion, while allowing much
more fine-grained control over things. Ping me off-list for more
details, if you are interested. 

  Luke> The basic problem that *I* would like the theory people to
  Luke> solve is how to break down the "configure the system"
  Luke> high-level problem into a easy to understand set of tools like
  Luke> nagios that people like me can come in and configure on a low
  Luke> level for each one of our services.  Heck, you don't even need
  Luke> to write the actual tools, just describe what the tools need
  Luke> to do (of course, that's what writing the tools would do,
  Luke> right?  computer languages are designed to precisely specify
  Luke> what a program ought to do.)

I think this also illustrates an interesting point. The theory folks
aren't all that interested in problems at that level. They are a
little past it. Keep in mind that most are working on fields like
constraint solvers, autonomics, etc, and need to publish papers to
maintain good academic standing. Until the tools provide a 
conduit between researchers and our ready users, we will have this
disconnect. 
 -nld
_______________________________________________
lssconf-discuss mailing list
lssconf-discuss@inf.ed.ac.uk
http://lists.inf.ed.ac.uk/mailman/listinfo/lssconf-discuss

Re: [lssconf-discuss] "Theory" vs "Practice"

Reply via email to