Hi all
I apologise in advance if I'm missing the point here; I didn't have a
huge amount of time to read it through and I missed the discussions at
LISA.
The difficulties I see with this approach are:
1) scalability in complexity
Presumably some kind of breadth first search is in operation between
the current state and the target state, trying some proportion of the
available operations at each step forward. The critical parameters here
I would think are the number of operations available at each step and
the number of steps needed to solve the problem. Time to solve the
problem will increase dramatically with both. Even if the plans are
likely to be quite short, merely the availability of more operations to
try will massively increase run time - assuming we don't hit any
failing states and every option could be attempted at each step, then
we will explore n^m nodes, where n is the number of options and m the
length of the minimal plan. This seems big and worrisome for a
configuration tool of any degree of complexity.
2) scalability in network size
Ignoring the question of the complexity of the operators, then the
hardest thing about a scheme like this isn't posing the question in
PROLOG (or things that look suspiciously like they might be PROLOG in
disguise), its making a bridge between the real world and the model
world in terms of:
(a) gathering enough information about the real state of the network in
a central place to be able to make meaningful decisions.
(b) effecting a lengthy sequence of changes without underlying changes
in the hardware invalidating the plan before it can complete.
Equivalently, generating and executing a plan sufficiently quickly that
the data it was generated from is still current, and knowing what data
can change without invalidating the plan (this is obvious of course by
hand, but for an automated system you need to distinguish between
things that will change the solution (a better solution now exists),
and things that invalidate the solution (the solution you have is no
longer feasible). I suspect this adds a lot of complexity somewhere.
(c) monitoring and overseeing the execution of such a plan, given all
of the previous caveats, but adding in the great difficulty one might
experience in knowing when any step of the plan has completed, and
knowing which stages of the plan are interdependent. For example
presumably idling two tuners on the same machine can be done in
parallel. It probably happens almost instantly, in this example, so you
dont care. In general however, what if tuners took 30 minutes each to
idle and power down; then finding parallelism would become an issue. In
general, this kind of serial plan will put you at a great disadvantage
in the race to implement the plan before it is invalid. Of course,
generating a parallel plan is clearly possible using the same approach,
but again I suspect this adds yet more complexity somewhere.
So I suppose what I think is that there aren't really any traps in this
application. You have a small number of tuners and nodes, requiring a
small plan to shift between desired states, and time to generate a plan
versus incidence of failure in the system looks attractive. I don't
think this approach can be made workable in the general space of
configuration tools though (! - sorry if this seems excessively
negative!) because this scheme is inherently centralised and I don't
believe large and complex plans can be made and executed fast enough
from a central location to be still relevant when the plan completes,
even if the huge gathering and dispersal effort involved can be borne.
When I have approached this problem I have usually favoured:
1) weak solvers that explore a limited range of options, but can do so
very quickly, and also provide a high degree of parallelism in the
resulting plans. (trying to win the race involved in centralisation -
you can make the odds more attractive still by creating tree structures
of nodes rather than centralising across everything... i read some m$
document along these lines at some point).
2) heavy distribution and peer-to-peer like behaviour, with no formal
plan, only a specification of a target state, and the ability for nodes
to delegate requirements between each other (possibly like Mark's
promise theory, although I have a limited understanding of it; I think
I want to say what the objective is rather than providing promises that
can be exchanged, but I'm still unclear about how objectives are
encoded in that framework...)
I should stress I haven't provided working demonstrations of either of
these anywhere, this is just the kind of form I believe solutions to
the general problem will take.
Ed
Quoting Andrew Hume <[EMAIL PROTECTED]>:
following the hours of turgid discussion at LISA, i have been dwelling
on the issues of deployment/rollout of service changes. in particular,
how we should think of this as a separate and orthogonal issue
to that of what needs to be done. so i decided to test the separateness
by working on a local (real) problem, which i present in simplified form.
we have 4 nodes, each with 4 tuners. we have a set of 11 channels
we wish to record without loss. in this scenario, how do you reboot a node?
obviously, for each non-idle tuner on that node, you start recording that
channel on another tuner, and then idle the first one. after a while,
you should have all tuners on the desired node idle, at which point,
you can reboot.
now, we could all write simple programs to do this, i suspect.
but i wanted to see if it could be done mechanically, that is, by some
sort of "planning" software. of course, this is part of AI, and therefore
suspect, but i gave it a go. i'll forgo the war stories but eventually
wound up with SGPlan5 (ver 2.1). the problem description, basically
75 lines of static model description and 5 lines of configuration, was
straightforward to generate, and given a random assignment of
channels to tuners and being ask to reboot nodeB, it generated
a correct plan:
0.001: (RECORD C12 TUNERA1) [1]
1.002: (RECORD C7 TUNERA4) [1]
2.003: (IDLE C12 TUNERB4) [1]
3.004: (IDLE C7 TUNERB2) [1]
4.005: (RECORD C59 TUNERD4) [1]
5.006: (IDLE C59 TUNERB1) [1]
6.007: (BOOT NODEB) [1]
apparently, channels 7,12,59 were being recorded on B,
and it went off and started them elsewhere before the reboot.
despite issues around teh fragility of teh software, i am encouraged.
and the requirement of the channel being recorded elsewhere was done
in a clean way:
(:action IDLE
:parameters
(?ch - Channel ?tuner - Tuner)
:precondition
(and
(recording ?ch ?tuner)
(exists (?t - Tuner)
(and (not (= ?t ?tuner)) (recording ?ch ?t))
)
)
:effect
(and (idle ?tuner) (not (recording ?ch ?tuner)))
)
forgive the syntax, but its fairly straightforward, and i love the way
i can state as a precondition exactly what i want: that the
channel is recorded elsewhere. note that the requirement of
teh tuner being on another node is not needed here, but is
included in the definition of BOOT:
(:action BOOT
:parameters
(?n - Node)
:precondition
(not (exists (?t - Tuner)
(and (on ?t ?n) (not (idle ?t)))
))
any comments? has anyone else tried this approach?
any traps ahead for me?
----
Andrew Hume (best -> Telework) +1 732-886-1886
[EMAIL PROTECTED] (Work) +1 973-360-8651
AT&T Labs - Research; member of USENIX and LOPSA
_______________________________________________
lssconf-discuss mailing list
lssconf-discuss@inf.ed.ac.uk
http://lists.inf.ed.ac.uk/mailman/listinfo/lssconf-discuss
_______________________________________________
lssconf-discuss mailing list
lssconf-discuss@inf.ed.ac.uk
http://lists.inf.ed.ac.uk/mailman/listinfo/lssconf-discuss