Hi all

I apologise in advance if I'm missing the point here; I didn't have a huge amount of time to read it through and I missed the discussions at LISA.

The difficulties I see with this approach are:

1) scalability in complexity

Presumably some kind of breadth first search is in operation between the current state and the target state, trying some proportion of the available operations at each step forward. The critical parameters here I would think are the number of operations available at each step and the number of steps needed to solve the problem. Time to solve the problem will increase dramatically with both. Even if the plans are likely to be quite short, merely the availability of more operations to try will massively increase run time - assuming we don't hit any failing states and every option could be attempted at each step, then we will explore n^m nodes, where n is the number of options and m the length of the minimal plan. This seems big and worrisome for a configuration tool of any degree of complexity.

2) scalability in network size

Ignoring the question of the complexity of the operators, then the hardest thing about a scheme like this isn't posing the question in PROLOG (or things that look suspiciously like they might be PROLOG in disguise), its making a bridge between the real world and the model world in terms of:

(a) gathering enough information about the real state of the network in a central place to be able to make meaningful decisions.

(b) effecting a lengthy sequence of changes without underlying changes in the hardware invalidating the plan before it can complete. Equivalently, generating and executing a plan sufficiently quickly that the data it was generated from is still current, and knowing what data can change without invalidating the plan (this is obvious of course by hand, but for an automated system you need to distinguish between things that will change the solution (a better solution now exists), and things that invalidate the solution (the solution you have is no longer feasible). I suspect this adds a lot of complexity somewhere.

(c) monitoring and overseeing the execution of such a plan, given all of the previous caveats, but adding in the great difficulty one might experience in knowing when any step of the plan has completed, and knowing which stages of the plan are interdependent. For example presumably idling two tuners on the same machine can be done in parallel. It probably happens almost instantly, in this example, so you dont care. In general however, what if tuners took 30 minutes each to idle and power down; then finding parallelism would become an issue. In general, this kind of serial plan will put you at a great disadvantage in the race to implement the plan before it is invalid. Of course, generating a parallel plan is clearly possible using the same approach, but again I suspect this adds yet more complexity somewhere.

So I suppose what I think is that there aren't really any traps in this application. You have a small number of tuners and nodes, requiring a small plan to shift between desired states, and time to generate a plan versus incidence of failure in the system looks attractive. I don't think this approach can be made workable in the general space of configuration tools though (! - sorry if this seems excessively negative!) because this scheme is inherently centralised and I don't believe large and complex plans can be made and executed fast enough from a central location to be still relevant when the plan completes, even if the huge gathering and dispersal effort involved can be borne.

When I have approached this problem I have usually favoured:

1) weak solvers that explore a limited range of options, but can do so very quickly, and also provide a high degree of parallelism in the resulting plans. (trying to win the race involved in centralisation - you can make the odds more attractive still by creating tree structures of nodes rather than centralising across everything... i read some m$ document along these lines at some point).

2) heavy distribution and peer-to-peer like behaviour, with no formal plan, only a specification of a target state, and the ability for nodes to delegate requirements between each other (possibly like Mark's promise theory, although I have a limited understanding of it; I think I want to say what the objective is rather than providing promises that can be exchanged, but I'm still unclear about how objectives are encoded in that framework...)

I should stress I haven't provided working demonstrations of either of these anywhere, this is just the kind of form I believe solutions to the general problem will take.

Ed

Quoting Andrew Hume <[EMAIL PROTECTED]>:

        following the hours of turgid discussion at LISA, i have been dwelling
on the issues of deployment/rollout of service changes. in particular,
how we should think of this as a separate and orthogonal issue
to that of what needs to be done. so i decided to test the separateness
by working on a local (real) problem, which i present in simplified form.

        we have 4 nodes, each with 4 tuners. we have a set of 11 channels
we wish to record without loss. in this scenario, how do you reboot a node?
obviously, for each non-idle tuner on that node, you start recording that
channel on another tuner, and then idle the first one. after a while,
you should have all tuners on the desired node idle, at which point,
you can reboot.

        now, we could all write simple programs to do this, i suspect.
but i wanted to see if it could be done mechanically, that is, by some
sort of "planning" software. of course, this is part of AI, and therefore
suspect, but i gave it a go. i'll forgo the war stories but eventually
wound up with SGPlan5 (ver 2.1). the problem description, basically
75 lines of static model description and 5 lines of configuration, was
straightforward to generate, and given a random assignment of
channels to tuners and being ask to reboot nodeB, it generated
a correct plan:

0.001: (RECORD C12 TUNERA1) [1]
1.002: (RECORD C7 TUNERA4) [1]
2.003: (IDLE C12 TUNERB4) [1]
3.004: (IDLE C7 TUNERB2) [1]
4.005: (RECORD C59 TUNERD4) [1]
5.006: (IDLE C59 TUNERB1) [1]
6.007: (BOOT NODEB) [1]

apparently, channels 7,12,59 were being recorded on B,
and it went off and started them elsewhere before the reboot.
despite issues around teh fragility of teh software, i am encouraged.
and the requirement of the channel being recorded elsewhere was done
in a clean way:

(:action IDLE
        :parameters
                (?ch - Channel ?tuner - Tuner)
        :precondition
                (and
                        (recording ?ch ?tuner)
                        (exists (?t - Tuner)
                                (and (not (= ?t ?tuner)) (recording ?ch ?t))
                        )
                )
        :effect
                (and (idle ?tuner) (not (recording ?ch ?tuner)))
)

forgive the syntax, but its fairly straightforward, and i love the way
i can state as a precondition exactly what i want: that the
channel is recorded elsewhere. note that the requirement of
teh tuner being on another node is not needed here, but is
included in the definition of BOOT:
(:action BOOT
        :parameters
                (?n - Node)
        :precondition
                (not (exists (?t - Tuner)
                        (and (on ?t ?n) (not (idle ?t)))
                ))

any comments? has anyone else tried this approach?
any traps ahead for me?

----
Andrew Hume  (best -> Telework) +1 732-886-1886
[EMAIL PROTECTED]  (Work) +1 973-360-8651
AT&T Labs - Research; member of USENIX and LOPSA

_______________________________________________
lssconf-discuss mailing list
lssconf-discuss@inf.ed.ac.uk
http://lists.inf.ed.ac.uk/mailman/listinfo/lssconf-discuss



_______________________________________________
lssconf-discuss mailing list
lssconf-discuss@inf.ed.ac.uk
http://lists.inf.ed.ac.uk/mailman/listinfo/lssconf-discuss

Reply via email to