Re: [lssconf-discuss] planning

esmith4 Mon, 22 Jan 2007 11:10:07 -0800

Hi all

I apologise in advance if I'm missing the point here; I didn't have ahuge amount of time to read it through and I missed the discussions atLISA.


The difficulties I see with this approach are:

1) scalability in complexity

Presumably some kind of breadth first search is in operation betweenthe current state and the target state, trying some proportion of theavailable operations at each step forward. The critical parameters hereI would think are the number of operations available at each step andthe number of steps needed to solve the problem. Time to solve theproblem will increase dramatically with both. Even if the plans arelikely to be quite short, merely the availability of more operations totry will massively increase run time - assuming we don't hit anyfailing states and every option could be attempted at each step, thenwe will explore n^m nodes, where n is the number of options and m thelength of the minimal plan. This seems big and worrisome for aconfiguration tool of any degree of complexity.


2) scalability in network size

Ignoring the question of the complexity of the operators, then thehardest thing about a scheme like this isn't posing the question inPROLOG (or things that look suspiciously like they might be PROLOG indisguise), its making a bridge between the real world and the modelworld in terms of:

(a) gathering enough information about the real state of the network ina central place to be able to make meaningful decisions.

(b) effecting a lengthy sequence of changes without underlying changesin the hardware invalidating the plan before it can complete.Equivalently, generating and executing a plan sufficiently quickly thatthe data it was generated from is still current, and knowing what datacan change without invalidating the plan (this is obvious of course byhand, but for an automated system you need to distinguish betweenthings that will change the solution (a better solution now exists),and things that invalidate the solution (the solution you have is nolonger feasible). I suspect this adds a lot of complexity somewhere.

(c) monitoring and overseeing the execution of such a plan, given allof the previous caveats, but adding in the great difficulty one mightexperience in knowing when any step of the plan has completed, andknowing which stages of the plan are interdependent. For examplepresumably idling two tuners on the same machine can be done inparallel. It probably happens almost instantly, in this example, so youdont care. In general however, what if tuners took 30 minutes each toidle and power down; then finding parallelism would become an issue. Ingeneral, this kind of serial plan will put you at a great disadvantagein the race to implement the plan before it is invalid. Of course,generating a parallel plan is clearly possible using the same approach,but again I suspect this adds yet more complexity somewhere.

So I suppose what I think is that there aren't really any traps in thisapplication. You have a small number of tuners and nodes, requiring asmall plan to shift between desired states, and time to generate a planversus incidence of failure in the system looks attractive. I don'tthink this approach can be made workable in the general space ofconfiguration tools though (! - sorry if this seems excessivelynegative!) because this scheme is inherently centralised and I don'tbelieve large and complex plans can be made and executed fast enoughfrom a central location to be still relevant when the plan completes,even if the huge gathering and dispersal effort involved can be borne.


When I have approached this problem I have usually favoured:

1) weak solvers that explore a limited range of options, but can do sovery quickly, and also provide a high degree of parallelism in theresulting plans. (trying to win the race involved in centralisation -you can make the odds more attractive still by creating tree structuresof nodes rather than centralising across everything... i read some m$document along these lines at some point).

2) heavy distribution and peer-to-peer like behaviour, with no formalplan, only a specification of a target state, and the ability for nodesto delegate requirements between each other (possibly like Mark'spromise theory, although I have a limited understanding of it; I thinkI want to say what the objective is rather than providing promises thatcan be exchanged, but I'm still unclear about how objectives areencoded in that framework...)

I should stress I haven't provided working demonstrations of either ofthese anywhere, this is just the kind of form I believe solutions tothe general problem will take.


Ed

Quoting Andrew Hume <[EMAIL PROTECTED]>:

        following the hours of turgid discussion at LISA, i have been dwelling
on the issues of deployment/rollout of service changes. in particular,
how we should think of this as a separate and orthogonal issue
to that of what needs to be done. so i decided to test the separateness
by working on a local (real) problem, which i present in simplified form.

        we have 4 nodes, each with 4 tuners. we have a set of 11 channels
we wish to record without loss. in this scenario, how do you reboot a node?
obviously, for each non-idle tuner on that node, you start recording that
channel on another tuner, and then idle the first one. after a while,
you should have all tuners on the desired node idle, at which point,
you can reboot.

        now, we could all write simple programs to do this, i suspect.
but i wanted to see if it could be done mechanically, that is, by some
sort of "planning" software. of course, this is part of AI, and therefore
suspect, but i gave it a go. i'll forgo the war stories but eventually
wound up with SGPlan5 (ver 2.1). the problem description, basically
75 lines of static model description and 5 lines of configuration, was
straightforward to generate, and given a random assignment of
channels to tuners and being ask to reboot nodeB, it generated
a correct plan:

0.001: (RECORD C12 TUNERA1) [1]
1.002: (RECORD C7 TUNERA4) [1]
2.003: (IDLE C12 TUNERB4) [1]
3.004: (IDLE C7 TUNERB2) [1]
4.005: (RECORD C59 TUNERD4) [1]
5.006: (IDLE C59 TUNERB1) [1]
6.007: (BOOT NODEB) [1]

apparently, channels 7,12,59 were being recorded on B,
and it went off and started them elsewhere before the reboot.
despite issues around teh fragility of teh software, i am encouraged.
and the requirement of the channel being recorded elsewhere was done
in a clean way:

(:action IDLE
        :parameters
                (?ch - Channel ?tuner - Tuner)
        :precondition
                (and
                        (recording ?ch ?tuner)
                        (exists (?t - Tuner)
                                (and (not (= ?t ?tuner)) (recording ?ch ?t))
                        )
                )
        :effect
                (and (idle ?tuner) (not (recording ?ch ?tuner)))
)

forgive the syntax, but its fairly straightforward, and i love the way
i can state as a precondition exactly what i want: that the
channel is recorded elsewhere. note that the requirement of
teh tuner being on another node is not needed here, but is
included in the definition of BOOT:
(:action BOOT
        :parameters
                (?n - Node)
        :precondition
                (not (exists (?t - Tuner)
                        (and (on ?t ?n) (not (idle ?t)))
                ))

any comments? has anyone else tried this approach?
any traps ahead for me?

----
Andrew Hume  (best -> Telework) +1 732-886-1886
[EMAIL PROTECTED]  (Work) +1 973-360-8651
AT&T Labs - Research; member of USENIX and LOPSA

_______________________________________________
lssconf-discuss mailing list
lssconf-discuss@inf.ed.ac.uk
http://lists.inf.ed.ac.uk/mailman/listinfo/lssconf-discuss



_______________________________________________
lssconf-discuss mailing list
lssconf-discuss@inf.ed.ac.uk
http://lists.inf.ed.ac.uk/mailman/listinfo/lssconf-discuss

Re: [lssconf-discuss] planning

Reply via email to