Antonio Terceiro <[email protected]> writes:

> On Mon, Feb 25, 2013 at 12:37:24PM +1300, Michael Hudson-Doyle wrote:
>> Antonio Terceiro <[email protected]> writes:
>> > My thoughts, from a LAVA standpoint.
>> >
>> > This parallelism style is indeed very elegant, but I couldn't think of
>> > how we could take advantage of that in the existing LAVA infrastructure.
>> 
>> Yeah, I guess the LAVA trend has been towards being more
>> device-controlled (lava-test-shell and all that) and that doesn't really
>> fit with the explicit parallelism style.  Oh well.  I'll get over it :-)
>> 
>> > Maybe we could make the dispatcher spawn child dispatchers (one for each
>> > node involved in the test) and wait for all of them to finish.
>> 
>> I think on some level this model makes sense (whether it's subprocesses
>> or threads or the dispatcher does some async stuff doesn't really matter
>> for the mental model IMHO).
>> 
>> > Inside each child dispatcher invocation, there should be a primitive
>> > that says "wait until all by test budies are ready" so that after
>> > flashing and booting each once can perform its setup steps (i.e. the
>> > stuff we do before actually running tests), and wait for the others
>> > before executing its part in the distributed job. This communication
>> > might be coordinated by the "parent" dispatcher through signals.  I'm
>> > not sure whether this primitive would be a new dispatcher action (and
>> > thus declared in the job description file), or a binary inside the
>> > target (and thus able to be invoked from inside lava test-shell-test
>> > test runs), or both.
>> 
>> I think ... perhaps both?  It seems to me that the difference is around
>> rebooting: (currently, anyway) a lava_test_shell action implies a
>> reboot, and one thing a lava_test_shell-invoked script _cannot_ do
>> (well, easily, there are probably hacks) is reboot.  And I can just
>> about imagine tests that might want do some configuration that requires
>> a reboot to take effect.
>
> A job that requires a reboot could declare the following:
>
>   - deploy image
>   - boot image
>   - lava-test-shell <- setup.yaml
>   - boot image
>   - lava-test-shell <- run.yaml

Yeah, that would work.  It's kind of crummy in that there is a
dependence between the structure of the job file and the repository with
the yaml files in it but as it's a bit of a special case..

> the run.yaml lava-test-shell definition could then just call a binary
> that implements "wait for buddies".
>
> (this way we don't need a "wait for budies" dispatcher action, just a
> binary that can be called by the test suite).
>
>> I think we should probably try to write some tests like my simple iperf
>> test and see what API we would like.
>
> Yep.
>
>> Here's a fun problem: devices will need to know the IP addresses of the
>> other devices in the test.  I suppose we could delay starting the
>> lava-test-shell processes on any device until they have all booted and
>> acquired an IP address?  Or we could run some service on the host
>> running the dispatcher that can be queried and informed of IP addresses
>> or something.
>
> the "wait for buddies" action could inform the node's IP to the
> dispatcher via a signal. When the dispatcher receives those from all
> nodes, then it sends a list of all IP's to each node. After receiving
> that list, the node can then write a /etc/hosts-like file with the IP's
> of the group that can be read by the tests scripts being run.

Ah yeah.  Putting it in /etc/hosts would be a neat trick -- I'd been
thinking anyway that the job file should give names to the nodes it
requests (origin-server-1, origin-server-2, proxy-node, load-gen-1,
load-gen-2...).

Cheers,
mwh

_______________________________________________
linaro-validation mailing list
[email protected]
http://lists.linaro.org/mailman/listinfo/linaro-validation

Reply via email to