Antonio Terceiro <[email protected]> writes: > On Mon, Feb 25, 2013 at 12:37:24PM +1300, Michael Hudson-Doyle wrote: >> Antonio Terceiro <[email protected]> writes: >> > My thoughts, from a LAVA standpoint. >> > >> > This parallelism style is indeed very elegant, but I couldn't think of >> > how we could take advantage of that in the existing LAVA infrastructure. >> >> Yeah, I guess the LAVA trend has been towards being more >> device-controlled (lava-test-shell and all that) and that doesn't really >> fit with the explicit parallelism style. Oh well. I'll get over it :-) >> >> > Maybe we could make the dispatcher spawn child dispatchers (one for each >> > node involved in the test) and wait for all of them to finish. >> >> I think on some level this model makes sense (whether it's subprocesses >> or threads or the dispatcher does some async stuff doesn't really matter >> for the mental model IMHO). >> >> > Inside each child dispatcher invocation, there should be a primitive >> > that says "wait until all by test budies are ready" so that after >> > flashing and booting each once can perform its setup steps (i.e. the >> > stuff we do before actually running tests), and wait for the others >> > before executing its part in the distributed job. This communication >> > might be coordinated by the "parent" dispatcher through signals. I'm >> > not sure whether this primitive would be a new dispatcher action (and >> > thus declared in the job description file), or a binary inside the >> > target (and thus able to be invoked from inside lava test-shell-test >> > test runs), or both. >> >> I think ... perhaps both? It seems to me that the difference is around >> rebooting: (currently, anyway) a lava_test_shell action implies a >> reboot, and one thing a lava_test_shell-invoked script _cannot_ do >> (well, easily, there are probably hacks) is reboot. And I can just >> about imagine tests that might want do some configuration that requires >> a reboot to take effect. > > A job that requires a reboot could declare the following: > > - deploy image > - boot image > - lava-test-shell <- setup.yaml > - boot image > - lava-test-shell <- run.yaml
Yeah, that would work. It's kind of crummy in that there is a dependence between the structure of the job file and the repository with the yaml files in it but as it's a bit of a special case.. > the run.yaml lava-test-shell definition could then just call a binary > that implements "wait for buddies". > > (this way we don't need a "wait for budies" dispatcher action, just a > binary that can be called by the test suite). > >> I think we should probably try to write some tests like my simple iperf >> test and see what API we would like. > > Yep. > >> Here's a fun problem: devices will need to know the IP addresses of the >> other devices in the test. I suppose we could delay starting the >> lava-test-shell processes on any device until they have all booted and >> acquired an IP address? Or we could run some service on the host >> running the dispatcher that can be queried and informed of IP addresses >> or something. > > the "wait for buddies" action could inform the node's IP to the > dispatcher via a signal. When the dispatcher receives those from all > nodes, then it sends a list of all IP's to each node. After receiving > that list, the node can then write a /etc/hosts-like file with the IP's > of the group that can be read by the tests scripts being run. Ah yeah. Putting it in /etc/hosts would be a neat trick -- I'd been thinking anyway that the job file should give names to the nodes it requests (origin-server-1, origin-server-2, proxy-node, load-gen-1, load-gen-2...). Cheers, mwh _______________________________________________ linaro-validation mailing list [email protected] http://lists.linaro.org/mailman/listinfo/linaro-validation
