Re: [Linaro-validation] hackish test automation & thoughts about writing multi-node jobs

Antonio Terceiro Mon, 25 Feb 2013 09:29:18 -0800

On Mon, Feb 25, 2013 at 12:37:24PM +1300, Michael Hudson-Doyle wrote:
> Antonio Terceiro <[email protected]> writes:
> > My thoughts, from a LAVA standpoint.
> >
> > This parallelism style is indeed very elegant, but I couldn't think of
> > how we could take advantage of that in the existing LAVA infrastructure.
> 
> Yeah, I guess the LAVA trend has been towards being more
> device-controlled (lava-test-shell and all that) and that doesn't really
> fit with the explicit parallelism style.  Oh well.  I'll get over it :-)
> 
> > Maybe we could make the dispatcher spawn child dispatchers (one for each
> > node involved in the test) and wait for all of them to finish.
> 
> I think on some level this model makes sense (whether it's subprocesses
> or threads or the dispatcher does some async stuff doesn't really matter
> for the mental model IMHO).
> 
> > Inside each child dispatcher invocation, there should be a primitive
> > that says "wait until all by test budies are ready" so that after
> > flashing and booting each once can perform its setup steps (i.e. the
> > stuff we do before actually running tests), and wait for the others
> > before executing its part in the distributed job. This communication
> > might be coordinated by the "parent" dispatcher through signals.  I'm
> > not sure whether this primitive would be a new dispatcher action (and
> > thus declared in the job description file), or a binary inside the
> > target (and thus able to be invoked from inside lava test-shell-test
> > test runs), or both.
> 
> I think ... perhaps both?  It seems to me that the difference is around
> rebooting: (currently, anyway) a lava_test_shell action implies a
> reboot, and one thing a lava_test_shell-invoked script _cannot_ do
> (well, easily, there are probably hacks) is reboot.  And I can just
> about imagine tests that might want do some configuration that requires
> a reboot to take effect.


A job that requires a reboot could declare the following:

  - deploy image
  - boot image
  - lava-test-shell <- setup.yaml
  - boot image
  - lava-test-shell <- run.yaml

the run.yaml lava-test-shell definition could then just call a binary
that implements "wait for buddies".

(this way we don't need a "wait for budies" dispatcher action, just a
binary that can be called by the test suite).

> I think we should probably try to write some tests like my simple iperf
> test and see what API we would like.

Yep.

> Here's a fun problem: devices will need to know the IP addresses of the
> other devices in the test.  I suppose we could delay starting the
> lava-test-shell processes on any device until they have all booted and
> acquired an IP address?  Or we could run some service on the host
> running the dispatcher that can be queried and informed of IP addresses
> or something.

the "wait for buddies" action could inform the node's IP to the
dispatcher via a signal. When the dispatcher receives those from all
nodes, then it sends a list of all IP's to each node. After receiving
that list, the node can then write a /etc/hosts-like file with the IP's
of the group that can be read by the tests scripts being run.

-- 
Antonio Terceiro
Software Engineer - Linaro
http://www.linaro.org

signature.asc
Description: Digital signature

_______________________________________________
linaro-validation mailing list
[email protected]
http://lists.linaro.org/mailman/listinfo/linaro-validation

Re: [Linaro-validation] hackish test automation & thoughts about writing multi-node jobs

Reply via email to