On 20 May 2014 13:25, Antonio Terceiro <antonio.terce...@linaro.org> wrote:
> On Mon, May 19, 2014 at 07:47:09PM +0100, Milosz Wasilewski wrote:
>> Hi,
>>
>> I'm trying to submit job for TC2 now and I'm in the long queue. There
>> seem to be a few multinode Android jobs that run on dummy-ssh and
>> vexpress-tc2 (workload automation). We only have one dummy-ssh device
>> so there is no way that more than one TC2 is going to be used with
>> dummy-ssh at the same time. On top of that we have
>> vexpress-tc2-benchmark which also can run multinode jobs with
>> dummy-ssh. For some reason if there are couple of multinode jobs
>> requested for dummy-ssh + vexpress-tc2, the TC2 boards get reserved
>> and there is no way to submit any other jobs there. While I understand
>> that 1 board might be in reserved state, there is no point to reserve
>> all 3 (there is only one dummy-ssh). IMHO this is a bug in multinode.
>
> This is a known issue. The only way we found of not letting multinode
> jobs starve waiting for devices forever is to reserve their devices as
> they become available instead of waiting for a moment when all of their
> requested devices would be available simultaneously.
>
> We did not figure out a way of not letting multinode jobs deadlock that
> wouldn't involve a far more complicated mechanism.
>
>> Current status is:
>>
>> dummy-ssh: 7 jobs in the queue
>> vexpress-tc2: 3 reserved + 3 jobs in the queue
>>
>> I know that proper solution should be moving with WA to dynamically
>> allocated VMs, but unfortunately licensing is in the way.
>
> Actually I am working right now on a patch to allow multiple dummy-ssh
> devices on the same host, which might solve this specific problem
> (assuming WA licensing allow multiple simultaneous uses withing the same
> host).

There is no restriction on the number connections in the license.
There were some problems with lava-test-shell as the dummy-ssh is
persistent and doesn't reboot. I guess we might run into some problems
with multinode as there are parameters passed between target and host.
They are written in the shared file. If there are more simultaneous
jobs running on dummy-ssh there may be race conditions or the file
might be overwritten.

milosz

>
> --
> Antonio Terceiro
> Software Engineer - Linaro
> http://www.linaro.org
>
> _______________________________________________
> linaro-validation mailing list
> linaro-validation@lists.linaro.org
> http://lists.linaro.org/mailman/listinfo/linaro-validation
>

_______________________________________________
linaro-validation mailing list
linaro-validation@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-validation

Reply via email to