Re: [Linaro-validation] New publishing infra prototype report

Neil Williams Tue, 04 Jun 2013 02:51:50 -0700

On 3 June 2013 19:18, Paul Sokolovsky <paul.sokolov...@linaro.org> wrote:
> On Mon, 3 Jun 2013 12:57:43 +0100
> James Tunnicliffe <james.tunnicli...@linaro.org> wrote:
>
>>
>> I know some of our ARM slaves are a bit CPU light, but they also tend
>> to have slow network connections. I am sure a bit of experimentation
>> will tell us if we we should always move files off some slaves to an
>> intermediary to do the hash+upload stuff.
>
> Well, I'd personally aim to make client-side publishing support
> clean and lean, so it was easy to setup and run on any client, including
> not too powerful. Of course, some cases may need intermediary (like
> when we need to publish [big] files from non-networked board (hmm)),
> but those are niche cases.


>From a MultiNode perspective, clients will set up their own services
if they need heavy lifting, e.g. Aarch64 MultiNode could easily need
to setup saturation bandwidth big.LITTLE connections over TCP/IP but
that is up to the ARMv8 engineering team to prepare suitable images
with this support already implemented. All LAVA would need to do is
provide a connection from the child job back to the parent so that
each child can tell the parent the IP address it gets after boot and
for the parent to tell other child jobs the IP address of other
clients for that parent. This only needs a basic level of
communication to be supported by LAVA itself. So this sounds like only
a small part of what the publishing protocol would need.

>> >> Do we want to authenticate this sort of call? Should just be a
>> >> dictionary or DB lookup, so it would probably involve more CPU
>> >> time to authenticate it. That said, you can use it to fish for
>> >> files that already exist that you don't have access to, so perhaps
>> >> we need to filter the results based on each user...
>> >
>> > My idea is that all publishing API calls are authed by that
>> > "security token". It's by definition limited use, like has source IP
>> > constraints, timing constraints (use not before 30min after issuance
>> > and not after 60min), other constraints, like may use for no more
>> > than 50 API calls, may publish not more than 10 files, etc.
>>
>> OK, it occurs that I may have not broadcast my use case, which is: A
>> server gets a token from the publishing service then pass it to a
>> slave. The slave uses it and once the job has finished the server
>> should be able to inform the publishing service the token is no longer
>> required.
>
> Per security practices, that's worse solution than specifying
> constraints upfront. What if server "forgets" to terminate token life?

We are aware of intermittent problems where a job sits in "Canceling"
interminably. The risks of a token not being revoked need to be
discussed but it is a token-based service which I am considering for
MultiNode.

> Actually, I specifically brought this question up to avoid situation
> that other engineers go for "spur of the moment" adhoc implementations,
> and we end up with bunch of crippled, insecure, hard-to-maintain
> publishing implementations (current Jenkins one already has enough
> holes and pain to setup/debug).

The two mechanisms have a lot in common, we clearly need to work
together for both sets of use cases.

>> If we are only issuing and using the tokens over HTTPS I think that
>> the best practice is to not restrict the use of the service other than
>> how long the token is issued for.
>
> Well, constraints above were just an example of what we can easily
> implement with HTTP-based system (and not so easily with PAM-based). Of
> course, the idea is that token constraints are flexible: scheduling
> server decides a token with how many constraints to request for
> particular publishing client. I agree that basic constraints to start
> with would be: source IP (important for EC2, maybe less important for
> LAVA) and max lifetime.

The lifetime being specified in the job JSON?

>> > On the other hand, Neil sent email that there're similar challenges
>> > for multi-node LAVA setup. I didn't read thru it yet, but my guess
>> > that for (arbitrary) LAVA tests we'd rather use (and let our users
>> > use) standard tech like ssh/scp/rsync for inter-node comm, and then
>> > we'd need to have "PAM" level auth anyway, and then it makes little
>> > sense to have separate auth scheme just for publishing.

I'm not sure how much of that LAVA would need to setup for MultiNode.
It's more likely that the setup of a secure connection between two
clients under test would need to be part of the test itself. An image
with openssh-server, known users, possibly pre-configured keys even.
MultiNode cannot prescribe how clients under test arrange their
in-test connections. We just need to allow for a child job to declare
it's allocated IP details to the parent, the parent job to collate
that data and serve it back to child jobs of the same parent, upon
request via a token setup by the parent on the child filesystem prior
to boot. Child jobs interested in a particular node will simply need
to loop until the parent has the data from that client or fail the
test on a timeout. LAVA can provide helpers to do the queries to the
parent and install those onto the child as part of lava_test_shell.
Those helpers could well be the same as the ones which establish the
connections used for publishing too?

That would just mean exposing the helper to lava_test_shell so that a
test can obtain the data and start using the IP addresses as it sees
fit. There would be no need / no support for exposing the actual token
outside the helpers. I'm working on the basis that MultiNode exposes
only the IP addresses and hostnames of the jobs being managed by the
parent along with the "role" description specified in the job JSON. If
a particular client image doesn't manage to setup networking within
the timeout specified by the original JSON, that client will simply
have a blank IP and hostname section. So as far as authentication
goes, I expect MultiNode to only need to do a minimal amount of work
to read a token put onto the child before boot, contact the parent
with details of the IP address of that child and then be able to query
the parent for the IP addresses of other child jobs of that parent.

("parent" in this context would have to be the lava-dispatcher of the
parent job as the contact details of the parent need to be written to
the child filesystem prior to boot.)

Neil.

_______________________________________________
linaro-validation mailing list
linaro-validation@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-validation

Re: [Linaro-validation] New publishing infra prototype report

Reply via email to