Re: [ARTIQ] controller management

Sébastien Bourdeauducq Sun, 04 Jan 2015 01:34:33 -0800

Hello,

On 01/03/2015 10:51 AM, Robert Jördens wrote:
>>> I would just point the managers to the master where they can get their
>>> (current, versioned) subset of the devicedb. They can cache that if
>>> really needed.
>>
>> Then the master depends on the controllers to be able to run
>> experiments, and the controllers depend on the master to get their
> 
> How does the master depend on the controllers?


Upon startup, the master immediately attempts to run an experiment,
which typically requires controllers.

>> configuration data. Plus handling changes in the manager configuration
>> becomes complicated.
> 
> How?

Having the manager configuration in the master requires a
publish/subscribe mechanism for changes in the device database. Anyway,
it turns out that the GUI also needs it, so we can just use the same.

>>> What should happen if you check out an older version of
>>> experiments/devicedb/paramdb?
>>
>> If you modify the device database, you also need to change the
>> connection of devices and/or machines in the laboratory anyway. I'd
>> think it's just as good to let the user reconnect the hardware, and
>> accordingly update themselves the managers' configuration and possibly
>> the hardware database containing the controller URLs.
> 
> All that information is part of the provenance information. All,
> except the physical wiring, can and should be versioned with ARTIQ.

A simple solution is to have a copy of the involved parameters and
device settings close to each result. If we use HDF5 as result format,
we can integrate them in the file. Then the GUI and/or the CLI can have
"load devices/parameters from HDF5 file" features.

Is there more device/parameter versioning needed?

For using time-series databases and tools like Grafana, it is possible
to connect to the master, subscribe to parameter changes, and push them
into the database (duplicating the data in the result files). How much
integration is needed?

>> The main reason for using asyncio anyway is that it easily supports
>> multiple connections to the same TCP server. This way, if one client did
>> not properly close its connection to the controller, a new client can
>> still connect.
> 
> The real reason is that asyncio allows you to turn an api that gives
> you access to a fileno into an asynchronous api that plays nicely with
> your event loop. That then allows you to handle competing/concurrent
> requests correctly.
> Once you start allowing multiple clients to compete for the same
> blocking api through asyncio, all advantage of asyncio is lost.

Consider the following scenario with a single-thread non-asyncio
controller using only blocking calls:
1) a client successfully connects to the controller.
2) the client suddenly crashes, without the possibility to send any
further network packets and a fortiori cleanly shut down its connection.
3) the controller is blocked on the data read from the client until the
OS detects a problem with the connection, which can take a long time (if
it happens at all).
4) the client reboots and attempts to reconnect to the master, which
will be delayed until the stale connection times out (or forever).

TCP keepalive with a short timeout would solve this problem, but if
possible I'd rather not use it as I'm unsure of its good portability
(and it is not mandated by rfc1122).

With asyncio, the new connection can proceed immediately, even in the
presence of the previous stale connection. Asyncio does provide an
advantage here, even if it is not used to its full potential. And the
code for the asyncio RPC server is already written and used in the
master (which uses asyncio "by the book" without any blocking calls), so
we might as well just reuse it to solve the problem of stale connections.

It is true that a simple controller cannot do any socket I/O while it is
executing a blocking call in the driver. But it does not need to: the
client is waiting for the result from the blocking call anyway, and no
data is supposed to be exchanged in the meantime (new connection
attempts will be put into the TCP listen backlog by the OS and
eventually succeed after a delay). I believe that this simple solution
is acceptable when the blocking calls normally take less than a dozen
seconds.

> Sure. I was more imagining developing a controller+feature in a
> development branch in the office against simulated hardware while the
> production branch is running in the lab, then testing the development
> branch in the lab against real hardware, noticing that it fails,
> rolling back to the production branch in the lab. That might still
> take a few iterations between development and production.

What about the following workflow:
1) checkout the new controller in a separate folder on the lab machine
that has the device connected.
2) modify the device database to change the path of the controller. The
master publishes the change and the manager stops the old controller and
starts the new one.
3) checkout the experiment in some folder on the master machine.
4) run the experiment by filename, bypassing the main experiment repository.
5) revert #2 manually to go back to the production controller.
6) repeat until everything works fine, then commit the experiment and
the new controller to the main repositories and upgrade/restart.

>> machine - and we'd rather use an existing one. Is sshfs also difficult?
> 
> With Windows as an endpoint? I have never tried but I bet it is.

Seems there are programs for this, e.g.
http://linhost.info/2012/09/sshfs-in-windows/

> I meant the ability to have
> multiple (actual production) experiments run concurrently on the
> master. Like a temperature servo that does not interfere with the core
> device but does RPCs, uses the parameter and device db, and wants to
> reschedule itself.

So that would mean multiple queues and periodic schedules in the master
(and GUI). Is the extra performance compared to running it sequentially
worth the complexity of multiple concurrent queues, with
publish/subscribe of queue creation/destruction and with additional GUI
widgets every time a queue is involved?

Sébastien

_______________________________________________
ARTIQ mailing list
https://ssl.serverraum.org/lists/listinfo/artiq

Re: [ARTIQ] controller management

Reply via email to