On 18.03.2015 10:42, Didier Kryn wrote:
Long lived daemons should have both startup methods, selectable by a
parameter, so you make nobodies work more difficult than required.

     OK, I think you are right, because it is a little more than a fork:
you want to detach from the controlling terminal and start a new
session. I agree that it is a pain to do it by hand and it is OK if
there is a command-line switch to avoid all of it.

But there must be this switch.

Ack!


No, restart is not required, as netlink dies, when fifosvd dies (or
later on when the handler dies), the supervisor watching netlink may
then fire up a new netlink reader (possibly after failure management),
where this startup is always done through a central startup command
(e.g. xdev).

The supervisor, never starts up the netlink reader directly, but
watches the process it starts up for xdev. xdev does it's initial
action (startup code) then chains (exec) to the netlink reader. This
may look ugly and unnecessary complicated at the first glance, but is
a known practical trick to drop some memory resources not needed by
the long lived daemon, but required by the start up code. For the
supervisor instance this looks like a single process, it has started
and it may watch until it exits. So from that view it looks, as if
netlink has created the pipe and started the fifosvd, but in fact this
is done by the startup code (difference between flow of operation and
technical placing the code).

     I didn't notice this trick in your description. It is making more
and more sense :-).

I left it out, to make it not unnecessary complicated, and I wanted to focus on the netlink / pipe operation.


     Now look, since nldev (lest's call it by its name) is execed by
xdev, it remains the parent of fifosvd, and therefore it shall receive
the SIGCLD if fifosvd dies. This is the best way for nldev to watch
fifosvd. Otherwise it should wait until it receives an event from the
netlink and tries to write it to the pipe, hence loosing the event and
the possible burst following it. nldev must die on SIGCLD (after piping
available events, though); this is the only "supervision" logic it must
implement, but I think it is critical. And it is the same if nldev is
launched with a long-lived mdev-i without a fifosvd.

netlink reader (nldev) does not need to explicitly watch the fifosvd by SIGCHLD.

Either that piece of code does it's job, or it fails and dies. When fifosvd dies, the read end of the pipe is closed (by kernel), except there is still a handler process (which shall process remaining events from the pipe). As soon as there is neither a fifosvd, nor a handler process, the pipe is shut down by the kernel, and nldev get error when writing to the pipe, so it knows the other end died.

You won't gain much benefit from watching SIGCHLD and reading the process status. It either will give you the information, fifosvd process is still running, or it died (failed). The same information you get from the write to the pipe, when the read end died, you get EPIPE.

Limiting the time, nldev tries to write to the pipe, would although allow to detect stuck operation of fifosvd / handler (won't be given by SIGCHLD watching) ... but (in parallel I discussed that with Laurent), the question is, how to react, when write to the pipe stuck (but no failure)? We can't do much here, and are in trouble either, but Laurent gave the argument: The netlink socket also contain a buffer, which may hold additional events, so we do not loss them, in case processing continues normally. When the kernel buffer fills up to it's limit, let the kernel react to the problem.

... otherwise you are right, nldev's job is to detect failure of the rest of the chain (that is supervise those), and has to react on this. The details of taken actions in this case, need and can be discussed (and may be later adapted), without much impact on other operation.

This clearly means, I'm open for suggestions, which kind of failure handling shall be done. Every action taken, to improve reaction, which is of benefit for the major purpose of the netlink reader, without blowing this up needlessly, is of interest (hold in mind: long lived daemon, trying to keep it simple and small).

My suggestion is: Let the netlink reader detect relevant errors, and exec (not spawn) a script of given name, when there are failures. This is small, and gives the invoked script full control on the failure management (no fixed functionality in a binary). When done, it can either die, letting a higher instance doing the job to restart, or exec back and re-start the hotplug system (may be with a different mechanism). When the script does not exist, the default action is to exit the netlink reader process unsuccessful, giving a higher instance a failure indication and the possibility to react on it.


Not detect? Sure you closed all open file descriptors for the write
end (a common cave-eat)? I have never bean hit by such a case, except
anyone forgot to close all file descriptors of the write end.

     You notice that something happened on input (AFAIR) but I'm sure
you don't know what. It may be data as well. You must read() to know.

The information is all you need. Either the writer process is still there (good), or has gone (bad). This is all required to decide what to do. More information may only be of interest for some kind of logging or error message, but this should have been done, before the writer process dies, not afterwards from the back (which always has less information than the writer itself).


     Anyway you don't want to poll() the pipe unless mdev-i is dead
because you don't want to awake fifosvd for every event.

Therefor fifosvd does poll the pipe only, when there is no running handler process. As soon as a handler is started (handing over the read end of the pipe), fifosvd waits not for events on the pipe, but for exit of the handler process (supervising that). When the handler exits fifosvd, goes back to watching for more data arriving in the pipe. With a few simple counter checks, fifosvd shall detect ping-pong plays, and avoid endless respawning of a failing handler process. If that happen, spawn a failure script, wait until exit, then retry pipe / handler operation.

--
Harald

_______________________________________________
busybox mailing list
[email protected]
http://lists.busybox.net/mailman/listinfo/busybox

Reply via email to