Re: RFD: Rework/extending functionality of mdev

Harald Becker Wed, 18 Mar 2015 05:35:58 -0700

On 18.03.2015 10:42, Didier Kryn wrote:

Long lived daemons should have both startup methods, selectable by a
parameter, so you make nobodies work more difficult than required.


     OK, I think you are right, because it is a little more than a fork:
you want to detach from the controlling terminal and start a new
session. I agree that it is a pain to do it by hand and it is OK if
there is a command-line switch to avoid all of it.

But there must be this switch.


Ack!

No, restart is not required, as netlink dies, when fifosvd dies (or
later on when the handler dies), the supervisor watching netlink may
then fire up a new netlink reader (possibly after failure management),
where this startup is always done through a central startup command
(e.g. xdev).

The supervisor, never starts up the netlink reader directly, but
watches the process it starts up for xdev. xdev does it's initial
action (startup code) then chains (exec) to the netlink reader. This
may look ugly and unnecessary complicated at the first glance, but is
a known practical trick to drop some memory resources not needed by
the long lived daemon, but required by the start up code. For the
supervisor instance this looks like a single process, it has started
and it may watch until it exits. So from that view it looks, as if
netlink has created the pipe and started the fifosvd, but in fact this
is done by the startup code (difference between flow of operation and
technical placing the code).


     I didn't notice this trick in your description. It is making more
and more sense :-).

I left it out, to make it not unnecessary complicated, and I wanted tofocus on the netlink / pipe operation.

     Now look, since nldev (lest's call it by its name) is execed by
xdev, it remains the parent of fifosvd, and therefore it shall receive
the SIGCLD if fifosvd dies. This is the best way for nldev to watch
fifosvd. Otherwise it should wait until it receives an event from the
netlink and tries to write it to the pipe, hence loosing the event and
the possible burst following it. nldev must die on SIGCLD (after piping
available events, though); this is the only "supervision" logic it must
implement, but I think it is critical. And it is the same if nldev is
launched with a long-lived mdev-i without a fifosvd.

netlink reader (nldev) does not need to explicitly watch the fifosvd bySIGCHLD.

Either that piece of code does it's job, or it fails and dies. Whenfifosvd dies, the read end of the pipe is closed (by kernel), exceptthere is still a handler process (which shall process remaining eventsfrom the pipe). As soon as there is neither a fifosvd, nor a handlerprocess, the pipe is shut down by the kernel, and nldev get error whenwriting to the pipe, so it knows the other end died.

You won't gain much benefit from watching SIGCHLD and reading theprocess status. It either will give you the information, fifosvd processis still running, or it died (failed). The same information you get fromthe write to the pipe, when the read end died, you get EPIPE.

Limiting the time, nldev tries to write to the pipe, would althoughallow to detect stuck operation of fifosvd / handler (won't be given bySIGCHLD watching) ... but (in parallel I discussed that with Laurent),the question is, how to react, when write to the pipe stuck (but nofailure)? We can't do much here, and are in trouble either, but Laurentgave the argument: The netlink socket also contain a buffer, which mayhold additional events, so we do not loss them, in case processingcontinues normally. When the kernel buffer fills up to it's limit, letthe kernel react to the problem.

... otherwise you are right, nldev's job is to detect failure of therest of the chain (that is supervise those), and has to react on this.The details of taken actions in this case, need and can be discussed(and may be later adapted), without much impact on other operation.

This clearly means, I'm open for suggestions, which kind of failurehandling shall be done. Every action taken, to improve reaction, whichis of benefit for the major purpose of the netlink reader, withoutblowing this up needlessly, is of interest (hold in mind: long liveddaemon, trying to keep it simple and small).

My suggestion is: Let the netlink reader detect relevant errors, andexec (not spawn) a script of given name, when there are failures. Thisis small, and gives the invoked script full control on the failuremanagement (no fixed functionality in a binary). When done, it caneither die, letting a higher instance doing the job to restart, or execback and re-start the hotplug system (may be with a differentmechanism). When the script does not exist, the default action is toexit the netlink reader process unsuccessful, giving a higher instance afailure indication and the possibility to react on it.

Not detect? Sure you closed all open file descriptors for the write
end (a common cave-eat)? I have never bean hit by such a case, except
anyone forgot to close all file descriptors of the write end.

     You notice that something happened on input (AFAIR) but I'm sure
you don't know what. It may be data as well. You must read() to know.

The information is all you need. Either the writer process is stillthere (good), or has gone (bad). This is all required to decide what todo. More information may only be of interest for some kind of logging orerror message, but this should have been done, before the writer processdies, not afterwards from the back (which always has less informationthan the writer itself).

     Anyway you don't want to poll() the pipe unless mdev-i is dead
because you don't want to awake fifosvd for every event.

Therefor fifosvd does poll the pipe only, when there is no runninghandler process. As soon as a handler is started (handing over the readend of the pipe), fifosvd waits not for events on the pipe, but for exitof the handler process (supervising that). When the handler exitsfifosvd, goes back to watching for more data arriving in the pipe. Witha few simple counter checks, fifosvd shall detect ping-pong plays, andavoid endless respawning of a failing handler process. If that happen,spawn a failure script, wait until exit, then retry pipe / handleroperation.


--
Harald

_______________________________________________
busybox mailing list
[email protected]
http://lists.busybox.net/mailman/listinfo/busybox

Re: RFD: Rework/extending functionality of mdev

Reply via email to