Re: RFD: Rework/extending functionality of mdev

Didier Kryn Wed, 18 Mar 2015 10:10:29 -0700


Le 18/03/2015 13:34, Harald Becker a écrit :

On 18.03.2015 10:42, Didier Kryn wrote:

Long lived daemons should have both startup methods, selectable by a
parameter, so you make nobodies work more difficult than required.


     OK, I think you are right, because it is a little more than a fork:
you want to detach from the controlling terminal and start a new
session. I agree that it is a pain to do it by hand and it is OK if
there is a command-line switch to avoid all of it.

But there must be this switch.


Ack!

No, restart is not required, as netlink dies, when fifosvd dies (or
later on when the handler dies), the supervisor watching netlink may
then fire up a new netlink reader (possibly after failure management),
where this startup is always done through a central startup command
(e.g. xdev).

The supervisor, never starts up the netlink reader directly, but
watches the process it starts up for xdev. xdev does it's initial
action (startup code) then chains (exec) to the netlink reader. This
may look ugly and unnecessary complicated at the first glance, but is
a known practical trick to drop some memory resources not needed by
the long lived daemon, but required by the start up code. For the
supervisor instance this looks like a single process, it has started
and it may watch until it exits. So from that view it looks, as if
netlink has created the pipe and started the fifosvd, but in fact this
is done by the startup code (difference between flow of operation and
technical placing the code).


     I didn't notice this trick in your description. It is making more
and more sense :-).

I left it out, to make it not unnecessary complicated, and I wanted tofocus on the netlink / pipe operation.

     Now look, since nldev (lest's call it by its name) is execed by
xdev, it remains the parent of fifosvd, and therefore it shall receive
the SIGCLD if fifosvd dies. This is the best way for nldev to watch
fifosvd. Otherwise it should wait until it receives an event from the
netlink and tries to write it to the pipe, hence loosing the event and
the possible burst following it. nldev must die on SIGCLD (after piping
available events, though); this is the only "supervision" logic it must
implement, but I think it is critical. And it is the same if nldev is
launched with a long-lived mdev-i without a fifosvd.

netlink reader (nldev) does not need to explicitly watch the fifosvdby SIGCHLD.

Either that piece of code does it's job, or it fails and dies. Whenfifosvd dies, the read end of the pipe is closed (by kernel), exceptthere is still a handler process (which shall process remaining eventsfrom the pipe). As soon as there is neither a fifosvd, nor a handlerprocess, the pipe is shut down by the kernel, and nldev get error whenwriting to the pipe, so it knows the other end died.

No, you must write to the pipe to detect it is broken. And youwon't try to write before you've got an event from the netlink. Thisevent will be lost.

You won't gain much benefit from watching SIGCHLD and reading theprocess status. It either will give you the information, fifosvdprocess is still running, or it died (failed). The same informationyou get from the write to the pipe, when the read end died, you get EPIPE.

You get the information immediately from SIGCLD. You get it toolate from the pipe, and you loose at least one event for sure, a wholeburst if there is.

Limiting the time, nldev tries to write to the pipe, would althoughallow to detect stuck operation of fifosvd / handler (won't be givenby SIGCHLD watching) ... but (in parallel I discussed that withLaurent), the question is, how to react, when write to the pipe stuck(but no failure)? We can't do much here, and are in trouble either,but Laurent gave the argument: The netlink socket also contain abuffer, which may hold additional events, so we do not loss them, incase processing continues normally. When the kernel buffer fills up toit's limit, let the kernel react to the problem.

    Sure, the limit here is pipe size (adjustable) + netlink buffer size.

... otherwise you are right, nldev's job is to detect failure of therest of the chain (that is supervise those), and has to react on this.The details of taken actions in this case, need and can be discussed(and may be later adapted), without much impact on other operation.
This clearly means, I'm open for suggestions, which kind of failurehandling shall be done. Every action taken, to improve reaction, whichis of benefit for the major purpose of the netlink reader, withoutblowing this up needlessly, is of interest (hold in mind: long liveddaemon, trying to keep it simple and small).
My suggestion is: Let the netlink reader detect relevant errors, andexec (not spawn) a script of given name, when there are failures. Thisis small, and gives the invoked script full control on the failuremanagement (no fixed functionality in a binary). When done, it caneither die, letting a higher instance doing the job to restart, orexec back and re-start the hotplug system (may be with a differentmechanism). When the script does not exist, the default action is toexit the netlink reader process unsuccessful, giving a higher instancea failure indication and the possibility to react on it.

This is fine as long as the netlink reader keeps control on itsexit, not if it's killed.

This netlink reader you describe is not the general tool we wereconsidering up to now, the simple data funnel. If the idea is tointegrate such peculiarities as execing a script, then it is not thegeneral tool and why not integrate as well the supervision of mdev-iinstead of needing fifosvd. The reason for fifosvd was AFAIU toassociate general tools, nldev and mdev-i.

On the other hand, exiting on SIGCLD (after wait()ing the child) isneither a major change to nldev, nor one which would preclude its use inany other case.

Not detect? Sure you closed all open file descriptors for the write
end (a common cave-eat)? I have never bean hit by such a case, except
anyone forgot to close all file descriptors of the write end.

     You notice that something happened on input (AFAIR) but I'm sure
you don't know what. It may be data as well. You must read() to know.

The information is all you need. Either the writer process is stillthere (good), or has gone (bad).

OK, let's assume fifosvd polls the pipe. As long as poll() blocks,it means nldev is alive and is waiting for event. When poll() returns,it means either nldev has piped an event or it has died, you don't knowwhich; you don't get the information you need because the only way toget it is to read from the pipe.

Now suppose nldev is dead but fifosvd doesn't read. It assumesthere is data and launches mdev-i. mdev-i dies immediately and fifosvdpolls again; poll returns immediately. This is endless.

However there is an indirect way to get the information that nldevdied; it is from the return code of mdev-i.


    Didier

_______________________________________________
busybox mailing list
[email protected]
http://lists.busybox.net/mailman/listinfo/busybox

Re: RFD: Rework/extending functionality of mdev

Reply via email to