On Thursday, September 15, 2011 12:16:24 PM Michael Mol wrote: > On Thu, Sep 15, 2011 at 11:43 AM, Joost Roeleveld <[email protected]> wrote: > > On Thursday, September 15, 2011 11:03:09 AM Michael Mol wrote: > >> On Thu, Sep 15, 2011 at 10:48 AM, Joost Roeleveld <[email protected]> > > > > wrote: > >> The problem with this is that you now need to manage synchronization > >> between the kernel event processor and the action processor, which is > >> actually more complicated than keeping them together in a > >> single-threaded, single-process scenario. > > > > I don't agree. Why does this need to be synchronized? > > > > The kernel puts events in the new-dev-event-queue that process 1 picks > > up. process 1 creates the /dev-entrie(s) and, if there is an action to > > be taken, puts the action in the new-action-event-queue. > > > > Process 2 will then pick up this action from the new-action-event-queue > > and will process this. > > > > If, as a side-effect, of the action processed by process 2, a new device > > appears for the kernel, the kernel will simply put a corresponding event > > in the new-dev-event-queue. > > > > At which state does this need to be synchronized? > > We can simply use a pipe-device as, for instance, used for syslog? > > Let's assume that you have a single-reader, single-writer scenario, > and that either the protocol includes a 'record end' marker, or that > protocol messages are all of a fixed length and are written > atomically. With that out of the way, I don't know. Perhaps no > additional synchronization is necessary. > > You still have a problem with race conditions. Virtually all scripts > I've read and written assume a single-threaded environment, but you've > defined a two-threaded environment. > > Here's a potential scenario: > > 1) A kernel hotplug event comes in when a device is inserted. > 2) keventler catches the hotplug event, creates the device node, > queues an action event. > 3) actionhandler catches the action event, launches the script. > 4) The action handler script is still running for the plug-in event, > when A kernel hotplug event comes in indicating the device was > removed. keventler catches the new hotplug event, removes the device > node-- > 5) --the scheduler comes around and resumes working on the action > handler script. Or perhaps the action handler script was on a > different CPU core, and never needed to be unscheduled. The device > node it was expecting to be there just disappeared out from under it, > violating one of its assumptions, and putting it in an inconsistent > state. The inconsistency might occur in a place the script author > expected it, or the inconsistency might have occurred in an unexpected > place. One presumes the script author didn't sign up to deal with > concurrency issues in a bash or python script. > 6) keventler registers a new action event, for actioning on the disconnect. > 7) actionhandler picks up this new action event, runs the script. > Kudos to the script author for thinking ahead to have a shutdown > script properly clean up an inconsistent system state left by the > partially failed setup script. > > Steps 3-5 are a classic example of a race condition, and stem from two > active threads operating concurrently. Entire programming languages > are developed with the core intent of reducing the programmer's need > to worry about such things. > > You _must not_ change the operating environment of a script out from under > it. > > In bash scripts, this is an extraordinarily common pattern: > > if [ -d $SOME_PATH ]; then > // do something > fi > > That's common and accepted; nobody expects a shell script to fail in a > scenario like that, because it's is a single-threaded language, and > that's been true since its inception. When something keventler does > causes the result of "[ -d $SOME_PATH ]" to change after the test had > already been done, then the script is only broken because > keventler/actionhandler broke it, not because the script was badly > written.
Ok, didn't think of this scenario. Thank you for pointing this out to me. Your pseudo-code would be better then, except there should be some way of delaying action-tasks based on wether or not required files (including dependencies) are available. Or a retry-queue that retries an action a few times with certain intervals. This, however, will be more difficult to implement especially with the race-condition you mentioned. > I've really got to get back to working on stuff I'm being paid for. > I'll chat with you guys this weekend. I'm very interested in helping > with a reasoned critical perspective, so if this wanders over to a new > mailing list or discussion environment, drop me an invite. We will, but for now, why not keep it on here? :) Was wondering, does udev actually support actions for when a device is removed? Ok, just checked on my server and it does. All nicely pointing to scripts in /etc/.... Also, anyone knows how udev handles the scenario where a device is removed while the script is still running? Wouldn't it fail mid-execution because the kernel no longer allows actions with that device? -- Joost

