On Thursday, September 15, 2011 12:16:24 PM Michael Mol wrote:
> On Thu, Sep 15, 2011 at 11:43 AM, Joost Roeleveld <[email protected]> 
wrote:
> > On Thursday, September 15, 2011 11:03:09 AM Michael Mol wrote:
> >> On Thu, Sep 15, 2011 at 10:48 AM, Joost Roeleveld <[email protected]>
> > 
> > wrote:
> >> The problem with this is that you now need to manage synchronization
> >> between the kernel event processor and the action processor, which is
> >> actually more complicated than keeping them together in a
> >> single-threaded, single-process scenario.
> > 
> > I don't agree. Why does this need to be synchronized?
> > 
> > The kernel puts events in the new-dev-event-queue that process 1 picks
> > up. process 1 creates the /dev-entrie(s) and, if there is an action to
> > be taken, puts the action in the new-action-event-queue.
> > 
> > Process 2 will then pick up this action from the new-action-event-queue
> > and will process this.
> > 
> > If, as a side-effect, of the action processed by process 2, a new device
> > appears for the kernel, the kernel will simply put a corresponding event
> > in the new-dev-event-queue.
> > 
> > At which state does this need to be synchronized?
> > We can simply use a pipe-device as, for instance, used for syslog?
> 
> Let's assume that you have a single-reader, single-writer scenario,
> and that either the protocol includes a 'record end' marker, or that
> protocol messages are all of a fixed length and are written
> atomically. With that out of the way, I don't know. Perhaps no
> additional synchronization is necessary.
> 
> You still have a problem with race conditions. Virtually all scripts
> I've read and written assume a single-threaded environment, but you've
> defined a two-threaded environment.
> 
> Here's a potential scenario:
> 
> 1) A kernel hotplug event comes in when a device is inserted.
> 2) keventler catches the hotplug event, creates the device node,
> queues an action event.
> 3) actionhandler catches the action event, launches the script.
> 4) The action handler script is still running for the plug-in event,
> when A kernel hotplug event comes in indicating the device was
> removed. keventler catches the new hotplug event, removes the device
> node--
> 5) --the scheduler comes around and resumes working on the action
> handler script. Or perhaps the action handler script was on a
> different CPU core, and never needed to be unscheduled. The device
> node it was expecting to be there just disappeared out from under it,
> violating one of its assumptions, and putting it in an inconsistent
> state. The inconsistency might occur in a place the script author
> expected it, or the inconsistency might have occurred in an unexpected
> place. One presumes the script author didn't sign up to deal with
> concurrency issues in a bash or python script.
> 6) keventler  registers a new action event, for actioning on the disconnect.
> 7) actionhandler picks up this new action event, runs the script.
> Kudos to the script author for thinking ahead to have a shutdown
> script properly clean up an inconsistent system state left by the
> partially failed setup script.
> 
> Steps 3-5 are a classic example of a race condition, and stem from two
> active threads operating concurrently. Entire programming languages
> are developed with the core intent of reducing the programmer's need
> to worry about such things.
>
> You _must not_ change the operating environment of a script out from under
> it.
> 
> In bash scripts, this is an extraordinarily common pattern:
> 
> if [ -d $SOME_PATH ]; then
>   // do something
> fi
> 
> That's common and accepted; nobody expects a shell script to fail in a
> scenario like that, because it's is a single-threaded language, and
> that's been true since its inception. When something keventler does
> causes the result of "[ -d $SOME_PATH ]" to change after the test had
> already been done, then the script is only broken because
> keventler/actionhandler broke it, not because the script was badly
> written.

Ok, didn't think of this scenario. Thank you for pointing this out to me.
Your pseudo-code would be better then, except there should be some way of 
delaying action-tasks based on wether or not required files (including 
dependencies) are available. Or a retry-queue that retries an action a few 
times with certain intervals. This, however, will be more difficult to 
implement especially with the race-condition you mentioned.

> I've really got to get back to working on stuff I'm being paid for.
> I'll chat with you guys this weekend. I'm very interested in helping
> with a reasoned critical perspective, so if this wanders over to a new
> mailing list or discussion environment, drop me an invite.

We will, but for now, why not keep it on here? :)

Was wondering, does udev actually support actions for when a device is 
removed?
Ok, just checked on my server and it does. All nicely pointing to scripts in 
/etc/....

Also, anyone knows how udev handles the scenario where a device is removed 
while the script is still running? Wouldn't it fail mid-execution because the 
kernel no longer allows actions with that device?

--
Joost

Reply via email to