Neale Ferguson wrote:

>Looking at the relevant routine I note that the queue element is declared in
>the routine's stack area such that when it exits this area of storage should
>be up for grabs.

Well, yes.  As far as I understand that stuff (which is not very), the
msg_receiver is supposed to be added to the list by sys_msgrcv, and
removed from the list before sys_msgrcv exits.  The removal can happen
in either of two places: in pipelined_send if something is sent, and
in sys_msgrcv itself if nothing was sent.

However, there would appear to be a race condition in the implementation.
Say, CPU 0 is executing pipelined_send:

                                msr->r_msg = msg;
                                msq->q_lrpid = msr->r_tsk->pid;
                                msq->q_rtime = CURRENT_TIME;
                                wake_up_process(msr->r_tsk);

just after the assignment to r_msg, but before the wake_up_process.

At the same time, CPU 1 starts executing sys_msgrcv just after the
schedule () (because the process was woken for some other reason).

                schedule();
                current->state = TASK_RUNNING;

                msg = (struct msg_msg*) msr_d.r_msg;
                if(!IS_ERR(msg))
                        goto out_success;

                t = msg_lock(msqid);
                if(t==NULL)
                        msqid=-1;

Now, IS_ERR (msg) is false (because CPU 0 has already stored a valid
message), and so CPU 1 proceeds to out_success and subsequently exits,
causing the msg_receiver on its stack to be clobbered.

At this point, CPU 0 continues executing (it was a bit slow, maybe
it had been processing an interrupt) and accesses the non-existing
msg_receiver struct ...

As far as I can see, this sort of race is supposed to be prevented
by the msg_lock.  pipelined_send executes under protection of this
lock, and if every other user of the msg_receiver were as well,
the race would be fixed.  Unfortunately, this first check for
IS_ERR (msg) happens *outside* the msg_lock.

I'd suggest to try removing this first check.  A second check for
IS_ERR (msg) follows anyway, but this time under the proper lock.

You should also report this to the linux-kernel mailing list,
this is an architecture-indepdent bug.

Bye,
Ulrich

--
  Dr. Ulrich Weigand
  [EMAIL PROTECTED]

Reply via email to