Neale Ferguson wrote:
>Looking at the relevant routine I note that the queue element is declared in
>the routine's stack area such that when it exits this area of storage should
>be up for grabs.
Well, yes. As far as I understand that stuff (which is not very), the
msg_receiver is supposed to be added to the list by sys_msgrcv, and
removed from the list before sys_msgrcv exits. The removal can happen
in either of two places: in pipelined_send if something is sent, and
in sys_msgrcv itself if nothing was sent.
However, there would appear to be a race condition in the implementation.
Say, CPU 0 is executing pipelined_send:
msr->r_msg = msg;
msq->q_lrpid = msr->r_tsk->pid;
msq->q_rtime = CURRENT_TIME;
wake_up_process(msr->r_tsk);
just after the assignment to r_msg, but before the wake_up_process.
At the same time, CPU 1 starts executing sys_msgrcv just after the
schedule () (because the process was woken for some other reason).
schedule();
current->state = TASK_RUNNING;
msg = (struct msg_msg*) msr_d.r_msg;
if(!IS_ERR(msg))
goto out_success;
t = msg_lock(msqid);
if(t==NULL)
msqid=-1;
Now, IS_ERR (msg) is false (because CPU 0 has already stored a valid
message), and so CPU 1 proceeds to out_success and subsequently exits,
causing the msg_receiver on its stack to be clobbered.
At this point, CPU 0 continues executing (it was a bit slow, maybe
it had been processing an interrupt) and accesses the non-existing
msg_receiver struct ...
As far as I can see, this sort of race is supposed to be prevented
by the msg_lock. pipelined_send executes under protection of this
lock, and if every other user of the msg_receiver were as well,
the race would be fixed. Unfortunately, this first check for
IS_ERR (msg) happens *outside* the msg_lock.
I'd suggest to try removing this first check. A second check for
IS_ERR (msg) follows anyway, but this time under the proper lock.
You should also report this to the linux-kernel mailing list,
this is an architecture-indepdent bug.
Bye,
Ulrich
--
Dr. Ulrich Weigand
[EMAIL PROTECTED]