On Fri, 2013-03-29 at 12:26 -0700, Linus Torvalds wrote:
> On Fri, Mar 29, 2013 at 12:06 PM, Dave Jones <da...@redhat.com> wrote:
> >
> > Here's an oops I just hit..
> >
> > BUG: unable to handle kernel NULL pointer dereference at 000000000000000f
> > IP: [<ffffffff812c24ca>] testmsg.isra.5+0x1a/0x60
> 
> Btw, looking at the code leading up to this, what the f*ck is wrong
> with the IPC stuff?
> 
> It's using the generic list stuff for most of the lists, but then it
> open-codes the accesses.
> 
> So instead of using
> 
>    for_each_entry(walk_msg, &msq->q_messages, m_list) {
>       ..
>    }
> 
> the ipc/msg.c code does all that by hand, with
> 
>    tmp = msq->q_messages.next;
>    while (tmp != &msq->q_messages) {
>       struct msg_msg *walk_msg;
> 
>       walk_msg = list_entry(tmp, struct msg_msg, m_list);
>       ...
>       tmp = tmp->next;
>    }
> 
> Ugh. The code is near unreadable. And then it has magic memory
> barriers etc, implying that it doesn't lock the data structures, but
> no comments about them. See expunge_all() and pipelined_send().
> 
> The code seems entirely random, and it's badly set up (annoyance of
> the day: crazy helper functions in ipc/msgutil.c to make sure that (a)
> you have to spend more effort looking for them, and (b) they won't get
> inlined).
> 
> Clearly nobody has cared for the crazy IPC message code in a long time.

Exactly that's what my patch series does; clean this mess up.

This is what I wrote to Andrew a couple of days ago.

On Tue, 2013-03-26 at 22:33 -0400, Peter Hurley wrote:
I just figured out how the queue is being corrupted and why my series
> fixes it.
> 
> 
> With MSG_COPY set, the queue scan can exit with the local variable
'msg'
> pointing to a real msg if the msg_counter never reaches the
copy_number.
> 
> The remaining execution looks like this:
> 
>       if (!IS_ERR(msg)) {
>               ....
>               if (msgflg & MSG_COPY)
>                       goto out_unlock;
>               ....
> 
> out_unlock:
>                       msg_unlock(msq);
>                       break;
>               }
>       }
>       if (IS_ERR(msg))
>               ....
> 
>       bufsz = msg_handler();
>       free_msg(msg);                  <<---- msg never unlinked
> 
> 
> Since the msg should not have been found (because it failed the match
> criteria), the if (!IS_ERR(msg)) clause should never have executed.
> 
> That's why my refactor fixes resolve this; because msg is not
> inadvertently treated as a found msg.
> 
> But let's be honest; the real bug here is the poor structure of this
> function that even made this possible. The deepest nesting executes a
> goto to a label in the middle of an if clause. Yuck! No wonder this
> thing's fragile.
> 
> So my recommendation still stands. The series that fixes this has been
> getting tested in linux-next for a month. Fixing this some other way
is
> just asking for more trouble.
> 
> But why not just revert MSG_COPY altogether for 3.9?

Regards,
Peter Hurley

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to