On Fri, 2013-03-29 at 12:26 -0700, Linus Torvalds wrote: > On Fri, Mar 29, 2013 at 12:06 PM, Dave Jones <da...@redhat.com> wrote: > > > > Here's an oops I just hit.. > > > > BUG: unable to handle kernel NULL pointer dereference at 000000000000000f > > IP: [<ffffffff812c24ca>] testmsg.isra.5+0x1a/0x60 > > Btw, looking at the code leading up to this, what the f*ck is wrong > with the IPC stuff? > > It's using the generic list stuff for most of the lists, but then it > open-codes the accesses. > > So instead of using > > for_each_entry(walk_msg, &msq->q_messages, m_list) { > .. > } > > the ipc/msg.c code does all that by hand, with > > tmp = msq->q_messages.next; > while (tmp != &msq->q_messages) { > struct msg_msg *walk_msg; > > walk_msg = list_entry(tmp, struct msg_msg, m_list); > ... > tmp = tmp->next; > } > > Ugh. The code is near unreadable. And then it has magic memory > barriers etc, implying that it doesn't lock the data structures, but > no comments about them. See expunge_all() and pipelined_send(). > > The code seems entirely random, and it's badly set up (annoyance of > the day: crazy helper functions in ipc/msgutil.c to make sure that (a) > you have to spend more effort looking for them, and (b) they won't get > inlined). > > Clearly nobody has cared for the crazy IPC message code in a long time.
Exactly that's what my patch series does; clean this mess up. This is what I wrote to Andrew a couple of days ago. On Tue, 2013-03-26 at 22:33 -0400, Peter Hurley wrote: I just figured out how the queue is being corrupted and why my series > fixes it. > > > With MSG_COPY set, the queue scan can exit with the local variable 'msg' > pointing to a real msg if the msg_counter never reaches the copy_number. > > The remaining execution looks like this: > > if (!IS_ERR(msg)) { > .... > if (msgflg & MSG_COPY) > goto out_unlock; > .... > > out_unlock: > msg_unlock(msq); > break; > } > } > if (IS_ERR(msg)) > .... > > bufsz = msg_handler(); > free_msg(msg); <<---- msg never unlinked > > > Since the msg should not have been found (because it failed the match > criteria), the if (!IS_ERR(msg)) clause should never have executed. > > That's why my refactor fixes resolve this; because msg is not > inadvertently treated as a found msg. > > But let's be honest; the real bug here is the poor structure of this > function that even made this possible. The deepest nesting executes a > goto to a label in the middle of an if clause. Yuck! No wonder this > thing's fragile. > > So my recommendation still stands. The series that fixes this has been > getting tested in linux-next for a month. Fixing this some other way is > just asking for more trouble. > > But why not just revert MSG_COPY altogether for 3.9? Regards, Peter Hurley -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/