On Wed, Jan 10, 2007 at 11:40:35PM -0800, David Miller wrote:
From: Jarek Poplawski [EMAIL PROTECTED]
Date: Thu, 11 Jan 2007 08:24:28 +0100
Yesterday I did what I should do earlier - checked
this simple way, with printk, and now I have no doubts
it's a bug: if you add or remove vlan
On Thu, Jan 11, 2007 at 09:29:58AM +0100, Jarek Poplawski wrote:
On Wed, Jan 10, 2007 at 11:40:35PM -0800, David Miller wrote:
From: Jarek Poplawski [EMAIL PROTECTED]
Date: Thu, 11 Jan 2007 08:24:28 +0100
Yesterday I did what I should do earlier - checked
this simple way, with
On Thu, Jan 11, 2007 at 09:35:26AM +0100, Jarek Poplawski wrote:
On Thu, Jan 11, 2007 at 09:29:58AM +0100, Jarek Poplawski wrote:
On Wed, Jan 10, 2007 at 11:40:35PM -0800, David Miller wrote:
From: Jarek Poplawski [EMAIL PROTECTED]
Date: Thu, 11 Jan 2007 08:24:28 +0100
Yesterday I
On Thu, Jan 11, 2007 at 01:27:55AM -0800, David Miller wrote:
From: Jarek Poplawski [EMAIL PROTECTED]
Date: Thu, 11 Jan 2007 09:39:34 +0100
Sure, but is this even legal to be preempted during
reading or modifying rcu list or be blocked while
holding rcu protected pointer? Doesn't this
On Tue, Jan 09, 2007 at 09:10:45AM +0100, Jarek Poplawski wrote:
On Mon, Jan 08, 2007 at 10:03:50AM -0800, Stephen Hemminger wrote:
...
* Must be invoked with RCU read lock (no preempt)
*/
struct net_device *__find_vlan_dev(struct net_device *real_dev,
...
But later in
On Wed, Jan 10, 2007 at 10:04:11AM +0100, Jarek Poplawski wrote:
...
It looks like you're talking about the right thing
and I'm a fool again! Now I try to find why I even
had to pay for this. I read again and again adequate
chapters from R. Love and C. Benvenuti's books, see
a lot about
On Wed, 10 Jan 2007 13:50:48 +0100
Jarek Poplawski [EMAIL PROTECTED] wrote:
On Wed, Jan 10, 2007 at 10:04:11AM +0100, Jarek Poplawski wrote:
...
It looks like you're talking about the right thing
and I'm a fool again! Now I try to find why I even
had to pay for this. I read again and
On Wed, Jan 10, 2007 at 12:01:23PM -0800, Stephen Hemminger wrote:
...
Don't rely on books too heavily, they can get out of date
with a simple code change.
I've tried to find this in the code at the beginning
and got mislead by the path with PREEMPT_BKL.
I think the books are necessary to get
From: Jarek Poplawski [EMAIL PROTECTED]
Date: Thu, 11 Jan 2007 08:24:28 +0100
Yesterday I did what I should do earlier - checked
this simple way, with printk, and now I have no doubts
it's a bug: if you add or remove vlan devices with
vconfig, register_vlan_device and unregister_vlan_dev
are
On Mon, Jan 08, 2007 at 10:03:50AM -0800, Stephen Hemminger wrote:
On Mon, 08 Jan 2007 08:57:10 -0800
Ben Greear [EMAIL PROTECTED] wrote:
Jarek Poplawski wrote:
On Fri, Jan 05, 2007 at 12:33:43PM -0800, Ben Greear wrote:
...
So, I do believe this was the problem we were
Jarek Poplawski wrote:
On Fri, Jan 05, 2007 at 12:33:43PM -0800, Ben Greear wrote:
...
So, I do believe this was the problem we were hitting, and it seems fixed.
Congratulations!
But I can see one strange thing in vlan.c:
/* Must be invoked with RCU read lock (no preempt) */
static
On Mon, 08 Jan 2007 08:57:10 -0800
Ben Greear [EMAIL PROTECTED] wrote:
Jarek Poplawski wrote:
On Fri, Jan 05, 2007 at 12:33:43PM -0800, Ben Greear wrote:
...
So, I do believe this was the problem we were hitting, and it seems fixed.
Congratulations!
But I can see one
On Fri, Jan 05, 2007 at 12:33:43PM -0800, Ben Greear wrote:
...
So, I do believe this was the problem we were hitting, and it seems fixed.
Congratulations!
But I can see one strange thing in vlan.c:
/* Must be invoked with RCU read lock (no preempt) */
static struct vlan_group
On Fri, Jan 05, 2007 at 07:38:44AM +0100, Jarek Poplawski wrote:
I'd only suggest to change goto out; to
return NULL; at the end of inetdev_init because
now RCU is engaged unnecessarily.
I agree. The RCU assignment should come before the out label.
Can you send a patch?
Thanks,
--
Visit
On Thu, Jan 04, 2007 at 09:04:29AM -0800, Ben Greear wrote:
Jarek Poplawski wrote:
On Thu, Jan 04, 2007 at 09:27:07PM +1100, Herbert Xu wrote:
On Thu, Jan 04, 2007 at 09:50:14AM +0100, Jarek Poplawski wrote:
Could you explain? I can see some inet_rtm_newaddr
interrupted. For me it
On Thu, Jan 04, 2007 at 07:29:30PM +1100, Herbert Xu wrote:
On Thu, Jan 04, 2007 at 09:03:51AM +0100, Jarek Poplawski wrote:
I doubt this is the right solution. It certainly
could fix this particular situation but my main
point was packets shouldn't get into kernel
receive queues with
On Thu, Jan 04, 2007 at 09:50:14AM +0100, Jarek Poplawski wrote:
Could you explain? I can see some inet_rtm_newaddr
interrupted. For me it could be e.g.:
after
vconfig add eth0 9
ip addr add dev eth0.9 ...
Whether eth0.9 is up or not does not affect this at all. The spin
locks are
On Thu, Jan 04, 2007 at 09:27:07PM +1100, Herbert Xu wrote:
On Thu, Jan 04, 2007 at 09:50:14AM +0100, Jarek Poplawski wrote:
Could you explain? I can see some inet_rtm_newaddr
interrupted. For me it could be e.g.:
after
vconfig add eth0 9
ip addr add dev eth0.9 ...
Whether
Jarek Poplawski wrote:
On Thu, Jan 04, 2007 at 09:27:07PM +1100, Herbert Xu wrote:
On Thu, Jan 04, 2007 at 09:50:14AM +0100, Jarek Poplawski wrote:
Could you explain? I can see some inet_rtm_newaddr
interrupted. For me it could be e.g.:
after
vconfig add eth0 9
ip addr add dev eth0.9
From: Herbert Xu [EMAIL PROTECTED]
Date: Thu, 04 Jan 2007 17:26:27 +1100
David Stevens [EMAIL PROTECTED] wrote:
You're right, I don't know whether it'll fix the problem Ben saw
or not, but it looks like the original code can do a receive before the
in_device is fully initialized,
On Thu, Jan 04, 2007 at 12:33:33PM -0800, David Miller wrote:
From: Herbert Xu [EMAIL PROTECTED]
Date: Thu, 04 Jan 2007 17:26:27 +1100
David Stevens [EMAIL PROTECTED] wrote:
You're right, I don't know whether it'll fix the problem Ben saw
or not, but it looks like the original
On Tue, Jan 02, 2007 at 03:35:39PM -0800, David Stevens wrote:
I've looked at this a little too -- it'd be nice to know who holds
the write lock.
If you mean mc_list_lock - probably nobody - it's
not initialized (so the timers) for this in_device
and rtnl mutex is preempted by irq.
Actually I
On Wed, Jan 03, 2007 at 09:07:11AM +0100, Jarek Poplawski wrote:
On Tue, Jan 02, 2007 at 03:35:39PM -0800, David Stevens wrote:
I've looked at this a little too -- it'd be nice to know who holds
the write lock.
If you mean mc_list_lock - probably nobody - it's
not initialized (so the
Jarek Poplawski wrote:
On Wed, Jan 03, 2007 at 09:07:11AM +0100, Jarek Poplawski wrote:
On Tue, Jan 02, 2007 at 03:35:39PM -0800, David Stevens wrote:
I've looked at this a little too -- it'd be nice to know who holds
the write lock.
If you mean mc_list_lock - probably nobody -
Ben Jarek,
Your analysis looks correct to me. It seems to me the problem is
that
we don't want the in_device to be searchable until after the
initialization is done.
What about moving the initialization of dev-ip_ptr in inetdev_init() to
after the
out label?
Ben,
Here's a patch that I think will fix it, assuming the receive is
on the
same device as the initialization. Can you try this out?
+-DLS
[inline for viewing, attached for applying]
Signed-off-by: David L Stevens [EMAIL PROTECTED]
diff
David Stevens wrote:
Ben,
Here's a patch that I think will fix it, assuming the receive is
on the
same device as the initialization. Can you try this out?
We are attempting to reproduce this now...as soon as we can reproduce,
I'll apply this and see if that fixes the problem. This
OK, sounds good.
By the way, I think you can probably hit it more often if you have
something on the virtual network sending lots of multicast traffic while
you're creating the interface. That'll increase the odds that you'll
get into ip_check_mc() with a partially initialized in_dev.
You can
David Stevens [EMAIL PROTECTED] wrote:
Ben,
Here's a patch that I think will fix it, assuming the receive is
on the
same device as the initialization. Can you try this out?
Hi David:
Your patch makes sense on its own but I don't see the direct connection
to the soft lock-up. Sure
Herbert Xu wrote:
David Stevens [EMAIL PROTECTED] wrote:
Ben,
Here's a patch that I think will fix it, assuming the receive is
on the
same device as the initialization. Can you try this out?
Hi David:
Your patch makes sense on its own but I don't see the direct connection
to the
Herbert,
You're right, I don't know whether it'll fix the problem Ben saw
or not, but it looks like the original code can do a receive before the
in_device is fully initialized, and that, of course, is bad.
If the device for ip_rcv() is not the same one we were
initializing when
Ben,
If the ip_rcv() and the inetdev_init() are on the same
interface in your stack backtrace, it's a certainty at that point
that the lock value is still 0ed, because none of the initialization
occurs until after it has returned from the function it interrupted
to do the receive.
David Stevens [EMAIL PROTECTED] wrote:
You're right, I don't know whether it'll fix the problem Ben saw
or not, but it looks like the original code can do a receive before the
in_device is fully initialized, and that, of course, is bad.
If the device for ip_rcv() is not the same
On Tue, Jan 02, 2007 at 08:39:09AM +0100, Jarek Poplawski wrote:
...
It is hard to say what kind of bug to expect
because at the same time other net_rx_action
with the same vlan dev could take place on
other processor and this inetdev_init could
do more.
Sorry! inetdev_init couldn't do more
On Tue, Jan 02, 2007 at 09:23:02AM +0100, Jarek Poplawski wrote:
On Tue, Jan 02, 2007 at 08:39:09AM +0100, Jarek Poplawski wrote:
...
The main thing is the possibility of processing
skb with not entirely open source dev which isn't
expected (and checked) by receive functions.
I think the
I've looked at this a little too -- it'd be nice to know who holds
the write lock.
I see ip_mc_destroy_dev() is bouncing through the lock for
each multicast address, though it starts at the beginning of
the list each time. I don't see a problem with it, but it'd be
simpler if it acquired the
David Stevens wrote:
I've looked at this a little too -- it'd be nice to know who holds
the write lock.
I see ip_mc_destroy_dev() is bouncing through the lock for
each multicast address, though it starts at the beginning of
the list each time. I don't see a problem with it, but it'd be
simpler
I finally had time to look through the code in this backtrace in
detail. I think it *could*
be a race between ip_rcv and inetdev_init, but I am not certain. Other
than that, I'm real
low on ideas. I found a few more stack trace debugging options to
enable..perhaps that
will give a better
On Mon, Jan 01, 2007 at 09:00:05PM -0800, Ben Greear wrote:
I finally had time to look through the code in this backtrace in
detail. I think it *could*
be a race between ip_rcv and inetdev_init, but I am not certain. Other
than that, I'm real
low on ideas. I found a few more stack trace
39 matches
Mail list logo