On Tuesday 17 July 2007 10:57, Ingo Molnar wrote:
> i've got a new observation: changing CONFIG_HZ from 250 to 1000 makes 
> the problem go away. So it's somehow also related to jiffies.

There are several "Tx Hang detected" messages in the log, which looks
a lot as if net_rx_action never runs, or at least never calls
dev->poll on the e1000 nic.

Can you check whether/how often it bails out of net_rx_action taking
the softnet_break path?

My suspicion right now is that dev->quota goes way negative when
pushing out netconsole output. Normally, we bump the quota in
__net_rx_schedule. But the whole point of the patch is that
netpoll has no business removing the device from the poll_list,
so it stays there, and we don't end up calling __net_rx_schedule
as often as we would otherwise.

Can you try what happens if you change netif_rx_complete to
something like this:

        if (test_bit(__LINK_STATE_POLL_LIST_FROZEN, &dev->state)) {
                dev->quota = dev->weight;
                return;
        }

This is just a hack to make sure that we don't go to insanely
negative quotas while sending packets through netpoll.

Olaf
-- 
Olaf Kirch  |  --- o --- Nous sommes du soleil we love when we play
[EMAIL PROTECTED] |    / | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to