From: Stephen Irons
Sent: Tuesday, March 24, 2009 4:18 AM
>
> As reported a few days ago, my DVEVM was producing a kernel panic with kernel
> version 2.6.25-davinci1, 2.6.26-rc5 and 2.6.27-davinci1. It was consistently
> in
> net_rx_action. It would happen after anything from a few seconds up to about 4
> hours of operation. A copy of the console log is attached at the end of this
> email.
>
> This happened only when the DVEVM was attached to a corporate network with
> thousands of packets per second. On a quiet, development network, this
> problem did
> not happen.
>
> I have discovered a few issues in drivers/net/davinci_emac.c. These problems
> have
> not been fixed in kernel 2.6.28 or 2.6.29. In the descriptions below, line
> numbers
> refer to the head of Davinci GIT.
>
> This email is a description of the problem and some solutions. Patches will
> come
> shortly.
>
>
> Stephen Irons
> Senior Engineer
> Tait Electronics Ltd
>
> Function emac_poll(), around line 2189:
> Original code:
> if (txpending || rxpending) {
> if (likely(netif_rx_schedule_prep(&priv->napi))) {
> emac_int_disable(priv);
> __netif_rx_schedule(&priv->napi);
> }
> } else {
> netif_rx_complete(napi);
> emac_int_enable(priv);
> }
> This code tries to decide whether to perform another NAPI poll (if there is
> still
> transmit or receive work to do), or if it should stop NAPI polling and
> re-enable
> the EMAC interrupt. If there is no more work to do (both txpending and
> rxpending
> are false), it removes the NAPI poll from the poll list, and re-enables the
> EMAC
> interrupt.
>
> Now, in the rare situation that the driver performs EXACTLY the budgetted
> amount of
> work and there is now no more to do, then it violates the comment in
> net/core/dev.c, function net_rx_action(),
> /* Drivers must not modify the NAPI state if they
> * consume the entire weight. In such cases this code
> * still "owns" the NAPI instance and therefore can
> * move the instance around on the list at-will.
> */
> if (unlikely(work == weight)) {
> if (unlikely(napi_disable_pending(n)))
> __napi_complete(n);
> else
> list_move_tail(&n->poll_list, list);
> }
> This code executes list_move_tail() to move the NAPI poll to the end of the
> list.
> However, in the rare situation described above, the NAPI poll has already been
> moved, and list_move_tail() tries to access a list-item that has been
> deleted. The
> ->next and ->prev pointers are set to the LIST_POISON value (0x00100100) and
> when
> these are followed, the corresponding addresses are not mapped into virtual
> memory
> and the kernel panic described above occurs.
Hmm, good catch!
> Solution
> In function emac_poll(), around line 2189, only disable the NAPI poll if
> there is
> no more work to do AND you have not consumed exactly your budget:
> if (txpending || rxpending) {
> if (likely(netif_rx_schedule_prep(&priv->napi))) {
> emac_int_disable(priv);
> __netif_rx_schedule(&priv->napi);
> }
> } else {
> if (num_pkts < budget) {
> netif_rx_complete(napi);
> emac_int_enable(priv);
> }
> }
Right, seems like this is what is being done in other drivers.
Drivers/net/e100.c seems a good example to follow. Looking at that driver, the
call to __netif_rx_schedule above could be superfluous as well. The poll
function will anyway keep getting called until it is knocked off the poll list
by driver poll function or by net_rx_action.
> This has the effect that there will be one more poll even though there is no
> more
> work to do. However, the next time around it will process no packets (because
> there
> are no more) and will disable the NAPI poll.
> Testing
> This problem usually occurs on a busy network. However, it is possible to
> cause the
> problem to happen on a quiet network by reducing the Davinci EMAC driver poll
> weight from 64 down to 4 or 8. If set to 4, my DVEVM would not even get to the
> login: prompt before crashing (booting from NFS-mounted root FS). If set to
> 8, it
> would often get to the login prompt, but would crash when flood-pinging the
> board.
>
> In drivers/net/davinci_emac.c:
> #define EMAC_POLL_WEIGHT (4) /* Default NAPI poll weight */
>
> /* Buffer descriptor parameters */
> #define EMAC_DEF_TX_MAX_SERVICE (4) /* TX max service BD's */
> #define EMAC_DEF_RX_MAX_SERVICE (4) /* should = netdev->weight */
> This works by making the emac_poll() function consumes its entire budget more
> often. When set to the default of 64, it very seldom consumed its budget,
> except on
> a busy network.
>
> It does also load up the processor significantly: it spends 50--60% of its
> time in
> doing soft irq activity.
> Function emac_poll(), around line 2137:
> The change above highlighted another issue: the transmit function can, and
> often
> will, do MORE than the budgetted amount of work in one poll.
> if (status & mask) {
> num_pkts = emac_tx_bdproc(priv, EMAC_DEF_TX_CH,
> EMAC_DEF_TX_MAX_SERVICE,
> &txpending);
> } /* TX processing */
> EMAC_DEF_TX_MAX_SERVICE is set to 32. If there is lots of transmit work to
> do, it
> will do all of it, irrespective of the budget. If there is no receive work to
> do,
> emac_poll() will return the number of transmit packets processed. This could
> be a
> value larger than its budget. This is a bad thing to do:
> * it will earn the driver a warning in dev.c, function net_rx_action(), when
> it
> executes WARN_ON_ONCE(work > weight);
> * it will cause untold chaos in netpoll.c, function poll_one_napi() and
> poll_napi(), because
> o poll_one_napi() will return a negative number (budget - work)
> o poll_napi() will keep on looping when it should have stopped
> if (!budget)
> break;
> Solution
> In drivers/net/davinci_emac.c, function emac_poll(), ensure that transmit
> processing also honours the budget.
> if (status & mask) {
> num_pkts = emac_tx_bdproc(priv, EMAC_DEF_TX_CH,
> budget,
> &txpending);
> } /* TX processing */
I think NAPI was meant for RX side only. Tx is supposed to throttle itself
since that traffic originates from the system itself.
Again e100 looks a good example to follow. It allows Tx to piggyback the poll
function, but Tx does not influence the work done. So in our case, not setting
num_pkts for Tx case should be good?
Thanks,
Sekhar
> In net/core/netpoll.c, function poll_napi(), change the test for end of loop
> to if
> (budget <= 0). This is defensive coding to ensure that buggy drivers do not
> make
> poll_napi() loop too many times.
> if (budget <= 0)
> break;
> Additional problem
> Consider the situation where there is both transmit and receive work to do.
> It will
> process up to the budgetted number of transmit AND receive packets. However,
> num_pkts is set to the number of receive packets processed, so it lies about
> the
> amount of work it has done. This is not a serious problem. No-one else in the
> kernel actually knows or cares how much time we spend doing stuff.
>
> Ideally, the emac_poll() function would:
> * Find out how much transmit and receive work is queued.
> * Portion this out between transmit and receive.
> * Tell the caller how much work it actually did.
> In practice, it might be acceptable to:
> * Do up to the budgetted amount of receive work
> * Do the remainder of the budget in transmit work
> * Tell the caller how much work it actually did.
> mask = EMAC_DM644X_MAC_IN_VECTOR_RX_INT_VEC;
>
> if (priv->version == EMAC_VERSION_2)
> mask = EMAC_DM646X_MAC_IN_VECTOR_RX_INT_VEC;
>
> if (status & mask) {
> num_rx_pkts = emac_rx_bdproc(priv, EMAC_DEF_RX_CH,
> budget, &rxpending);
> } /* RX processing */
>
> tx_budget = budget - num_rx_pkts;
> mask = EMAC_DM644X_MAC_IN_VECTOR_TX_INT_VEC;
>
> if (priv->version == EMAC_VERSION_2)
> mask = EMAC_DM646X_MAC_IN_VECTOR_TX_INT_VEC;
>
> if (status & mask) {
> num_tx_pkts = emac_tx_bdproc(priv, EMAC_DEF_TX_CH,
> tx_budget,
> &txpending);
> } /* TX processing */
>
> I have not tested the above. It might just make the Davinci EMAC driver a bit
> more
> friendly.
> Console log
> 20090306_153918 < r...@davinci-dcb:~# uname -r. .a
> 20090306_153918 < Linux davinci-dcb 2.6.27-davinci1-svn17242 #3 PREEMPT Fri
> Mar 6
> 09:34:36 NZDT 2009 armv5tejl unknown
> 20090306_153918 < r...@davinci-dcb:~#
> 20090306_161241 < r...@davinci-dcb:~# Unable to handle kernel paging request
> at
> virtual address 00100104
> 20090306_161241 < pgd = c0004000
> 20090306_161241 < [00100104] *pgd=00000000
> 20090306_161241 < Internal error: Oops: 817 [#1] PREEMPT
> 20090306_161241 < Modules linked in:
> 20090306_161241 < CPU: 0 Not tainted (2.6.27-davinci1-svn17242 #3)
> 20090306_161241 < PC is at net_rx_action+0x158/0x260
> 20090306_161241 < LR is at emac_poll+0x5bc/0x868
> 20090306_161241 < pc : [<c01845ec>] lr : [<c015f3ac>] psr: 60000093
> 20090306_161241 < sp : c0289e98 ip : c0289e38 fp : c0289ecc
> 20090306_161241 < r10: 0000012c r9 : c02c2464 r8 : 00000040
> 20090306_161241 < r7 : c0288000 r6 : 00000000 r5 : 00000040 r4 : c30a238c
> 20090306_161241 < r3 : 00200200 r2 : 00100100 r1 : 00000100 r0 : 00000040
> 20090306_161241 < Flags: nZCv IRQs off FIQs on Mode SVC_32 ISA ARM
> Segment
> kernel
> 20090306_161241 < Control: 0005317f Table: 8318c000 DAC: 00000017
> 20090306_161241 < Process swapper (pid: 0, stack limit = 0xc0288268)
> 20090306_161241 < Stack: (0xc0289e98 to 0xc028a000)
> 20090306_161241 < 9e80:
> c0289ebc 000320ce
> 20090306_161241 < 9ea0: c0180e48 c02a57c0 00000001 0000000a c02a5780 00000001
> c02b47ec 00000000
> 20090306_161241 < 9ec0: c0289efc c0289ed0 c003df44 c01844a4 c3144e20 0000000d
> c028f5ec 00000000
> 20090306_161241 < 9ee0: 00000002 00000001 c0288000 8001fd9c c0289f14 c0289f00
> c003e338 c003deec
> 20090306_161241 < 9f00: 00000002 0000000d c0289f34 c0289f18 c002404c c003e2fc
> c0063140 ffffffff
> 20090306_161241 < 9f20: fec48000 c028bdb8 c0289f8c c0289f38 c0024868 c0024010
> 00000000 0005317f
> 20090306_161241 < 9f40: 0005217f 60000013 c0025d8c c0288000 c028bdb8 c0025d8c
> c029efc8 41069265
> 20090306_161241 < 9f60: 8001fd9c c0289f8c 600000d3 c0289f80 c0025dd0 c0025ddc
> 60000013 ffffffff
> 20090306_161241 < 9f80: c0289fb4 c0289f90 c0025c58 c0025d9c c029eb68 c02b4a8c
> c029eb68 c0021f14
> 20090306_161241 < 9fa0: c028bc50 8001fdd0 c0289fc4 c0289fb8 c01fa980 c0025c10
> c0289ff4 c0289fc8
> 20090306_161241 < 9fc0: c0008a28 c01fa928 c0008394 00000000 00000000 c0021f14
> 00000000 00053175
> 20090306_161241 < 9fe0: c029f02c c0022318 00000000 c0289ff8 80008034 c00087cc
> 00000000 00000000
> 20090306_161241 < Backtrace:
> 20090306_161241 < [<c0184494>] (net_rx_action+0x0/0x260) from [<c003df44>]
> (__do_softirq+0x68/0xd4)
> 20090306_161241 < [<c003dedc>] (__do_softirq+0x0/0xd4) from [<c003e338>]
> (irq_exit+0x4c/0x9c)
> 20090306_161241 < [<c003e2ec>] (irq_exit+0x0/0x9c) from [<c002404c>]
> (__exception_text_start+0x4c/0x64)
> 20090306_161241 < r4:0000000d
> 20090306_161241 < [<c0024000>] (__exception_text_start+0x0/0x64) from
> [<c0024868>]
> (__irq_svc+0x48/0x88)
> 20090306_161241 < Exception stack(0xc0289f38 to 0xc0289f80)
> 20090306_161241 < 9f20:
> 00000000 0005317f
> 20090306_161241 < 9f40: 0005217f 60000013 c0025d8c c0288000 c028bdb8 c0025d8c
> c029efc8 41069265
> 20090306_161241 < 9f60: 8001fd9c c0289f8c 600000d3 c0289f80 c0025dd0 c0025ddc
> 60000013 ffffffff
> 20090306_161241 < r6:c028bdb8 r5:fec48000 r4:ffffffff
> 20090306_161241 < [<c0025d8c>] (default_idle+0x0/0x58) from [<c0025c58>]
> (cpu_idle+0x58/0x98)
> 20090306_161241 < [<c0025c00>] (cpu_idle+0x0/0x98) from [<c01fa980>]
> (rest_init+0x68/0x7c)
> 20090306_161241 < r8:8001fdd0 r7:c028bc50 r6:c0021f14 r5:c029eb68 r4:c02b4a8c
> 20090306_161241 < [<c01fa918>] (rest_init+0x0/0x7c) from [<c0008a28>]
> (start_kernel+0x26c/0x2dc)
> 20090306_161241 < [<c00087bc>] (start_kernel+0x0/0x2dc) from [<80008034>]
> (0x80008034)
> 20090306_161241 < r6:c0022318 r5:c029f02c r4:00053175
> 20090306_161241 < Code: e5843008 e121f002 ea000008 e894000c (e5823004)
> 20090306_161241 < Kernel panic - not syncing: Fatal exception in interrupt
> ________________________________________
>
> This email, including any attachments, is only for the intended addressee. It
> is
> subject to copyright, confidential and may be the subject of legal or other
> privilege, none of which is waived or lost by reason of this transmission. If
> the
> receiver is not the intended addressee, please accept our apologies, notify
> us by
> return, delete all copies and perform no other act on the email.
> Unfortunately, we
> cannot warrant that the email has not been altered or corrupted during
> transmission.
> ________________________________________
_______________________________________________
Davinci-linux-open-source mailing list
[email protected]
http://linux.davincidsp.com/mailman/listinfo/davinci-linux-open-source