Christian Borntraeger wrote:
> Rusty, Anthony, Dor,
>
> I need your brain power :-)
>
> On smp guests I have seen a problem with virtio (the version in curent Avi's
> git) which do not occur on single processor guests:
>
> kernel BUG at /space/kvm/drivers/virtio/virtio_ring.c:228!
> illegal operation: 0001 [#1]
> Modules linked in: ipv6
> CPU:    2    Not tainted
> Process swapper (pid: 0, task: 000000000f83e038, ksp: 000000000f877d70)
> Krnl PSW : 0704000180000000 000000000045df2a (vring_restart+0x5a/0x70)
>            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:0 PM:0 EA:3
> Krnl GPRS: 00000000c0a80101 0000000000000000 000000000eb35000 0000000010005800
>            000000000045ded0 000000000000192f 000000000eb21000 000000000eb21000
>            000000000000000e 000000000eb21900 000000000eb21920 000000000f867cb8
>            0700000000d9b058 0000000000000010 000000000045c06a 000000000f867cb8
> Krnl Code: 000000000045df1e: e3b0b0700004       lg      %r11,112(%r11)
>            000000000045df24: 07fe               bcr     15,%r14
>            000000000045df26: a7f40001           brc     15,45df28
>           >000000000045df2a: a7f4ffe1           brc     15,45deec
>            000000000045df2e: e31020300004       lg      %r1,48(%r2)
>            000000000045df34: a7480000           lhi     %r4,0
>            000000000045df38: 96011001           oi      1(%r1),1
>            000000000045df3c: a7f4ffef           brc     15,45df1a
> Call Trace:
> ([<000000000045c016>] virtnet_poll+0x96/0x42c)
>  [<000000000048cda2>] net_rx_action+0xca/0x150
>  [<0000000000137f7a>] __do_softirq+0x9e/0x130
>  [<00000000001105d6>] do_softirq+0xae/0xb4
>  [<0000000000138182>] irq_exit+0x96/0x9c
>  [<000000000010d710>] do_extint+0xcc/0xf8
>  [<00000000001135d0>] ext_no_vtime+0x16/0x1a
>  [<000000000010a57e>] cpu_idle+0x216/0x238
>
>
> I think there is a valid code path, triggering this bug:
>
>       CPU1                                            CPU2
> -----------------------                               -----------------------
> - virtnet_poll found no
>   more packets on queue
> - netif_rx_complete allow
>   poll to be called
> - vq_ops->restart is called
> - vq Interrupts are enabled   
>       .                    <new packets arrive>
> <vcpu is scheduled away>
>       .                                       - interrupt is delivered
>       .                                       - poll is called
>       .                                       - poll work is done
>       .                                       - netif_rx_complete
>       .                                       - vq_ops->restart is called
>       .                                       - check if vq interrupts are
>       .                                         enable --> BUG
>
>   
I didn't understand how its possible:

<vcpu is scheduled away>
        .                                       - interrupt is delivered
                                                        -vring_interrupt is 
called -> 
                                                        - skb_recv_done 
callback return false ->
                                                        vq->vring.avail->flags 
|= VRING_AVAIL_F_NO_INTERRUPT;

So when the restart callback will be called the 
BUG_ON(!(vq->vring.avail->flags & VRING_AVAIL_F_NO_INTERRUPT));
will not be issued.

        .                                       - poll is called
        .                                       - poll work is done
        .                                       - netif_rx_complete
        .                                       - vq_ops->restart is called
        .                                       - check if vq interrupts are
        .                                         enable --> BUG


> The first idea was to remove this check? (See patch below). I am not sure
> if the proper fix also requires to change vring.avail->flags to be only
> changed by atomic bitops. Any ideas, comments?
>   
As for now no harm can be done since it is only used in two place:
1. vring_restart inside a napi poll calback which is protected by napi 
poll lock
2. vring_interrupt in the interrupt handler. While only the 
VRING_AVAIL_F_NO_INTERRUPT bit is touched
    there is no possible harm, once we'll use more bits it might be an 
issue.
So Maybe we should use atomics.
> Signed-off-by: Christian Borntraeger <[EMAIL PROTECTED]>
> CC: Anthony Liguori <[EMAIL PROTECTED]>
> CC: Dor Laor <[EMAIL PROTECTED]>
> CC: Rusty Russell <[EMAIL PROTECTED]>
>
> ---
>  drivers/virtio/virtio_ring.c |    2 --
>  1 file changed, 2 deletions(-)
>
> Index: kvm/drivers/virtio/virtio_ring.c
> ===================================================================
> --- kvm.orig/drivers/virtio/virtio_ring.c
> +++ kvm/drivers/virtio/virtio_ring.c
> @@ -225,8 +225,6 @@ static bool vring_restart(struct virtque
>       struct vring_virtqueue *vq = to_vvq(_vq);
>  
>       START_USE(vq);
> -     BUG_ON(!(vq->vring.avail->flags & VRING_AVAIL_F_NO_INTERRUPT));
> -
>       /* We optimistically turn back on interrupts, then check if there was
>        * more to do. */
>       vq->vring.avail->flags &= ~VRING_AVAIL_F_NO_INTERRUPT;
>
>   


-------------------------------------------------------------------------
SF.Net email is sponsored by:
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services
for just about anything Open Source.
http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace
_______________________________________________
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Reply via email to