Re: [PULL 04/17] virtio-net: Add support for USO features

2024-05-16 Thread Jason Wang
On Thu, May 16, 2024 at 9:51 PM Fiona Ebner  wrote:
>
> Hi,
>
> Am 08.09.23 um 08:44 schrieb Jason Wang:
> > diff --git a/hw/core/machine.c b/hw/core/machine.c
> > index da699cf..230aab8 100644
> > --- a/hw/core/machine.c
> > +++ b/hw/core/machine.c
> > @@ -38,6 +38,7 @@
> >  #include "exec/confidential-guest-support.h"
> >  #include "hw/virtio/virtio.h"
> >  #include "hw/virtio/virtio-pci.h"
> > +#include "hw/virtio/virtio-net.h"
> >
> >  GlobalProperty hw_compat_8_1[] = {};
> >  const size_t hw_compat_8_1_len = G_N_ELEMENTS(hw_compat_8_1);
> > @@ -45,6 +46,9 @@ const size_t hw_compat_8_1_len = 
> > G_N_ELEMENTS(hw_compat_8_1);
> >  GlobalProperty hw_compat_8_0[] = {
> >  { "migration", "multifd-flush-after-each-section", "on"},
> >  { TYPE_PCI_DEVICE, "x-pcie-ari-nextfn-1", "on" },
> > +{ TYPE_VIRTIO_NET, "host_uso", "off"},
> > +{ TYPE_VIRTIO_NET, "guest_uso4", "off"},
> > +{ TYPE_VIRTIO_NET, "guest_uso6", "off"},
> >  };
> >  const size_t hw_compat_8_0_len = G_N_ELEMENTS(hw_compat_8_0);
> >
>
> unfortunately, this broke backwards migration with machine version 8.1
> from 8.2 and 9.0 binaries to a 8.1 binary:
>
> > kvm: Features 0x1c0010130afffa7 unsupported. Allowed features: 0x10179bfffe7
> > kvm: Failed to load virtio-net:virtio
> > kvm: error while loading state for instance 0x0 of device 
> > ':00:12.0/virtio-net'
> > kvm: load of migration failed: Operation not permitted
>
> Since the series here only landed in 8.2, shouldn't these flags have
> been added to hw_compat_8_1[] instead?

You are right. We need to put them into hw_compat_8_1[].

>
> Attempting to fix it by moving the flags will break migration with
> machine version 8.1 between patched 9.0 and unpatched 9.0 however :(

I'm sorry but I can't think of a way better.

>
> Is there anything that can be done or will it need to stay broken now?

Would you mind posting a patch to fix this and cc stable?

>
> CC-ing the migration maintainers.
>
> Best Regards,
> Fiona
>

Thanks




Re: [RFC 0/2] Identify aliased maps in vdpa SVQ iova_tree

2024-05-13 Thread Jason Wang
On Mon, May 13, 2024 at 5:58 PM Eugenio Perez Martin
 wrote:
>
> On Mon, May 13, 2024 at 10:28 AM Jason Wang  wrote:
> >
> > On Mon, May 13, 2024 at 2:28 PM Eugenio Perez Martin
> >  wrote:
> > >
> > > On Sat, May 11, 2024 at 6:07 AM Jason Wang  wrote:
> > > >
> > > > On Fri, May 10, 2024 at 3:16 PM Eugenio Perez Martin
> > > >  wrote:
> > > > >
> > > > > On Fri, May 10, 2024 at 6:29 AM Jason Wang  
> > > > > wrote:
> > > > > >
> > > > > > On Thu, May 9, 2024 at 3:10 PM Eugenio Perez Martin 
> > > > > >  wrote:
> > > > > > >
> > > > > > > On Thu, May 9, 2024 at 8:27 AM Jason Wang  
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > On Thu, May 9, 2024 at 1:16 AM Eugenio Perez Martin 
> > > > > > > >  wrote:
> > > > > > > > >
> > > > > > > > > On Wed, May 8, 2024 at 4:29 AM Jason Wang 
> > > > > > > > >  wrote:
> > > > > > > > > >
> > > > > > > > > > On Tue, May 7, 2024 at 6:57 PM Eugenio Perez Martin 
> > > > > > > > > >  wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Tue, May 7, 2024 at 9:29 AM Jason Wang 
> > > > > > > > > > >  wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > On Fri, Apr 12, 2024 at 3:56 PM Eugenio Perez Martin
> > > > > > > > > > > >  wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Fri, Apr 12, 2024 at 8:47 AM Jason Wang 
> > > > > > > > > > > > >  wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Wed, Apr 10, 2024 at 6:03 PM Eugenio Pérez 
> > > > > > > > > > > > > >  wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > The guest may have overlapped memory regions, 
> > > > > > > > > > > > > > > where different GPA leads
> > > > > > > > > > > > > > > to the same HVA.  This causes a problem when 
> > > > > > > > > > > > > > > overlapped regions
> > > > > > > > > > > > > > > (different GPA but same translated HVA) exists in 
> > > > > > > > > > > > > > > the tree, as looking
> > > > > > > > > > > > > > > them by HVA will return them twice.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I think I don't understand if there's any side 
> > > > > > > > > > > > > > effect for shadow virtqueue?
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > My bad, I totally forgot to put a reference to where 
> > > > > > > > > > > > > this comes from.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Si-Wei found that during initialization this 
> > > > > > > > > > > > > sequences of maps /
> > > > > > > > > > > > > unmaps happens [1]:
> > > > > > > > > > > > >
> > > > > > > > > > > > > HVAGPAIOVA
> > > > > > > > > > > > > -
> > > > > > > > > > > > > Map
> > > > > > > > > > > > > [0x7f7903e0, 0x7f7983e0)[0x0, 0x8000) 
> > > > > > > > > > > > > [0x1000, 0x8000)
> > > > > > > > > > > > > [0x7f7983e0, 0x7f9903e0)[0x1, 
> > > > >

Re: [RFC 0/2] Identify aliased maps in vdpa SVQ iova_tree

2024-05-13 Thread Jason Wang
On Mon, May 13, 2024 at 2:28 PM Eugenio Perez Martin
 wrote:
>
> On Sat, May 11, 2024 at 6:07 AM Jason Wang  wrote:
> >
> > On Fri, May 10, 2024 at 3:16 PM Eugenio Perez Martin
> >  wrote:
> > >
> > > On Fri, May 10, 2024 at 6:29 AM Jason Wang  wrote:
> > > >
> > > > On Thu, May 9, 2024 at 3:10 PM Eugenio Perez Martin 
> > > >  wrote:
> > > > >
> > > > > On Thu, May 9, 2024 at 8:27 AM Jason Wang  wrote:
> > > > > >
> > > > > > On Thu, May 9, 2024 at 1:16 AM Eugenio Perez Martin 
> > > > > >  wrote:
> > > > > > >
> > > > > > > On Wed, May 8, 2024 at 4:29 AM Jason Wang  
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > On Tue, May 7, 2024 at 6:57 PM Eugenio Perez Martin 
> > > > > > > >  wrote:
> > > > > > > > >
> > > > > > > > > On Tue, May 7, 2024 at 9:29 AM Jason Wang 
> > > > > > > > >  wrote:
> > > > > > > > > >
> > > > > > > > > > On Fri, Apr 12, 2024 at 3:56 PM Eugenio Perez Martin
> > > > > > > > > >  wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Fri, Apr 12, 2024 at 8:47 AM Jason Wang 
> > > > > > > > > > >  wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > On Wed, Apr 10, 2024 at 6:03 PM Eugenio Pérez 
> > > > > > > > > > > >  wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > The guest may have overlapped memory regions, where 
> > > > > > > > > > > > > different GPA leads
> > > > > > > > > > > > > to the same HVA.  This causes a problem when 
> > > > > > > > > > > > > overlapped regions
> > > > > > > > > > > > > (different GPA but same translated HVA) exists in the 
> > > > > > > > > > > > > tree, as looking
> > > > > > > > > > > > > them by HVA will return them twice.
> > > > > > > > > > > >
> > > > > > > > > > > > I think I don't understand if there's any side effect 
> > > > > > > > > > > > for shadow virtqueue?
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > My bad, I totally forgot to put a reference to where this 
> > > > > > > > > > > comes from.
> > > > > > > > > > >
> > > > > > > > > > > Si-Wei found that during initialization this sequences of 
> > > > > > > > > > > maps /
> > > > > > > > > > > unmaps happens [1]:
> > > > > > > > > > >
> > > > > > > > > > > HVAGPAIOVA
> > > > > > > > > > > -
> > > > > > > > > > > Map
> > > > > > > > > > > [0x7f7903e0, 0x7f7983e0)[0x0, 0x8000) 
> > > > > > > > > > > [0x1000, 0x8000)
> > > > > > > > > > > [0x7f7983e0, 0x7f9903e0)[0x1, 
> > > > > > > > > > > 0x208000)
> > > > > > > > > > > [0x80001000, 0x201000)
> > > > > > > > > > > [0x7f7903ea, 0x7f7903ec)[0xfeda, 
> > > > > > > > > > > 0xfedc)
> > > > > > > > > > > [0x201000, 0x221000)
> > > > > > > > > > >
> > > > > > > > > > > Unmap
> > > > > > > > > > > [0x7f7903ea, 0x7f7903ec)[0xfeda, 
> > > > > > > > > > > 0xfedc) [0x1000,
> > > > > > > > > > > 0x2) ???
> > > > > > > > > > >
>

Re: [RFC 0/2] Identify aliased maps in vdpa SVQ iova_tree

2024-05-10 Thread Jason Wang
On Fri, May 10, 2024 at 3:16 PM Eugenio Perez Martin
 wrote:
>
> On Fri, May 10, 2024 at 6:29 AM Jason Wang  wrote:
> >
> > On Thu, May 9, 2024 at 3:10 PM Eugenio Perez Martin  
> > wrote:
> > >
> > > On Thu, May 9, 2024 at 8:27 AM Jason Wang  wrote:
> > > >
> > > > On Thu, May 9, 2024 at 1:16 AM Eugenio Perez Martin 
> > > >  wrote:
> > > > >
> > > > > On Wed, May 8, 2024 at 4:29 AM Jason Wang  wrote:
> > > > > >
> > > > > > On Tue, May 7, 2024 at 6:57 PM Eugenio Perez Martin 
> > > > > >  wrote:
> > > > > > >
> > > > > > > On Tue, May 7, 2024 at 9:29 AM Jason Wang  
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > On Fri, Apr 12, 2024 at 3:56 PM Eugenio Perez Martin
> > > > > > > >  wrote:
> > > > > > > > >
> > > > > > > > > On Fri, Apr 12, 2024 at 8:47 AM Jason Wang 
> > > > > > > > >  wrote:
> > > > > > > > > >
> > > > > > > > > > On Wed, Apr 10, 2024 at 6:03 PM Eugenio Pérez 
> > > > > > > > > >  wrote:
> > > > > > > > > > >
> > > > > > > > > > > The guest may have overlapped memory regions, where 
> > > > > > > > > > > different GPA leads
> > > > > > > > > > > to the same HVA.  This causes a problem when overlapped 
> > > > > > > > > > > regions
> > > > > > > > > > > (different GPA but same translated HVA) exists in the 
> > > > > > > > > > > tree, as looking
> > > > > > > > > > > them by HVA will return them twice.
> > > > > > > > > >
> > > > > > > > > > I think I don't understand if there's any side effect for 
> > > > > > > > > > shadow virtqueue?
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > My bad, I totally forgot to put a reference to where this 
> > > > > > > > > comes from.
> > > > > > > > >
> > > > > > > > > Si-Wei found that during initialization this sequences of 
> > > > > > > > > maps /
> > > > > > > > > unmaps happens [1]:
> > > > > > > > >
> > > > > > > > > HVAGPAIOVA
> > > > > > > > > -
> > > > > > > > > Map
> > > > > > > > > [0x7f7903e0, 0x7f7983e0)[0x0, 0x8000) 
> > > > > > > > > [0x1000, 0x8000)
> > > > > > > > > [0x7f7983e0, 0x7f9903e0)[0x1, 
> > > > > > > > > 0x208000)
> > > > > > > > > [0x80001000, 0x201000)
> > > > > > > > > [0x7f7903ea, 0x7f7903ec)[0xfeda, 0xfedc)
> > > > > > > > > [0x201000, 0x221000)
> > > > > > > > >
> > > > > > > > > Unmap
> > > > > > > > > [0x7f7903ea, 0x7f7903ec)[0xfeda, 0xfedc) 
> > > > > > > > > [0x1000,
> > > > > > > > > 0x2) ???
> > > > > > > > >
> > > > > > > > > The third HVA range is contained in the first one, but 
> > > > > > > > > exposed under a
> > > > > > > > > different GVA (aliased). This is not "flattened" by QEMU, as 
> > > > > > > > > GPA does
> > > > > > > > > not overlap, only HVA.
> > > > > > > > >
> > > > > > > > > At the third chunk unmap, the current algorithm finds the 
> > > > > > > > > first chunk,
> > > > > > > > > not the second one. This series is the way to tell the 
> > > > > > > > > difference at
> > > > > > > > > unmap time.
> > > > > > > > >
> > >

Re: [RFC 0/2] Identify aliased maps in vdpa SVQ iova_tree

2024-05-09 Thread Jason Wang
On Thu, May 9, 2024 at 3:10 PM Eugenio Perez Martin  wrote:
>
> On Thu, May 9, 2024 at 8:27 AM Jason Wang  wrote:
> >
> > On Thu, May 9, 2024 at 1:16 AM Eugenio Perez Martin  
> > wrote:
> > >
> > > On Wed, May 8, 2024 at 4:29 AM Jason Wang  wrote:
> > > >
> > > > On Tue, May 7, 2024 at 6:57 PM Eugenio Perez Martin 
> > > >  wrote:
> > > > >
> > > > > On Tue, May 7, 2024 at 9:29 AM Jason Wang  wrote:
> > > > > >
> > > > > > On Fri, Apr 12, 2024 at 3:56 PM Eugenio Perez Martin
> > > > > >  wrote:
> > > > > > >
> > > > > > > On Fri, Apr 12, 2024 at 8:47 AM Jason Wang  
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > On Wed, Apr 10, 2024 at 6:03 PM Eugenio Pérez 
> > > > > > > >  wrote:
> > > > > > > > >
> > > > > > > > > The guest may have overlapped memory regions, where different 
> > > > > > > > > GPA leads
> > > > > > > > > to the same HVA.  This causes a problem when overlapped 
> > > > > > > > > regions
> > > > > > > > > (different GPA but same translated HVA) exists in the tree, 
> > > > > > > > > as looking
> > > > > > > > > them by HVA will return them twice.
> > > > > > > >
> > > > > > > > I think I don't understand if there's any side effect for 
> > > > > > > > shadow virtqueue?
> > > > > > > >
> > > > > > >
> > > > > > > My bad, I totally forgot to put a reference to where this comes 
> > > > > > > from.
> > > > > > >
> > > > > > > Si-Wei found that during initialization this sequences of maps /
> > > > > > > unmaps happens [1]:
> > > > > > >
> > > > > > > HVAGPAIOVA
> > > > > > > -
> > > > > > > Map
> > > > > > > [0x7f7903e0, 0x7f7983e0)[0x0, 0x8000) [0x1000, 
> > > > > > > 0x8000)
> > > > > > > [0x7f7983e0, 0x7f9903e0)[0x1, 0x208000)
> > > > > > > [0x80001000, 0x201000)
> > > > > > > [0x7f7903ea, 0x7f7903ec)[0xfeda, 0xfedc)
> > > > > > > [0x201000, 0x221000)
> > > > > > >
> > > > > > > Unmap
> > > > > > > [0x7f7903ea, 0x7f7903ec)[0xfeda, 0xfedc) 
> > > > > > > [0x1000,
> > > > > > > 0x2) ???
> > > > > > >
> > > > > > > The third HVA range is contained in the first one, but exposed 
> > > > > > > under a
> > > > > > > different GVA (aliased). This is not "flattened" by QEMU, as GPA 
> > > > > > > does
> > > > > > > not overlap, only HVA.
> > > > > > >
> > > > > > > At the third chunk unmap, the current algorithm finds the first 
> > > > > > > chunk,
> > > > > > > not the second one. This series is the way to tell the difference 
> > > > > > > at
> > > > > > > unmap time.
> > > > > > >
> > > > > > > [1] 
> > > > > > > https://lists.nongnu.org/archive/html/qemu-devel/2024-04/msg00079.html
> > > > > > >
> > > > > > > Thanks!
> > > > > >
> > > > > > Ok, I was wondering if we need to store GPA(GIOVA) to HVA mappings 
> > > > > > in
> > > > > > the iova tree to solve this issue completely. Then there won't be
> > > > > > aliasing issues.
> > > > > >
> > > > >
> > > > > I'm ok to explore that route but this has another problem. Both SVQ
> > > > > vrings and CVQ buffers also need to be addressable by VhostIOVATree,
> > > > > and they do not have GPA.
> > > > >
> > > > > At this moment vhost_svq_translate_addr is able to handle this
> > > > >

Re: [RFC 0/2] Identify aliased maps in vdpa SVQ iova_tree

2024-05-09 Thread Jason Wang
On Thu, May 9, 2024 at 1:16 AM Eugenio Perez Martin  wrote:
>
> On Wed, May 8, 2024 at 4:29 AM Jason Wang  wrote:
> >
> > On Tue, May 7, 2024 at 6:57 PM Eugenio Perez Martin  
> > wrote:
> > >
> > > On Tue, May 7, 2024 at 9:29 AM Jason Wang  wrote:
> > > >
> > > > On Fri, Apr 12, 2024 at 3:56 PM Eugenio Perez Martin
> > > >  wrote:
> > > > >
> > > > > On Fri, Apr 12, 2024 at 8:47 AM Jason Wang  
> > > > > wrote:
> > > > > >
> > > > > > On Wed, Apr 10, 2024 at 6:03 PM Eugenio Pérez  
> > > > > > wrote:
> > > > > > >
> > > > > > > The guest may have overlapped memory regions, where different GPA 
> > > > > > > leads
> > > > > > > to the same HVA.  This causes a problem when overlapped regions
> > > > > > > (different GPA but same translated HVA) exists in the tree, as 
> > > > > > > looking
> > > > > > > them by HVA will return them twice.
> > > > > >
> > > > > > I think I don't understand if there's any side effect for shadow 
> > > > > > virtqueue?
> > > > > >
> > > > >
> > > > > My bad, I totally forgot to put a reference to where this comes from.
> > > > >
> > > > > Si-Wei found that during initialization this sequences of maps /
> > > > > unmaps happens [1]:
> > > > >
> > > > > HVAGPAIOVA
> > > > > -
> > > > > Map
> > > > > [0x7f7903e0, 0x7f7983e0)[0x0, 0x8000) [0x1000, 
> > > > > 0x8000)
> > > > > [0x7f7983e0, 0x7f9903e0)[0x1, 0x208000)
> > > > > [0x80001000, 0x201000)
> > > > > [0x7f7903ea, 0x7f7903ec)[0xfeda, 0xfedc)
> > > > > [0x201000, 0x221000)
> > > > >
> > > > > Unmap
> > > > > [0x7f7903ea, 0x7f7903ec)[0xfeda, 0xfedc) [0x1000,
> > > > > 0x2) ???
> > > > >
> > > > > The third HVA range is contained in the first one, but exposed under a
> > > > > different GVA (aliased). This is not "flattened" by QEMU, as GPA does
> > > > > not overlap, only HVA.
> > > > >
> > > > > At the third chunk unmap, the current algorithm finds the first chunk,
> > > > > not the second one. This series is the way to tell the difference at
> > > > > unmap time.
> > > > >
> > > > > [1] 
> > > > > https://lists.nongnu.org/archive/html/qemu-devel/2024-04/msg00079.html
> > > > >
> > > > > Thanks!
> > > >
> > > > Ok, I was wondering if we need to store GPA(GIOVA) to HVA mappings in
> > > > the iova tree to solve this issue completely. Then there won't be
> > > > aliasing issues.
> > > >
> > >
> > > I'm ok to explore that route but this has another problem. Both SVQ
> > > vrings and CVQ buffers also need to be addressable by VhostIOVATree,
> > > and they do not have GPA.
> > >
> > > At this moment vhost_svq_translate_addr is able to handle this
> > > transparently as we translate vaddr to SVQ IOVA. How can we store
> > > these new entries? Maybe a (hwaddr)-1 GPA to signal it has no GPA and
> > > then a list to go through other entries (SVQ vaddr and CVQ buffers).
> >
> > This seems to be tricky.
> >
> > As discussed, it could be another iova tree.
> >
>
> Yes but there are many ways to add another IOVATree. Let me expand & recap.
>
> Option 1 is to simply add another iova tree to VhostShadowVirtqueue.
> Let's call it gpa_iova_tree, as opposed to the current iova_tree that
> translates from vaddr to SVQ IOVA. To know which one to use is easy at
> adding or removing, like in the memory listener, but how to know at
> vhost_svq_translate_addr?

Then we won't use virtqueue_pop() at all, we need a SVQ version of
virtqueue_pop() to translate GPA to SVQ IOVA directly?

>
> The easiest way for me is to rely on memory_region_from_host(). When
> vaddr is from the guest, it returns a valid MemoryRegion. When it is
> not, it returns NULL. I'm not sure if this is a valid use case, it
> just worked in my test

Re: [PATCH] hw/virtio: Fix obtain the buffer id from the last descriptor

2024-05-07 Thread Jason Wang
On Mon, Apr 22, 2024 at 9:41 AM Wafer  wrote:
>
> The virtio-1.3 specification
>  writes:
> 2.8.6 Next Flag: Descriptor Chaining
>   Buffer ID is included in the last descriptor in the list.
>
> If the feature (_F_INDIRECT_DESC) has been negotiated, install only
> one descriptor in the virtqueue.
> Therefor the buffer id should be obtained from the first descriptor.
>
> In descriptor chaining scenarios, the buffer id should be obtained
> from the last descriptor.
>
> Fixes: 86044b24e8 ("virtio: basic packed virtqueue support")
>
> Signed-off-by: Wafer 
> ---
>  hw/virtio/virtio.c | 5 +
>  1 file changed, 5 insertions(+)
>
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index 871674f9be..f65d4b4161 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -1739,6 +1739,11 @@ static void *virtqueue_packed_pop(VirtQueue *vq, 
> size_t sz)
>  goto err_undo_map;
>  }
>
> +if (desc_cache != _desc_cache) {
> +/* Buffer ID is included in the last descriptor in the list. */
> +id = desc.id;
> +}

It looks to me we can move this out of the loop.

Others look good.

Thanks

> +
>  rc = virtqueue_packed_read_next_desc(vq, , desc_cache, max, ,
>   desc_cache ==
>   _desc_cache);
> --
> 2.27.0
>




Re: [RFC 0/2] Identify aliased maps in vdpa SVQ iova_tree

2024-05-07 Thread Jason Wang
On Tue, May 7, 2024 at 6:57 PM Eugenio Perez Martin  wrote:
>
> On Tue, May 7, 2024 at 9:29 AM Jason Wang  wrote:
> >
> > On Fri, Apr 12, 2024 at 3:56 PM Eugenio Perez Martin
> >  wrote:
> > >
> > > On Fri, Apr 12, 2024 at 8:47 AM Jason Wang  wrote:
> > > >
> > > > On Wed, Apr 10, 2024 at 6:03 PM Eugenio Pérez  
> > > > wrote:
> > > > >
> > > > > The guest may have overlapped memory regions, where different GPA 
> > > > > leads
> > > > > to the same HVA.  This causes a problem when overlapped regions
> > > > > (different GPA but same translated HVA) exists in the tree, as looking
> > > > > them by HVA will return them twice.
> > > >
> > > > I think I don't understand if there's any side effect for shadow 
> > > > virtqueue?
> > > >
> > >
> > > My bad, I totally forgot to put a reference to where this comes from.
> > >
> > > Si-Wei found that during initialization this sequences of maps /
> > > unmaps happens [1]:
> > >
> > > HVAGPAIOVA
> > > -
> > > Map
> > > [0x7f7903e0, 0x7f7983e0)[0x0, 0x8000) [0x1000, 0x8000)
> > > [0x7f7983e0, 0x7f9903e0)[0x1, 0x208000)
> > > [0x80001000, 0x201000)
> > > [0x7f7903ea, 0x7f7903ec)[0xfeda, 0xfedc)
> > > [0x201000, 0x221000)
> > >
> > > Unmap
> > > [0x7f7903ea, 0x7f7903ec)[0xfeda, 0xfedc) [0x1000,
> > > 0x2) ???
> > >
> > > The third HVA range is contained in the first one, but exposed under a
> > > different GVA (aliased). This is not "flattened" by QEMU, as GPA does
> > > not overlap, only HVA.
> > >
> > > At the third chunk unmap, the current algorithm finds the first chunk,
> > > not the second one. This series is the way to tell the difference at
> > > unmap time.
> > >
> > > [1] https://lists.nongnu.org/archive/html/qemu-devel/2024-04/msg00079.html
> > >
> > > Thanks!
> >
> > Ok, I was wondering if we need to store GPA(GIOVA) to HVA mappings in
> > the iova tree to solve this issue completely. Then there won't be
> > aliasing issues.
> >
>
> I'm ok to explore that route but this has another problem. Both SVQ
> vrings and CVQ buffers also need to be addressable by VhostIOVATree,
> and they do not have GPA.
>
> At this moment vhost_svq_translate_addr is able to handle this
> transparently as we translate vaddr to SVQ IOVA. How can we store
> these new entries? Maybe a (hwaddr)-1 GPA to signal it has no GPA and
> then a list to go through other entries (SVQ vaddr and CVQ buffers).

This seems to be tricky.

As discussed, it could be another iova tree.

Thanks

>
> Thanks!
>
> > Thanks
> >
> > >
> > > > Thanks
> > > >
> > > > >
> > > > > To solve this, track GPA in the DMA entry that acs as unique 
> > > > > identifiers
> > > > > to the maps.  When the map needs to be removed, iova tree is able to
> > > > > find the right one.
> > > > >
> > > > > Users that does not go to this extra layer of indirection can use the
> > > > > iova tree as usual, with id = 0.
> > > > >
> > > > > This was found by Si-Wei Liu , but I'm having 
> > > > > a hard
> > > > > time to reproduce the issue.  This has been tested only without 
> > > > > overlapping
> > > > > maps.  If it works with overlapping maps, it will be intergrated in 
> > > > > the main
> > > > > series.
> > > > >
> > > > > Comments are welcome.  Thanks!
> > > > >
> > > > > Eugenio Pérez (2):
> > > > >   iova_tree: add an id member to DMAMap
> > > > >   vdpa: identify aliased maps in iova_tree
> > > > >
> > > > >  hw/virtio/vhost-vdpa.c   | 2 ++
> > > > >  include/qemu/iova-tree.h | 5 +++--
> > > > >  util/iova-tree.c | 3 ++-
> > > > >  3 files changed, 7 insertions(+), 3 deletions(-)
> > > > >
> > > > > --
> > > > > 2.44.0
> > > > >
> > > >
> > >
> >
>




Re: [RFC 0/2] Identify aliased maps in vdpa SVQ iova_tree

2024-05-07 Thread Jason Wang
On Fri, Apr 12, 2024 at 3:56 PM Eugenio Perez Martin
 wrote:
>
> On Fri, Apr 12, 2024 at 8:47 AM Jason Wang  wrote:
> >
> > On Wed, Apr 10, 2024 at 6:03 PM Eugenio Pérez  wrote:
> > >
> > > The guest may have overlapped memory regions, where different GPA leads
> > > to the same HVA.  This causes a problem when overlapped regions
> > > (different GPA but same translated HVA) exists in the tree, as looking
> > > them by HVA will return them twice.
> >
> > I think I don't understand if there's any side effect for shadow virtqueue?
> >
>
> My bad, I totally forgot to put a reference to where this comes from.
>
> Si-Wei found that during initialization this sequences of maps /
> unmaps happens [1]:
>
> HVAGPAIOVA
> -
> Map
> [0x7f7903e0, 0x7f7983e0)[0x0, 0x8000) [0x1000, 0x8000)
> [0x7f7983e0, 0x7f9903e0)[0x1, 0x208000)
> [0x80001000, 0x201000)
> [0x7f7903ea, 0x7f7903ec)[0xfeda, 0xfedc)
> [0x201000, 0x221000)
>
> Unmap
> [0x7f7903ea, 0x7f7903ec)[0xfeda, 0xfedc) [0x1000,
> 0x2) ???
>
> The third HVA range is contained in the first one, but exposed under a
> different GVA (aliased). This is not "flattened" by QEMU, as GPA does
> not overlap, only HVA.
>
> At the third chunk unmap, the current algorithm finds the first chunk,
> not the second one. This series is the way to tell the difference at
> unmap time.
>
> [1] https://lists.nongnu.org/archive/html/qemu-devel/2024-04/msg00079.html
>
> Thanks!

Ok, I was wondering if we need to store GPA(GIOVA) to HVA mappings in
the iova tree to solve this issue completely. Then there won't be
aliasing issues.

Thanks

>
> > Thanks
> >
> > >
> > > To solve this, track GPA in the DMA entry that acs as unique identifiers
> > > to the maps.  When the map needs to be removed, iova tree is able to
> > > find the right one.
> > >
> > > Users that does not go to this extra layer of indirection can use the
> > > iova tree as usual, with id = 0.
> > >
> > > This was found by Si-Wei Liu , but I'm having a 
> > > hard
> > > time to reproduce the issue.  This has been tested only without 
> > > overlapping
> > > maps.  If it works with overlapping maps, it will be intergrated in the 
> > > main
> > > series.
> > >
> > > Comments are welcome.  Thanks!
> > >
> > > Eugenio Pérez (2):
> > >   iova_tree: add an id member to DMAMap
> > >   vdpa: identify aliased maps in iova_tree
> > >
> > >  hw/virtio/vhost-vdpa.c   | 2 ++
> > >  include/qemu/iova-tree.h | 5 +++--
> > >  util/iova-tree.c | 3 ++-
> > >  3 files changed, 7 insertions(+), 3 deletions(-)
> > >
> > > --
> > > 2.44.0
> > >
> >
>




Re: [PATCH RESEND] virtio-net: fix bug 1451 aka "assert(!virtio_net_get_subqueue(nc)->async_tx.elem); "

2024-05-06 Thread Jason Wang
On Tue, Apr 30, 2024 at 6:54 PM Alexey Dobriyan
 wrote:
>
> Reproducer from https://gitlab.com/qemu-project/qemu/-/issues/1451
> creates small packet (1 segment, len = 10 == n->guest_hdr_len),
> then destroys queue.
>
> "if (n->host_hdr_len != n->guest_hdr_len)" is triggered, if body creates
> zero length/zero segment packet as there is nothing after guest header.
>
> qemu_sendv_packet_async() tries to send it.
>
> slirp discards it because it is smaller than Ethernet header,
> but returns 0 because tx hooks are supposed to return total length of data.
>
> 0 is propagated upwards and is interpreted as "packet has been sent"
> which is terrible because queue is being destroyed, nobody is waiting for TX
> to complete and assert it triggered.
>
> Fix is discard such empty packets instead of sending them.
>
> Length 1 packets will go via different codepath:
>
> virtqueue_push(q->tx_vq, elem, 0);
> virtio_notify(vdev, q->tx_vq);
> g_free(elem);
>
> and aren't problematic.
>
> Signed-off-by: Alexey Dobriyan 
> ---
>
> hopefully better changelog.
> use "if (out_num < 1)" so that discard doesn't calculate iov length
>
>  hw/net/virtio-net.c | 18 --
>  1 file changed, 12 insertions(+), 6 deletions(-)
>

I tweak the title to "drop too short packets early".

And queued.

Thanks




Re: [PATCH 0/3] virtio-net: Convert feature properties to OnOffAuto

2024-05-05 Thread Jason Wang
On Wed, May 1, 2024 at 3:20 PM Akihiko Odaki  wrote:
>
> On 2024/04/29 16:05, Michael S. Tsirkin wrote:
> > On Sun, Apr 28, 2024 at 04:21:06PM +0900, Akihiko Odaki wrote:
> >> Based-on: <20240428-rss-v10-0-73cbaa91a...@daynix.com>
> >> ("[PATCH v10 00/18] virtio-net RSS/hash report fixes and improvements")
> >>
> >> Some features are not always available, and virtio-net used to disable
> >> them when not available even if the corresponding properties were
> >> explicitly set to "on".

I think we'd better fail the initialization in this case, otherwise it
might confuse libvirt.

Adding Jonathon for more comments.

> >>
> >> Convert feature properties to OnOffAuto so that the user can explicitly
> >> tell QEMU to automatically select the value by setting them "auto".
> >> QEMU will give an error if they are set "on".
> >>
> >> Signed-off-by: Akihiko Odaki 
> >
> > Should we maybe bite the bullet allow "auto" for all binary/boolean
> > properties? Just ignore "auto" if no one cares ATM.
>
> It is not always obvious whether "auto" should be considered as "on" or
> "off" for existing boolean properties. The properties this patch deals
> with are to enable features so "auto" should be considered as "on" if
> possible. However, other properties may mean to disable features, and in
> such a case, "auto" should be considered as "off".
>
> It may still make sense to accept "auto" for all virtio-net feature bits
> for consistency. In particular, I left guest_rsc_ext property boolean
> since nobody cares "auto" for that feature, but this can be converted to
> OnOffAuto.
>

Thanks




Re: [PATCH v9 13/20] virtio-net: Return an error when vhost cannot enable RSS

2024-04-16 Thread Jason Wang
On Tue, Apr 16, 2024 at 5:51 PM Yuri Benditovich
 wrote:
>
> On Tue, Apr 16, 2024 at 10:14 AM Jason Wang  wrote:
> >
> > On Tue, Apr 16, 2024 at 1:43 PM Yuri Benditovich
> >  wrote:
> > >
> > > On Tue, Apr 16, 2024 at 7:00 AM Jason Wang  wrote:
> > > >
> > > > On Mon, Apr 15, 2024 at 10:05 PM Yuri Benditovich
> > > >  wrote:
> > > > >
> > > > > On Wed, Apr 3, 2024 at 2:11 PM Akihiko Odaki 
> > > > >  wrote:
> > > > > >
> > > > > > vhost requires eBPF for RSS. When eBPF is not available, virtio-net
> > > > > > implicitly disables RSS even if the user explicitly requests it. 
> > > > > > Return
> > > > > > an error instead of implicitly disabling RSS if RSS is requested 
> > > > > > but not
> > > > > > available.
> > > > > >
> > > > > > Signed-off-by: Akihiko Odaki 
> > > > > > ---
> > > > > >  hw/net/virtio-net.c | 97 
> > > > > > ++---
> > > > > >  1 file changed, 48 insertions(+), 49 deletions(-)
> > > > > >
> > > > > > diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> > > > > > index 61b49e335dea..3d53eba88cfc 100644
> > > > > > --- a/hw/net/virtio-net.c
> > > > > > +++ b/hw/net/virtio-net.c
> > > > > > @@ -793,9 +793,6 @@ static uint64_t 
> > > > > > virtio_net_get_features(VirtIODevice *vdev, uint64_t features,
> > > > > >  return features;
> > > > > >  }
> > > > > >
> > > > > > -if (!ebpf_rss_is_loaded(>ebpf_rss)) {
> > > > > > -virtio_clear_feature(, VIRTIO_NET_F_RSS);
> > > > > > -}
> > > > > >  features = vhost_net_get_features(get_vhost_net(nc->peer), 
> > > > > > features);
> > > > > >  vdev->backend_features = features;
> > > > > >
> > > > > > @@ -3591,6 +3588,50 @@ static bool 
> > > > > > failover_hide_primary_device(DeviceListener *listener,
> > > > > >  return qatomic_read(>failover_primary_hidden);
> > > > > >  }
> > > > > >
> > > > > > +static void virtio_net_device_unrealize(DeviceState *dev)
> > > > > > +{
> > > > > > +VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> > > > > > +VirtIONet *n = VIRTIO_NET(dev);
> > > > > > +int i, max_queue_pairs;
> > > > > > +
> > > > > > +if (virtio_has_feature(n->host_features, VIRTIO_NET_F_RSS)) {
> > > > > > +virtio_net_unload_ebpf(n);
> > > > > > +}
> > > > > > +
> > > > > > +/* This will stop vhost backend if appropriate. */
> > > > > > +virtio_net_set_status(vdev, 0);
> > > > > > +
> > > > > > +g_free(n->netclient_name);
> > > > > > +n->netclient_name = NULL;
> > > > > > +g_free(n->netclient_type);
> > > > > > +n->netclient_type = NULL;
> > > > > > +
> > > > > > +g_free(n->mac_table.macs);
> > > > > > +g_free(n->vlans);
> > > > > > +
> > > > > > +if (n->failover) {
> > > > > > +qobject_unref(n->primary_opts);
> > > > > > +device_listener_unregister(>primary_listener);
> > > > > > +migration_remove_notifier(>migration_state);
> > > > > > +} else {
> > > > > > +assert(n->primary_opts == NULL);
> > > > > > +}
> > > > > > +
> > > > > > +max_queue_pairs = n->multiqueue ? n->max_queue_pairs : 1;
> > > > > > +for (i = 0; i < max_queue_pairs; i++) {
> > > > > > +virtio_net_del_queue(n, i);
> > > > > > +}
> > > > > > +/* delete also control vq */
> > > > > > +virtio_del_queue(vdev, max_queue_pairs * 2);
> > > > > > +qemu_announce_timer_del(>announce_timer, false);
> > > > > > +g_free(n->vqs);
> > > > > > +qemu_del_nic(n->nic);
> > > > >

Re: [PATCH v9 13/20] virtio-net: Return an error when vhost cannot enable RSS

2024-04-16 Thread Jason Wang
On Tue, Apr 16, 2024 at 1:43 PM Yuri Benditovich
 wrote:
>
> On Tue, Apr 16, 2024 at 7:00 AM Jason Wang  wrote:
> >
> > On Mon, Apr 15, 2024 at 10:05 PM Yuri Benditovich
> >  wrote:
> > >
> > > On Wed, Apr 3, 2024 at 2:11 PM Akihiko Odaki  
> > > wrote:
> > > >
> > > > vhost requires eBPF for RSS. When eBPF is not available, virtio-net
> > > > implicitly disables RSS even if the user explicitly requests it. Return
> > > > an error instead of implicitly disabling RSS if RSS is requested but not
> > > > available.
> > > >
> > > > Signed-off-by: Akihiko Odaki 
> > > > ---
> > > >  hw/net/virtio-net.c | 97 
> > > > ++---
> > > >  1 file changed, 48 insertions(+), 49 deletions(-)
> > > >
> > > > diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> > > > index 61b49e335dea..3d53eba88cfc 100644
> > > > --- a/hw/net/virtio-net.c
> > > > +++ b/hw/net/virtio-net.c
> > > > @@ -793,9 +793,6 @@ static uint64_t 
> > > > virtio_net_get_features(VirtIODevice *vdev, uint64_t features,
> > > >  return features;
> > > >  }
> > > >
> > > > -if (!ebpf_rss_is_loaded(>ebpf_rss)) {
> > > > -virtio_clear_feature(, VIRTIO_NET_F_RSS);
> > > > -}
> > > >  features = vhost_net_get_features(get_vhost_net(nc->peer), 
> > > > features);
> > > >  vdev->backend_features = features;
> > > >
> > > > @@ -3591,6 +3588,50 @@ static bool 
> > > > failover_hide_primary_device(DeviceListener *listener,
> > > >  return qatomic_read(>failover_primary_hidden);
> > > >  }
> > > >
> > > > +static void virtio_net_device_unrealize(DeviceState *dev)
> > > > +{
> > > > +VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> > > > +VirtIONet *n = VIRTIO_NET(dev);
> > > > +int i, max_queue_pairs;
> > > > +
> > > > +if (virtio_has_feature(n->host_features, VIRTIO_NET_F_RSS)) {
> > > > +virtio_net_unload_ebpf(n);
> > > > +}
> > > > +
> > > > +/* This will stop vhost backend if appropriate. */
> > > > +virtio_net_set_status(vdev, 0);
> > > > +
> > > > +g_free(n->netclient_name);
> > > > +n->netclient_name = NULL;
> > > > +g_free(n->netclient_type);
> > > > +n->netclient_type = NULL;
> > > > +
> > > > +g_free(n->mac_table.macs);
> > > > +g_free(n->vlans);
> > > > +
> > > > +if (n->failover) {
> > > > +qobject_unref(n->primary_opts);
> > > > +device_listener_unregister(>primary_listener);
> > > > +migration_remove_notifier(>migration_state);
> > > > +} else {
> > > > +assert(n->primary_opts == NULL);
> > > > +}
> > > > +
> > > > +max_queue_pairs = n->multiqueue ? n->max_queue_pairs : 1;
> > > > +for (i = 0; i < max_queue_pairs; i++) {
> > > > +virtio_net_del_queue(n, i);
> > > > +}
> > > > +/* delete also control vq */
> > > > +virtio_del_queue(vdev, max_queue_pairs * 2);
> > > > +qemu_announce_timer_del(>announce_timer, false);
> > > > +g_free(n->vqs);
> > > > +qemu_del_nic(n->nic);
> > > > +virtio_net_rsc_cleanup(n);
> > > > +g_free(n->rss_data.indirections_table);
> > > > +net_rx_pkt_uninit(n->rx_pkt);
> > > > +virtio_cleanup(vdev);
> > > > +}
> > > > +
> > > >  static void virtio_net_device_realize(DeviceState *dev, Error **errp)
> > > >  {
> > > >  VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> > > > @@ -3760,53 +3801,11 @@ static void 
> > > > virtio_net_device_realize(DeviceState *dev, Error **errp)
> > > >
> > > >  net_rx_pkt_init(>rx_pkt);
> > > >
> > > > -if (virtio_has_feature(n->host_features, VIRTIO_NET_F_RSS)) {
> > > > -virtio_net_load_ebpf(n);
> > > > -}
> > > > -}
> > > > -
> > > > -static void virtio_net_device_unrealize(DeviceState *dev)
> > > > -{
> > > > -VirtIO

Re: [PATCH v9 13/20] virtio-net: Return an error when vhost cannot enable RSS

2024-04-15 Thread Jason Wang
On Mon, Apr 15, 2024 at 10:05 PM Yuri Benditovich
 wrote:
>
> On Wed, Apr 3, 2024 at 2:11 PM Akihiko Odaki  wrote:
> >
> > vhost requires eBPF for RSS. When eBPF is not available, virtio-net
> > implicitly disables RSS even if the user explicitly requests it. Return
> > an error instead of implicitly disabling RSS if RSS is requested but not
> > available.
> >
> > Signed-off-by: Akihiko Odaki 
> > ---
> >  hw/net/virtio-net.c | 97 
> > ++---
> >  1 file changed, 48 insertions(+), 49 deletions(-)
> >
> > diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> > index 61b49e335dea..3d53eba88cfc 100644
> > --- a/hw/net/virtio-net.c
> > +++ b/hw/net/virtio-net.c
> > @@ -793,9 +793,6 @@ static uint64_t virtio_net_get_features(VirtIODevice 
> > *vdev, uint64_t features,
> >  return features;
> >  }
> >
> > -if (!ebpf_rss_is_loaded(>ebpf_rss)) {
> > -virtio_clear_feature(, VIRTIO_NET_F_RSS);
> > -}
> >  features = vhost_net_get_features(get_vhost_net(nc->peer), features);
> >  vdev->backend_features = features;
> >
> > @@ -3591,6 +3588,50 @@ static bool 
> > failover_hide_primary_device(DeviceListener *listener,
> >  return qatomic_read(>failover_primary_hidden);
> >  }
> >
> > +static void virtio_net_device_unrealize(DeviceState *dev)
> > +{
> > +VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> > +VirtIONet *n = VIRTIO_NET(dev);
> > +int i, max_queue_pairs;
> > +
> > +if (virtio_has_feature(n->host_features, VIRTIO_NET_F_RSS)) {
> > +virtio_net_unload_ebpf(n);
> > +}
> > +
> > +/* This will stop vhost backend if appropriate. */
> > +virtio_net_set_status(vdev, 0);
> > +
> > +g_free(n->netclient_name);
> > +n->netclient_name = NULL;
> > +g_free(n->netclient_type);
> > +n->netclient_type = NULL;
> > +
> > +g_free(n->mac_table.macs);
> > +g_free(n->vlans);
> > +
> > +if (n->failover) {
> > +qobject_unref(n->primary_opts);
> > +device_listener_unregister(>primary_listener);
> > +migration_remove_notifier(>migration_state);
> > +} else {
> > +assert(n->primary_opts == NULL);
> > +}
> > +
> > +max_queue_pairs = n->multiqueue ? n->max_queue_pairs : 1;
> > +for (i = 0; i < max_queue_pairs; i++) {
> > +virtio_net_del_queue(n, i);
> > +}
> > +/* delete also control vq */
> > +virtio_del_queue(vdev, max_queue_pairs * 2);
> > +qemu_announce_timer_del(>announce_timer, false);
> > +g_free(n->vqs);
> > +qemu_del_nic(n->nic);
> > +virtio_net_rsc_cleanup(n);
> > +g_free(n->rss_data.indirections_table);
> > +net_rx_pkt_uninit(n->rx_pkt);
> > +virtio_cleanup(vdev);
> > +}
> > +
> >  static void virtio_net_device_realize(DeviceState *dev, Error **errp)
> >  {
> >  VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> > @@ -3760,53 +3801,11 @@ static void virtio_net_device_realize(DeviceState 
> > *dev, Error **errp)
> >
> >  net_rx_pkt_init(>rx_pkt);
> >
> > -if (virtio_has_feature(n->host_features, VIRTIO_NET_F_RSS)) {
> > -virtio_net_load_ebpf(n);
> > -}
> > -}
> > -
> > -static void virtio_net_device_unrealize(DeviceState *dev)
> > -{
> > -VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> > -VirtIONet *n = VIRTIO_NET(dev);
> > -int i, max_queue_pairs;
> > -
> > -if (virtio_has_feature(n->host_features, VIRTIO_NET_F_RSS)) {
> > -virtio_net_unload_ebpf(n);
> > +if (virtio_has_feature(n->host_features, VIRTIO_NET_F_RSS) &&
> > +!virtio_net_load_ebpf(n) && get_vhost_net(nc->peer)) {
> > +virtio_net_device_unrealize(dev);
> > +error_setg(errp, "Can't load eBPF RSS for vhost");
> >  }
>
> As I already mentioned, I think this is an extremely bad idea to
> fail to run qemu due to such a reason as .absence of one feature.
> What I suggest is:
> 1. Redefine rss as tri-state (off|auto|on)
> 2. Fail to run only if rss is on and not available via ebpf
> 3. On auto - silently drop it

"Auto" might be promatic for migration compatibility which is hard to
be used by management layers like libvirt. The reason is that there's
no way for libvirt to know if it is supported by device or not.

Thanks

> 4. The same with 'hash' option - it is not compatible with vhost (at
> least at the moment)
> 5. Reformat the patch as it is hard to review it due to replacing
> entire procedures, i.e. one patch with replacing without changes,
> another one - with real changes.
> If this is hard to review only for me - please ignore that.
>
> > -
> > -/* This will stop vhost backend if appropriate. */
> > -virtio_net_set_status(vdev, 0);
> > -
> > -g_free(n->netclient_name);
> > -n->netclient_name = NULL;
> > -g_free(n->netclient_type);
> > -n->netclient_type = NULL;
> > -
> > -g_free(n->mac_table.macs);
> > -g_free(n->vlans);
> > -
> > -if (n->failover) {
> > -qobject_unref(n->primary_opts);
> > -

Re: [PATCH v8] virtio-pci: fix use of a released vector

2024-04-15 Thread Jason Wang
On Mon, Apr 15, 2024 at 6:41 PM Cindy Lu  wrote:
>
> On Mon, Apr 15, 2024 at 5:34 PM Michael S. Tsirkin  wrote:
> >
> > From: Cindy Lu 
> >
> > During the booting process of the non-standard image, the behavior of the
> > called function in qemu is as follows:
> >
> > 1. vhost_net_stop() was triggered by guest image. This will call the 
> > function
> > virtio_pci_set_guest_notifiers() with assgin= false,
> > virtio_pci_set_guest_notifiers() will release the irqfd for vector 0
> >
> > 2. virtio_reset() was triggered, this will set configure vector to 
> > VIRTIO_NO_VECTOR
> >
> > 3.vhost_net_start() was called (at this time, the configure vector is
> > still VIRTIO_NO_VECTOR) and then call virtio_pci_set_guest_notifiers() with
> > assgin=true, so the irqfd for vector 0 is still not "init" during this 
> > process
> >
> > 4. The system continues to boot and sets the vector back to 0. After that
> > msix_fire_vector_notifier() was triggered to unmask the vector 0 and  meet 
> > the crash
> >
> > To fix the issue, we need to support changing the vector after 
> > VIRTIO_CONFIG_S_DRIVER_OK is set.
> >
> > (gdb) bt
> > 0  __pthread_kill_implementation (threadid=, 
> > signo=signo@entry=6, no_tid=no_tid@entry=0)
> > at pthread_kill.c:44
> > 1  0x7fc87148ec53 in __pthread_kill_internal (signo=6, 
> > threadid=) at pthread_kill.c:78
> > 2  0x7fc87143e956 in __GI_raise (sig=sig@entry=6) at 
> > ../sysdeps/posix/raise.c:26
> > 3  0x7fc8714287f4 in __GI_abort () at abort.c:79
> > 4  0x7fc87142871b in __assert_fail_base
> > (fmt=0x7fc8715bbde0 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", 
> > assertion=0x5606413efd53 "ret == 0", file=0x5606413ef87d 
> > "../accel/kvm/kvm-all.c", line=1837, function=) at 
> > assert.c:92
> > 5  0x7fc871437536 in __GI___assert_fail
> > (assertion=0x5606413efd53 "ret == 0", file=0x5606413ef87d 
> > "../accel/kvm/kvm-all.c", line=1837, function=0x5606413f06f0 
> > <__PRETTY_FUNCTION__.19> "kvm_irqchip_commit_routes") at assert.c:101
> > 6  0x560640f884b5 in kvm_irqchip_commit_routes (s=0x560642cae1f0) at 
> > ../accel/kvm/kvm-all.c:1837
> > 7  0x560640c98f8e in virtio_pci_one_vector_unmask
> > (proxy=0x560643c65f00, queue_no=4294967295, vector=0, msg=..., 
> > n=0x560643c6e4c8)
> > at ../hw/virtio/virtio-pci.c:1005
> > 8  0x560640c99201 in virtio_pci_vector_unmask (dev=0x560643c65f00, 
> > vector=0, msg=...)
> > at ../hw/virtio/virtio-pci.c:1070
> > 9  0x560640bc402e in msix_fire_vector_notifier (dev=0x560643c65f00, 
> > vector=0, is_masked=false)
> > at ../hw/pci/msix.c:120
> > 10 0x560640bc40f1 in msix_handle_mask_update (dev=0x560643c65f00, 
> > vector=0, was_masked=true)
> > at ../hw/pci/msix.c:140
> > 11 0x560640bc4503 in msix_table_mmio_write (opaque=0x560643c65f00, 
> > addr=12, val=0, size=4)
> > at ../hw/pci/msix.c:231
> > 12 0x560640f26d83 in memory_region_write_accessor
> > (mr=0x560643c66540, addr=12, value=0x7fc86b7bc628, size=4, shift=0, 
> > mask=4294967295, attrs=...)
> > at ../system/memory.c:497
> > 13 0x560640f270a6 in access_with_adjusted_size
> >
> >  (addr=12, value=0x7fc86b7bc628, size=4, access_size_min=1, 
> > access_size_max=4, access_fn=0x560640f26c8d , 
> > mr=0x560643c66540, attrs=...) at ../system/memory.c:573
> > 14 0x560640f2a2b5 in memory_region_dispatch_write (mr=0x560643c66540, 
> > addr=12, data=0, op=MO_32, attrs=...)
> > at ../system/memory.c:1521
> > 15 0x560640f37bac in flatview_write_continue
> > (fv=0x7fc65805e0b0, addr=4273803276, attrs=..., ptr=0x7fc871e9c028, 
> > len=4, addr1=12, l=4, mr=0x560643c66540)
> > at ../system/physmem.c:2714
> > 16 0x560640f37d0f in flatview_write
> > (fv=0x7fc65805e0b0, addr=4273803276, attrs=..., buf=0x7fc871e9c028, 
> > len=4) at ../system/physmem.c:2756
> > 17 0x560640f380bf in address_space_write
> > (as=0x560642161ae0 , addr=4273803276, attrs=..., 
> > buf=0x7fc871e9c028, len=4)
> > at ../system/physmem.c:2863
> > 18 0x560640f3812c in address_space_rw
> > (as=0x560642161ae0 , addr=4273803276, attrs=..., 
> > buf=0x7fc871e9c028, len=4, is_write=true) at ../system/physmem.c:2873
> > --Type  for more, q to quit, c to continue without paging--
> > 19 0x560640f8aa55 in kvm_cpu_exec (cpu=0x560642f205e0) at 
> > ../accel/kvm/kvm-all.c:2915
> > 20 0x560640f8d731 in kvm_vcpu_thread_fn (arg=0x560642f205e0) at 
> > ../accel/kvm/kvm-accel-ops.c:51
> > 21 0x5606411949f4 in qemu_thread_start (args=0x560642f292b0) at 
> > ../util/qemu-thread-posix.c:541
> > 22 0x7fc87148cdcd in start_thread (arg=) at 
> > pthread_create.c:442
> > 23 0x7fc871512630 in clone3 () at 
> > ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
> > (gdb)
> >
> > MST: coding style and typo fixups
> >
> > Fixes: f9a09ca3ea ("vhost: add support for configure interrupt")
> > Cc: qemu-sta...@nongnu.org
> > Signed-off-by: Cindy Lu 
> > Message-Id: <20240412062750.475180-1-l...@redhat.com>
> > 

Re: Discrepancy between mmap call on DPDK/libvduse and rust vm-memory crate

2024-04-15 Thread Jason Wang
On Mon, Apr 15, 2024 at 3:28 PM Yongji Xie  wrote:
>
> On Sun, Apr 14, 2024 at 5:02 PM Michael S. Tsirkin  wrote:
> >
> > On Fri, Apr 12, 2024 at 12:15:40PM +0200, Eugenio Perez Martin wrote:
> > > Hi!
> > >
> > > I'm building a bridge to expose vhost-user devices through VDUSE. The
> > > code is still immature but I'm able to forward packets using
> > > dpdk-l2fwd through VDUSE to VM. I'm now developing exposing virtiofsd,
> > > but I've hit an error I'd like to discuss.
> > >
> > > VDUSE devices can get all the memory regions the driver is using by
> > > VDUSE_IOTLB_GET_FD ioctl. It returns a file descriptor with a memory
> > > region associated that can be mapped with mmap, and an information
> > > entry about the map it contains:
> > > * Start and end addresses from the driver POV
> > > * Offset within the mmaped region of these start and end
> > > * Device permissions over that region.
> > >
> > > [start=0xc3000][last=0xe7fff][offset=0xc3000][perm=1]
> > >
> > > Now when I try to map it, it is impossible for the userspace device to
> > > call mmap with any offset different than 0.
> >
> > How exactly did you allocate memory? hugetlbfs?
> >
> > > So the "straightforward"
> > > mmap with size = entry.last-entry.start and offset = entry.offset does
> > > not work. I don't know if this is a limitation of Linux or VDUSE.
> > >
> > > Checking QEMU's
> > > subprojects/libvduse/libvduse.c:vduse_iova_add_region() I see it
> > > handles the offset by adding it up to the size, instead of using it
> > > directly as a parameter in the mmap:
> > >
> > > void *mmap_addr = mmap(0, size + offset, prot, MAP_SHARED, fd, 0);
> >
> >
> > CC Xie Yongji who wrote this code, too.
> >
>
> The mmap() with hugetlb would fail if the offset into the file is not
> aligned to the huge page size. So libvhost-user did something like
> this. But I think VDUSE doesn't have this problem.

I think what you meant is that VDUSE IOTLB doesn't have this problem.

Btw, I think we need to understand the setup. E.g is this used for
containers (bounce pages) or VM (hugetlb or other).

Thanks

> So it's fine to
> directly use the offset as a parameter in the mmap(2) here.
>
> Thanks,
> Yongji
>




Re: [PATCH v6] virtio-pci: Fix the crash that the vector was used after released.

2024-04-15 Thread Jason Wang
On Fri, Apr 12, 2024 at 2:28 PM Cindy Lu  wrote:
>
> During the booting process of the non-standard image, the behavior of the
> called function in qemu is as follows:
>
> 1. vhost_net_stop() was triggered by guest image. This will call the function
> virtio_pci_set_guest_notifiers() with assgin= false,
> virtio_pci_set_guest_notifiers() will release the irqfd for vector 0
>
> 2. virtio_reset() was triggered, this will set configure vector to 
> VIRTIO_NO_VECTOR
>
> 3.vhost_net_start() was called (at this time, the configure vector is
> still VIRTIO_NO_VECTOR) and then call virtio_pci_set_guest_notifiers() with
> assgin=true, so the irqfd for vector 0 is still not "init" during this process
>
> 4. The system continues to boot and sets the vector back to 0. After that
> msix_fire_vector_notifier() was triggered to unmask the vector 0 and  meet 
> the crash
>
> To fix the issue, we need to support changing the vector after 
> VIRTIO_CONFIG_S_DRIVER_OK is set.
>
> (gdb) bt
> 0  __pthread_kill_implementation (threadid=, 
> signo=signo@entry=6, no_tid=no_tid@entry=0)
> at pthread_kill.c:44
> 1  0x7fc87148ec53 in __pthread_kill_internal (signo=6, 
> threadid=) at pthread_kill.c:78
> 2  0x7fc87143e956 in __GI_raise (sig=sig@entry=6) at 
> ../sysdeps/posix/raise.c:26
> 3  0x7fc8714287f4 in __GI_abort () at abort.c:79
> 4  0x7fc87142871b in __assert_fail_base
> (fmt=0x7fc8715bbde0 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", 
> assertion=0x5606413efd53 "ret == 0", file=0x5606413ef87d 
> "../accel/kvm/kvm-all.c", line=1837, function=) at assert.c:92
> 5  0x7fc871437536 in __GI___assert_fail
> (assertion=0x5606413efd53 "ret == 0", file=0x5606413ef87d 
> "../accel/kvm/kvm-all.c", line=1837, function=0x5606413f06f0 
> <__PRETTY_FUNCTION__.19> "kvm_irqchip_commit_routes") at assert.c:101
> 6  0x560640f884b5 in kvm_irqchip_commit_routes (s=0x560642cae1f0) at 
> ../accel/kvm/kvm-all.c:1837
> 7  0x560640c98f8e in virtio_pci_one_vector_unmask
> (proxy=0x560643c65f00, queue_no=4294967295, vector=0, msg=..., 
> n=0x560643c6e4c8)
> at ../hw/virtio/virtio-pci.c:1005
> 8  0x560640c99201 in virtio_pci_vector_unmask (dev=0x560643c65f00, 
> vector=0, msg=...)
> at ../hw/virtio/virtio-pci.c:1070
> 9  0x560640bc402e in msix_fire_vector_notifier (dev=0x560643c65f00, 
> vector=0, is_masked=false)
> at ../hw/pci/msix.c:120
> 10 0x560640bc40f1 in msix_handle_mask_update (dev=0x560643c65f00, 
> vector=0, was_masked=true)
> at ../hw/pci/msix.c:140
> 11 0x560640bc4503 in msix_table_mmio_write (opaque=0x560643c65f00, 
> addr=12, val=0, size=4)
> at ../hw/pci/msix.c:231
> 12 0x560640f26d83 in memory_region_write_accessor
> (mr=0x560643c66540, addr=12, value=0x7fc86b7bc628, size=4, shift=0, 
> mask=4294967295, attrs=...)
> at ../system/memory.c:497
> 13 0x560640f270a6 in access_with_adjusted_size
>
>  (addr=12, value=0x7fc86b7bc628, size=4, access_size_min=1, 
> access_size_max=4, access_fn=0x560640f26c8d , 
> mr=0x560643c66540, attrs=...) at ../system/memory.c:573
> 14 0x560640f2a2b5 in memory_region_dispatch_write (mr=0x560643c66540, 
> addr=12, data=0, op=MO_32, attrs=...)
> at ../system/memory.c:1521
> 15 0x560640f37bac in flatview_write_continue
> (fv=0x7fc65805e0b0, addr=4273803276, attrs=..., ptr=0x7fc871e9c028, 
> len=4, addr1=12, l=4, mr=0x560643c66540)
> at ../system/physmem.c:2714
> 16 0x560640f37d0f in flatview_write
> (fv=0x7fc65805e0b0, addr=4273803276, attrs=..., buf=0x7fc871e9c028, 
> len=4) at ../system/physmem.c:2756
> 17 0x560640f380bf in address_space_write
> (as=0x560642161ae0 , addr=4273803276, attrs=..., 
> buf=0x7fc871e9c028, len=4)
> at ../system/physmem.c:2863
> 18 0x560640f3812c in address_space_rw
> (as=0x560642161ae0 , addr=4273803276, attrs=..., 
> buf=0x7fc871e9c028, len=4, is_write=true) at ../system/physmem.c:2873
> --Type  for more, q to quit, c to continue without paging--
> 19 0x560640f8aa55 in kvm_cpu_exec (cpu=0x560642f205e0) at 
> ../accel/kvm/kvm-all.c:2915
> 20 0x560640f8d731 in kvm_vcpu_thread_fn (arg=0x560642f205e0) at 
> ../accel/kvm/kvm-accel-ops.c:51
> 21 0x5606411949f4 in qemu_thread_start (args=0x560642f292b0) at 
> ../util/qemu-thread-posix.c:541
> 22 0x7fc87148cdcd in start_thread (arg=) at 
> pthread_create.c:442
> 23 0x7fc871512630 in clone3 () at 
> ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
> (gdb)
>
> Fixes: f9a09ca3ea ("vhost: add support for configure interrupt")
> Cc: qemu-sta...@nongnu.org
>
> Signed-off-by: Cindy Lu 

Acked-by: Jason Wang 

Thanks




Re: [RFC 0/2] Identify aliased maps in vdpa SVQ iova_tree

2024-04-12 Thread Jason Wang
On Wed, Apr 10, 2024 at 6:03 PM Eugenio Pérez  wrote:
>
> The guest may have overlapped memory regions, where different GPA leads
> to the same HVA.  This causes a problem when overlapped regions
> (different GPA but same translated HVA) exists in the tree, as looking
> them by HVA will return them twice.

I think I don't understand if there's any side effect for shadow virtqueue?

Thanks

>
> To solve this, track GPA in the DMA entry that acs as unique identifiers
> to the maps.  When the map needs to be removed, iova tree is able to
> find the right one.
>
> Users that does not go to this extra layer of indirection can use the
> iova tree as usual, with id = 0.
>
> This was found by Si-Wei Liu , but I'm having a hard
> time to reproduce the issue.  This has been tested only without overlapping
> maps.  If it works with overlapping maps, it will be intergrated in the main
> series.
>
> Comments are welcome.  Thanks!
>
> Eugenio Pérez (2):
>   iova_tree: add an id member to DMAMap
>   vdpa: identify aliased maps in iova_tree
>
>  hw/virtio/vhost-vdpa.c   | 2 ++
>  include/qemu/iova-tree.h | 5 +++--
>  util/iova-tree.c | 3 ++-
>  3 files changed, 7 insertions(+), 3 deletions(-)
>
> --
> 2.44.0
>




Re: [RFC QEMU PATCH v8 2/2] virtio-pci: implement No_Soft_Reset bit

2024-04-12 Thread Jason Wang
On Fri, Apr 12, 2024 at 1:59 PM Chen, Jiqian  wrote:
>
> On 2024/4/7 11:20, Jason Wang wrote:
> > On Tue, Apr 2, 2024 at 11:03 AM Chen, Jiqian  wrote:
> >>
> >> On 2024/3/29 18:44, Michael S. Tsirkin wrote:
> >>> On Fri, Mar 29, 2024 at 03:20:59PM +0800, Jason Wang wrote:
> >>>> On Fri, Mar 29, 2024 at 3:07 PM Chen, Jiqian  wrote:
> >>>>>
> >>>>> On 2024/3/28 20:36, Michael S. Tsirkin wrote:
> >>>>>>>>> +}
> >>>>>>>>> +
> >>>>>>>>>  static void virtio_pci_bus_reset_hold(Object *obj)
> >>>>>>>>>  {
> >>>>>>>>>  PCIDevice *dev = PCI_DEVICE(obj);
> >>>>>>>>>  DeviceState *qdev = DEVICE(obj);
> >>>>>>>>>
> >>>>>>>>> +if (virtio_pci_no_soft_reset(dev)) {
> >>>>>>>>> +return;
> >>>>>>>>> +}
> >>>>>>>>> +
> >>>>>>>>>  virtio_pci_reset(qdev);
> >>>>>>>>>
> >>>>>>>>>  if (pci_is_express(dev)) {
> >>>>>>>>> @@ -2484,6 +2511,8 @@ static Property virtio_pci_properties[] = {
> >>>>>>>>>  VIRTIO_PCI_FLAG_INIT_LNKCTL_BIT, true),
> >>>>>>>>>  DEFINE_PROP_BIT("x-pcie-pm-init", VirtIOPCIProxy, flags,
> >>>>>>>>>  VIRTIO_PCI_FLAG_INIT_PM_BIT, true),
> >>>>>>>>> +DEFINE_PROP_BIT("x-pcie-pm-no-soft-reset", VirtIOPCIProxy, 
> >>>>>>>>> flags,
> >>>>>>>>> +VIRTIO_PCI_FLAG_PM_NO_SOFT_RESET_BIT, false),
> >>>>
> >>>> Why does it come with an x prefix?
> >>>>
> >>>>>>>>>  DEFINE_PROP_BIT("x-pcie-flr-init", VirtIOPCIProxy, flags,
> >>>>>>>>>  VIRTIO_PCI_FLAG_INIT_FLR_BIT, true),
> >>>>>>>>>  DEFINE_PROP_BIT("aer", VirtIOPCIProxy, flags,
> >>>>>>>>
> >>>>>>>> I am a bit confused about this part.
> >>>>>>>> Do you want to make this software controllable?
> >>>>>>> Yes, because even the real hardware, this bit is not always set.
> >>>>
> >>>> We are talking about emulated devices here.
> >>>>
> >>>>>>
> >>>>>> So which virtio devices should and which should not set this bit?
> >>>>> This depends on the scenario the virtio-device is used, if we want to 
> >>>>> trigger an internal soft reset for the virtio-device during S3, this 
> >>>>> bit shouldn't be set.
> >>>>
> >>>> If the device doesn't need reset, why bother the driver for this?
> >>>>
> >>>> Btw, no_soft_reset is insufficient for some cases, there's a proposal
> >>>> for the virtio-spec. I think we need to wait until it is done.
> >>>
> >>> That seems orthogonal or did I miss something?
> >> Yes, I looked the detail of the proposal, I also think they are unrelated.
> >
> > The point is the proposal said
> >
> > """
> > Without a mechanism to
> > suspend/resume virtio devices when the driver is suspended/resumed in
> > the early phase of suspend/late phase of resume, there is a window where
> > interrupts can be lost.
> > """
> >
> > It looks safe to enable it with the suspend bit. Or if you think it's
> > wrong, please comment on the virtio spec patch.
> If I understand the proposal correctly.
> Only need to check the SUSPEND bit when virtio_pci_bus_reset_hold is called.
> It seems the proposal won't block this patch to upstream.
> In next version, I will add comments to note the SUSPEND bit that need to be 
> considered once it is accepted.
>
> >
> >> I will set the default value of No_Soft_Reset bit to true in next version 
> >> according to your opinion.
> >> About the compatibility of old machine types, which types should I 
> >> consider? Does the same as x-pcie-pm-init(hw_compat_2_8)?
> >> Forgive me for not knowing much about compatibility.
> >
> > "x" means no compatibility at all, please drop the "x" prefix. And it
> Thanks to explain.
> So it seems the prefix "x" of "x-pcie-pm-init" is also wrong? Because it 
> considered the hw_compat_2_8. Also "x-pcie-flr-init".

Probably but too late to fix.

> Back to No_Soft_Reset, do you know which old machines should I consider to 
> compatible with?

Replied in another mail.

Thanks

>
> > looks more safe to start as "false" by default.
> >
> > Thanks
> >
> >>>
> >>>>> In my use case on my environment, I don't want to reset virtio-gpu 
> >>>>> during S3,
> >>>>> because once the display resources are destroyed, there are not enough 
> >>>>> information to re-create them, so this bit should be set.
> >>>>> Making this bit software controllable is convenient for users to take 
> >>>>> their own choices.
> >>>>
> >>>> Thanks
> >>>>
> >>>>>
> >>>>>>
> >>>>>>>> Or should this be set to true by default and then
> >>>>>>>> changed to false for old machine types?
> >>>>>>> How can I do so?
> >>>>>>> Do you mean set this to true by default, and if old machine types 
> >>>>>>> don't need this bit, they can pass false config to qemu when running 
> >>>>>>> qemu?
> >>>>>>
> >>>>>> No, you would use compat machinery. See how is x-pcie-flr-init handled.
> >>>>>>
> >>>>>>
> >>>>>
> >>>>> --
> >>>>> Best regards,
> >>>>> Jiqian Chen.
> >>>
> >>
> >> --
> >> Best regards,
> >> Jiqian Chen.
> >
>
> --
> Best regards,
> Jiqian Chen.




Re: [RFC QEMU PATCH v8 2/2] virtio-pci: implement No_Soft_Reset bit

2024-04-12 Thread Jason Wang
On Fri, Apr 12, 2024 at 2:05 PM Chen, Jiqian  wrote:
>
> On 2024/4/7 19:49, Michael S. Tsirkin wrote:
> >>> I will set the default value of No_Soft_Reset bit to true in next version 
> >>> according to your opinion.
> >>> About the compatibility of old machine types, which types should I 
> >>> consider? Does the same as x-pcie-pm-init(hw_compat_2_8)?
> >>> Forgive me for not knowing much about compatibility.
> >>
> >> "x" means no compatibility at all, please drop the "x" prefix. And it
> >> looks more safe to start as "false" by default.
> >>
> >> Thanks
> >
> >
> > Not sure I agree. External flags are for when users want to tweak them.
> > When would users want it to be off?
> > What is done here is I feel sane, just add machine compat machinery
> > to change to off for old machine types.
> Do you know which old machines should I consider to compatible with?
> Or which guys should I add to "CC" and can get answer from them?
> I have less knowledge about compatibility.

If you make it off by default, you don't need otherwise, it's one
release before.

Thanks

>
> >
>
> --
> Best regards,
> Jiqian Chen.




Re: [PATCH v5] virtio-pci: Fix the crash that the vector was used after released.

2024-04-11 Thread Jason Wang
On Thu, Apr 11, 2024 at 4:03 PM Cindy Lu  wrote:
>
> During the booting process of the non-standard image, the behavior of the
> called function in qemu is as follows:
>
> 1. vhost_net_stop() was triggered by guest image. This will call the function
> virtio_pci_set_guest_notifiers() with assgin= false,
> virtio_pci_set_guest_notifiers() will release the irqfd for vector 0
>
> 2. virtio_reset() was triggered, this will set configure vector to 
> VIRTIO_NO_VECTOR
>
> 3.vhost_net_start() was called (at this time, the configure vector is
> still VIRTIO_NO_VECTOR) and then call virtio_pci_set_guest_notifiers() with
> assgin=true, so the irqfd for vector 0 is still not "init" during this process
>
> 4. The system continues to boot and sets the vector back to 0. After that
> msix_fire_vector_notifier() was triggered to unmask the vector 0 and  meet 
> the crash
>
> To fix the issue, we need to support changing the vector after 
> VIRTIO_CONFIG_S_DRIVER_OK is set.
>
> (gdb) bt
> 0  __pthread_kill_implementation (threadid=, 
> signo=signo@entry=6, no_tid=no_tid@entry=0)
> at pthread_kill.c:44
> 1  0x7fc87148ec53 in __pthread_kill_internal (signo=6, 
> threadid=) at pthread_kill.c:78
> 2  0x7fc87143e956 in __GI_raise (sig=sig@entry=6) at 
> ../sysdeps/posix/raise.c:26
> 3  0x7fc8714287f4 in __GI_abort () at abort.c:79
> 4  0x7fc87142871b in __assert_fail_base
> (fmt=0x7fc8715bbde0 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", 
> assertion=0x5606413efd53 "ret == 0", file=0x5606413ef87d 
> "../accel/kvm/kvm-all.c", line=1837, function=) at assert.c:92
> 5  0x7fc871437536 in __GI___assert_fail
> (assertion=0x5606413efd53 "ret == 0", file=0x5606413ef87d 
> "../accel/kvm/kvm-all.c", line=1837, function=0x5606413f06f0 
> <__PRETTY_FUNCTION__.19> "kvm_irqchip_commit_routes") at assert.c:101
> 6  0x560640f884b5 in kvm_irqchip_commit_routes (s=0x560642cae1f0) at 
> ../accel/kvm/kvm-all.c:1837
> 7  0x560640c98f8e in virtio_pci_one_vector_unmask
> (proxy=0x560643c65f00, queue_no=4294967295, vector=0, msg=..., 
> n=0x560643c6e4c8)
> at ../hw/virtio/virtio-pci.c:1005
> 8  0x560640c99201 in virtio_pci_vector_unmask (dev=0x560643c65f00, 
> vector=0, msg=...)
> at ../hw/virtio/virtio-pci.c:1070
> 9  0x560640bc402e in msix_fire_vector_notifier (dev=0x560643c65f00, 
> vector=0, is_masked=false)
> at ../hw/pci/msix.c:120
> 10 0x560640bc40f1 in msix_handle_mask_update (dev=0x560643c65f00, 
> vector=0, was_masked=true)
> at ../hw/pci/msix.c:140
> 11 0x560640bc4503 in msix_table_mmio_write (opaque=0x560643c65f00, 
> addr=12, val=0, size=4)
> at ../hw/pci/msix.c:231
> 12 0x560640f26d83 in memory_region_write_accessor
> (mr=0x560643c66540, addr=12, value=0x7fc86b7bc628, size=4, shift=0, 
> mask=4294967295, attrs=...)
> at ../system/memory.c:497
> 13 0x560640f270a6 in access_with_adjusted_size
>
>  (addr=12, value=0x7fc86b7bc628, size=4, access_size_min=1, 
> access_size_max=4, access_fn=0x560640f26c8d , 
> mr=0x560643c66540, attrs=...) at ../system/memory.c:573
> 14 0x560640f2a2b5 in memory_region_dispatch_write (mr=0x560643c66540, 
> addr=12, data=0, op=MO_32, attrs=...)
> at ../system/memory.c:1521
> 15 0x560640f37bac in flatview_write_continue
> (fv=0x7fc65805e0b0, addr=4273803276, attrs=..., ptr=0x7fc871e9c028, 
> len=4, addr1=12, l=4, mr=0x560643c66540)
> at ../system/physmem.c:2714
> 16 0x560640f37d0f in flatview_write
> (fv=0x7fc65805e0b0, addr=4273803276, attrs=..., buf=0x7fc871e9c028, 
> len=4) at ../system/physmem.c:2756
> 17 0x560640f380bf in address_space_write
> (as=0x560642161ae0 , addr=4273803276, attrs=..., 
> buf=0x7fc871e9c028, len=4)
> at ../system/physmem.c:2863
> 18 0x560640f3812c in address_space_rw
> (as=0x560642161ae0 , addr=4273803276, attrs=..., 
> buf=0x7fc871e9c028, len=4, is_write=true) at ../system/physmem.c:2873
> --Type  for more, q to quit, c to continue without paging--
> 19 0x560640f8aa55 in kvm_cpu_exec (cpu=0x560642f205e0) at 
> ../accel/kvm/kvm-all.c:2915
> 20 0x560640f8d731 in kvm_vcpu_thread_fn (arg=0x560642f205e0) at 
> ../accel/kvm/kvm-accel-ops.c:51
> 21 0x5606411949f4 in qemu_thread_start (args=0x560642f292b0) at 
> ../util/qemu-thread-posix.c:541
> 22 0x7fc87148cdcd in start_thread (arg=) at 
> pthread_create.c:442
> 23 0x7fc871512630 in clone3 () at 
> ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
> (gdb)
>
> Fixes: f9a09ca3ea ("vhost: add support for configure interrupt")
> Cc: qemu-sta...@nongnu.org
>
> Signed-off-by: Cindy Lu 

Acked-by: Jas

Re: [PATCH v4] virtio-pci: Fix the crash that the vector was used after released.

2024-04-10 Thread Jason Wang
On Thu, Apr 11, 2024 at 12:11 PM Cindy Lu  wrote:
>
> During the booting process of the non-standard image, the behavior of the
> called function in qemu is as follows:
>
> 1. vhost_net_stop() was triggered by guest image. This will call the function
> virtio_pci_set_guest_notifiers() with assgin= false,
> virtio_pci_set_guest_notifiers() will release the irqfd for vector 0
>
> 2. virtio_reset() was triggered, this will set configure vector to 
> VIRTIO_NO_VECTOR
>
> 3.vhost_net_start() was called (at this time, the configure vector is
> still VIRTIO_NO_VECTOR) and then call virtio_pci_set_guest_notifiers() with
> assgin=true, so the irqfd for vector 0 is still not "init" during this process
>
> 4. The system continues to boot and sets the vector back to 0. After that
> msix_fire_vector_notifier() was triggered to unmask the vector 0 and  meet 
> the crash
>
> To fix the issue, we need to support changing the vector after 
> VIRTIO_CONFIG_S_DRIVER_OK is set.
>
> (gdb) bt
> 0  __pthread_kill_implementation (threadid=, 
> signo=signo@entry=6, no_tid=no_tid@entry=0)
> at pthread_kill.c:44
> 1  0x7fc87148ec53 in __pthread_kill_internal (signo=6, 
> threadid=) at pthread_kill.c:78
> 2  0x7fc87143e956 in __GI_raise (sig=sig@entry=6) at 
> ../sysdeps/posix/raise.c:26
> 3  0x7fc8714287f4 in __GI_abort () at abort.c:79
> 4  0x7fc87142871b in __assert_fail_base
> (fmt=0x7fc8715bbde0 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", 
> assertion=0x5606413efd53 "ret == 0", file=0x5606413ef87d 
> "../accel/kvm/kvm-all.c", line=1837, function=) at assert.c:92
> 5  0x7fc871437536 in __GI___assert_fail
> (assertion=0x5606413efd53 "ret == 0", file=0x5606413ef87d 
> "../accel/kvm/kvm-all.c", line=1837, function=0x5606413f06f0 
> <__PRETTY_FUNCTION__.19> "kvm_irqchip_commit_routes") at assert.c:101
> 6  0x560640f884b5 in kvm_irqchip_commit_routes (s=0x560642cae1f0) at 
> ../accel/kvm/kvm-all.c:1837
> 7  0x560640c98f8e in virtio_pci_one_vector_unmask
> (proxy=0x560643c65f00, queue_no=4294967295, vector=0, msg=..., 
> n=0x560643c6e4c8)
> at ../hw/virtio/virtio-pci.c:1005
> 8  0x560640c99201 in virtio_pci_vector_unmask (dev=0x560643c65f00, 
> vector=0, msg=...)
> at ../hw/virtio/virtio-pci.c:1070
> 9  0x560640bc402e in msix_fire_vector_notifier (dev=0x560643c65f00, 
> vector=0, is_masked=false)
> at ../hw/pci/msix.c:120
> 10 0x560640bc40f1 in msix_handle_mask_update (dev=0x560643c65f00, 
> vector=0, was_masked=true)
> at ../hw/pci/msix.c:140
> 11 0x560640bc4503 in msix_table_mmio_write (opaque=0x560643c65f00, 
> addr=12, val=0, size=4)
> at ../hw/pci/msix.c:231
> 12 0x560640f26d83 in memory_region_write_accessor
> (mr=0x560643c66540, addr=12, value=0x7fc86b7bc628, size=4, shift=0, 
> mask=4294967295, attrs=...)
> at ../system/memory.c:497
> 13 0x560640f270a6 in access_with_adjusted_size
>
>  (addr=12, value=0x7fc86b7bc628, size=4, access_size_min=1, 
> access_size_max=4, access_fn=0x560640f26c8d , 
> mr=0x560643c66540, attrs=...) at ../system/memory.c:573
> 14 0x560640f2a2b5 in memory_region_dispatch_write (mr=0x560643c66540, 
> addr=12, data=0, op=MO_32, attrs=...)
> at ../system/memory.c:1521
> 15 0x560640f37bac in flatview_write_continue
> (fv=0x7fc65805e0b0, addr=4273803276, attrs=..., ptr=0x7fc871e9c028, 
> len=4, addr1=12, l=4, mr=0x560643c66540)
> at ../system/physmem.c:2714
> 16 0x560640f37d0f in flatview_write
> (fv=0x7fc65805e0b0, addr=4273803276, attrs=..., buf=0x7fc871e9c028, 
> len=4) at ../system/physmem.c:2756
> 17 0x560640f380bf in address_space_write
> (as=0x560642161ae0 , addr=4273803276, attrs=..., 
> buf=0x7fc871e9c028, len=4)
> at ../system/physmem.c:2863
> 18 0x560640f3812c in address_space_rw
> (as=0x560642161ae0 , addr=4273803276, attrs=..., 
> buf=0x7fc871e9c028, len=4, is_write=true) at ../system/physmem.c:2873
> --Type  for more, q to quit, c to continue without paging--
> 19 0x560640f8aa55 in kvm_cpu_exec (cpu=0x560642f205e0) at 
> ../accel/kvm/kvm-all.c:2915
> 20 0x560640f8d731 in kvm_vcpu_thread_fn (arg=0x560642f205e0) at 
> ../accel/kvm/kvm-accel-ops.c:51
> 21 0x5606411949f4 in qemu_thread_start (args=0x560642f292b0) at 
> ../util/qemu-thread-posix.c:541
> 22 0x7fc87148cdcd in start_thread (arg=) at 
> pthread_create.c:442
> 23 0x7fc871512630 in clone3 () at 
> ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
> (gdb)
> Signed-off-by: Cindy Lu 

Fixes: f9a09ca3ea ("vhost: add support for configure interrupt")
Cc: qemu-sta...@nongnu.org

Acked-by: Jason Wang 

Thanks

>

Re: [PATCH v2 1/1] virtio-pci: Fix the crash that the vector was used after released.

2024-04-10 Thread Jason Wang
Offline:

On Wed, Apr 10, 2024 at 2:28 PM Cindy Lu  wrote:
>
> On Wed, Apr 10, 2024 at 1:36 PM Jason Wang  wrote:
> >
> > On Wed, Apr 10, 2024 at 1:29 PM Cindy Lu  wrote:
> > >
> > > When the guest triggers vhost_stop and then virtio_reset, the vector will 
> > > the
> > > IRQFD for this vector will be released and change to VIRTIO_NO_VECTOR.
> > > After that, the guest called vhost_net_start,  (at this time, the 
> > > configure
> > > vector is still VIRTIO_NO_VECTOR),  vector 0 still was not "init".
> > > The guest system continued to boot, set the vector back to 0, and then 
> > > met the crash.
> > >
> > > To fix this, we need to call the function 
> > > "kvm_virtio_pci_vector_use_one()"
> > > when the vector changes back from VIRTIO_NO_VECTOR
> > >
> > > (gdb) bt
> > > 0  __pthread_kill_implementation (threadid=, 
> > > signo=signo@entry=6, no_tid=no_tid@entry=0)
> > > at pthread_kill.c:44
> > > 1  0x7fc87148ec53 in __pthread_kill_internal (signo=6, 
> > > threadid=) at pthread_kill.c:78
> > > 2  0x7fc87143e956 in __GI_raise (sig=sig@entry=6) at 
> > > ../sysdeps/posix/raise.c:26
> > > 3  0x7fc8714287f4 in __GI_abort () at abort.c:79
> > > 4  0x7fc87142871b in __assert_fail_base
> > > (fmt=0x7fc8715bbde0 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", 
> > > assertion=0x5606413efd53 "ret == 0", file=0x5606413ef87d 
> > > "../accel/kvm/kvm-all.c", line=1837, function=) at 
> > > assert.c:92
> > > 5  0x7fc871437536 in __GI___assert_fail
> > > (assertion=0x5606413efd53 "ret == 0", file=0x5606413ef87d 
> > > "../accel/kvm/kvm-all.c", line=1837, function=0x5606413f06f0 
> > > <__PRETTY_FUNCTION__.19> "kvm_irqchip_commit_routes") at assert.c:101
> > > 6  0x560640f884b5 in kvm_irqchip_commit_routes (s=0x560642cae1f0) at 
> > > ../accel/kvm/kvm-all.c:1837
> > > 7  0x560640c98f8e in virtio_pci_one_vector_unmask
> > > (proxy=0x560643c65f00, queue_no=4294967295, vector=0, msg=..., 
> > > n=0x560643c6e4c8)
> > > at ../hw/virtio/virtio-pci.c:1005
> > > 8  0x560640c99201 in virtio_pci_vector_unmask (dev=0x560643c65f00, 
> > > vector=0, msg=...)
> > > at ../hw/virtio/virtio-pci.c:1070
> > > 9  0x560640bc402e in msix_fire_vector_notifier (dev=0x560643c65f00, 
> > > vector=0, is_masked=false)
> > > at ../hw/pci/msix.c:120
> > > 10 0x560640bc40f1 in msix_handle_mask_update (dev=0x560643c65f00, 
> > > vector=0, was_masked=true)
> > > at ../hw/pci/msix.c:140
> > > 11 0x560640bc4503 in msix_table_mmio_write (opaque=0x560643c65f00, 
> > > addr=12, val=0, size=4)
> > > at ../hw/pci/msix.c:231
> > > 12 0x560640f26d83 in memory_region_write_accessor
> > > (mr=0x560643c66540, addr=12, value=0x7fc86b7bc628, size=4, shift=0, 
> > > mask=4294967295, attrs=...)
> > > at ../system/memory.c:497
> > > 13 0x560640f270a6 in access_with_adjusted_size
> > >
> > >  (addr=12, value=0x7fc86b7bc628, size=4, access_size_min=1, 
> > > access_size_max=4, access_fn=0x560640f26c8d 
> > > , mr=0x560643c66540, attrs=...) at 
> > > ../system/memory.c:573
> > > 14 0x560640f2a2b5 in memory_region_dispatch_write (mr=0x560643c66540, 
> > > addr=12, data=0, op=MO_32, attrs=...)
> > > at ../system/memory.c:1521
> > > 15 0x560640f37bac in flatview_write_continue
> > > (fv=0x7fc65805e0b0, addr=4273803276, attrs=..., ptr=0x7fc871e9c028, 
> > > len=4, addr1=12, l=4, mr=0x560643c66540)
> > > at ../system/physmem.c:2714
> > > 16 0x560640f37d0f in flatview_write
> > > (fv=0x7fc65805e0b0, addr=4273803276, attrs=..., buf=0x7fc871e9c028, 
> > > len=4) at ../system/physmem.c:2756
> > > 17 0x560640f380bf in address_space_write
> > > (as=0x560642161ae0 , addr=4273803276, 
> > > attrs=..., buf=0x7fc871e9c028, len=4)
> > > at ../system/physmem.c:2863
> > > 18 0x560640f3812c in address_space_rw
> > > (as=0x560642161ae0 , addr=4273803276, 
> > > attrs=..., buf=0x7fc871e9c028, len=4, is_write=true) at 
> > > ../system/physmem.c:2873
> > > --Type  for more, q to quit, c to continue without paging--
> > > 19 0x560640f8aa55 in kvm_cpu_exec (cpu=0x560642f205e0) at 
> > > ../accel/kvm/kvm-all.c:2915
> >

Re: [PATCH-for-9.0? v2] hw/net/net_tx_pkt: Fix overrun in update_sctp_checksum()

2024-04-10 Thread Jason Wang
On Wed, Apr 10, 2024 at 3:06 PM Akihiko Odaki  wrote:
>
> On 2024/04/10 16:04, Philippe Mathieu-Daudé wrote:
> > If a fragmented packet size is too short, do not try to
> > calculate its checksum.
> >
> > Reproduced using:
> >
> >$ cat << EOF | qemu-system-i386 -display none -nodefaults \
> >-machine q35,accel=qtest -m 32M \
> >-device igb,netdev=net0 \
> >-netdev user,id=net0 \
> >-qtest stdio
> >outl 0xcf8 0x8810
> >outl 0xcfc 0xe000
> >outl 0xcf8 0x8804
> >outw 0xcfc 0x06
> >write 0xe403 0x1 0x02
> >writel 0xe0003808 0x
> >write 0xe000381a 0x1 0x5b
> >write 0xe000381b 0x1 0x00
> >EOF
> >Assertion failed: (offset == 0), function iov_from_buf_full, file 
> > util/iov.c, line 39.
> >#1 0x5575e81e952a in iov_from_buf_full qemu/util/iov.c:39:5
> >#2 0x5575e6500768 in net_tx_pkt_update_sctp_checksum 
> > qemu/hw/net/net_tx_pkt.c:144:9
> >#3 0x5575e659f3e1 in igb_setup_tx_offloads qemu/hw/net/igb_core.c:478:11
> >#4 0x5575e659f3e1 in igb_tx_pkt_send qemu/hw/net/igb_core.c:552:10
> >#5 0x5575e659f3e1 in igb_process_tx_desc qemu/hw/net/igb_core.c:671:17
> >#6 0x5575e659f3e1 in igb_start_xmit qemu/hw/net/igb_core.c:903:9
> >#7 0x5575e659f3e1 in igb_set_tdt qemu/hw/net/igb_core.c:2812:5
> >#8 0x5575e657d6a4 in igb_core_write qemu/hw/net/igb_core.c:4248:9
> >
> > Cc: qemu-sta...@nongnu.org
> > Reported-by: Zheyu Ma 
> > Fixes: f199b13bc1 ("igb: Implement Tx SCTP CSO")
> > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2273
> > Signed-off-by: Philippe Mathieu-Daudé 
>
> Reviewed-by: Akihiko Odaki 

Fixes: CVE-2024-3567
Acked-by: Jason Wang 

Peter, would you want to pick this for 9.0?

Thanks

>
> > ---
> > Since v1: check at offset 8 (Akihiko)
> > ---
> >   hw/net/net_tx_pkt.c | 4 
> >   1 file changed, 4 insertions(+)
> >
> > diff --git a/hw/net/net_tx_pkt.c b/hw/net/net_tx_pkt.c
> > index 2134a18c4c..b7b1de816d 100644
> > --- a/hw/net/net_tx_pkt.c
> > +++ b/hw/net/net_tx_pkt.c
> > @@ -141,6 +141,10 @@ bool net_tx_pkt_update_sctp_checksum(struct NetTxPkt 
> > *pkt)
> >   uint32_t csum = 0;
> >   struct iovec *pl_start_frag = pkt->vec + NET_TX_PKT_PL_START_FRAG;
> >
> > +if (iov_size(pl_start_frag, pkt->payload_frags) < 8 + sizeof(csum)) {
> > +return false;
> > +}
> > +
> >   if (iov_from_buf(pl_start_frag, pkt->payload_frags, 8, , 
> > sizeof(csum)) < sizeof(csum)) {
> >   return false;
> >   }
>




Re: [PATCH v2 1/1] virtio-pci: Fix the crash that the vector was used after released.

2024-04-09 Thread Jason Wang
On Wed, Apr 10, 2024 at 1:29 PM Cindy Lu  wrote:
>
> When the guest triggers vhost_stop and then virtio_reset, the vector will the
> IRQFD for this vector will be released and change to VIRTIO_NO_VECTOR.
> After that, the guest called vhost_net_start,  (at this time, the configure
> vector is still VIRTIO_NO_VECTOR),  vector 0 still was not "init".
> The guest system continued to boot, set the vector back to 0, and then met 
> the crash.

Btw, the description of the cover letter seems to be better, how about
just using that (so there won't be a cover letter since this series
just have 1 patch)?

Thanks




Re: [PATCH v2 1/1] virtio-pci: Fix the crash that the vector was used after released.

2024-04-09 Thread Jason Wang
On Wed, Apr 10, 2024 at 1:29 PM Cindy Lu  wrote:
>
> When the guest triggers vhost_stop and then virtio_reset, the vector will the
> IRQFD for this vector will be released and change to VIRTIO_NO_VECTOR.
> After that, the guest called vhost_net_start,  (at this time, the configure
> vector is still VIRTIO_NO_VECTOR),  vector 0 still was not "init".
> The guest system continued to boot, set the vector back to 0, and then met 
> the crash.
>
> To fix this, we need to call the function "kvm_virtio_pci_vector_use_one()"
> when the vector changes back from VIRTIO_NO_VECTOR
>
> (gdb) bt
> 0  __pthread_kill_implementation (threadid=, 
> signo=signo@entry=6, no_tid=no_tid@entry=0)
> at pthread_kill.c:44
> 1  0x7fc87148ec53 in __pthread_kill_internal (signo=6, 
> threadid=) at pthread_kill.c:78
> 2  0x7fc87143e956 in __GI_raise (sig=sig@entry=6) at 
> ../sysdeps/posix/raise.c:26
> 3  0x7fc8714287f4 in __GI_abort () at abort.c:79
> 4  0x7fc87142871b in __assert_fail_base
> (fmt=0x7fc8715bbde0 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", 
> assertion=0x5606413efd53 "ret == 0", file=0x5606413ef87d 
> "../accel/kvm/kvm-all.c", line=1837, function=) at assert.c:92
> 5  0x7fc871437536 in __GI___assert_fail
> (assertion=0x5606413efd53 "ret == 0", file=0x5606413ef87d 
> "../accel/kvm/kvm-all.c", line=1837, function=0x5606413f06f0 
> <__PRETTY_FUNCTION__.19> "kvm_irqchip_commit_routes") at assert.c:101
> 6  0x560640f884b5 in kvm_irqchip_commit_routes (s=0x560642cae1f0) at 
> ../accel/kvm/kvm-all.c:1837
> 7  0x560640c98f8e in virtio_pci_one_vector_unmask
> (proxy=0x560643c65f00, queue_no=4294967295, vector=0, msg=..., 
> n=0x560643c6e4c8)
> at ../hw/virtio/virtio-pci.c:1005
> 8  0x560640c99201 in virtio_pci_vector_unmask (dev=0x560643c65f00, 
> vector=0, msg=...)
> at ../hw/virtio/virtio-pci.c:1070
> 9  0x560640bc402e in msix_fire_vector_notifier (dev=0x560643c65f00, 
> vector=0, is_masked=false)
> at ../hw/pci/msix.c:120
> 10 0x560640bc40f1 in msix_handle_mask_update (dev=0x560643c65f00, 
> vector=0, was_masked=true)
> at ../hw/pci/msix.c:140
> 11 0x560640bc4503 in msix_table_mmio_write (opaque=0x560643c65f00, 
> addr=12, val=0, size=4)
> at ../hw/pci/msix.c:231
> 12 0x560640f26d83 in memory_region_write_accessor
> (mr=0x560643c66540, addr=12, value=0x7fc86b7bc628, size=4, shift=0, 
> mask=4294967295, attrs=...)
> at ../system/memory.c:497
> 13 0x560640f270a6 in access_with_adjusted_size
>
>  (addr=12, value=0x7fc86b7bc628, size=4, access_size_min=1, 
> access_size_max=4, access_fn=0x560640f26c8d , 
> mr=0x560643c66540, attrs=...) at ../system/memory.c:573
> 14 0x560640f2a2b5 in memory_region_dispatch_write (mr=0x560643c66540, 
> addr=12, data=0, op=MO_32, attrs=...)
> at ../system/memory.c:1521
> 15 0x560640f37bac in flatview_write_continue
> (fv=0x7fc65805e0b0, addr=4273803276, attrs=..., ptr=0x7fc871e9c028, 
> len=4, addr1=12, l=4, mr=0x560643c66540)
> at ../system/physmem.c:2714
> 16 0x560640f37d0f in flatview_write
> (fv=0x7fc65805e0b0, addr=4273803276, attrs=..., buf=0x7fc871e9c028, 
> len=4) at ../system/physmem.c:2756
> 17 0x560640f380bf in address_space_write
> (as=0x560642161ae0 , addr=4273803276, attrs=..., 
> buf=0x7fc871e9c028, len=4)
> at ../system/physmem.c:2863
> 18 0x560640f3812c in address_space_rw
> (as=0x560642161ae0 , addr=4273803276, attrs=..., 
> buf=0x7fc871e9c028, len=4, is_write=true) at ../system/physmem.c:2873
> --Type  for more, q to quit, c to continue without paging--
> 19 0x560640f8aa55 in kvm_cpu_exec (cpu=0x560642f205e0) at 
> ../accel/kvm/kvm-all.c:2915
> 20 0x560640f8d731 in kvm_vcpu_thread_fn (arg=0x560642f205e0) at 
> ../accel/kvm/kvm-accel-ops.c:51
> 21 0x5606411949f4 in qemu_thread_start (args=0x560642f292b0) at 
> ../util/qemu-thread-posix.c:541
> 22 0x7fc87148cdcd in start_thread (arg=) at 
> pthread_create.c:442
> 23 0x7fc871512630 in clone3 () at 
> ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
> (gdb)
> Signed-off-by: Cindy Lu 
> ---
>  hw/virtio/virtio-pci.c | 35 +++
>  1 file changed, 35 insertions(+)
>
> diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
> index 1a7039fb0c..344f4fb844 100644
> --- a/hw/virtio/virtio-pci.c
> +++ b/hw/virtio/virtio-pci.c
> @@ -880,6 +880,7 @@ static int kvm_virtio_pci_vector_use_one(VirtIOPCIProxy 
> *proxy, int queue_no)
>  int ret;
>  EventNotifier *n;
>  PCIDevice *dev = >pci_dev;
> +VirtIOIRQFD *irqfd;
>  VirtIODevice *vdev = virtio_bus_get_device(>bus);
>  VirtioDeviceClass *k = VIRTIO_DEVICE_GET_CLASS(vdev);
>
> @@ -890,10 +891,19 @@ static int kvm_virtio_pci_vector_use_one(VirtIOPCIProxy 
> *proxy, int queue_no)
>  if (vector >= msix_nr_vectors_allocated(dev)) {
>  return 0;
>  }
> +/*
> + * if the irqfd still in use, means the irqfd was not
> + * release before and don't need to 

Re: [PATCH v2] vhost: don't set vring call if guest notifiers is not enabled

2024-04-08 Thread Jason Wang
On Mon, Apr 8, 2024 at 3:33 PM lyx634449800  wrote:
>
> When conducting performance testing using testpmd in the guest os,
> it was observed that the performance was lower compared to the
> scenario of direct vfio-pci usage.
>
> In the commit 96a3d98d2cdbd897ff5ab33427aa4cfb94077665, the author
> provided a good solution. However, because the guest OS's
> driver(e.g., virtio-net pmd) may not enable the msix capability, the
> function k->query_guest_notifiers(qbus->parent) may return false,
> resulting in the expected effect not being achieved. To address this
> issue, modify the conditional statement.
>
> Signed-off-by: Yuxue Liu 

Acked-by: Jason Wang 

Thanks




Re: [PATCH 1/2] virtio-net: Fix vhost virtqueue notifiers for RSS

2024-04-08 Thread Jason Wang
On Mon, Apr 8, 2024 at 6:13 PM Michael S. Tsirkin  wrote:
>
> On Tue, Mar 26, 2024 at 07:06:29PM +0900, Akihiko Odaki wrote:
> > virtio_net_guest_notifier_pending() and virtio_net_guest_notifier_mask()
> > checked VIRTIO_NET_F_MQ to know there are multiple queues, but
> > VIRTIO_NET_F_RSS also enables multiple queues. Refer to n->multiqueue,
> > which is set to true either of VIRTIO_NET_F_MQ or VIRTIO_NET_F_RSS is
> > enabled.
> >
> > Fixes: 68b0a6395f36 ("virtio-net: align ctrl_vq index for non-mq guest for 
> > vhost_vdpa")
> > Signed-off-by: Akihiko Odaki 
>
> Reviewed-by: Michael S. Tsirkin 
>
> Jason, are you merging this?

It has been merged:

https://gitlab.com/qemu-project/qemu/-/commit/ba6bb2ec953f10751f174b6f7da8fe7e5f008c08

Thanks




Re: [PATCH] Revert "hw/virtio: Add support for VDPA network simulation devices"

2024-04-08 Thread Jason Wang
On Mon, Apr 8, 2024 at 5:47 PM Michael S. Tsirkin  wrote:
>
> This reverts commit cd341fd1ffded978b2aa0b5309b00be7c42e347c.
>
> The patch adds non-upstream code in
> include/standard-headers/linux/virtio_pci.h
> which would make maintainance harder.
>
> Revert for now.
>
> Suggested-by: Jason Wang 
> Signed-off-by: Michael S. Tsirkin 

Acked-by: Jason Wang 

Thanks




Re: [PATCH] hw/virtio: Add support for VDPA network simulation devices

2024-04-08 Thread Jason Wang
On Mon, Mar 18, 2024 at 8:41 PM Michael S. Tsirkin  wrote:
>
> On Thu, Mar 14, 2024 at 11:24:33AM +0800, Jason Wang wrote:
> > On Thu, Mar 14, 2024 at 3:52 AM Michael S. Tsirkin  wrote:
> > >
> > > On Wed, Mar 13, 2024 at 07:51:08PM +0100, Thomas Weißschuh wrote:
> > > > On 2024-02-21 15:38:02+0800, Hao Chen wrote:
> > > > > This patch adds support for VDPA network simulation devices.
> > > > > The device is developed based on virtio-net and tap backend,
> > > > > and supports hardware live migration function.
> > > > >
> > > > > For more details, please refer to "docs/system/devices/vdpa-net.rst"
> > > > >
> > > > > Signed-off-by: Hao Chen 
> > > > > ---
> > > > >  MAINTAINERS |   5 +
> > > > >  docs/system/device-emulation.rst|   1 +
> > > > >  docs/system/devices/vdpa-net.rst| 121 +
> > > > >  hw/net/virtio-net.c |  16 ++
> > > > >  hw/virtio/virtio-pci.c  | 189 
> > > > > +++-
> >
> > I think those modifications should belong to a separate file as it
> > might conflict with virito features in the future.
> >
> > > > >  hw/virtio/virtio.c  |  39 
> > > > >  include/hw/virtio/virtio-pci.h  |   5 +
> > > > >  include/hw/virtio/virtio.h  |  19 ++
> > > > >  include/standard-headers/linux/virtio_pci.h |   7 +
> > > > >  9 files changed, 399 insertions(+), 3 deletions(-)
> > > > >  create mode 100644 docs/system/devices/vdpa-net.rst
> > > >
> > > > [..]
> > > >
> > > > > diff --git a/include/standard-headers/linux/virtio_pci.h 
> > > > > b/include/standard-headers/linux/virtio_pci.h
> > > > > index b7fdfd0668..fb5391cef6 100644
> > > > > --- a/include/standard-headers/linux/virtio_pci.h
> > > > > +++ b/include/standard-headers/linux/virtio_pci.h
> > > > > @@ -216,6 +216,13 @@ struct virtio_pci_cfg_cap {
> > > > >  #define VIRTIO_PCI_COMMON_Q_NDATA  56
> > > > >  #define VIRTIO_PCI_COMMON_Q_RESET  58
> > > > >
> > > > > +#define LM_LOGGING_CTRL 0
> > > > > +#define LM_BASE_ADDR_LOW4
> > > > > +#define LM_BASE_ADDR_HIGH   8
> > > > > +#define LM_END_ADDR_LOW 12
> > > > > +#define LM_END_ADDR_HIGH16
> > > > > +#define LM_VRING_STATE_OFFSET   0x20
> > > >
> > > > These changes are not in upstream Linux and will be undone by
> > > > ./scripts/update-linux-headers.sh.
> > > >
> > > > Are they intentionally in this header?
> > >
> > >
> > > Good point. Pls move.
> >
> > Right and this part, it's not a part of standard virtio.
> >
> > Thanks
>
> I'm thinking of reverting this patch unless there's a resolution
> soon, and reapplying later after the release.

I think we need to revert this and re-visit in the next release.

Thanks

>
>
> > >
> > > > > +
> > > > >  #endif /* VIRTIO_PCI_NO_MODERN */
> > > > >
> > > > >  #endif
> > >
>




Re: [PATCH 1/1] virtio-net: fix bug 1451 aka "assert(!virtio_net_get_subqueue(nc)->async_tx.elem); "

2024-04-08 Thread Jason Wang
On Fri, Apr 5, 2024 at 7:22 PM Alexey Dobriyan  wrote:
>
> Don't send zero length packets in virtio_net_flush_tx().
>
> Reproducer from https://gitlab.com/qemu-project/qemu/-/issues/1451
> creates small packet (1 segment, len = 10 == n->guest_hdr_len),
> destroys queue.
>
> "if (n->host_hdr_len != n->guest_hdr_len)" is triggered, if body creates
> zero length/zero segment packet, because there is nothing after guest
> header.

And in this case host_hdr_len is 0.

>
> qemu_sendv_packet_async() tries to send it.
>
> slirp discards it because it is smaller than Ethernet header,
> but returns 0.
>
> 0 length is propagated upwards and is interpreted as "packet has been sent"
> which is terrible because queue is being destroyed, nothing has been sent,
> nobody is waiting for TX to complete and assert it triggered.
>
> Signed-off-by: Alexey Dobriyan 
> ---
>  hw/net/virtio-net.c | 18 --
>  1 file changed, 12 insertions(+), 6 deletions(-)
>
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index 58014a92ad..258633f885 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -2765,18 +2765,14 @@ static int32_t virtio_net_flush_tx(VirtIONetQueue *q)
>  out_sg = elem->out_sg;
>  if (out_num < 1) {
>  virtio_error(vdev, "virtio-net header not in first element");
> -virtqueue_detach_element(q->tx_vq, elem, 0);
> -g_free(elem);
> -return -EINVAL;
> +goto detach;
>  }
>
>  if (n->has_vnet_hdr) {
>  if (iov_to_buf(out_sg, out_num, 0, , n->guest_hdr_len) <
>  n->guest_hdr_len) {
>  virtio_error(vdev, "virtio-net header incorrect");
> -virtqueue_detach_element(q->tx_vq, elem, 0);
> -g_free(elem);
> -return -EINVAL;
> +goto detach;
>  }
>  if (n->needs_vnet_hdr_swap) {
>  virtio_net_hdr_swap(vdev, (void *) );
> @@ -2807,6 +2803,11 @@ static int32_t virtio_net_flush_tx(VirtIONetQueue *q)
>   n->guest_hdr_len, -1);
>  out_num = sg_num;
>  out_sg = sg;
> +
> +if (iov_size(out_sg, out_num) == 0) {
> +virtio_error(vdev, "virtio-net nothing to send");
> +goto detach;
> +}

Nit, I think we can do this check before the iov_copy()?

Thanks

>  }
>
>  ret = qemu_sendv_packet_async(qemu_get_subqueue(n->nic, queue_index),
> @@ -2827,6 +2828,11 @@ drop:
>  }
>  }
>  return num_packets;
> +
> +detach:
> +virtqueue_detach_element(q->tx_vq, elem, 0);
> +g_free(elem);
> +return -EINVAL;
>  }
>
>  static void virtio_net_tx_timer(void *opaque);
> --
> 2.34.1
>




Re: [PULL 0/5] Net patches

2024-04-08 Thread Jason Wang
On Mon, Apr 1, 2024 at 3:21 AM Michael Tokarev  wrote:
>
> 29.03.2024 10:10, Jason Wang:
>
> > Akihiko Odaki (5):
> >virtio-net: Fix vhost virtqueue notifiers for RSS
> >ebpf: Fix indirections table setting
> >hw/net/net_tx_pkt: Fix virtio header without checksum offloading
> >tap-win32: Remove unnecessary stubs
> >Revert "tap: setting error appropriately when calling 
> > net_init_tap_one()"
>
>  From the above, I'm picking up
>
>virtio-net: Fix vhost virtqueue notifiers for RSS
>hw/net/net_tx_pkt: Fix virtio header without checksum offloading

Yes.

>
> for stable.  Not yet sure about
>
>Revert "tap: setting error appropriately when calling net_init_tap_one()"
>
> as it's been with us for a long time.

It probably isn't worth bothering.

>
> Please Cc: qemu-stable@ for changes which should be picked for stable
> series.
>

Right.

Thanks

> Thanks,
>
> /mjt
>




Re: [PATCH 1/1] ebpf: Added traces back. Changed source set for eBPF to 'system'.

2024-04-08 Thread Jason Wang
On Fri, Mar 29, 2024 at 7:30 PM Andrew Melnychenko  wrote:
>
> There was an issue with Qemu build with "--disable-system".
> The traces could be generated and the build fails.
> The traces were 'cut out' for previous patches, and overall,
> the 'system' source set should be used like in pre-'eBPF blob' patches.
>
> Signed-off-by: Andrew Melnychenko 
> ---

Queued for 9.1

Thanks




Re: [PATCH-for-9.1 1/7] ebpf: Restrict to system emulation

2024-04-08 Thread Jason Wang
On Fri, Apr 5, 2024 at 3:48 AM Philippe Mathieu-Daudé  wrote:
>
> eBPF is not used in user emulation.
>
> Signed-off-by: Philippe Mathieu-Daudé 

Queued for 9.1.

Thanks

> ---
>  ebpf/meson.build | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/ebpf/meson.build b/ebpf/meson.build
> index c5bf9295a2..bff6156f51 100644
> --- a/ebpf/meson.build
> +++ b/ebpf/meson.build
> @@ -1 +1 @@
> -common_ss.add(when: libbpf, if_true: files('ebpf.c', 'ebpf_rss.c'), 
> if_false: files('ebpf_rss-stub.c'))
> +system_ss.add(when: libbpf, if_true: files('ebpf.c', 'ebpf_rss.c'), 
> if_false: files('ebpf_rss-stub.c'))
> --
> 2.41.0
>




Re: [PATCH] vhost: don't set vring call if no enabled msix

2024-04-08 Thread Jason Wang
On Mon, Apr 8, 2024 at 2:09 PM lyx634449800  wrote:
>
> When conducting performance testing using testpmd in the guest os,
> it was observed that the performance was lower compared to the
> scenario of direct vfio-pci usage.
>
> In the virtual machine operating system, even if the virtio device
> does not use msix interrupts, vhost still sets vring call fd. This
> leads to unnecessary performance overhead. If the guest driver does
> not enable msix capability (e.g virtio-net pmd), we should also
> check and clear the vring call fd.
>
> Signed-off-by: Yuxue Liu 

Patch looks good, I would like to do the following tweaks:

1) explain what is not enough since commit:

commit 96a3d98d2cdbd897ff5ab33427aa4cfb94077665
Author: Jason Wang 
Date:   Mon Aug 1 16:07:58 2016 +0800

vhost: don't set vring call if no vector

2) tweak the title to "vhost: don't set vring call if guest notifiers
is not enabled" as it's not necessarily pci but also ccw.

Thanks

> ---
>  hw/virtio/vhost.c | 16 +---
>  1 file changed, 9 insertions(+), 7 deletions(-)
>
> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> index f50180e60e..b972c84e67 100644
> --- a/hw/virtio/vhost.c
> +++ b/hw/virtio/vhost.c
> @@ -1266,13 +1266,15 @@ int vhost_virtqueue_start(struct vhost_dev *dev,
>  vhost_virtqueue_mask(dev, vdev, idx, false);
>  }
>
> -if (k->query_guest_notifiers &&
> -k->query_guest_notifiers(qbus->parent) &&
> -virtio_queue_vector(vdev, idx) == VIRTIO_NO_VECTOR) {
> -file.fd = -1;
> -r = dev->vhost_ops->vhost_set_vring_call(dev, );
> -if (r) {
> -goto fail_vector;
> +if (k->query_guest_notifiers) {
> +if (!k->query_guest_notifiers(qbus->parent) ||
> +(k->query_guest_notifiers(qbus->parent) &&
> +virtio_queue_vector(vdev, idx) == VIRTIO_NO_VECTOR)) {
> +file.fd = -1;
> +r = dev->vhost_ops->vhost_set_vring_call(dev, );
> +if (r) {
> +goto fail_vector;
> +}
>  }
>  }
>
> --
> 2.43.0
>




Re: [PATCH 1/1] virtio-pci: Fix the crash when the vector changes back from VIRTIO_NO_VECTOR

2024-04-07 Thread Jason Wang
On Sun, Apr 7, 2024 at 3:00 PM Cindy Lu  wrote:
>
> On Sun, Apr 7, 2024 at 12:20 PM Jason Wang  wrote:
> >
> > On Tue, Apr 2, 2024 at 11:02 PM Cindy Lu  wrote:
> > >
> > > When the guest calls virtio_stop and then virtio_reset,
> >
> > Guests could not call those functions directly, it is triggered by for
> > example writing to some of the registers like reset or others.
> >
> sure , Will fix this
> > > the vector will change
> > > to VIRTIO_NO_VECTOR and the IRQFD for this vector will be released. After 
> > > that
> > > If you want to change the vector back,
> >
> > What do you mean by "change the vector back"? Something like
> >
> > assign VIRTIO_MSI_NO_VECTOR to vector 0
> > assign X to vector 0
> >
> yes, the process is something  like
> 
> set config_vector = VIRTIO_MSI_NO_VECTOR
> ...
> set config_vector = 0
> > And I guess what you meant is to configure the vector after DRIVER_OK.
>
> >
> >
> > > it will cause a crash.
> > >
> > > To fix this, we need to call the function 
> > > "kvm_virtio_pci_vector_use_one()"
> > > when the vector changes back from VIRTIO_NO_VECTOR
> > >
> > > Signed-off-by: Cindy Lu 
> > > ---
> > >  hw/virtio/virtio-pci.c | 31 ---
> > >  1 file changed, 28 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
> > > index e433879542..45f3ab38c3 100644
> > > --- a/hw/virtio/virtio-pci.c
> > > +++ b/hw/virtio/virtio-pci.c
> > > @@ -874,12 +874,14 @@ static int virtio_pci_get_notifier(VirtIOPCIProxy 
> > > *proxy, int queue_no,
> > >  return 0;
> > >  }
> > >
> > > -static int kvm_virtio_pci_vector_use_one(VirtIOPCIProxy *proxy, int 
> > > queue_no)
> > > +static int kvm_virtio_pci_vector_use_one(VirtIOPCIProxy *proxy, int 
> > > queue_no,
> > > + bool recovery)
> > >  {
> > >  unsigned int vector;
> > >  int ret;
> > >  EventNotifier *n;
> > >  PCIDevice *dev = >pci_dev;
> > > +VirtIOIRQFD *irqfd;
> > >  VirtIODevice *vdev = virtio_bus_get_device(>bus);
> > >  VirtioDeviceClass *k = VIRTIO_DEVICE_GET_CLASS(vdev);
> > >
> > > @@ -890,10 +892,21 @@ static int 
> > > kvm_virtio_pci_vector_use_one(VirtIOPCIProxy *proxy, int queue_no)
> > >  if (vector >= msix_nr_vectors_allocated(dev)) {
> > >  return 0;
> > >  }
> > > +/*
> > > + * if this is recovery and irqfd still in use, means the irqfd was 
> > > not
> > > + * release before and don't need to set up again
> > > + */
> > > +if (recovery) {
> > > +irqfd = >vector_irqfd[vector];
> > > +if (irqfd->users != 0) {
> > > +return 0;
> > > +}
> > > +}
> > >  ret = kvm_virtio_pci_vq_vector_use(proxy, vector);
> > >  if (ret < 0) {
> > >  goto undo;
> > >  }
> > > +
> > >  /*
> > >   * If guest supports masking, set up irqfd now.
> > >   * Otherwise, delay until unmasked in the frontend.
> > > @@ -932,14 +945,14 @@ static int 
> > > kvm_virtio_pci_vector_vq_use(VirtIOPCIProxy *proxy, int nvqs)
> > >  if (!virtio_queue_get_num(vdev, queue_no)) {
> > >  return -1;
> > >  }
> > > -ret = kvm_virtio_pci_vector_use_one(proxy, queue_no);
> > > +ret = kvm_virtio_pci_vector_use_one(proxy, queue_no, false);
> > >  }
> > >  return ret;
> > >  }
> > >
> > >  static int kvm_virtio_pci_vector_config_use(VirtIOPCIProxy *proxy)
> > >  {
> > > -return kvm_virtio_pci_vector_use_one(proxy, VIRTIO_CONFIG_IRQ_IDX);
> > > +return kvm_virtio_pci_vector_use_one(proxy, VIRTIO_CONFIG_IRQ_IDX, 
> > > false);
> > >  }
> > >
> > >  static void kvm_virtio_pci_vector_release_one(VirtIOPCIProxy *proxy,
> > > @@ -1570,7 +1583,13 @@ static void virtio_pci_common_write(void *opaque, 
> > > hwaddr addr,
> > >  } else {
> > >  val = VIRTIO_NO_VECTOR;
> > >  }
> > > +vector = vdev->config_vector;
> > >  vdev->config_vector = val;
> > > +/*che

Re: [RFC QEMU PATCH v8 2/2] virtio-pci: implement No_Soft_Reset bit

2024-04-07 Thread Jason Wang
On Sun, Apr 7, 2024 at 7:50 PM Michael S. Tsirkin  wrote:
>
> On Sun, Apr 07, 2024 at 11:20:57AM +0800, Jason Wang wrote:
> > On Tue, Apr 2, 2024 at 11:03 AM Chen, Jiqian  wrote:
> > >
> > > On 2024/3/29 18:44, Michael S. Tsirkin wrote:
> > > > On Fri, Mar 29, 2024 at 03:20:59PM +0800, Jason Wang wrote:
> > > >> On Fri, Mar 29, 2024 at 3:07 PM Chen, Jiqian  
> > > >> wrote:
> > > >>>
> > > >>> On 2024/3/28 20:36, Michael S. Tsirkin wrote:
> > > >>>>>>> +}
> > > >>>>>>> +
> > > >>>>>>>  static void virtio_pci_bus_reset_hold(Object *obj)
> > > >>>>>>>  {
> > > >>>>>>>  PCIDevice *dev = PCI_DEVICE(obj);
> > > >>>>>>>  DeviceState *qdev = DEVICE(obj);
> > > >>>>>>>
> > > >>>>>>> +if (virtio_pci_no_soft_reset(dev)) {
> > > >>>>>>> +return;
> > > >>>>>>> +}
> > > >>>>>>> +
> > > >>>>>>>  virtio_pci_reset(qdev);
> > > >>>>>>>
> > > >>>>>>>  if (pci_is_express(dev)) {
> > > >>>>>>> @@ -2484,6 +2511,8 @@ static Property virtio_pci_properties[] = {
> > > >>>>>>>  VIRTIO_PCI_FLAG_INIT_LNKCTL_BIT, true),
> > > >>>>>>>  DEFINE_PROP_BIT("x-pcie-pm-init", VirtIOPCIProxy, flags,
> > > >>>>>>>  VIRTIO_PCI_FLAG_INIT_PM_BIT, true),
> > > >>>>>>> +DEFINE_PROP_BIT("x-pcie-pm-no-soft-reset", VirtIOPCIProxy, 
> > > >>>>>>> flags,
> > > >>>>>>> +VIRTIO_PCI_FLAG_PM_NO_SOFT_RESET_BIT, false),
> > > >>
> > > >> Why does it come with an x prefix?
> > > >>
> > > >>>>>>>  DEFINE_PROP_BIT("x-pcie-flr-init", VirtIOPCIProxy, flags,
> > > >>>>>>>  VIRTIO_PCI_FLAG_INIT_FLR_BIT, true),
> > > >>>>>>>  DEFINE_PROP_BIT("aer", VirtIOPCIProxy, flags,
> > > >>>>>>
> > > >>>>>> I am a bit confused about this part.
> > > >>>>>> Do you want to make this software controllable?
> > > >>>>> Yes, because even the real hardware, this bit is not always set.
> > > >>
> > > >> We are talking about emulated devices here.
> > > >>
> > > >>>>
> > > >>>> So which virtio devices should and which should not set this bit?
> > > >>> This depends on the scenario the virtio-device is used, if we want to 
> > > >>> trigger an internal soft reset for the virtio-device during S3, this 
> > > >>> bit shouldn't be set.
> > > >>
> > > >> If the device doesn't need reset, why bother the driver for this?
> > > >>
> > > >> Btw, no_soft_reset is insufficient for some cases, there's a proposal
> > > >> for the virtio-spec. I think we need to wait until it is done.
> > > >
> > > > That seems orthogonal or did I miss something?
> > > Yes, I looked the detail of the proposal, I also think they are unrelated.
> >
> > The point is the proposal said
> >
> > """
> > Without a mechanism to
> > suspend/resume virtio devices when the driver is suspended/resumed in
> > the early phase of suspend/late phase of resume, there is a window where
> > interrupts can be lost.
> > """
> >
> > It looks safe to enable it with the suspend bit. Or if you think it's
> > wrong, please comment on the virtio spec patch.
> >
> > > I will set the default value of No_Soft_Reset bit to true in next version 
> > > according to your opinion.
> > > About the compatibility of old machine types, which types should I 
> > > consider? Does the same as x-pcie-pm-init(hw_compat_2_8)?
> > > Forgive me for not knowing much about compatibility.
> >
> > "x" means no compatibility at all, please drop the "x" prefix. And it
> > looks more safe to start as "false" by default.
> >
> > Thanks
>
>
> Not sure I agree. External flags are for when users want to tweak them.
> When would users want it to be off?

If I understand the suspending status proposal correctly, there are at
least 1 device that is not safe. And this series does not mention
which device it has tested.

It means if we enable it unconditionally, guests may break.

Thanks

> What is done here is I feel sane, just add machine compat machinery
> to change to off for old machine types.
>
>
> > > >
> > > >>> In my use case on my environment, I don't want to reset virtio-gpu 
> > > >>> during S3,
> > > >>> because once the display resources are destroyed, there are not 
> > > >>> enough information to re-create them, so this bit should be set.
> > > >>> Making this bit software controllable is convenient for users to take 
> > > >>> their own choices.
> > > >>
> > > >> Thanks
> > > >>
> > > >>>
> > > >>>>
> > > >>>>>> Or should this be set to true by default and then
> > > >>>>>> changed to false for old machine types?
> > > >>>>> How can I do so?
> > > >>>>> Do you mean set this to true by default, and if old machine types 
> > > >>>>> don't need this bit, they can pass false config to qemu when 
> > > >>>>> running qemu?
> > > >>>>
> > > >>>> No, you would use compat machinery. See how is x-pcie-flr-init 
> > > >>>> handled.
> > > >>>>
> > > >>>>
> > > >>>
> > > >>> --
> > > >>> Best regards,
> > > >>> Jiqian Chen.
> > > >
> > >
> > > --
> > > Best regards,
> > > Jiqian Chen.
>




Re: [PATCH 1/1] virtio-pci: Fix the crash when the vector changes back from VIRTIO_NO_VECTOR

2024-04-07 Thread Jason Wang
On Sun, Apr 7, 2024 at 7:53 PM Michael S. Tsirkin  wrote:
>
> On Sun, Apr 07, 2024 at 12:19:57PM +0800, Jason Wang wrote:
> > On Tue, Apr 2, 2024 at 11:02 PM Cindy Lu  wrote:
> > >
> > > When the guest calls virtio_stop and then virtio_reset,
> >
> > Guests could not call those functions directly, it is triggered by for
> > example writing to some of the registers like reset or others.
> >
> > > the vector will change
> > > to VIRTIO_NO_VECTOR and the IRQFD for this vector will be released. After 
> > > that
> > > If you want to change the vector back,
> >
> > What do you mean by "change the vector back"? Something like
> >
> > assign VIRTIO_MSI_NO_VECTOR to vector 0
> > assign X to vector 0
> >
> > And I guess what you meant is to configure the vector after DRIVER_OK.
> >
> >
> > > it will cause a crash.
> > >
> > > To fix this, we need to call the function 
> > > "kvm_virtio_pci_vector_use_one()"
> > > when the vector changes back from VIRTIO_NO_VECTOR
> > >
> > > Signed-off-by: Cindy Lu 
> > > ---
> > >  hw/virtio/virtio-pci.c | 31 ---
> > >  1 file changed, 28 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
> > > index e433879542..45f3ab38c3 100644
> > > --- a/hw/virtio/virtio-pci.c
> > > +++ b/hw/virtio/virtio-pci.c
> > > @@ -874,12 +874,14 @@ static int virtio_pci_get_notifier(VirtIOPCIProxy 
> > > *proxy, int queue_no,
> > >  return 0;
> > >  }
> > >
> > > -static int kvm_virtio_pci_vector_use_one(VirtIOPCIProxy *proxy, int 
> > > queue_no)
> > > +static int kvm_virtio_pci_vector_use_one(VirtIOPCIProxy *proxy, int 
> > > queue_no,
> > > + bool recovery)
> > >  {
> > >  unsigned int vector;
> > >  int ret;
> > >  EventNotifier *n;
> > >  PCIDevice *dev = >pci_dev;
> > > +VirtIOIRQFD *irqfd;
> > >  VirtIODevice *vdev = virtio_bus_get_device(>bus);
> > >  VirtioDeviceClass *k = VIRTIO_DEVICE_GET_CLASS(vdev);
> > >
> > > @@ -890,10 +892,21 @@ static int 
> > > kvm_virtio_pci_vector_use_one(VirtIOPCIProxy *proxy, int queue_no)
> > >  if (vector >= msix_nr_vectors_allocated(dev)) {
> > >  return 0;
> > >  }
> > > +/*
> > > + * if this is recovery and irqfd still in use, means the irqfd was 
> > > not
> > > + * release before and don't need to set up again
> > > + */
> > > +if (recovery) {
> > > +irqfd = >vector_irqfd[vector];
> > > +if (irqfd->users != 0) {
> > > +return 0;
> > > +}
> > > +}
> > >  ret = kvm_virtio_pci_vq_vector_use(proxy, vector);
> > >  if (ret < 0) {
> > >  goto undo;
> > >  }
> > > +
> > >  /*
> > >   * If guest supports masking, set up irqfd now.
> > >   * Otherwise, delay until unmasked in the frontend.
> > > @@ -932,14 +945,14 @@ static int 
> > > kvm_virtio_pci_vector_vq_use(VirtIOPCIProxy *proxy, int nvqs)
> > >  if (!virtio_queue_get_num(vdev, queue_no)) {
> > >  return -1;
> > >  }
> > > -ret = kvm_virtio_pci_vector_use_one(proxy, queue_no);
> > > +ret = kvm_virtio_pci_vector_use_one(proxy, queue_no, false);
> > >  }
> > >  return ret;
> > >  }
> > >
> > >  static int kvm_virtio_pci_vector_config_use(VirtIOPCIProxy *proxy)
> > >  {
> > > -return kvm_virtio_pci_vector_use_one(proxy, VIRTIO_CONFIG_IRQ_IDX);
> > > +return kvm_virtio_pci_vector_use_one(proxy, VIRTIO_CONFIG_IRQ_IDX, 
> > > false);
> > >  }
> > >
> > >  static void kvm_virtio_pci_vector_release_one(VirtIOPCIProxy *proxy,
> > > @@ -1570,7 +1583,13 @@ static void virtio_pci_common_write(void *opaque, 
> > > hwaddr addr,
> > >  } else {
> > >  val = VIRTIO_NO_VECTOR;
> > >  }
> > > +vector = vdev->config_vector;
> > >  vdev->config_vector = val;
> > > +/*check if the vector need to recovery*/
> > > +if ((val != VIRTIO_NO_VECTOR) && (vector == VIRTIO_NO_VECTOR) &&
> > > + 

Re: [PATCH 1/1] virtio-pci: Fix the crash when the vector changes back from VIRTIO_NO_VECTOR

2024-04-06 Thread Jason Wang
On Tue, Apr 2, 2024 at 11:02 PM Cindy Lu  wrote:
>
> When the guest calls virtio_stop and then virtio_reset,

Guests could not call those functions directly, it is triggered by for
example writing to some of the registers like reset or others.

> the vector will change
> to VIRTIO_NO_VECTOR and the IRQFD for this vector will be released. After that
> If you want to change the vector back,

What do you mean by "change the vector back"? Something like

assign VIRTIO_MSI_NO_VECTOR to vector 0
assign X to vector 0

And I guess what you meant is to configure the vector after DRIVER_OK.


> it will cause a crash.
>
> To fix this, we need to call the function "kvm_virtio_pci_vector_use_one()"
> when the vector changes back from VIRTIO_NO_VECTOR
>
> Signed-off-by: Cindy Lu 
> ---
>  hw/virtio/virtio-pci.c | 31 ---
>  1 file changed, 28 insertions(+), 3 deletions(-)
>
> diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
> index e433879542..45f3ab38c3 100644
> --- a/hw/virtio/virtio-pci.c
> +++ b/hw/virtio/virtio-pci.c
> @@ -874,12 +874,14 @@ static int virtio_pci_get_notifier(VirtIOPCIProxy 
> *proxy, int queue_no,
>  return 0;
>  }
>
> -static int kvm_virtio_pci_vector_use_one(VirtIOPCIProxy *proxy, int queue_no)
> +static int kvm_virtio_pci_vector_use_one(VirtIOPCIProxy *proxy, int queue_no,
> + bool recovery)
>  {
>  unsigned int vector;
>  int ret;
>  EventNotifier *n;
>  PCIDevice *dev = >pci_dev;
> +VirtIOIRQFD *irqfd;
>  VirtIODevice *vdev = virtio_bus_get_device(>bus);
>  VirtioDeviceClass *k = VIRTIO_DEVICE_GET_CLASS(vdev);
>
> @@ -890,10 +892,21 @@ static int kvm_virtio_pci_vector_use_one(VirtIOPCIProxy 
> *proxy, int queue_no)
>  if (vector >= msix_nr_vectors_allocated(dev)) {
>  return 0;
>  }
> +/*
> + * if this is recovery and irqfd still in use, means the irqfd was not
> + * release before and don't need to set up again
> + */
> +if (recovery) {
> +irqfd = >vector_irqfd[vector];
> +if (irqfd->users != 0) {
> +return 0;
> +}
> +}
>  ret = kvm_virtio_pci_vq_vector_use(proxy, vector);
>  if (ret < 0) {
>  goto undo;
>  }
> +
>  /*
>   * If guest supports masking, set up irqfd now.
>   * Otherwise, delay until unmasked in the frontend.
> @@ -932,14 +945,14 @@ static int kvm_virtio_pci_vector_vq_use(VirtIOPCIProxy 
> *proxy, int nvqs)
>  if (!virtio_queue_get_num(vdev, queue_no)) {
>  return -1;
>  }
> -ret = kvm_virtio_pci_vector_use_one(proxy, queue_no);
> +ret = kvm_virtio_pci_vector_use_one(proxy, queue_no, false);
>  }
>  return ret;
>  }
>
>  static int kvm_virtio_pci_vector_config_use(VirtIOPCIProxy *proxy)
>  {
> -return kvm_virtio_pci_vector_use_one(proxy, VIRTIO_CONFIG_IRQ_IDX);
> +return kvm_virtio_pci_vector_use_one(proxy, VIRTIO_CONFIG_IRQ_IDX, 
> false);
>  }
>
>  static void kvm_virtio_pci_vector_release_one(VirtIOPCIProxy *proxy,
> @@ -1570,7 +1583,13 @@ static void virtio_pci_common_write(void *opaque, 
> hwaddr addr,
>  } else {
>  val = VIRTIO_NO_VECTOR;
>  }
> +vector = vdev->config_vector;
>  vdev->config_vector = val;
> +/*check if the vector need to recovery*/
> +if ((val != VIRTIO_NO_VECTOR) && (vector == VIRTIO_NO_VECTOR) &&
> +(vdev->status & VIRTIO_CONFIG_S_DRIVER_OK)) {
> +kvm_virtio_pci_vector_use_one(proxy, VIRTIO_CONFIG_IRQ_IDX, 
> true);
> +}

This looks too tricky.

Think hard of this. I think it's better to split this into two parts:

1) a series that disables config irqfd for vhost-net, this series
needs to be backported to -stable which needs to be conservative. It
looks more like your V1, but let's add a boolean for pci proxy.
2) a series that deal with the msix vector configuration after
driver_ok, we probably need some refactoring to do per vq use instead
of the current loop in DRIVER_OK

Does this make sense?

Thanks

>  break;
>  case VIRTIO_PCI_COMMON_STATUS:
>  if (!(val & VIRTIO_CONFIG_S_DRIVER_OK)) {
> @@ -1611,6 +1630,12 @@ static void virtio_pci_common_write(void *opaque, 
> hwaddr addr,
>  val = VIRTIO_NO_VECTOR;
>  }
>  virtio_queue_set_vector(vdev, vdev->queue_sel, val);
> +
> +/*check if the vector need to recovery*/
> +if ((val != VIRTIO_NO_VECTOR) && (vector == VIRTIO_NO_VECTOR) &&
> +(vdev->status & VIRTIO_CONFIG_S_DRIVER_OK)) {
> +kvm_virtio_pci_vector_use_one(proxy, vdev->queue_sel, true);
> +}
>  break;
>  case VIRTIO_PCI_COMMON_Q_ENABLE:
>  if (val == 1) {
> --
> 2.43.0
>




Re: [PATCH] vdpa-dev: Fix the issue of device status not updating when configuration interruption is triggered

2024-04-06 Thread Jason Wang
On Sun, Apr 7, 2024 at 11:22 AM lyx634449800  wrote:
>
> The set_config callback function vhost_vdpa_device_get_config in
> vdpa-dev does not fetch the current device status from the hardware
> device, causing the GUEST OS to not receive the latest device status

nit: no need for upper case here.

> information.
>
> The hardware updates the config status of the vdpa device and then
> notifies the OS. The GUEST OS receives an interrupt notification,
> triggering a get_config access in the kernel, which then enters qemu
> internally. Ultimately, the vhost_vdpa_device_get_config function of
> vdpa-dev is called
>
> One scenario encountered is when the device needs to bring down the
> vdpa net device. After modifying the status field of virtio_net_config
> in the hardware, it sends an interrupt notification. However, the guest
> OS always receives the STATUS field as VIRTIO_NET_S_LINK_UP.
>
> Signed-off-by: Yuxue Liu 

This aligns with the vhost-net support for vDPA.

Acked-by: Jason Wang 

Thanks

> ---
>  hw/virtio/vdpa-dev.c | 7 +++
>  1 file changed, 7 insertions(+)
>
> diff --git a/hw/virtio/vdpa-dev.c b/hw/virtio/vdpa-dev.c
> index 13e87f06f6..64b96b226c 100644
> --- a/hw/virtio/vdpa-dev.c
> +++ b/hw/virtio/vdpa-dev.c
> @@ -195,7 +195,14 @@ static void
>  vhost_vdpa_device_get_config(VirtIODevice *vdev, uint8_t *config)
>  {
>  VhostVdpaDevice *s = VHOST_VDPA_DEVICE(vdev);
> +int ret;
>
> +ret = vhost_dev_get_config(>dev, s->config, s->config_size,
> +NULL);
> +if (ret < 0) {
> +error_report("get device config space failed");
> +return;
> +}
>  memcpy(config, s->config, s->config_size);
>  }
>
> --
> 2.43.0
>




Re: [RFC QEMU PATCH v8 2/2] virtio-pci: implement No_Soft_Reset bit

2024-04-06 Thread Jason Wang
On Tue, Apr 2, 2024 at 11:03 AM Chen, Jiqian  wrote:
>
> On 2024/3/29 18:44, Michael S. Tsirkin wrote:
> > On Fri, Mar 29, 2024 at 03:20:59PM +0800, Jason Wang wrote:
> >> On Fri, Mar 29, 2024 at 3:07 PM Chen, Jiqian  wrote:
> >>>
> >>> On 2024/3/28 20:36, Michael S. Tsirkin wrote:
> >>>>>>> +}
> >>>>>>> +
> >>>>>>>  static void virtio_pci_bus_reset_hold(Object *obj)
> >>>>>>>  {
> >>>>>>>  PCIDevice *dev = PCI_DEVICE(obj);
> >>>>>>>  DeviceState *qdev = DEVICE(obj);
> >>>>>>>
> >>>>>>> +if (virtio_pci_no_soft_reset(dev)) {
> >>>>>>> +return;
> >>>>>>> +}
> >>>>>>> +
> >>>>>>>  virtio_pci_reset(qdev);
> >>>>>>>
> >>>>>>>  if (pci_is_express(dev)) {
> >>>>>>> @@ -2484,6 +2511,8 @@ static Property virtio_pci_properties[] = {
> >>>>>>>  VIRTIO_PCI_FLAG_INIT_LNKCTL_BIT, true),
> >>>>>>>  DEFINE_PROP_BIT("x-pcie-pm-init", VirtIOPCIProxy, flags,
> >>>>>>>  VIRTIO_PCI_FLAG_INIT_PM_BIT, true),
> >>>>>>> +DEFINE_PROP_BIT("x-pcie-pm-no-soft-reset", VirtIOPCIProxy, flags,
> >>>>>>> +VIRTIO_PCI_FLAG_PM_NO_SOFT_RESET_BIT, false),
> >>
> >> Why does it come with an x prefix?
> >>
> >>>>>>>  DEFINE_PROP_BIT("x-pcie-flr-init", VirtIOPCIProxy, flags,
> >>>>>>>  VIRTIO_PCI_FLAG_INIT_FLR_BIT, true),
> >>>>>>>  DEFINE_PROP_BIT("aer", VirtIOPCIProxy, flags,
> >>>>>>
> >>>>>> I am a bit confused about this part.
> >>>>>> Do you want to make this software controllable?
> >>>>> Yes, because even the real hardware, this bit is not always set.
> >>
> >> We are talking about emulated devices here.
> >>
> >>>>
> >>>> So which virtio devices should and which should not set this bit?
> >>> This depends on the scenario the virtio-device is used, if we want to 
> >>> trigger an internal soft reset for the virtio-device during S3, this bit 
> >>> shouldn't be set.
> >>
> >> If the device doesn't need reset, why bother the driver for this?
> >>
> >> Btw, no_soft_reset is insufficient for some cases, there's a proposal
> >> for the virtio-spec. I think we need to wait until it is done.
> >
> > That seems orthogonal or did I miss something?
> Yes, I looked the detail of the proposal, I also think they are unrelated.

The point is the proposal said

"""
Without a mechanism to
suspend/resume virtio devices when the driver is suspended/resumed in
the early phase of suspend/late phase of resume, there is a window where
interrupts can be lost.
"""

It looks safe to enable it with the suspend bit. Or if you think it's
wrong, please comment on the virtio spec patch.

> I will set the default value of No_Soft_Reset bit to true in next version 
> according to your opinion.
> About the compatibility of old machine types, which types should I consider? 
> Does the same as x-pcie-pm-init(hw_compat_2_8)?
> Forgive me for not knowing much about compatibility.

"x" means no compatibility at all, please drop the "x" prefix. And it
looks more safe to start as "false" by default.

Thanks

> >
> >>> In my use case on my environment, I don't want to reset virtio-gpu during 
> >>> S3,
> >>> because once the display resources are destroyed, there are not enough 
> >>> information to re-create them, so this bit should be set.
> >>> Making this bit software controllable is convenient for users to take 
> >>> their own choices.
> >>
> >> Thanks
> >>
> >>>
> >>>>
> >>>>>> Or should this be set to true by default and then
> >>>>>> changed to false for old machine types?
> >>>>> How can I do so?
> >>>>> Do you mean set this to true by default, and if old machine types don't 
> >>>>> need this bit, they can pass false config to qemu when running qemu?
> >>>>
> >>>> No, you would use compat machinery. See how is x-pcie-flr-init handled.
> >>>>
> >>>>
> >>>
> >>> --
> >>> Best regards,
> >>> Jiqian Chen.
> >
>
> --
> Best regards,
> Jiqian Chen.




Re: [RFC QEMU PATCH v8 2/2] virtio-pci: implement No_Soft_Reset bit

2024-03-29 Thread Jason Wang
On Fri, Mar 29, 2024 at 4:00 PM Chen, Jiqian  wrote:
>
> On 2024/3/29 15:20, Jason Wang wrote:
> > On Fri, Mar 29, 2024 at 3:07 PM Chen, Jiqian  wrote:
> >>
> >> On 2024/3/28 20:36, Michael S. Tsirkin wrote:
> >>>>>> +}
> >>>>>> +
> >>>>>>  static void virtio_pci_bus_reset_hold(Object *obj)
> >>>>>>  {
> >>>>>>  PCIDevice *dev = PCI_DEVICE(obj);
> >>>>>>  DeviceState *qdev = DEVICE(obj);
> >>>>>>
> >>>>>> +if (virtio_pci_no_soft_reset(dev)) {
> >>>>>> +return;
> >>>>>> +}
> >>>>>> +
> >>>>>>  virtio_pci_reset(qdev);
> >>>>>>
> >>>>>>  if (pci_is_express(dev)) {
> >>>>>> @@ -2484,6 +2511,8 @@ static Property virtio_pci_properties[] = {
> >>>>>>  VIRTIO_PCI_FLAG_INIT_LNKCTL_BIT, true),
> >>>>>>  DEFINE_PROP_BIT("x-pcie-pm-init", VirtIOPCIProxy, flags,
> >>>>>>  VIRTIO_PCI_FLAG_INIT_PM_BIT, true),
> >>>>>> +DEFINE_PROP_BIT("x-pcie-pm-no-soft-reset", VirtIOPCIProxy, flags,
> >>>>>> +VIRTIO_PCI_FLAG_PM_NO_SOFT_RESET_BIT, false),
> >
> > Why does it come with an x prefix?
> Sorry, it's my misunderstanding of this prefix, if No_Soft_Reset doesn't need 
> this prefix, I will delete it in next version.
> Does x prefix means compat machinery? Or other meanings?
>
> >
> >>>>>>  DEFINE_PROP_BIT("x-pcie-flr-init", VirtIOPCIProxy, flags,
> >>>>>>  VIRTIO_PCI_FLAG_INIT_FLR_BIT, true),
> >>>>>>  DEFINE_PROP_BIT("aer", VirtIOPCIProxy, flags,
> >>>>>
> >>>>> I am a bit confused about this part.
> >>>>> Do you want to make this software controllable?
> >>>> Yes, because even the real hardware, this bit is not always set.
> >
> > We are talking about emulated devices here.
> Yes, I just gave an example. It actually this bit is not always set. What's 
> your opinion about when to set this bit or which virtio-device should set 
> this bit?

If the implementation of Qemu is correct, we should set it unless we
need compatibility.

>
> >
> >>>
> >>> So which virtio devices should and which should not set this bit?
> >> This depends on the scenario the virtio-device is used, if we want to 
> >> trigger an internal soft reset for the virtio-device during S3, this bit 
> >> shouldn't be set.
> >
> > If the device doesn't need reset, why bother the driver for this?
> I don't know what you mean.
> If the device doesn't need reset, we can config true to set this bit, then on 
> the driver side, driver finds this bit is set, then driver will not trigger a 
> soft reset.

I mean if the device can suspend without reset, we don't need to
bother the driver to save and load states.

>
> >
> > Btw, no_soft_reset is insufficient for some cases,
> May I know which cases?
>
> > there's a proposal for the virtio-spec. I think we need to wait until it is 
> > done.
> Can you share the proposal?

See this

https://lore.kernel.org/all/20240227015345.3614965-1-steve...@chromium.org/T/

Thanks

>
> >
> >> In my use case on my environment, I don't want to reset virtio-gpu during 
> >> S3,
> >> because once the display resources are destroyed, there are not enough 
> >> information to re-create them, so this bit should be set.
> >> Making this bit software controllable is convenient for users to take 
> >> their own choices.
> >
> > Thanks
> >
> >>
> >>>
> >>>>> Or should this be set to true by default and then
> >>>>> changed to false for old machine types?
> >>>> How can I do so?
> >>>> Do you mean set this to true by default, and if old machine types don't 
> >>>> need this bit, they can pass false config to qemu when running qemu?
> >>>
> >>> No, you would use compat machinery. See how is x-pcie-flr-init handled.
> >>>
> >>>
> >>
> >> --
> >> Best regards,
> >> Jiqian Chen.
> >
>
> --
> Best regards,
> Jiqian Chen.




Re: [RFC QEMU PATCH v8 2/2] virtio-pci: implement No_Soft_Reset bit

2024-03-29 Thread Jason Wang
On Fri, Mar 29, 2024 at 3:07 PM Chen, Jiqian  wrote:
>
> On 2024/3/28 20:36, Michael S. Tsirkin wrote:
>  +}
>  +
>   static void virtio_pci_bus_reset_hold(Object *obj)
>   {
>   PCIDevice *dev = PCI_DEVICE(obj);
>   DeviceState *qdev = DEVICE(obj);
> 
>  +if (virtio_pci_no_soft_reset(dev)) {
>  +return;
>  +}
>  +
>   virtio_pci_reset(qdev);
> 
>   if (pci_is_express(dev)) {
>  @@ -2484,6 +2511,8 @@ static Property virtio_pci_properties[] = {
>   VIRTIO_PCI_FLAG_INIT_LNKCTL_BIT, true),
>   DEFINE_PROP_BIT("x-pcie-pm-init", VirtIOPCIProxy, flags,
>   VIRTIO_PCI_FLAG_INIT_PM_BIT, true),
>  +DEFINE_PROP_BIT("x-pcie-pm-no-soft-reset", VirtIOPCIProxy, flags,
>  +VIRTIO_PCI_FLAG_PM_NO_SOFT_RESET_BIT, false),

Why does it come with an x prefix?

>   DEFINE_PROP_BIT("x-pcie-flr-init", VirtIOPCIProxy, flags,
>   VIRTIO_PCI_FLAG_INIT_FLR_BIT, true),
>   DEFINE_PROP_BIT("aer", VirtIOPCIProxy, flags,
> >>>
> >>> I am a bit confused about this part.
> >>> Do you want to make this software controllable?
> >> Yes, because even the real hardware, this bit is not always set.

We are talking about emulated devices here.

> >
> > So which virtio devices should and which should not set this bit?
> This depends on the scenario the virtio-device is used, if we want to trigger 
> an internal soft reset for the virtio-device during S3, this bit shouldn't be 
> set.

If the device doesn't need reset, why bother the driver for this?

Btw, no_soft_reset is insufficient for some cases, there's a proposal
for the virtio-spec. I think we need to wait until it is done.

> In my use case on my environment, I don't want to reset virtio-gpu during S3,
> because once the display resources are destroyed, there are not enough 
> information to re-create them, so this bit should be set.
> Making this bit software controllable is convenient for users to take their 
> own choices.

Thanks

>
> >
> >>> Or should this be set to true by default and then
> >>> changed to false for old machine types?
> >> How can I do so?
> >> Do you mean set this to true by default, and if old machine types don't 
> >> need this bit, they can pass false config to qemu when running qemu?
> >
> > No, you would use compat machinery. See how is x-pcie-flr-init handled.
> >
> >
>
> --
> Best regards,
> Jiqian Chen.




[PULL 2/5] ebpf: Fix indirections table setting

2024-03-29 Thread Jason Wang
From: Akihiko Odaki 

The kernel documentation says:
> The value stored can be of any size, however, all array elements are
> aligned to 8 bytes.
https://www.kernel.org/doc/html/v6.8/bpf/map_array.html

Fixes: 333b3e5fab75 ("ebpf: Added eBPF map update through mmap.")
Signed-off-by: Akihiko Odaki 
Acked-by: Andrew Melnychenko 
Signed-off-by: Jason Wang 
---
 ebpf/ebpf_rss.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/ebpf/ebpf_rss.c b/ebpf/ebpf_rss.c
index 2e506f9743..d102f3dd09 100644
--- a/ebpf/ebpf_rss.c
+++ b/ebpf/ebpf_rss.c
@@ -185,13 +185,18 @@ static bool ebpf_rss_set_indirections_table(struct 
EBPFRSSContext *ctx,
 uint16_t *indirections_table,
 size_t len)
 {
+char *cursor = ctx->mmap_indirections_table;
+
 if (!ebpf_rss_is_loaded(ctx) || indirections_table == NULL ||
len > VIRTIO_NET_RSS_MAX_TABLE_LEN) {
 return false;
 }
 
-memcpy(ctx->mmap_indirections_table, indirections_table,
-sizeof(*indirections_table) * len);
+for (size_t i = 0; i < len; i++) {
+*(uint16_t *)cursor = indirections_table[i];
+cursor += 8;
+}
+
 return true;
 }
 
-- 
2.42.0




[PULL 3/5] hw/net/net_tx_pkt: Fix virtio header without checksum offloading

2024-03-29 Thread Jason Wang
From: Akihiko Odaki 

It is incorrect to have the VIRTIO_NET_HDR_F_NEEDS_CSUM set when
checksum offloading is disabled so clear the bit.

TCP/UDP checksum is usually offloaded when the peer requires virtio
headers because they can instruct the peer to compute checksum. However,
igb disables TX checksum offloading when a VF is enabled whether the
peer requires virtio headers because a transmitted packet can be routed
to it and it expects the packet has a proper checksum. Therefore, it
is necessary to have a correct virtio header even when checksum
offloading is disabled.

A real TCP/UDP checksum will be computed and saved in the buffer when
checksum offloading is disabled. The virtio specification requires to
set the packet checksum stored in the buffer to the TCP/UDP pseudo
header when the VIRTIO_NET_HDR_F_NEEDS_CSUM bit is set so the bit must
be cleared in that case.

Fixes: ffbd2dbd8e64 ("e1000e: Perform software segmentation for loopback")
Buglink: https://issues.redhat.com/browse/RHEL-23067
Signed-off-by: Akihiko Odaki 
Signed-off-by: Jason Wang 
---
 hw/net/net_tx_pkt.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/net/net_tx_pkt.c b/hw/net/net_tx_pkt.c
index 2e5f58b3c9..2134a18c4c 100644
--- a/hw/net/net_tx_pkt.c
+++ b/hw/net/net_tx_pkt.c
@@ -833,6 +833,7 @@ bool net_tx_pkt_send_custom(struct NetTxPkt *pkt, bool 
offload,
 
 if (offload || gso_type == VIRTIO_NET_HDR_GSO_NONE) {
 if (!offload && pkt->virt_hdr.flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) {
+pkt->virt_hdr.flags &= ~VIRTIO_NET_HDR_F_NEEDS_CSUM;
 net_tx_pkt_do_sw_csum(pkt, >vec[NET_TX_PKT_L2HDR_FRAG],
   pkt->payload_frags + 
NET_TX_PKT_PL_START_FRAG - 1,
   pkt->payload_len);
-- 
2.42.0




[PULL 4/5] tap-win32: Remove unnecessary stubs

2024-03-29 Thread Jason Wang
From: Akihiko Odaki 

Some of them are only necessary for POSIX systems. The others are
assigned to function pointers in NetClientInfo that can actually be
NULL.

Signed-off-by: Akihiko Odaki 
Signed-off-by: Jason Wang 
---
 net/tap-win32.c | 54 -
 1 file changed, 54 deletions(-)

diff --git a/net/tap-win32.c b/net/tap-win32.c
index 7b8b4be02c..7edbd71633 100644
--- a/net/tap-win32.c
+++ b/net/tap-win32.c
@@ -707,70 +707,16 @@ static void tap_win32_send(void *opaque)
 }
 }
 
-static bool tap_has_ufo(NetClientState *nc)
-{
-return false;
-}
-
-static bool tap_has_vnet_hdr(NetClientState *nc)
-{
-return false;
-}
-
-int tap_probe_vnet_hdr_len(int fd, int len)
-{
-return 0;
-}
-
-void tap_fd_set_vnet_hdr_len(int fd, int len)
-{
-}
-
-int tap_fd_set_vnet_le(int fd, int is_le)
-{
-return -EINVAL;
-}
-
-int tap_fd_set_vnet_be(int fd, int is_be)
-{
-return -EINVAL;
-}
-
-static void tap_using_vnet_hdr(NetClientState *nc, bool using_vnet_hdr)
-{
-}
-
-static void tap_set_offload(NetClientState *nc, int csum, int tso4,
- int tso6, int ecn, int ufo, int uso4, int uso6)
-{
-}
-
 struct vhost_net *tap_get_vhost_net(NetClientState *nc)
 {
 return NULL;
 }
 
-static bool tap_has_vnet_hdr_len(NetClientState *nc, int len)
-{
-return false;
-}
-
-static void tap_set_vnet_hdr_len(NetClientState *nc, int len)
-{
-abort();
-}
-
 static NetClientInfo net_tap_win32_info = {
 .type = NET_CLIENT_DRIVER_TAP,
 .size = sizeof(TAPState),
 .receive = tap_receive,
 .cleanup = tap_cleanup,
-.has_ufo = tap_has_ufo,
-.has_vnet_hdr = tap_has_vnet_hdr,
-.has_vnet_hdr_len = tap_has_vnet_hdr_len,
-.using_vnet_hdr = tap_using_vnet_hdr,
-.set_offload = tap_set_offload,
-.set_vnet_hdr_len = tap_set_vnet_hdr_len,
 };
 
 static int tap_win32_init(NetClientState *peer, const char *model,
-- 
2.42.0




[PULL 1/5] virtio-net: Fix vhost virtqueue notifiers for RSS

2024-03-29 Thread Jason Wang
From: Akihiko Odaki 

virtio_net_guest_notifier_pending() and virtio_net_guest_notifier_mask()
checked VIRTIO_NET_F_MQ to know there are multiple queues, but
VIRTIO_NET_F_RSS also enables multiple queues. Refer to n->multiqueue,
which is set to true either of VIRTIO_NET_F_MQ or VIRTIO_NET_F_RSS is
enabled.

Fixes: 68b0a6395f36 ("virtio-net: align ctrl_vq index for non-mq guest for 
vhost_vdpa")
Signed-off-by: Akihiko Odaki 
Signed-off-by: Jason Wang 
---
 hw/net/virtio-net.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 9959f1932b..a6ff000cd9 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -3426,7 +3426,7 @@ static bool 
virtio_net_guest_notifier_pending(VirtIODevice *vdev, int idx)
 VirtIONet *n = VIRTIO_NET(vdev);
 NetClientState *nc;
 assert(n->vhost_started);
-if (!virtio_vdev_has_feature(vdev, VIRTIO_NET_F_MQ) && idx == 2) {
+if (!n->multiqueue && idx == 2) {
 /* Must guard against invalid features and bogus queue index
  * from being set by malicious guest, or penetrated through
  * buggy migration stream.
@@ -3458,7 +3458,7 @@ static void virtio_net_guest_notifier_mask(VirtIODevice 
*vdev, int idx,
 VirtIONet *n = VIRTIO_NET(vdev);
 NetClientState *nc;
 assert(n->vhost_started);
-if (!virtio_vdev_has_feature(vdev, VIRTIO_NET_F_MQ) && idx == 2) {
+if (!n->multiqueue && idx == 2) {
 /* Must guard against invalid features and bogus queue index
  * from being set by malicious guest, or penetrated through
  * buggy migration stream.
-- 
2.42.0




[PULL 5/5] Revert "tap: setting error appropriately when calling net_init_tap_one()"

2024-03-29 Thread Jason Wang
From: Akihiko Odaki 

This reverts commit 46d4d36d0bf2b24b205f2f604f0905db80264eef.

The reverted commit changed to emit warnings instead of errors when
vhost is requested but vhost initialization fails if vhostforce option
is not set.

However, vhostforce is not meant to ignore vhost errors. It was once
introduced as an option to commit 5430a28fe4 ("vhost: force vhost off
for non-MSI guests") to force enabling vhost for non-MSI guests, which
will have worse performance with vhost. The option was deprecated with
commit 1e7398a140 ("vhost: enable vhost without without MSI-X") and
changed to behave identical with the vhost option for compatibility.

Worse, commit bf769f742c ("virtio: del net client if net_init_tap_one
failed") changed to delete the client when vhost fails even when the
failure only results in a warning. The leads to an assertion failure
for the -netdev command line option.

The reverted commit was intended to avoid that the vhost initialization
failure won't result in a corrupted netdev. This problem should have
been fixed by deleting netdev when the initialization fails instead of
ignoring the failure with an arbitrary option. Fortunately, commit
bf769f742c ("virtio: del net client if net_init_tap_one failed"),
mentioned earlier, implements this behavior.

Restore the correct semantics and fix the assertion failure for the
-netdev command line option by reverting the problematic commit.

Signed-off-by: Akihiko Odaki 
Signed-off-by: Jason Wang 
---
 include/net/vhost_net.h |  3 ---
 net/tap.c   | 22 +-
 2 files changed, 5 insertions(+), 20 deletions(-)

diff --git a/include/net/vhost_net.h b/include/net/vhost_net.h
index c37aba35e6..c6a5361a2a 100644
--- a/include/net/vhost_net.h
+++ b/include/net/vhost_net.h
@@ -4,9 +4,6 @@
 #include "net/net.h"
 #include "hw/virtio/vhost-backend.h"
 
-#define VHOST_NET_INIT_FAILED \
-"vhost-net requested but could not be initialized"
-
 struct vhost_net;
 typedef struct vhost_net VHostNetState;
 
diff --git a/net/tap.c b/net/tap.c
index c698b70475..baaa2f7a9a 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -743,11 +743,7 @@ static void net_init_tap_one(const NetdevTapOptions *tap, 
NetClientState *peer,
 if (vhostfdname) {
 vhostfd = monitor_fd_param(monitor_cur(), vhostfdname, );
 if (vhostfd == -1) {
-if (tap->has_vhostforce && tap->vhostforce) {
-error_propagate(errp, err);
-} else {
-warn_report_err(err);
-}
+error_propagate(errp, err);
 goto failed;
 }
 if (!g_unix_set_fd_nonblocking(vhostfd, true, NULL)) {
@@ -758,13 +754,8 @@ static void net_init_tap_one(const NetdevTapOptions *tap, 
NetClientState *peer,
 } else {
 vhostfd = open("/dev/vhost-net", O_RDWR);
 if (vhostfd < 0) {
-if (tap->has_vhostforce && tap->vhostforce) {
-error_setg_errno(errp, errno,
- "tap: open vhost char device failed");
-} else {
-warn_report("tap: open vhost char device failed: %s",
-strerror(errno));
-}
+error_setg_errno(errp, errno,
+ "tap: open vhost char device failed");
 goto failed;
 }
 if (!g_unix_set_fd_nonblocking(vhostfd, true, NULL)) {
@@ -777,11 +768,8 @@ static void net_init_tap_one(const NetdevTapOptions *tap, 
NetClientState *peer,
 
 s->vhost_net = vhost_net_init();
 if (!s->vhost_net) {
-if (tap->has_vhostforce && tap->vhostforce) {
-error_setg(errp, VHOST_NET_INIT_FAILED);
-} else {
-warn_report(VHOST_NET_INIT_FAILED);
-}
+error_setg(errp,
+   "vhost-net requested but could not be initialized");
 goto failed;
 }
 } else if (vhostfdname) {
-- 
2.42.0




[PULL 0/5] Net patches

2024-03-29 Thread Jason Wang
The following changes since commit 5012e522aca161be5c141596c66e5cc6082538a9:

  Update version for v9.0.0-rc1 release (2024-03-26 19:46:55 +)

are available in the Git repository at:

  https://github.com/jasowang/qemu.git tags/net-pull-request

for you to fetch changes up to d9b33018a0da51eddceb48c42345cfb351065f3e:

  Revert "tap: setting error appropriately when calling net_init_tap_one()" 
(2024-03-29 14:59:07 +0800)


-BEGIN PGP SIGNATURE-

iQEzBAABCAAdFiEEIV1G9IJGaJ7HfzVi7wSWWzmNYhEFAmYGZ7EACgkQ7wSWWzmN
YhHvxgf/SDEYYMlxU7PA1SfwlIYtUG8K1zQnwLXNY6ySCJuCn1IdVoITaUt3BtE5
OtrhKI8cW5WwL4qzkElWlL431vyqomGdmJQedF8agwoR2aIo24i/Ue09MHxJxXUB
ONEOv3bizDCYWUjz+PMHRdIbo0AiSNaUDnB8iY59yD6HZqSLVMDx8Ia2KVrzUKwc
nMuqkDsVIc3gwqFNPbTl3yqVt6k1x+vBCGQUg9BiKE3pkUcONhsJpBYYj4hlY9mn
/BPlQBcRUoLHQD7KGSUKVFSODHPYzDg7BsSz2+EpuZucRRI3VEyHlcB5A6LIVhrK
fpqd+80Fb7VE9CAxA2gFj7gh5uPJ1A==
=shO6
-END PGP SIGNATURE-


Akihiko Odaki (5):
  virtio-net: Fix vhost virtqueue notifiers for RSS
  ebpf: Fix indirections table setting
  hw/net/net_tx_pkt: Fix virtio header without checksum offloading
  tap-win32: Remove unnecessary stubs
  Revert "tap: setting error appropriately when calling net_init_tap_one()"

 ebpf/ebpf_rss.c |  9 +++--
 hw/net/net_tx_pkt.c |  1 +
 hw/net/virtio-net.c |  4 ++--
 include/net/vhost_net.h |  3 ---
 net/tap-win32.c | 54 -
 net/tap.c   | 22 +---
 6 files changed, 15 insertions(+), 78 deletions(-)




Re: [RFC 0/2] disable the configuration interrupt for the unsupported device

2024-03-28 Thread Jason Wang
On Fri, Mar 29, 2024 at 11:02 AM Cindy Lu  wrote:
>
> On Thu, Mar 28, 2024 at 12:12 PM Jason Wang  wrote:
> >
> > On Wed, Mar 27, 2024 at 5:33 PM Cindy Lu  wrote:
> > >
> > > On Wed, Mar 27, 2024 at 5:12 PM Jason Wang  wrote:
> > > >
> > > > On Wed, Mar 27, 2024 at 4:28 PM Cindy Lu  wrote:
> > > > >
> > > > > On Wed, Mar 27, 2024 at 3:54 PM Jason Wang  
> > > > > wrote:
> > > > > >
> > > > > > On Wed, Mar 27, 2024 at 2:03 PM Cindy Lu  wrote:
> > > > > > >
> > > > > > > On Wed, Mar 27, 2024 at 11:05 AM Jason Wang  
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > Hi Cindy:
> > > > > > > >
> > > > > > > > On Wed, Mar 27, 2024 at 9:29 AM Cindy Lu  
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > we need a crash in Non-standard image, here is the jira for 
> > > > > > > > > this https://issues.redhat.com/browse/RHEL-28522
> > > > > > > > > The root cause of the issue is that an IRQFD was used without 
> > > > > > > > > initialization..
> > > > > > > > >
> > > > > > > > > During the booting process of the Vyatta image, the behavior 
> > > > > > > > > of the called function in qemu is as follows:
> > > > > > > > >
> > > > > > > > > 1. vhost_net_stop() was called, this will call the function
> > > > > > > > > virtio_pci_set_guest_notifiers() with assgin= false, and
> > > > > > > > > virtio_pci_set_guest_notifiers() will release the irqfd for 
> > > > > > > > > vector 0
> > > > > > > >
> > > > > > > > Before vhost_net_stop(), do we know which vector is used by 
> > > > > > > > which queue?
> > > > > > > >
> > > > > > > before this stop, vdev->config_verctor is get from
> > > > > > > virtio_pci_common_read/virtio_pci_common_write
> > > > > > > it was set to vector 0
> > > > > >
> > > > > > I basically meant if vector 0 is shared with some virtqueues here.
> > > > > >
> > > > > Really sorry for this, vq's vector is 1,2, and will not share with the
> > > > > configure vector
> > > > > > > > >
> > > > > > > > > 2. virtio_reset() was called -->set configure vector to 
> > > > > > > > > VIRTIO_NO_VECTORt
> > > > > > > > >
> > > > > > > > > 3.vhost_net_start() was called (at this time the configure 
> > > > > > > > > vector is
> > > > > > > > > still VIRTIO_NO_VECTOR) and call 
> > > > > > > > > virtio_pci_set_guest_notifiers() with
> > > > > > > > > assgin= true, so the irqfd for vector 0 was not "init" during 
> > > > > > > > > this process
> > > > > > > >
> > > > > > > > How does the configure vector differ from the virtqueue vector 
> > > > > > > > here?
> > > > > > > >
> > > > > > > All the vectors are VIRTIO_NO_VECTOR (including vq). any
> > > > > > > msix_fire_vector_notifier()
> > > > > > > been called will cause the crash at this time.
> > > > > >
> > > > > > Won't virtio_pci_set_guest_notifiers() will try to allocate irqfd 
> > > > > > when
> > > > > > the assignment is true?
> > > > > >
> > > > > It will allocate, but  the vector is VIRTIO_NO_VECTOR (0x)
> > > > >
> > > > > then it will called kvm_virtio_pci_vector_use_one()
> > > > >
> > > > > in this function, there is a check for
> > > > >
> > > > > if (vector >= msix_nr_vectors_allocated(dev))
> > > > >
> > > > > { return 0; }
> > > > >
> > > > > So it will return.
> > > >
> > > > How about let's just fix this?
> > > >
> > > > Btw, it's better to explain in detail like the 

Re: [PATCH v2] hw/net/net_tx_pkt: Fix virtio header without checksum offloading

2024-03-27 Thread Jason Wang
On Wed, Mar 27, 2024 at 4:43 PM Akihiko Odaki  wrote:
>
> It is incorrect to have the VIRTIO_NET_HDR_F_NEEDS_CSUM set when
> checksum offloading is disabled so clear the bit.
>
> TCP/UDP checksum is usually offloaded when the peer requires virtio
> headers because they can instruct the peer to compute checksum. However,
> igb disables TX checksum offloading when a VF is enabled whether the
> peer requires virtio headers because a transmitted packet can be routed
> to it and it expects the packet has a proper checksum. Therefore, it
> is necessary to have a correct virtio header even when checksum
> offloading is disabled.
>
> A real TCP/UDP checksum will be computed and saved in the buffer when
> checksum offloading is disabled. The virtio specification requires to
> set the packet checksum stored in the buffer to the TCP/UDP pseudo
> header when the VIRTIO_NET_HDR_F_NEEDS_CSUM bit is set so the bit must
> be cleared in that case.
>
> Fixes: ffbd2dbd8e64 ("e1000e: Perform software segmentation for loopback")
> Buglink: https://issues.redhat.com/browse/RHEL-23067
> Signed-off-by: Akihiko Odaki 
> ---
> Changes in v2:
> - Dropped VIRTIO_NET_HDR_F_DATA_VALID. (Jason Wang)
> - Link to v1: 
> https://lore.kernel.org/r/20240324-tx-v1-1-a3b413574...@daynix.com

Queued.

Thanks

> ---
>  hw/net/net_tx_pkt.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/hw/net/net_tx_pkt.c b/hw/net/net_tx_pkt.c
> index 2e5f58b3c9cc..2134a18c4c90 100644
> --- a/hw/net/net_tx_pkt.c
> +++ b/hw/net/net_tx_pkt.c
> @@ -833,6 +833,7 @@ bool net_tx_pkt_send_custom(struct NetTxPkt *pkt, bool 
> offload,
>
>  if (offload || gso_type == VIRTIO_NET_HDR_GSO_NONE) {
>  if (!offload && pkt->virt_hdr.flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) {
> +pkt->virt_hdr.flags &= ~VIRTIO_NET_HDR_F_NEEDS_CSUM;
>  net_tx_pkt_do_sw_csum(pkt, >vec[NET_TX_PKT_L2HDR_FRAG],
>pkt->payload_frags + 
> NET_TX_PKT_PL_START_FRAG - 1,
>pkt->payload_len);
>
> ---
> base-commit: ba49d760eb04630e7b15f423ebecf6c871b8f77b
> change-id: 20240324-tx-c57d3c22ad73
>
> Best regards,
> --
> Akihiko Odaki 
>




Re: [RFC 0/2] disable the configuration interrupt for the unsupported device

2024-03-27 Thread Jason Wang
On Wed, Mar 27, 2024 at 5:44 PM Cindy Lu  wrote:
>
> On Wed, Mar 27, 2024 at 5:13 PM Jason Wang  wrote:
> >
> > On Wed, Mar 27, 2024 at 5:12 PM Jason Wang  wrote:
> > >
> > > On Wed, Mar 27, 2024 at 4:28 PM Cindy Lu  wrote:
> > > >
> > > > On Wed, Mar 27, 2024 at 3:54 PM Jason Wang  wrote:
> > > > >
> > > > > On Wed, Mar 27, 2024 at 2:03 PM Cindy Lu  wrote:
> > > > > >
> > > > > > On Wed, Mar 27, 2024 at 11:05 AM Jason Wang  
> > > > > > wrote:
> > > > > > >
> > > > > > > Hi Cindy:
> > > > > > >
> > > > > > > On Wed, Mar 27, 2024 at 9:29 AM Cindy Lu  wrote:
> > > > > > > >
> > > > > > > > we need a crash in Non-standard image, here is the jira for 
> > > > > > > > this https://issues.redhat.com/browse/RHEL-28522
> > > > > > > > The root cause of the issue is that an IRQFD was used without 
> > > > > > > > initialization..
> > > > > > > >
> > > > > > > > During the booting process of the Vyatta image, the behavior of 
> > > > > > > > the called function in qemu is as follows:
> > > > > > > >
> > > > > > > > 1. vhost_net_stop() was called, this will call the function
> > > > > > > > virtio_pci_set_guest_notifiers() with assgin= false, and
> > > > > > > > virtio_pci_set_guest_notifiers() will release the irqfd for 
> > > > > > > > vector 0
> > > > > > >
> > > > > > > Before vhost_net_stop(), do we know which vector is used by which 
> > > > > > > queue?
> > > > > > >
> > > > > > before this stop, vdev->config_verctor is get from
> > > > > > virtio_pci_common_read/virtio_pci_common_write
> > > > > > it was set to vector 0
> > > > >
> > > > > I basically meant if vector 0 is shared with some virtqueues here.
> > > > >
> > > > Really sorry for this, vq's vector is 1,2, and will not share with the
> > > > configure vector
> > > > > > > >
> > > > > > > > 2. virtio_reset() was called -->set configure vector to 
> > > > > > > > VIRTIO_NO_VECTORt
> > > > > > > >
> > > > > > > > 3.vhost_net_start() was called (at this time the configure 
> > > > > > > > vector is
> > > > > > > > still VIRTIO_NO_VECTOR) and call 
> > > > > > > > virtio_pci_set_guest_notifiers() with
> > > > > > > > assgin= true, so the irqfd for vector 0 was not "init" during 
> > > > > > > > this process
> > > > > > >
> > > > > > > How does the configure vector differ from the virtqueue vector 
> > > > > > > here?
> > > > > > >
> > > > > > All the vectors are VIRTIO_NO_VECTOR (including vq). any
> > > > > > msix_fire_vector_notifier()
> > > > > > been called will cause the crash at this time.
> > > > >
> > > > > Won't virtio_pci_set_guest_notifiers() will try to allocate irqfd when
> > > > > the assignment is true?
> > > > >
> > > > It will allocate, but  the vector is VIRTIO_NO_VECTOR (0x)
> > > >
> > > > then it will called kvm_virtio_pci_vector_use_one()
> > > >
> > > > in this function, there is a check for
> > > >
> > > > if (vector >= msix_nr_vectors_allocated(dev))
> > > >
> > > > { return 0; }
> > > >
> > > > So it will return.
> > >
> > > How about let's just fix this?
> >
> > Btw, another question, how does vDPA work here?
> >
> > Thanks
> >
> the rhel/fedroa guest image will not call  vrtio_stop and virtio_reset
> during the boot
> So vector will not change to  VIRTIO_NO_VECTOR. So the vdpa's
> configure interrupt
> Should work ok and there is no crash

I mean:

1) if vDPA can work with the image you used to reproduce the issue
2) if current Qemu can work on old kernel without configure interrupt
support for vDPA

Thanks

> Thanks
> cindy
>
> > >
> > > Btw, it's better to explain in detail like the above in the next 

Re: [RFC 0/2] disable the configuration interrupt for the unsupported device

2024-03-27 Thread Jason Wang
On Wed, Mar 27, 2024 at 5:33 PM Cindy Lu  wrote:
>
> On Wed, Mar 27, 2024 at 5:12 PM Jason Wang  wrote:
> >
> > On Wed, Mar 27, 2024 at 4:28 PM Cindy Lu  wrote:
> > >
> > > On Wed, Mar 27, 2024 at 3:54 PM Jason Wang  wrote:
> > > >
> > > > On Wed, Mar 27, 2024 at 2:03 PM Cindy Lu  wrote:
> > > > >
> > > > > On Wed, Mar 27, 2024 at 11:05 AM Jason Wang  
> > > > > wrote:
> > > > > >
> > > > > > Hi Cindy:
> > > > > >
> > > > > > On Wed, Mar 27, 2024 at 9:29 AM Cindy Lu  wrote:
> > > > > > >
> > > > > > > we need a crash in Non-standard image, here is the jira for this 
> > > > > > > https://issues.redhat.com/browse/RHEL-28522
> > > > > > > The root cause of the issue is that an IRQFD was used without 
> > > > > > > initialization..
> > > > > > >
> > > > > > > During the booting process of the Vyatta image, the behavior of 
> > > > > > > the called function in qemu is as follows:
> > > > > > >
> > > > > > > 1. vhost_net_stop() was called, this will call the function
> > > > > > > virtio_pci_set_guest_notifiers() with assgin= false, and
> > > > > > > virtio_pci_set_guest_notifiers() will release the irqfd for 
> > > > > > > vector 0
> > > > > >
> > > > > > Before vhost_net_stop(), do we know which vector is used by which 
> > > > > > queue?
> > > > > >
> > > > > before this stop, vdev->config_verctor is get from
> > > > > virtio_pci_common_read/virtio_pci_common_write
> > > > > it was set to vector 0
> > > >
> > > > I basically meant if vector 0 is shared with some virtqueues here.
> > > >
> > > Really sorry for this, vq's vector is 1,2, and will not share with the
> > > configure vector
> > > > > > >
> > > > > > > 2. virtio_reset() was called -->set configure vector to 
> > > > > > > VIRTIO_NO_VECTORt
> > > > > > >
> > > > > > > 3.vhost_net_start() was called (at this time the configure vector 
> > > > > > > is
> > > > > > > still VIRTIO_NO_VECTOR) and call virtio_pci_set_guest_notifiers() 
> > > > > > > with
> > > > > > > assgin= true, so the irqfd for vector 0 was not "init" during 
> > > > > > > this process
> > > > > >
> > > > > > How does the configure vector differ from the virtqueue vector here?
> > > > > >
> > > > > All the vectors are VIRTIO_NO_VECTOR (including vq). any
> > > > > msix_fire_vector_notifier()
> > > > > been called will cause the crash at this time.
> > > >
> > > > Won't virtio_pci_set_guest_notifiers() will try to allocate irqfd when
> > > > the assignment is true?
> > > >
> > > It will allocate, but  the vector is VIRTIO_NO_VECTOR (0x)
> > >
> > > then it will called kvm_virtio_pci_vector_use_one()
> > >
> > > in this function, there is a check for
> > >
> > > if (vector >= msix_nr_vectors_allocated(dev))
> > >
> > > { return 0; }
> > >
> > > So it will return.
> >
> > How about let's just fix this?
> >
> > Btw, it's better to explain in detail like the above in the next version.
> >
> > Thanks
> >
> The problem is I think the behavior here is correct, The vector here is
>  VIRTIO_NO_VECTOR and we should return,

So if I understand correctly, the configure vector is configured after
DRIVER_OK?

Spec doesn't forbid this, this is something we need to support.

It looks to me the correct fix is to kvm_virtio_pci_vector_use_one()
when guest is writing to msix_vector after DRIVER_OK?

Thanks

> the fix could work maybe is we try get to know if this was changed
> from another value
> and use that one? this seems strange.
> Thanks
> cindy
> > >
> > > > > So I think this should
> > > > > be a bug in this guest image
> > > >
> > > > The point is Qemu should not crash even if the guest driver is buggy.
> > > >
> > > > It would be nice if we can have a qtest for this on top.
> > > >
> > > > Thanks
> > > >
> &g

Re: [RFC 0/2] disable the configuration interrupt for the unsupported device

2024-03-27 Thread Jason Wang
On Wed, Mar 27, 2024 at 5:12 PM Jason Wang  wrote:
>
> On Wed, Mar 27, 2024 at 4:28 PM Cindy Lu  wrote:
> >
> > On Wed, Mar 27, 2024 at 3:54 PM Jason Wang  wrote:
> > >
> > > On Wed, Mar 27, 2024 at 2:03 PM Cindy Lu  wrote:
> > > >
> > > > On Wed, Mar 27, 2024 at 11:05 AM Jason Wang  wrote:
> > > > >
> > > > > Hi Cindy:
> > > > >
> > > > > On Wed, Mar 27, 2024 at 9:29 AM Cindy Lu  wrote:
> > > > > >
> > > > > > we need a crash in Non-standard image, here is the jira for this 
> > > > > > https://issues.redhat.com/browse/RHEL-28522
> > > > > > The root cause of the issue is that an IRQFD was used without 
> > > > > > initialization..
> > > > > >
> > > > > > During the booting process of the Vyatta image, the behavior of the 
> > > > > > called function in qemu is as follows:
> > > > > >
> > > > > > 1. vhost_net_stop() was called, this will call the function
> > > > > > virtio_pci_set_guest_notifiers() with assgin= false, and
> > > > > > virtio_pci_set_guest_notifiers() will release the irqfd for vector 0
> > > > >
> > > > > Before vhost_net_stop(), do we know which vector is used by which 
> > > > > queue?
> > > > >
> > > > before this stop, vdev->config_verctor is get from
> > > > virtio_pci_common_read/virtio_pci_common_write
> > > > it was set to vector 0
> > >
> > > I basically meant if vector 0 is shared with some virtqueues here.
> > >
> > Really sorry for this, vq's vector is 1,2, and will not share with the
> > configure vector
> > > > > >
> > > > > > 2. virtio_reset() was called -->set configure vector to 
> > > > > > VIRTIO_NO_VECTORt
> > > > > >
> > > > > > 3.vhost_net_start() was called (at this time the configure vector is
> > > > > > still VIRTIO_NO_VECTOR) and call virtio_pci_set_guest_notifiers() 
> > > > > > with
> > > > > > assgin= true, so the irqfd for vector 0 was not "init" during this 
> > > > > > process
> > > > >
> > > > > How does the configure vector differ from the virtqueue vector here?
> > > > >
> > > > All the vectors are VIRTIO_NO_VECTOR (including vq). any
> > > > msix_fire_vector_notifier()
> > > > been called will cause the crash at this time.
> > >
> > > Won't virtio_pci_set_guest_notifiers() will try to allocate irqfd when
> > > the assignment is true?
> > >
> > It will allocate, but  the vector is VIRTIO_NO_VECTOR (0x)
> >
> > then it will called kvm_virtio_pci_vector_use_one()
> >
> > in this function, there is a check for
> >
> > if (vector >= msix_nr_vectors_allocated(dev))
> >
> > { return 0; }
> >
> > So it will return.
>
> How about let's just fix this?

Btw, another question, how does vDPA work here?

Thanks

>
> Btw, it's better to explain in detail like the above in the next version.
>
> Thanks
>
> >
> > > > So I think this should
> > > > be a bug in this guest image
> > >
> > > The point is Qemu should not crash even if the guest driver is buggy.
> > >
> > > It would be nice if we can have a qtest for this on top.
> > >
> > > Thanks
> > >
> > sure, got it, I have done the Qtest, and it passed
> > here is the result
> >
> > Ok: 794
> > Expected Fail:  0
> > Fail:   0
> > Unexpected Pass:0
> > Skipped:32
> > Timeout:0
> >
> > > > > >
> > > > > > 4. The system continues to boot and msix_fire_vector_notifier() was
> > > > > > called unmask the vector 0 and then met the crash
> > > > > > [msix_fire_vector_notifier] 112 called vector 0 is_masked 1
> > > > > > [msix_fire_vector_notifier] 112 called vector 0 is_masked 0
> > > > > >
> > > > > > The reason for not reproducing in RHEL/fedora guest image is because
> > > > > > REHL/Fedora doesn't have the behavior of calling vhost_net_stop and 
> > > > > > then virtio_reset, and also won't call msix_fire_vector_notifier 
> > > > > > for vector 0 during system boot.
&

Re: [RFC 0/2] disable the configuration interrupt for the unsupported device

2024-03-27 Thread Jason Wang
On Wed, Mar 27, 2024 at 4:28 PM Cindy Lu  wrote:
>
> On Wed, Mar 27, 2024 at 3:54 PM Jason Wang  wrote:
> >
> > On Wed, Mar 27, 2024 at 2:03 PM Cindy Lu  wrote:
> > >
> > > On Wed, Mar 27, 2024 at 11:05 AM Jason Wang  wrote:
> > > >
> > > > Hi Cindy:
> > > >
> > > > On Wed, Mar 27, 2024 at 9:29 AM Cindy Lu  wrote:
> > > > >
> > > > > we need a crash in Non-standard image, here is the jira for this 
> > > > > https://issues.redhat.com/browse/RHEL-28522
> > > > > The root cause of the issue is that an IRQFD was used without 
> > > > > initialization..
> > > > >
> > > > > During the booting process of the Vyatta image, the behavior of the 
> > > > > called function in qemu is as follows:
> > > > >
> > > > > 1. vhost_net_stop() was called, this will call the function
> > > > > virtio_pci_set_guest_notifiers() with assgin= false, and
> > > > > virtio_pci_set_guest_notifiers() will release the irqfd for vector 0
> > > >
> > > > Before vhost_net_stop(), do we know which vector is used by which queue?
> > > >
> > > before this stop, vdev->config_verctor is get from
> > > virtio_pci_common_read/virtio_pci_common_write
> > > it was set to vector 0
> >
> > I basically meant if vector 0 is shared with some virtqueues here.
> >
> Really sorry for this, vq's vector is 1,2, and will not share with the
> configure vector
> > > > >
> > > > > 2. virtio_reset() was called -->set configure vector to 
> > > > > VIRTIO_NO_VECTORt
> > > > >
> > > > > 3.vhost_net_start() was called (at this time the configure vector is
> > > > > still VIRTIO_NO_VECTOR) and call virtio_pci_set_guest_notifiers() with
> > > > > assgin= true, so the irqfd for vector 0 was not "init" during this 
> > > > > process
> > > >
> > > > How does the configure vector differ from the virtqueue vector here?
> > > >
> > > All the vectors are VIRTIO_NO_VECTOR (including vq). any
> > > msix_fire_vector_notifier()
> > > been called will cause the crash at this time.
> >
> > Won't virtio_pci_set_guest_notifiers() will try to allocate irqfd when
> > the assignment is true?
> >
> It will allocate, but  the vector is VIRTIO_NO_VECTOR (0x)
>
> then it will called kvm_virtio_pci_vector_use_one()
>
> in this function, there is a check for
>
> if (vector >= msix_nr_vectors_allocated(dev))
>
> { return 0; }
>
> So it will return.

How about let's just fix this?

Btw, it's better to explain in detail like the above in the next version.

Thanks

>
> > > So I think this should
> > > be a bug in this guest image
> >
> > The point is Qemu should not crash even if the guest driver is buggy.
> >
> > It would be nice if we can have a qtest for this on top.
> >
> > Thanks
> >
> sure, got it, I have done the Qtest, and it passed
> here is the result
>
> Ok: 794
> Expected Fail:  0
> Fail:   0
> Unexpected Pass:0
> Skipped:32
> Timeout:0
>
> > > > >
> > > > > 4. The system continues to boot and msix_fire_vector_notifier() was
> > > > > called unmask the vector 0 and then met the crash
> > > > > [msix_fire_vector_notifier] 112 called vector 0 is_masked 1
> > > > > [msix_fire_vector_notifier] 112 called vector 0 is_masked 0
> > > > >
> > > > > The reason for not reproducing in RHEL/fedora guest image is because
> > > > > REHL/Fedora doesn't have the behavior of calling vhost_net_stop and 
> > > > > then virtio_reset, and also won't call msix_fire_vector_notifier for 
> > > > > vector 0 during system boot.
> > > > >
> > > > > The reason for not reproducing before configure interrupt support is 
> > > > > because
> > > > > vector 0 is for configure interrupt,  before the support for 
> > > > > configure interrupts, the notifier process will not handle vector 0.
> > > > >
> > > > > For the device Vyatta using, it doesn't support configure interrupts 
> > > > > at all, So we plan to disable the configure interrupts in unsupported 
> > > > > device
> > > >
> > > > Btw, let's tweak the changelog, it's a little bit hard to understand.
> > > >
> > > sure will do
> > > thanks
> > > Cindy
> > > > Thanks
> > > >
> > > > >
> > > > > Signed-off-by: Cindy Lu 
> > > > >
> > > > > Cindy Lu (2):
> > > > >   virtio-net: disable the configure interrupt for not support device
> > > > >   virtio-pci: check if the configure interrupt enable
> > > > >
> > > > >  hw/net/virtio-net.c|  5 -
> > > > >  hw/virtio/virtio-pci.c | 41 
> > > > > +-
> > > > >  hw/virtio/virtio.c |  1 +
> > > > >  include/hw/virtio/virtio.h |  1 +
> > > > >  4 files changed, 29 insertions(+), 19 deletions(-)
> > > > >
> > > > > --
> > > > > 2.43.0
> > > > >
> > > >
> > >
> >
>




Re: [RFC 0/2] disable the configuration interrupt for the unsupported device

2024-03-27 Thread Jason Wang
On Wed, Mar 27, 2024 at 2:03 PM Cindy Lu  wrote:
>
> On Wed, Mar 27, 2024 at 11:05 AM Jason Wang  wrote:
> >
> > Hi Cindy:
> >
> > On Wed, Mar 27, 2024 at 9:29 AM Cindy Lu  wrote:
> > >
> > > we need a crash in Non-standard image, here is the jira for this 
> > > https://issues.redhat.com/browse/RHEL-28522
> > > The root cause of the issue is that an IRQFD was used without 
> > > initialization..
> > >
> > > During the booting process of the Vyatta image, the behavior of the 
> > > called function in qemu is as follows:
> > >
> > > 1. vhost_net_stop() was called, this will call the function
> > > virtio_pci_set_guest_notifiers() with assgin= false, and
> > > virtio_pci_set_guest_notifiers() will release the irqfd for vector 0
> >
> > Before vhost_net_stop(), do we know which vector is used by which queue?
> >
> before this stop, vdev->config_verctor is get from
> virtio_pci_common_read/virtio_pci_common_write
> it was set to vector 0

I basically meant if vector 0 is shared with some virtqueues here.

> > >
> > > 2. virtio_reset() was called -->set configure vector to VIRTIO_NO_VECTORt
> > >
> > > 3.vhost_net_start() was called (at this time the configure vector is
> > > still VIRTIO_NO_VECTOR) and call virtio_pci_set_guest_notifiers() with
> > > assgin= true, so the irqfd for vector 0 was not "init" during this process
> >
> > How does the configure vector differ from the virtqueue vector here?
> >
> All the vectors are VIRTIO_NO_VECTOR (including vq). any
> msix_fire_vector_notifier()
> been called will cause the crash at this time.

Won't virtio_pci_set_guest_notifiers() will try to allocate irqfd when
the assignment is true?

> So I think this should
> be a bug in this guest image

The point is Qemu should not crash even if the guest driver is buggy.

It would be nice if we can have a qtest for this on top.

Thanks

> > >
> > > 4. The system continues to boot and msix_fire_vector_notifier() was
> > > called unmask the vector 0 and then met the crash
> > > [msix_fire_vector_notifier] 112 called vector 0 is_masked 1
> > > [msix_fire_vector_notifier] 112 called vector 0 is_masked 0
> > >
> > > The reason for not reproducing in RHEL/fedora guest image is because
> > > REHL/Fedora doesn't have the behavior of calling vhost_net_stop and then 
> > > virtio_reset, and also won't call msix_fire_vector_notifier for vector 0 
> > > during system boot.
> > >
> > > The reason for not reproducing before configure interrupt support is 
> > > because
> > > vector 0 is for configure interrupt,  before the support for configure 
> > > interrupts, the notifier process will not handle vector 0.
> > >
> > > For the device Vyatta using, it doesn't support configure interrupts at 
> > > all, So we plan to disable the configure interrupts in unsupported device
> >
> > Btw, let's tweak the changelog, it's a little bit hard to understand.
> >
> sure will do
> thanks
> Cindy
> > Thanks
> >
> > >
> > > Signed-off-by: Cindy Lu 
> > >
> > > Cindy Lu (2):
> > >   virtio-net: disable the configure interrupt for not support device
> > >   virtio-pci: check if the configure interrupt enable
> > >
> > >  hw/net/virtio-net.c|  5 -
> > >  hw/virtio/virtio-pci.c | 41 +-
> > >  hw/virtio/virtio.c |  1 +
> > >  include/hw/virtio/virtio.h |  1 +
> > >  4 files changed, 29 insertions(+), 19 deletions(-)
> > >
> > > --
> > > 2.43.0
> > >
> >
>




Re: [PATCH] hw/net/net_tx_pkt: Fix virtio header without checksum offloading

2024-03-26 Thread Jason Wang
On Wed, Mar 27, 2024 at 11:11 AM Akihiko Odaki  wrote:
>
> On 2024/03/27 12:06, Jason Wang wrote:
> > On Wed, Mar 27, 2024 at 11:05 AM Akihiko Odaki  
> > wrote:
> >>
> >> On 2024/03/27 11:59, Jason Wang wrote:
> >>> On Wed, Mar 27, 2024 at 10:53 AM Akihiko Odaki  
> >>> wrote:
> >>>>
> >>>> On 2024/03/27 11:50, Jason Wang wrote:
> >>>>> On Tue, Mar 26, 2024 at 3:04 PM Akihiko Odaki 
> >>>>>  wrote:
> >>>>>>
> >>>>>> On 2024/03/26 15:51, Jason Wang wrote:
> >>>>>>> On Sun, Mar 24, 2024 at 4:32 PM Akihiko Odaki 
> >>>>>>>  wrote:
> >>>>>>>>
> >>>>>>>> It is incorrect to have the VIRTIO_NET_HDR_F_NEEDS_CSUM set when
> >>>>>>>> checksum offloading is disabled so clear the bit. Set the
> >>>>>>>> VIRTIO_NET_HDR_F_DATA_VALID bit instead to tell the checksum is 
> >>>>>>>> valid.
> >>>>>>>>
> >>>>>>>> TCP/UDP checksum is usually offloaded when the peer requires virtio
> >>>>>>>> headers because they can instruct the peer to compute checksum. 
> >>>>>>>> However,
> >>>>>>>> igb disables TX checksum offloading when a VF is enabled whether the
> >>>>>>>> peer requires virtio headers because a transmitted packet can be 
> >>>>>>>> routed
> >>>>>>>> to it and it expects the packet has a proper checksum. Therefore, it
> >>>>>>>> is necessary to have a correct virtio header even when checksum
> >>>>>>>> offloading is disabled.
> >>>>>>>>
> >>>>>>>> A real TCP/UDP checksum will be computed and saved in the buffer when
> >>>>>>>> checksum offloading is disabled. The virtio specification requires to
> >>>>>>>> set the packet checksum stored in the buffer to the TCP/UDP pseudo
> >>>>>>>> header when the VIRTIO_NET_HDR_F_NEEDS_CSUM bit is set so the bit 
> >>>>>>>> must
> >>>>>>>> be cleared in that case.
> >>>>>>>>
> >>>>>>>> The VIRTIO_NET_HDR_F_NEEDS_CSUM bit also tells to skip checksum
> >>>>>>>> validation. Even if checksum offloading is disabled, it is desirable 
> >>>>>>>> to
> >>>>>>>> skip checksum validation because the checksum is always correct. Use 
> >>>>>>>> the
> >>>>>>>> VIRTIO_NET_HDR_F_DATA_VALID bit to claim the validity of the 
> >>>>>>>> checksum.
> >>>>>>>>
> >>>>>>>> Fixes: ffbd2dbd8e64 ("e1000e: Perform software segmentation for 
> >>>>>>>> loopback")
> >>>>>>>> Buglink: https://issues.redhat.com/browse/RHEL-23067
> >>>>>>>> Signed-off-by: Akihiko Odaki 
> >>>>>>>> ---
> >>>>>>>>  hw/net/net_tx_pkt.c | 3 +++
> >>>>>>>>  1 file changed, 3 insertions(+)
> >>>>>>>>
> >>>>>>>> diff --git a/hw/net/net_tx_pkt.c b/hw/net/net_tx_pkt.c
> >>>>>>>> index 2e5f58b3c9cc..c225cf706513 100644
> >>>>>>>> --- a/hw/net/net_tx_pkt.c
> >>>>>>>> +++ b/hw/net/net_tx_pkt.c
> >>>>>>>> @@ -833,6 +833,9 @@ bool net_tx_pkt_send_custom(struct NetTxPkt 
> >>>>>>>> *pkt, bool offload,
> >>>>>>>>
> >>>>>>>>  if (offload || gso_type == VIRTIO_NET_HDR_GSO_NONE) {
> >>>>>>>>  if (!offload && pkt->virt_hdr.flags & 
> >>>>>>>> VIRTIO_NET_HDR_F_NEEDS_CSUM) {
> >>>>>>>> +pkt->virt_hdr.flags =
> >>>>>>>> +(pkt->virt_hdr.flags & 
> >>>>>>>> ~VIRTIO_NET_HDR_F_NEEDS_CSUM) |
> >>>>>>>> +VIRTIO_NET_HDR_F_DATA_VALID;
> >>>>>>>
> >>>>>>> Why VIRTIO_NET_HDR_F_DATA_VALID is used in TX path?
> >>>>>>
> >>>>>> On igb, a pa

Re: [PATCH] hw/net/net_tx_pkt: Fix virtio header without checksum offloading

2024-03-26 Thread Jason Wang
On Wed, Mar 27, 2024 at 11:05 AM Akihiko Odaki  wrote:
>
> On 2024/03/27 11:59, Jason Wang wrote:
> > On Wed, Mar 27, 2024 at 10:53 AM Akihiko Odaki  
> > wrote:
> >>
> >> On 2024/03/27 11:50, Jason Wang wrote:
> >>> On Tue, Mar 26, 2024 at 3:04 PM Akihiko Odaki  
> >>> wrote:
> >>>>
> >>>> On 2024/03/26 15:51, Jason Wang wrote:
> >>>>> On Sun, Mar 24, 2024 at 4:32 PM Akihiko Odaki 
> >>>>>  wrote:
> >>>>>>
> >>>>>> It is incorrect to have the VIRTIO_NET_HDR_F_NEEDS_CSUM set when
> >>>>>> checksum offloading is disabled so clear the bit. Set the
> >>>>>> VIRTIO_NET_HDR_F_DATA_VALID bit instead to tell the checksum is valid.
> >>>>>>
> >>>>>> TCP/UDP checksum is usually offloaded when the peer requires virtio
> >>>>>> headers because they can instruct the peer to compute checksum. 
> >>>>>> However,
> >>>>>> igb disables TX checksum offloading when a VF is enabled whether the
> >>>>>> peer requires virtio headers because a transmitted packet can be routed
> >>>>>> to it and it expects the packet has a proper checksum. Therefore, it
> >>>>>> is necessary to have a correct virtio header even when checksum
> >>>>>> offloading is disabled.
> >>>>>>
> >>>>>> A real TCP/UDP checksum will be computed and saved in the buffer when
> >>>>>> checksum offloading is disabled. The virtio specification requires to
> >>>>>> set the packet checksum stored in the buffer to the TCP/UDP pseudo
> >>>>>> header when the VIRTIO_NET_HDR_F_NEEDS_CSUM bit is set so the bit must
> >>>>>> be cleared in that case.
> >>>>>>
> >>>>>> The VIRTIO_NET_HDR_F_NEEDS_CSUM bit also tells to skip checksum
> >>>>>> validation. Even if checksum offloading is disabled, it is desirable to
> >>>>>> skip checksum validation because the checksum is always correct. Use 
> >>>>>> the
> >>>>>> VIRTIO_NET_HDR_F_DATA_VALID bit to claim the validity of the checksum.
> >>>>>>
> >>>>>> Fixes: ffbd2dbd8e64 ("e1000e: Perform software segmentation for 
> >>>>>> loopback")
> >>>>>> Buglink: https://issues.redhat.com/browse/RHEL-23067
> >>>>>> Signed-off-by: Akihiko Odaki 
> >>>>>> ---
> >>>>>> hw/net/net_tx_pkt.c | 3 +++
> >>>>>> 1 file changed, 3 insertions(+)
> >>>>>>
> >>>>>> diff --git a/hw/net/net_tx_pkt.c b/hw/net/net_tx_pkt.c
> >>>>>> index 2e5f58b3c9cc..c225cf706513 100644
> >>>>>> --- a/hw/net/net_tx_pkt.c
> >>>>>> +++ b/hw/net/net_tx_pkt.c
> >>>>>> @@ -833,6 +833,9 @@ bool net_tx_pkt_send_custom(struct NetTxPkt *pkt, 
> >>>>>> bool offload,
> >>>>>>
> >>>>>> if (offload || gso_type == VIRTIO_NET_HDR_GSO_NONE) {
> >>>>>> if (!offload && pkt->virt_hdr.flags & 
> >>>>>> VIRTIO_NET_HDR_F_NEEDS_CSUM) {
> >>>>>> +pkt->virt_hdr.flags =
> >>>>>> +(pkt->virt_hdr.flags & ~VIRTIO_NET_HDR_F_NEEDS_CSUM) |
> >>>>>> +VIRTIO_NET_HDR_F_DATA_VALID;
> >>>>>
> >>>>> Why VIRTIO_NET_HDR_F_DATA_VALID is used in TX path?
> >>>>
> >>>> On igb, a packet sent from a PCI function may be routed to another
> >>>> function. The virtio header updated here will be directly provided to
> >>>> the RX path in such a case.
> >>>
> >>> But I meant for example net_tx_pkt_send_custom() is used in
> >>> e1000e_tx_pkt_send() which is the tx path on the host.
> >>>
> >>> VIRTIO_NET_HDR_F_DATA_VALID is not necessary in the tx path.
> >>
> >> igb passes igb_tx_pkt_vmdq_callback to net_tx_pkt_send_custom().
> >> igb_tx_pkt_vmdq_callback() passes the packet to its rx path for loopback.
> >>
> >
> > You are right, how about igb_tx_pkt_vmdq_callback()?
> >
> > We probably need to tweak the name if it is only used in rx path.
>
> igb_tx_pkt_vmdq_callback() itself is part of the tx path of a PCI
> function, and invokes the rx path of another PCI function in case of
> loopback, or triggers the transmission to the external peer.

Right, so if it's an external TX, VIRTIO_NET_HDR_F_DATA_VALID may not
work there.

Thanks

>
> Regards,
> Akihiko Odaki
>
> >
> > Thanks
> >
> >> Regards,
> >> Akihiko Odaki
> >>
> >>>
> >>> Thanks
> >>>
> >>>>
> >>>> Regards,
> >>>> Akihiko Odaki
> >>>>
> >>>>>
> >>>>> Thanks
> >>>>>
> >>>>>> net_tx_pkt_do_sw_csum(pkt, 
> >>>>>> >vec[NET_TX_PKT_L2HDR_FRAG],
> >>>>>>   pkt->payload_frags + 
> >>>>>> NET_TX_PKT_PL_START_FRAG - 1,
> >>>>>>   pkt->payload_len);
> >>>>>>
> >>>>>> ---
> >>>>>> base-commit: ba49d760eb04630e7b15f423ebecf6c871b8f77b
> >>>>>> change-id: 20240324-tx-c57d3c22ad73
> >>>>>>
> >>>>>> Best regards,
> >>>>>> --
> >>>>>> Akihiko Odaki 
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> >
>




Re: [RFC 0/2] disable the configuration interrupt for the unsupported device

2024-03-26 Thread Jason Wang
Hi Cindy:

On Wed, Mar 27, 2024 at 9:29 AM Cindy Lu  wrote:
>
> we need a crash in Non-standard image, here is the jira for this 
> https://issues.redhat.com/browse/RHEL-28522
> The root cause of the issue is that an IRQFD was used without initialization..
>
> During the booting process of the Vyatta image, the behavior of the called 
> function in qemu is as follows:
>
> 1. vhost_net_stop() was called, this will call the function
> virtio_pci_set_guest_notifiers() with assgin= false, and
> virtio_pci_set_guest_notifiers() will release the irqfd for vector 0

Before vhost_net_stop(), do we know which vector is used by which queue?

>
> 2. virtio_reset() was called -->set configure vector to VIRTIO_NO_VECTORt
>
> 3.vhost_net_start() was called (at this time the configure vector is
> still VIRTIO_NO_VECTOR) and call virtio_pci_set_guest_notifiers() with
> assgin= true, so the irqfd for vector 0 was not "init" during this process

How does the configure vector differ from the virtqueue vector here?

>
> 4. The system continues to boot and msix_fire_vector_notifier() was
> called unmask the vector 0 and then met the crash
> [msix_fire_vector_notifier] 112 called vector 0 is_masked 1
> [msix_fire_vector_notifier] 112 called vector 0 is_masked 0
>
> The reason for not reproducing in RHEL/fedora guest image is because
> REHL/Fedora doesn't have the behavior of calling vhost_net_stop and then 
> virtio_reset, and also won't call msix_fire_vector_notifier for vector 0 
> during system boot.
>
> The reason for not reproducing before configure interrupt support is because
> vector 0 is for configure interrupt,  before the support for configure 
> interrupts, the notifier process will not handle vector 0.
>
> For the device Vyatta using, it doesn't support configure interrupts at all, 
> So we plan to disable the configure interrupts in unsupported device

Btw, let's tweak the changelog, it's a little bit hard to understand.

Thanks

>
> Signed-off-by: Cindy Lu 
>
> Cindy Lu (2):
>   virtio-net: disable the configure interrupt for not support device
>   virtio-pci: check if the configure interrupt enable
>
>  hw/net/virtio-net.c|  5 -
>  hw/virtio/virtio-pci.c | 41 +-
>  hw/virtio/virtio.c |  1 +
>  include/hw/virtio/virtio.h |  1 +
>  4 files changed, 29 insertions(+), 19 deletions(-)
>
> --
> 2.43.0
>




Re: [PATCH] hw/net/net_tx_pkt: Fix virtio header without checksum offloading

2024-03-26 Thread Jason Wang
On Wed, Mar 27, 2024 at 10:53 AM Akihiko Odaki  wrote:
>
> On 2024/03/27 11:50, Jason Wang wrote:
> > On Tue, Mar 26, 2024 at 3:04 PM Akihiko Odaki  
> > wrote:
> >>
> >> On 2024/03/26 15:51, Jason Wang wrote:
> >>> On Sun, Mar 24, 2024 at 4:32 PM Akihiko Odaki  
> >>> wrote:
> >>>>
> >>>> It is incorrect to have the VIRTIO_NET_HDR_F_NEEDS_CSUM set when
> >>>> checksum offloading is disabled so clear the bit. Set the
> >>>> VIRTIO_NET_HDR_F_DATA_VALID bit instead to tell the checksum is valid.
> >>>>
> >>>> TCP/UDP checksum is usually offloaded when the peer requires virtio
> >>>> headers because they can instruct the peer to compute checksum. However,
> >>>> igb disables TX checksum offloading when a VF is enabled whether the
> >>>> peer requires virtio headers because a transmitted packet can be routed
> >>>> to it and it expects the packet has a proper checksum. Therefore, it
> >>>> is necessary to have a correct virtio header even when checksum
> >>>> offloading is disabled.
> >>>>
> >>>> A real TCP/UDP checksum will be computed and saved in the buffer when
> >>>> checksum offloading is disabled. The virtio specification requires to
> >>>> set the packet checksum stored in the buffer to the TCP/UDP pseudo
> >>>> header when the VIRTIO_NET_HDR_F_NEEDS_CSUM bit is set so the bit must
> >>>> be cleared in that case.
> >>>>
> >>>> The VIRTIO_NET_HDR_F_NEEDS_CSUM bit also tells to skip checksum
> >>>> validation. Even if checksum offloading is disabled, it is desirable to
> >>>> skip checksum validation because the checksum is always correct. Use the
> >>>> VIRTIO_NET_HDR_F_DATA_VALID bit to claim the validity of the checksum.
> >>>>
> >>>> Fixes: ffbd2dbd8e64 ("e1000e: Perform software segmentation for 
> >>>> loopback")
> >>>> Buglink: https://issues.redhat.com/browse/RHEL-23067
> >>>> Signed-off-by: Akihiko Odaki 
> >>>> ---
> >>>>hw/net/net_tx_pkt.c | 3 +++
> >>>>1 file changed, 3 insertions(+)
> >>>>
> >>>> diff --git a/hw/net/net_tx_pkt.c b/hw/net/net_tx_pkt.c
> >>>> index 2e5f58b3c9cc..c225cf706513 100644
> >>>> --- a/hw/net/net_tx_pkt.c
> >>>> +++ b/hw/net/net_tx_pkt.c
> >>>> @@ -833,6 +833,9 @@ bool net_tx_pkt_send_custom(struct NetTxPkt *pkt, 
> >>>> bool offload,
> >>>>
> >>>>if (offload || gso_type == VIRTIO_NET_HDR_GSO_NONE) {
> >>>>if (!offload && pkt->virt_hdr.flags & 
> >>>> VIRTIO_NET_HDR_F_NEEDS_CSUM) {
> >>>> +pkt->virt_hdr.flags =
> >>>> +(pkt->virt_hdr.flags & ~VIRTIO_NET_HDR_F_NEEDS_CSUM) |
> >>>> +VIRTIO_NET_HDR_F_DATA_VALID;
> >>>
> >>> Why VIRTIO_NET_HDR_F_DATA_VALID is used in TX path?
> >>
> >> On igb, a packet sent from a PCI function may be routed to another
> >> function. The virtio header updated here will be directly provided to
> >> the RX path in such a case.
> >
> > But I meant for example net_tx_pkt_send_custom() is used in
> > e1000e_tx_pkt_send() which is the tx path on the host.
> >
> > VIRTIO_NET_HDR_F_DATA_VALID is not necessary in the tx path.
>
> igb passes igb_tx_pkt_vmdq_callback to net_tx_pkt_send_custom().
> igb_tx_pkt_vmdq_callback() passes the packet to its rx path for loopback.
>

You are right, how about igb_tx_pkt_vmdq_callback()?

We probably need to tweak the name if it is only used in rx path.

Thanks

> Regards,
> Akihiko Odaki
>
> >
> > Thanks
> >
> >>
> >> Regards,
> >> Akihiko Odaki
> >>
> >>>
> >>> Thanks
> >>>
> >>>>net_tx_pkt_do_sw_csum(pkt, 
> >>>> >vec[NET_TX_PKT_L2HDR_FRAG],
> >>>>  pkt->payload_frags + 
> >>>> NET_TX_PKT_PL_START_FRAG - 1,
> >>>>  pkt->payload_len);
> >>>>
> >>>> ---
> >>>> base-commit: ba49d760eb04630e7b15f423ebecf6c871b8f77b
> >>>> change-id: 20240324-tx-c57d3c22ad73
> >>>>
> >>>> Best regards,
> >>>> --
> >>>> Akihiko Odaki 
> >>>>
> >>>
> >>
> >
>




Re: [RFC 1/2] virtio-net: disable the configure interrupt for not support device

2024-03-26 Thread Jason Wang
On Wed, Mar 27, 2024 at 9:29 AM Cindy Lu  wrote:
>
> Only the vdpa device support configure interrupt, we need to disable the
> configure interrupt process in all other device.

I think we need to tweak the terminology here at least.

It's not about configure interrupt, it's about whether or not we can
try to use irqfd for configure interrupt.

Btw, have you tried this on the old kernel that doesn't support
configure interrupt for vDPA?

> In order to achieve this, I added a check in the virtio_net_device_realize().
> When the peer's type is vdpa, the value of config_irq_enabled in the structure
> VirtIODevice will set to true.
>
> Signed-off-by: Cindy Lu 
> ---
>  hw/net/virtio-net.c| 5 -
>  hw/virtio/virtio.c | 1 +
>  include/hw/virtio/virtio.h | 1 +
>  3 files changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index 80c56f0cfc..3b487864a8 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -3749,12 +3749,15 @@ static void virtio_net_device_realize(DeviceState 
> *dev, Error **errp)
>
>  nc = qemu_get_queue(n->nic);
>  nc->rxfilter_notify_enabled = 1;
> +vdev->config_irq_enabled = false;

Let's tweak the name of the variable.

But in another thought, there's no easy way to know if vDPA support
configure interrupt at device realization.

We need a graceful fallback or just disable irqfd to configure irq.

>
> -   if (nc->peer && nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
> +if (nc->peer && nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
>  struct virtio_net_config netcfg = {};
>  memcpy(, >nic_conf.macaddr, ETH_ALEN);
>  vhost_net_set_config(get_vhost_net(nc->peer),
>  (uint8_t *), 0, ETH_ALEN, VHOST_SET_CONFIG_TYPE_FRONTEND);
> +
> +vdev->config_irq_enabled = true;
>  }
>  QTAILQ_INIT(>rsc_chains);
>  n->qdev = dev;
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index 3a160f86ed..6b52a7190d 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -3255,6 +3255,7 @@ void virtio_init(VirtIODevice *vdev, uint16_t 
> device_id, size_t config_size)
>  virtio_vmstate_change, vdev);
>  vdev->device_endian = virtio_default_endian();
>  vdev->use_guest_notifier_mask = true;
> +vdev->config_irq_enabled = false;
>  }
>
>  /*
> diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> index c8f72850bc..a7763b71e0 100644
> --- a/include/hw/virtio/virtio.h
> +++ b/include/hw/virtio/virtio.h
> @@ -164,6 +164,7 @@ struct VirtIODevice
>   */
>  EventNotifier config_notifier;
>  bool device_iotlb_enabled;
> +bool config_irq_enabled;
>  };
>
>  struct VirtioDeviceClass {

Thanks

> --
> 2.43.0
>




Re: [PATCH] hw/net/net_tx_pkt: Fix virtio header without checksum offloading

2024-03-26 Thread Jason Wang
On Tue, Mar 26, 2024 at 3:04 PM Akihiko Odaki  wrote:
>
> On 2024/03/26 15:51, Jason Wang wrote:
> > On Sun, Mar 24, 2024 at 4:32 PM Akihiko Odaki  
> > wrote:
> >>
> >> It is incorrect to have the VIRTIO_NET_HDR_F_NEEDS_CSUM set when
> >> checksum offloading is disabled so clear the bit. Set the
> >> VIRTIO_NET_HDR_F_DATA_VALID bit instead to tell the checksum is valid.
> >>
> >> TCP/UDP checksum is usually offloaded when the peer requires virtio
> >> headers because they can instruct the peer to compute checksum. However,
> >> igb disables TX checksum offloading when a VF is enabled whether the
> >> peer requires virtio headers because a transmitted packet can be routed
> >> to it and it expects the packet has a proper checksum. Therefore, it
> >> is necessary to have a correct virtio header even when checksum
> >> offloading is disabled.
> >>
> >> A real TCP/UDP checksum will be computed and saved in the buffer when
> >> checksum offloading is disabled. The virtio specification requires to
> >> set the packet checksum stored in the buffer to the TCP/UDP pseudo
> >> header when the VIRTIO_NET_HDR_F_NEEDS_CSUM bit is set so the bit must
> >> be cleared in that case.
> >>
> >> The VIRTIO_NET_HDR_F_NEEDS_CSUM bit also tells to skip checksum
> >> validation. Even if checksum offloading is disabled, it is desirable to
> >> skip checksum validation because the checksum is always correct. Use the
> >> VIRTIO_NET_HDR_F_DATA_VALID bit to claim the validity of the checksum.
> >>
> >> Fixes: ffbd2dbd8e64 ("e1000e: Perform software segmentation for loopback")
> >> Buglink: https://issues.redhat.com/browse/RHEL-23067
> >> Signed-off-by: Akihiko Odaki 
> >> ---
> >>   hw/net/net_tx_pkt.c | 3 +++
> >>   1 file changed, 3 insertions(+)
> >>
> >> diff --git a/hw/net/net_tx_pkt.c b/hw/net/net_tx_pkt.c
> >> index 2e5f58b3c9cc..c225cf706513 100644
> >> --- a/hw/net/net_tx_pkt.c
> >> +++ b/hw/net/net_tx_pkt.c
> >> @@ -833,6 +833,9 @@ bool net_tx_pkt_send_custom(struct NetTxPkt *pkt, bool 
> >> offload,
> >>
> >>   if (offload || gso_type == VIRTIO_NET_HDR_GSO_NONE) {
> >>   if (!offload && pkt->virt_hdr.flags & 
> >> VIRTIO_NET_HDR_F_NEEDS_CSUM) {
> >> +pkt->virt_hdr.flags =
> >> +(pkt->virt_hdr.flags & ~VIRTIO_NET_HDR_F_NEEDS_CSUM) |
> >> +VIRTIO_NET_HDR_F_DATA_VALID;
> >
> > Why VIRTIO_NET_HDR_F_DATA_VALID is used in TX path?
>
> On igb, a packet sent from a PCI function may be routed to another
> function. The virtio header updated here will be directly provided to
> the RX path in such a case.

But I meant for example net_tx_pkt_send_custom() is used in
e1000e_tx_pkt_send() which is the tx path on the host.

VIRTIO_NET_HDR_F_DATA_VALID is not necessary in the tx path.

Thanks

>
> Regards,
> Akihiko Odaki
>
> >
> > Thanks
> >
> >>   net_tx_pkt_do_sw_csum(pkt, >vec[NET_TX_PKT_L2HDR_FRAG],
> >> pkt->payload_frags + 
> >> NET_TX_PKT_PL_START_FRAG - 1,
> >> pkt->payload_len);
> >>
> >> ---
> >> base-commit: ba49d760eb04630e7b15f423ebecf6c871b8f77b
> >> change-id: 20240324-tx-c57d3c22ad73
> >>
> >> Best regards,
> >> --
> >> Akihiko Odaki 
> >>
> >
>




Re: [PATCH] hw/net/net_tx_pkt: Fix virtio header without checksum offloading

2024-03-26 Thread Jason Wang
On Sun, Mar 24, 2024 at 4:32 PM Akihiko Odaki  wrote:
>
> It is incorrect to have the VIRTIO_NET_HDR_F_NEEDS_CSUM set when
> checksum offloading is disabled so clear the bit. Set the
> VIRTIO_NET_HDR_F_DATA_VALID bit instead to tell the checksum is valid.
>
> TCP/UDP checksum is usually offloaded when the peer requires virtio
> headers because they can instruct the peer to compute checksum. However,
> igb disables TX checksum offloading when a VF is enabled whether the
> peer requires virtio headers because a transmitted packet can be routed
> to it and it expects the packet has a proper checksum. Therefore, it
> is necessary to have a correct virtio header even when checksum
> offloading is disabled.
>
> A real TCP/UDP checksum will be computed and saved in the buffer when
> checksum offloading is disabled. The virtio specification requires to
> set the packet checksum stored in the buffer to the TCP/UDP pseudo
> header when the VIRTIO_NET_HDR_F_NEEDS_CSUM bit is set so the bit must
> be cleared in that case.
>
> The VIRTIO_NET_HDR_F_NEEDS_CSUM bit also tells to skip checksum
> validation. Even if checksum offloading is disabled, it is desirable to
> skip checksum validation because the checksum is always correct. Use the
> VIRTIO_NET_HDR_F_DATA_VALID bit to claim the validity of the checksum.
>
> Fixes: ffbd2dbd8e64 ("e1000e: Perform software segmentation for loopback")
> Buglink: https://issues.redhat.com/browse/RHEL-23067
> Signed-off-by: Akihiko Odaki 
> ---
>  hw/net/net_tx_pkt.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/hw/net/net_tx_pkt.c b/hw/net/net_tx_pkt.c
> index 2e5f58b3c9cc..c225cf706513 100644
> --- a/hw/net/net_tx_pkt.c
> +++ b/hw/net/net_tx_pkt.c
> @@ -833,6 +833,9 @@ bool net_tx_pkt_send_custom(struct NetTxPkt *pkt, bool 
> offload,
>
>  if (offload || gso_type == VIRTIO_NET_HDR_GSO_NONE) {
>  if (!offload && pkt->virt_hdr.flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) {
> +pkt->virt_hdr.flags =
> +(pkt->virt_hdr.flags & ~VIRTIO_NET_HDR_F_NEEDS_CSUM) |
> +VIRTIO_NET_HDR_F_DATA_VALID;

Why VIRTIO_NET_HDR_F_DATA_VALID is used in TX path?

Thanks

>  net_tx_pkt_do_sw_csum(pkt, >vec[NET_TX_PKT_L2HDR_FRAG],
>pkt->payload_frags + 
> NET_TX_PKT_PL_START_FRAG - 1,
>pkt->payload_len);
>
> ---
> base-commit: ba49d760eb04630e7b15f423ebecf6c871b8f77b
> change-id: 20240324-tx-c57d3c22ad73
>
> Best regards,
> --
> Akihiko Odaki 
>




Re: [PATCH 0/2] tap: Use g_spawn_sync() and g_spawn_check_wait_status()

2024-03-26 Thread Jason Wang
On Tue, Dec 19, 2023 at 7:59 PM Akihiko Odaki  wrote:
>
> g_spawn_sync() gives an informative message if it fails to execute
> the script instead of reporting exiting status 1.
>
> g_spawn_check_wait_status() also gives an message easier to understand
> than the raw value returned by waitpid().
>
> Signed-off-by: Akihiko Odaki 
> ---
> Akihiko Odaki (2):
>   glib-compat: Define g_spawn_check_wait_status()
>   tap: Use g_spawn_sync() and g_spawn_check_wait_status()
>
>  include/glib-compat.h |  2 ++
>  net/tap.c | 52 
> ++-
>  2 files changed, 24 insertions(+), 30 deletions(-)
> ---
> base-commit: 9c74490bff6c8886a922008d0c9ce6cae70dd17e
> change-id: 20231219-glib-034a34bb05d8
>
> Best regards,
> --
> Akihiko Odaki 

I've queued this for 9.1

Thanks

>




Re: [PATCH v2] Revert "tap: setting error appropriately when calling net_init_tap_one()"

2024-03-26 Thread Jason Wang
On Thu, Sep 21, 2023 at 5:48 PM Akihiko Odaki  wrote:
>
> This reverts commit 46d4d36d0bf2b24b205f2f604f0905db80264eef.
>
> The reverted commit changed to emit warnings instead of errors when
> vhost is requested but vhost initialization fails if vhostforce option
> is not set.
>
> However, vhostforce is not meant to change the error handling. It was
> once introduced as an option to commit 5430a28fe4 ("vhost: force vhost
> off for non-MSI guests") to force enabling vhost for non-MSI guests,
> which will have worse performance with vhost. It was deprecated with
> commit 1e7398a140 ("vhost: enable vhost without without MSI-X") and
> changed to behave identical with the vhost option for compatibility.
>
> Worse, commit bf769f742c ("virtio: del net client if net_init_tap_one
> failed") changed to delete the client when vhost fails even when the
> failure only results in a warning. The leads to an assertion failure
> for the -netdev command line option.
>
> The reverted commit was intended to ensure that the vhost initialization
> failure won't result in a corrupted netdev. This problem should have
> been fixed by deleting netdev when the initialization fails instead of
> ignoring the failure by converting it into a warning. Fortunately,
> commit bf769f742c ("virtio: del net client if net_init_tap_one failed"),
> mentioned earlier, implements this behavior.
>
> Restore the correct semantics and fix the assertion failure for the
> -netdev command line option by reverting the problematic commit.
>
> Signed-off-by: Akihiko Odaki 
> ---
> V1 -> V2: Corrected the message.
>

Queued.

Thanks




Re: [PATCH v2] tap-win32: Remove unnecessary stubs

2024-03-26 Thread Jason Wang
On Mon, Feb 12, 2024 at 10:04 PM Akihiko Odaki  wrote:
>
> Some of them are only necessary for POSIX systems. The others are
> assigned to function pointers in NetClientInfo that can actually be
> NULL.
>
> Signed-off-by: Akihiko Odaki 
> ---
> Changes in v2:
> - Rebased.
> - Link to v1: 
> https://lore.kernel.org/r/20231006051127.5429-1-akihiko.od...@daynix.com
> ---

Queued.

Thanks




Re: [External] : Re: [PATCH v4 2/2] vhost: Perform memory section dirty scans once per iteration

2024-03-25 Thread Jason Wang
On Tue, Mar 26, 2024 at 7:21 AM Si-Wei Liu  wrote:
>
>
>
> On 3/24/2024 11:13 PM, Jason Wang wrote:
> > On Sat, Mar 23, 2024 at 5:14 AM Si-Wei Liu  wrote:
> >>
> >>
> >> On 3/21/2024 10:08 PM, Jason Wang wrote:
> >>> On Fri, Mar 22, 2024 at 5:43 AM Si-Wei Liu  wrote:
> >>>>
> >>>> On 3/20/2024 8:56 PM, Jason Wang wrote:
> >>>>> On Thu, Mar 21, 2024 at 5:03 AM Si-Wei Liu  
> >>>>> wrote:
> >>>>>> On 3/19/2024 8:27 PM, Jason Wang wrote:
> >>>>>>> On Tue, Mar 19, 2024 at 6:16 AM Si-Wei Liu  
> >>>>>>> wrote:
> >>>>>>>> On 3/17/2024 8:22 PM, Jason Wang wrote:
> >>>>>>>>> On Sat, Mar 16, 2024 at 2:45 AM Si-Wei Liu  
> >>>>>>>>> wrote:
> >>>>>>>>>> On 3/14/2024 9:03 PM, Jason Wang wrote:
> >>>>>>>>>>> On Fri, Mar 15, 2024 at 5:39 AM Si-Wei Liu 
> >>>>>>>>>>>  wrote:
> >>>>>>>>>>>> On setups with one or more virtio-net devices with vhost on,
> >>>>>>>>>>>> dirty tracking iteration increases cost the bigger the number
> >>>>>>>>>>>> amount of queues are set up e.g. on idle guests migration the
> >>>>>>>>>>>> following is observed with virtio-net with vhost=on:
> >>>>>>>>>>>>
> >>>>>>>>>>>> 48 queues -> 78.11%  [.] vhost_dev_sync_region.isra.13
> >>>>>>>>>>>> 8 queues -> 40.50%   [.] vhost_dev_sync_region.isra.13
> >>>>>>>>>>>> 1 queue -> 6.89% [.] vhost_dev_sync_region.isra.13
> >>>>>>>>>>>> 2 devices, 1 queue -> 18.60%  [.] vhost_dev_sync_region.isra.14
> >>>>>>>>>>>>
> >>>>>>>>>>>> With high memory rates the symptom is lack of convergence as soon
> >>>>>>>>>>>> as it has a vhost device with a sufficiently high number of 
> >>>>>>>>>>>> queues,
> >>>>>>>>>>>> the sufficient number of vhost devices.
> >>>>>>>>>>>>
> >>>>>>>>>>>> On every migration iteration (every 100msecs) it will redundantly
> >>>>>>>>>>>> query the *shared log* the number of queues configured with vhost
> >>>>>>>>>>>> that exist in the guest. For the virtqueue data, this is 
> >>>>>>>>>>>> necessary,
> >>>>>>>>>>>> but not for the memory sections which are the same. So 
> >>>>>>>>>>>> essentially
> >>>>>>>>>>>> we end up scanning the dirty log too often.
> >>>>>>>>>>>>
> >>>>>>>>>>>> To fix that, select a vhost device responsible for scanning the
> >>>>>>>>>>>> log with regards to memory sections dirty tracking. It is 
> >>>>>>>>>>>> selected
> >>>>>>>>>>>> when we enable the logger (during migration) and cleared when we
> >>>>>>>>>>>> disable the logger. If the vhost logger device goes away for some
> >>>>>>>>>>>> reason, the logger will be re-selected from the rest of vhost
> >>>>>>>>>>>> devices.
> >>>>>>>>>>>>
> >>>>>>>>>>>> After making mem-section logger a singleton instance, constant 
> >>>>>>>>>>>> cost
> >>>>>>>>>>>> of 7%-9% (like the 1 queue report) will be seen, no matter how 
> >>>>>>>>>>>> many
> >>>>>>>>>>>> queues or how many vhost devices are configured:
> >>>>>>>>>>>>
> >>>>>>>>>>>> 48 queues -> 8.71%[.] vhost_dev_sync_region.isra.13
> >>>>>>>>>>>> 2 devices, 8 queues -> 7.97%   [.] vhost_dev_sync_region.isra.14
> >>>>>>>>>>>>
> >>>>>>>>>

Re: [PATCH v4 2/2] vhost: Perform memory section dirty scans once per iteration

2024-03-25 Thread Jason Wang
On Sat, Mar 23, 2024 at 5:14 AM Si-Wei Liu  wrote:
>
>
>
> On 3/21/2024 10:08 PM, Jason Wang wrote:
> > On Fri, Mar 22, 2024 at 5:43 AM Si-Wei Liu  wrote:
> >>
> >>
> >> On 3/20/2024 8:56 PM, Jason Wang wrote:
> >>> On Thu, Mar 21, 2024 at 5:03 AM Si-Wei Liu  wrote:
> >>>>
> >>>> On 3/19/2024 8:27 PM, Jason Wang wrote:
> >>>>> On Tue, Mar 19, 2024 at 6:16 AM Si-Wei Liu  
> >>>>> wrote:
> >>>>>> On 3/17/2024 8:22 PM, Jason Wang wrote:
> >>>>>>> On Sat, Mar 16, 2024 at 2:45 AM Si-Wei Liu  
> >>>>>>> wrote:
> >>>>>>>> On 3/14/2024 9:03 PM, Jason Wang wrote:
> >>>>>>>>> On Fri, Mar 15, 2024 at 5:39 AM Si-Wei Liu  
> >>>>>>>>> wrote:
> >>>>>>>>>> On setups with one or more virtio-net devices with vhost on,
> >>>>>>>>>> dirty tracking iteration increases cost the bigger the number
> >>>>>>>>>> amount of queues are set up e.g. on idle guests migration the
> >>>>>>>>>> following is observed with virtio-net with vhost=on:
> >>>>>>>>>>
> >>>>>>>>>> 48 queues -> 78.11%  [.] vhost_dev_sync_region.isra.13
> >>>>>>>>>> 8 queues -> 40.50%   [.] vhost_dev_sync_region.isra.13
> >>>>>>>>>> 1 queue -> 6.89% [.] vhost_dev_sync_region.isra.13
> >>>>>>>>>> 2 devices, 1 queue -> 18.60%  [.] vhost_dev_sync_region.isra.14
> >>>>>>>>>>
> >>>>>>>>>> With high memory rates the symptom is lack of convergence as soon
> >>>>>>>>>> as it has a vhost device with a sufficiently high number of queues,
> >>>>>>>>>> the sufficient number of vhost devices.
> >>>>>>>>>>
> >>>>>>>>>> On every migration iteration (every 100msecs) it will redundantly
> >>>>>>>>>> query the *shared log* the number of queues configured with vhost
> >>>>>>>>>> that exist in the guest. For the virtqueue data, this is necessary,
> >>>>>>>>>> but not for the memory sections which are the same. So essentially
> >>>>>>>>>> we end up scanning the dirty log too often.
> >>>>>>>>>>
> >>>>>>>>>> To fix that, select a vhost device responsible for scanning the
> >>>>>>>>>> log with regards to memory sections dirty tracking. It is selected
> >>>>>>>>>> when we enable the logger (during migration) and cleared when we
> >>>>>>>>>> disable the logger. If the vhost logger device goes away for some
> >>>>>>>>>> reason, the logger will be re-selected from the rest of vhost
> >>>>>>>>>> devices.
> >>>>>>>>>>
> >>>>>>>>>> After making mem-section logger a singleton instance, constant cost
> >>>>>>>>>> of 7%-9% (like the 1 queue report) will be seen, no matter how many
> >>>>>>>>>> queues or how many vhost devices are configured:
> >>>>>>>>>>
> >>>>>>>>>> 48 queues -> 8.71%[.] vhost_dev_sync_region.isra.13
> >>>>>>>>>> 2 devices, 8 queues -> 7.97%   [.] vhost_dev_sync_region.isra.14
> >>>>>>>>>>
> >>>>>>>>>> Co-developed-by: Joao Martins 
> >>>>>>>>>> Signed-off-by: Joao Martins 
> >>>>>>>>>> Signed-off-by: Si-Wei Liu 
> >>>>>>>>>>
> >>>>>>>>>> ---
> >>>>>>>>>> v3 -> v4:
> >>>>>>>>>>- add comment to clarify effect on cache locality and
> >>>>>>>>>>  performance
> >>>>>>>>>>
> >>>>>>>>>> v2 -> v3:
> >>>>>>>>>>- add after-fix benchmark to commit log
> >>>>>>>>>>- rename vhost_log_dev_enabled to vhost_dev_should_log
> >>>>>>>>>>

Re: [PATCH v4 2/2] vhost: Perform memory section dirty scans once per iteration

2024-03-21 Thread Jason Wang
On Fri, Mar 22, 2024 at 5:43 AM Si-Wei Liu  wrote:
>
>
>
> On 3/20/2024 8:56 PM, Jason Wang wrote:
> > On Thu, Mar 21, 2024 at 5:03 AM Si-Wei Liu  wrote:
> >>
> >>
> >> On 3/19/2024 8:27 PM, Jason Wang wrote:
> >>> On Tue, Mar 19, 2024 at 6:16 AM Si-Wei Liu  wrote:
> >>>>
> >>>> On 3/17/2024 8:22 PM, Jason Wang wrote:
> >>>>> On Sat, Mar 16, 2024 at 2:45 AM Si-Wei Liu  
> >>>>> wrote:
> >>>>>> On 3/14/2024 9:03 PM, Jason Wang wrote:
> >>>>>>> On Fri, Mar 15, 2024 at 5:39 AM Si-Wei Liu  
> >>>>>>> wrote:
> >>>>>>>> On setups with one or more virtio-net devices with vhost on,
> >>>>>>>> dirty tracking iteration increases cost the bigger the number
> >>>>>>>> amount of queues are set up e.g. on idle guests migration the
> >>>>>>>> following is observed with virtio-net with vhost=on:
> >>>>>>>>
> >>>>>>>> 48 queues -> 78.11%  [.] vhost_dev_sync_region.isra.13
> >>>>>>>> 8 queues -> 40.50%   [.] vhost_dev_sync_region.isra.13
> >>>>>>>> 1 queue -> 6.89% [.] vhost_dev_sync_region.isra.13
> >>>>>>>> 2 devices, 1 queue -> 18.60%  [.] vhost_dev_sync_region.isra.14
> >>>>>>>>
> >>>>>>>> With high memory rates the symptom is lack of convergence as soon
> >>>>>>>> as it has a vhost device with a sufficiently high number of queues,
> >>>>>>>> the sufficient number of vhost devices.
> >>>>>>>>
> >>>>>>>> On every migration iteration (every 100msecs) it will redundantly
> >>>>>>>> query the *shared log* the number of queues configured with vhost
> >>>>>>>> that exist in the guest. For the virtqueue data, this is necessary,
> >>>>>>>> but not for the memory sections which are the same. So essentially
> >>>>>>>> we end up scanning the dirty log too often.
> >>>>>>>>
> >>>>>>>> To fix that, select a vhost device responsible for scanning the
> >>>>>>>> log with regards to memory sections dirty tracking. It is selected
> >>>>>>>> when we enable the logger (during migration) and cleared when we
> >>>>>>>> disable the logger. If the vhost logger device goes away for some
> >>>>>>>> reason, the logger will be re-selected from the rest of vhost
> >>>>>>>> devices.
> >>>>>>>>
> >>>>>>>> After making mem-section logger a singleton instance, constant cost
> >>>>>>>> of 7%-9% (like the 1 queue report) will be seen, no matter how many
> >>>>>>>> queues or how many vhost devices are configured:
> >>>>>>>>
> >>>>>>>> 48 queues -> 8.71%[.] vhost_dev_sync_region.isra.13
> >>>>>>>> 2 devices, 8 queues -> 7.97%   [.] vhost_dev_sync_region.isra.14
> >>>>>>>>
> >>>>>>>> Co-developed-by: Joao Martins 
> >>>>>>>> Signed-off-by: Joao Martins 
> >>>>>>>> Signed-off-by: Si-Wei Liu 
> >>>>>>>>
> >>>>>>>> ---
> >>>>>>>> v3 -> v4:
> >>>>>>>>   - add comment to clarify effect on cache locality and
> >>>>>>>> performance
> >>>>>>>>
> >>>>>>>> v2 -> v3:
> >>>>>>>>   - add after-fix benchmark to commit log
> >>>>>>>>   - rename vhost_log_dev_enabled to vhost_dev_should_log
> >>>>>>>>   - remove unneeded comparisons for backend_type
> >>>>>>>>   - use QLIST array instead of single flat list to store vhost
> >>>>>>>> logger devices
> >>>>>>>>   - simplify logger election logic
> >>>>>>>> ---
> >>>>>>>>  hw/virtio/vhost.c | 67 
> >>>>>>>> ++-
> >>>>>>>>  include/hw/virtio/vhost.h |  1 +
> >>>>

Re: [PATCH v4 1/2] vhost: dirty log should be per backend type

2024-03-20 Thread Jason Wang
On Thu, Mar 21, 2024 at 4:29 AM Si-Wei Liu  wrote:
>
>
>
> On 3/19/2024 8:25 PM, Jason Wang wrote:
> > On Tue, Mar 19, 2024 at 6:06 AM Si-Wei Liu  wrote:
> >>
> >>
> >> On 3/17/2024 8:20 PM, Jason Wang wrote:
> >>> On Sat, Mar 16, 2024 at 2:33 AM Si-Wei Liu  wrote:
> >>>>
> >>>> On 3/14/2024 8:50 PM, Jason Wang wrote:
> >>>>> On Fri, Mar 15, 2024 at 5:39 AM Si-Wei Liu  
> >>>>> wrote:
> >>>>>> There could be a mix of both vhost-user and vhost-kernel clients
> >>>>>> in the same QEMU process, where separate vhost loggers for the
> >>>>>> specific vhost type have to be used. Make the vhost logger per
> >>>>>> backend type, and have them properly reference counted.
> >>>>> It's better to describe what's the advantage of doing this.
> >>>> Yes, I can add that to the log. Although it's a niche use case, it was
> >>>> actually a long standing limitation / bug that vhost-user and
> >>>> vhost-kernel loggers can't co-exist per QEMU process, but today it's
> >>>> just silent failure that may be ended up with. This bug fix removes that
> >>>> implicit limitation in the code.
> >>> Ok.
> >>>
> >>>>>> Suggested-by: Michael S. Tsirkin 
> >>>>>> Signed-off-by: Si-Wei Liu 
> >>>>>>
> >>>>>> ---
> >>>>>> v3->v4:
> >>>>>>  - remove checking NULL return value from vhost_log_get
> >>>>>>
> >>>>>> v2->v3:
> >>>>>>  - remove non-effective assertion that never be reached
> >>>>>>  - do not return NULL from vhost_log_get()
> >>>>>>  - add neccessary assertions to vhost_log_get()
> >>>>>> ---
> >>>>>> hw/virtio/vhost.c | 45 
> >>>>>> +
> >>>>>> 1 file changed, 33 insertions(+), 12 deletions(-)
> >>>>>>
> >>>>>> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> >>>>>> index 2c9ac79..612f4db 100644
> >>>>>> --- a/hw/virtio/vhost.c
> >>>>>> +++ b/hw/virtio/vhost.c
> >>>>>> @@ -43,8 +43,8 @@
> >>>>>> do { } while (0)
> >>>>>> #endif
> >>>>>>
> >>>>>> -static struct vhost_log *vhost_log;
> >>>>>> -static struct vhost_log *vhost_log_shm;
> >>>>>> +static struct vhost_log *vhost_log[VHOST_BACKEND_TYPE_MAX];
> >>>>>> +static struct vhost_log *vhost_log_shm[VHOST_BACKEND_TYPE_MAX];
> >>>>>>
> >>>>>> /* Memslots used by backends that support private memslots 
> >>>>>> (without an fd). */
> >>>>>> static unsigned int used_memslots;
> >>>>>> @@ -287,6 +287,10 @@ static int vhost_set_backend_type(struct 
> >>>>>> vhost_dev *dev,
> >>>>>> r = -1;
> >>>>>> }
> >>>>>>
> >>>>>> +if (r == 0) {
> >>>>>> +assert(dev->vhost_ops->backend_type == backend_type);
> >>>>>> +}
> >>>>>> +
> >>>>> Under which condition could we hit this?
> >>>> Just in case some other function inadvertently corrupted this earlier,
> >>>> we have to capture discrepancy in the first place... On the other hand,
> >>>> it will be helpful for other vhost backend writers to diagnose day-one
> >>>> bug in the code. I feel just code comment here will not be
> >>>> sufficient/helpful.
> >>> See below.
> >>>
> >>>>> It seems not good to assert a local logic.
> >>>> It seems to me quite a few local asserts are in the same file already,
> >>>> vhost_save_backend_state,
> >>> For example it has assert for
> >>>
> >>> assert(!dev->started);
> >>>
> >>> which is not the logic of the function itself but require
> >>> vhost_dev_start() not to be called before.
> >>>
> >>> But it looks like this patch you assert the code just a few lines
> >>> above the assert itself?
> >> Yes, that was the intent - for e.g. 

Re: [PATCH v4 2/2] vhost: Perform memory section dirty scans once per iteration

2024-03-20 Thread Jason Wang
On Thu, Mar 21, 2024 at 5:03 AM Si-Wei Liu  wrote:
>
>
>
> On 3/19/2024 8:27 PM, Jason Wang wrote:
> > On Tue, Mar 19, 2024 at 6:16 AM Si-Wei Liu  wrote:
> >>
> >>
> >> On 3/17/2024 8:22 PM, Jason Wang wrote:
> >>> On Sat, Mar 16, 2024 at 2:45 AM Si-Wei Liu  wrote:
> >>>>
> >>>> On 3/14/2024 9:03 PM, Jason Wang wrote:
> >>>>> On Fri, Mar 15, 2024 at 5:39 AM Si-Wei Liu  
> >>>>> wrote:
> >>>>>> On setups with one or more virtio-net devices with vhost on,
> >>>>>> dirty tracking iteration increases cost the bigger the number
> >>>>>> amount of queues are set up e.g. on idle guests migration the
> >>>>>> following is observed with virtio-net with vhost=on:
> >>>>>>
> >>>>>> 48 queues -> 78.11%  [.] vhost_dev_sync_region.isra.13
> >>>>>> 8 queues -> 40.50%   [.] vhost_dev_sync_region.isra.13
> >>>>>> 1 queue -> 6.89% [.] vhost_dev_sync_region.isra.13
> >>>>>> 2 devices, 1 queue -> 18.60%  [.] vhost_dev_sync_region.isra.14
> >>>>>>
> >>>>>> With high memory rates the symptom is lack of convergence as soon
> >>>>>> as it has a vhost device with a sufficiently high number of queues,
> >>>>>> the sufficient number of vhost devices.
> >>>>>>
> >>>>>> On every migration iteration (every 100msecs) it will redundantly
> >>>>>> query the *shared log* the number of queues configured with vhost
> >>>>>> that exist in the guest. For the virtqueue data, this is necessary,
> >>>>>> but not for the memory sections which are the same. So essentially
> >>>>>> we end up scanning the dirty log too often.
> >>>>>>
> >>>>>> To fix that, select a vhost device responsible for scanning the
> >>>>>> log with regards to memory sections dirty tracking. It is selected
> >>>>>> when we enable the logger (during migration) and cleared when we
> >>>>>> disable the logger. If the vhost logger device goes away for some
> >>>>>> reason, the logger will be re-selected from the rest of vhost
> >>>>>> devices.
> >>>>>>
> >>>>>> After making mem-section logger a singleton instance, constant cost
> >>>>>> of 7%-9% (like the 1 queue report) will be seen, no matter how many
> >>>>>> queues or how many vhost devices are configured:
> >>>>>>
> >>>>>> 48 queues -> 8.71%[.] vhost_dev_sync_region.isra.13
> >>>>>> 2 devices, 8 queues -> 7.97%   [.] vhost_dev_sync_region.isra.14
> >>>>>>
> >>>>>> Co-developed-by: Joao Martins 
> >>>>>> Signed-off-by: Joao Martins 
> >>>>>> Signed-off-by: Si-Wei Liu 
> >>>>>>
> >>>>>> ---
> >>>>>> v3 -> v4:
> >>>>>>  - add comment to clarify effect on cache locality and
> >>>>>>performance
> >>>>>>
> >>>>>> v2 -> v3:
> >>>>>>  - add after-fix benchmark to commit log
> >>>>>>  - rename vhost_log_dev_enabled to vhost_dev_should_log
> >>>>>>  - remove unneeded comparisons for backend_type
> >>>>>>  - use QLIST array instead of single flat list to store vhost
> >>>>>>logger devices
> >>>>>>  - simplify logger election logic
> >>>>>> ---
> >>>>>> hw/virtio/vhost.c | 67 
> >>>>>> ++-
> >>>>>> include/hw/virtio/vhost.h |  1 +
> >>>>>> 2 files changed, 62 insertions(+), 6 deletions(-)
> >>>>>>
> >>>>>> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> >>>>>> index 612f4db..58522f1 100644
> >>>>>> --- a/hw/virtio/vhost.c
> >>>>>> +++ b/hw/virtio/vhost.c
> >>>>>> @@ -45,6 +45,7 @@
> >>>>>>
> >>>>>> static struct vhost_log *vhost_log[VHOST_BACKEND_TYPE_MAX];
> >>>>>> static struct vhost_log *vhost_log_shm[VHOST_BACKEND_TYPE_MAX];

Re: [PATCH] vhost-vdpa: check vhost_vdpa_set_vring_ready() return value

2024-03-19 Thread Jason Wang
On Mon, Mar 18, 2024 at 4:27 PM Stefano Garzarella  wrote:
>
> On Mon, Mar 18, 2024 at 12:31:59PM +0800, Jason Wang wrote:
> >On Fri, Mar 15, 2024 at 4:23 PM Stefano Garzarella  
> >wrote:
> >>
> >> On Thu, Mar 14, 2024 at 11:17:01AM +0800, Jason Wang wrote:
> >> >On Wed, Feb 7, 2024 at 5:27 PM Stefano Garzarella  
> >> >wrote:
> >> >>
> >> >> vhost_vdpa_set_vring_ready() could already fail, but if Linux's
> >> >> patch [1] will be merged, it may fail with more chance if
> >> >> userspace does not activate virtqueues before DRIVER_OK when
> >> >> VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK is not negotiated.
> >> >
> >> >I wonder what happens if we just leave it as is.
> >>
> >> Are you referring to this patch or the kernel patch?
> >
> >This patch.
> >
> >>
> >> Here I'm just checking the return value of vhost_vdpa_set_vring_ready().
> >> It can return an error also without that kernel patch, so IMHO is
> >> better to check the return value here in QEMU.
> >>
> >> What issue do you see with this patch applied?
> >
> >For the parent which can enable after driver_ok but not advertise it.
>
> But this patch is not changing anything in that sense, it just controls
> the return value of the VHOST_VDPA_SET_VRING_ENABLE ioctl.
>
> Why would QEMU ignore an error if it can't activate vrings?
> If we really want to ignore it we should document it both in QEMU, but
> also in the kernel, because honestly the way the code is now it
> shouldn't fail from what I understand.
>
> That said, even if we ignore it, IMHO we should at least print a warning
> in QEMU.

Right.

>
> >
> >(To say the truth, I'm not sure if we need to care about this)
>
> I agree on that, but this is related to the patch in the kernel, not
.> this simple patch to fix QEMU error path, right?

Or it's the charge of the Qemu vDPA layer to avoid calling
set_vq_ready() after driver_ok if no
VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK. Or it might be too late.

>
> >
> >>
> >> >
> >> >VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK: We do know enabling could be
> >> >done after driver_ok.
> >> >Without VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK: We don't know whether
> >> >enabling could be done after driver_ok or not.
> >>
> >> I see your point, indeed I didn't send a v2 of that patch.
> >> Maybe we should document that, because it could be interpreted that if
> >> VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK is not negotiated the enabling
> >> should always be done before driver_ok (which is true for example in
> >> VDUSE).
> >
> >I see, so I think we probably need the fix.
> >
> >>
> >> BTW I think we should discuss it in the kernel patch.
> >>
> >> Thanks,
> >> Stefano
> >>
> >> >
> >> >Thanks
> >> >
> >> >>
> >> >> So better check its return value anyway.
> >> >>
> >> >> [1] 
> >> >> https://lore.kernel.org/virtualization/20240206145154.118044-1-sgarz...@redhat.com/T/#u
> >> >>
> >> >> Signed-off-by: Stefano Garzarella 
> >> >> ---
> >> >> Note: This patch conflicts with [2], but the resolution is simple,
> >> >> so for now I sent a patch for the current master, but I'll rebase
> >> >> this patch if we merge the other one first.
> >
> >Will go through [2].
>
> Here I meant that the conflict is only in the code touched, because
> Kevin's patch remove/move some of the code touched by this patch.
> And rightly he checked the return value of the ioctl as I would like to
> do in the other places where we call the same ioctl.
>
> So honestly I still don't understand what's wrong with this patch...

Nothing wrong now.

Acked-by: Jason Wang 

Thanks

>
> Thanks,
> Stefano
>
> >
> >Thanks
> >
> >
> >> >>
> >> >> [2]
> >> >> https://lore.kernel.org/qemu-devel/20240202132521.32714-1-kw...@redhat.com/
> >> >> ---
> >> >>  hw/virtio/vdpa-dev.c |  8 +++-
> >> >>  net/vhost-vdpa.c | 15 ---
> >> >>  2 files changed, 19 insertions(+), 4 deletions(-)
> >> >>
> >> >> diff --git a/hw/virtio/vdpa-dev.c b/hw/virtio/vdpa-dev.c
> >> >> index eb9ecea83b..d57cd76c18 100644
> >> >> --- a/hw/virtio/vdpa-dev.c
> >> >> +++ b/hw/

Re: Pending network patches

2024-03-19 Thread Jason Wang
On Wed, Mar 20, 2024 at 11:33 AM Akihiko Odaki  wrote:
>
> Hi Jason,
>
> I have this and a few other network-related patches not reviewed. Can
> you review them?
> I have the following patches ready for review:
>
> https://patchew.org/QEMU/20240212-tap-v2-1-94e2ee18b...@daynix.com/
> ("[PATCH v2] tap-win32: Remove unnecessary stubs")
>
> https://patchew.org/QEMU/20230921094851.36295-1-akihiko.od...@daynix.com/
> ("[PATCH v2] Revert "tap: setting error appropriately when calling
> net_init_tap_one()"")
>
> https://patchew.org/QEMU/20231219-glib-v1-0-1b040d286...@daynix.com/
> ("[PATCH 0/2] tap: Use g_spawn_sync() and g_spawn_check_wait_status()")
>
> Regards,
> Akihiko Odaki

Will do.

Thanks

>




Re: [PATCH v4 2/2] vhost: Perform memory section dirty scans once per iteration

2024-03-19 Thread Jason Wang
On Tue, Mar 19, 2024 at 6:16 AM Si-Wei Liu  wrote:
>
>
>
> On 3/17/2024 8:22 PM, Jason Wang wrote:
> > On Sat, Mar 16, 2024 at 2:45 AM Si-Wei Liu  wrote:
> >>
> >>
> >> On 3/14/2024 9:03 PM, Jason Wang wrote:
> >>> On Fri, Mar 15, 2024 at 5:39 AM Si-Wei Liu  wrote:
> >>>> On setups with one or more virtio-net devices with vhost on,
> >>>> dirty tracking iteration increases cost the bigger the number
> >>>> amount of queues are set up e.g. on idle guests migration the
> >>>> following is observed with virtio-net with vhost=on:
> >>>>
> >>>> 48 queues -> 78.11%  [.] vhost_dev_sync_region.isra.13
> >>>> 8 queues -> 40.50%   [.] vhost_dev_sync_region.isra.13
> >>>> 1 queue -> 6.89% [.] vhost_dev_sync_region.isra.13
> >>>> 2 devices, 1 queue -> 18.60%  [.] vhost_dev_sync_region.isra.14
> >>>>
> >>>> With high memory rates the symptom is lack of convergence as soon
> >>>> as it has a vhost device with a sufficiently high number of queues,
> >>>> the sufficient number of vhost devices.
> >>>>
> >>>> On every migration iteration (every 100msecs) it will redundantly
> >>>> query the *shared log* the number of queues configured with vhost
> >>>> that exist in the guest. For the virtqueue data, this is necessary,
> >>>> but not for the memory sections which are the same. So essentially
> >>>> we end up scanning the dirty log too often.
> >>>>
> >>>> To fix that, select a vhost device responsible for scanning the
> >>>> log with regards to memory sections dirty tracking. It is selected
> >>>> when we enable the logger (during migration) and cleared when we
> >>>> disable the logger. If the vhost logger device goes away for some
> >>>> reason, the logger will be re-selected from the rest of vhost
> >>>> devices.
> >>>>
> >>>> After making mem-section logger a singleton instance, constant cost
> >>>> of 7%-9% (like the 1 queue report) will be seen, no matter how many
> >>>> queues or how many vhost devices are configured:
> >>>>
> >>>> 48 queues -> 8.71%[.] vhost_dev_sync_region.isra.13
> >>>> 2 devices, 8 queues -> 7.97%   [.] vhost_dev_sync_region.isra.14
> >>>>
> >>>> Co-developed-by: Joao Martins 
> >>>> Signed-off-by: Joao Martins 
> >>>> Signed-off-by: Si-Wei Liu 
> >>>>
> >>>> ---
> >>>> v3 -> v4:
> >>>> - add comment to clarify effect on cache locality and
> >>>>   performance
> >>>>
> >>>> v2 -> v3:
> >>>> - add after-fix benchmark to commit log
> >>>> - rename vhost_log_dev_enabled to vhost_dev_should_log
> >>>> - remove unneeded comparisons for backend_type
> >>>> - use QLIST array instead of single flat list to store vhost
> >>>>   logger devices
> >>>> - simplify logger election logic
> >>>> ---
> >>>>hw/virtio/vhost.c | 67 
> >>>> ++-
> >>>>include/hw/virtio/vhost.h |  1 +
> >>>>2 files changed, 62 insertions(+), 6 deletions(-)
> >>>>
> >>>> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> >>>> index 612f4db..58522f1 100644
> >>>> --- a/hw/virtio/vhost.c
> >>>> +++ b/hw/virtio/vhost.c
> >>>> @@ -45,6 +45,7 @@
> >>>>
> >>>>static struct vhost_log *vhost_log[VHOST_BACKEND_TYPE_MAX];
> >>>>static struct vhost_log *vhost_log_shm[VHOST_BACKEND_TYPE_MAX];
> >>>> +static QLIST_HEAD(, vhost_dev) vhost_log_devs[VHOST_BACKEND_TYPE_MAX];
> >>>>
> >>>>/* Memslots used by backends that support private memslots (without 
> >>>> an fd). */
> >>>>static unsigned int used_memslots;
> >>>> @@ -149,6 +150,47 @@ bool vhost_dev_has_iommu(struct vhost_dev *dev)
> >>>>}
> >>>>}
> >>>>
> >>>> +static inline bool vhost_dev_should_log(struct vhost_dev *dev)
> >>>> +{
> >>>> +assert(dev->vhost_ops);
> >>>> +assert(dev->vhost_ops->backend_type > VHOST_BAC

Re: [PATCH v4 1/2] vhost: dirty log should be per backend type

2024-03-19 Thread Jason Wang
On Tue, Mar 19, 2024 at 6:06 AM Si-Wei Liu  wrote:
>
>
>
> On 3/17/2024 8:20 PM, Jason Wang wrote:
> > On Sat, Mar 16, 2024 at 2:33 AM Si-Wei Liu  wrote:
> >>
> >>
> >> On 3/14/2024 8:50 PM, Jason Wang wrote:
> >>> On Fri, Mar 15, 2024 at 5:39 AM Si-Wei Liu  wrote:
> >>>> There could be a mix of both vhost-user and vhost-kernel clients
> >>>> in the same QEMU process, where separate vhost loggers for the
> >>>> specific vhost type have to be used. Make the vhost logger per
> >>>> backend type, and have them properly reference counted.
> >>> It's better to describe what's the advantage of doing this.
> >> Yes, I can add that to the log. Although it's a niche use case, it was
> >> actually a long standing limitation / bug that vhost-user and
> >> vhost-kernel loggers can't co-exist per QEMU process, but today it's
> >> just silent failure that may be ended up with. This bug fix removes that
> >> implicit limitation in the code.
> > Ok.
> >
> >>>> Suggested-by: Michael S. Tsirkin 
> >>>> Signed-off-by: Si-Wei Liu 
> >>>>
> >>>> ---
> >>>> v3->v4:
> >>>> - remove checking NULL return value from vhost_log_get
> >>>>
> >>>> v2->v3:
> >>>> - remove non-effective assertion that never be reached
> >>>> - do not return NULL from vhost_log_get()
> >>>> - add neccessary assertions to vhost_log_get()
> >>>> ---
> >>>>hw/virtio/vhost.c | 45 +
> >>>>1 file changed, 33 insertions(+), 12 deletions(-)
> >>>>
> >>>> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> >>>> index 2c9ac79..612f4db 100644
> >>>> --- a/hw/virtio/vhost.c
> >>>> +++ b/hw/virtio/vhost.c
> >>>> @@ -43,8 +43,8 @@
> >>>>do { } while (0)
> >>>>#endif
> >>>>
> >>>> -static struct vhost_log *vhost_log;
> >>>> -static struct vhost_log *vhost_log_shm;
> >>>> +static struct vhost_log *vhost_log[VHOST_BACKEND_TYPE_MAX];
> >>>> +static struct vhost_log *vhost_log_shm[VHOST_BACKEND_TYPE_MAX];
> >>>>
> >>>>/* Memslots used by backends that support private memslots (without 
> >>>> an fd). */
> >>>>static unsigned int used_memslots;
> >>>> @@ -287,6 +287,10 @@ static int vhost_set_backend_type(struct vhost_dev 
> >>>> *dev,
> >>>>r = -1;
> >>>>}
> >>>>
> >>>> +if (r == 0) {
> >>>> +assert(dev->vhost_ops->backend_type == backend_type);
> >>>> +}
> >>>> +
> >>> Under which condition could we hit this?
> >> Just in case some other function inadvertently corrupted this earlier,
> >> we have to capture discrepancy in the first place... On the other hand,
> >> it will be helpful for other vhost backend writers to diagnose day-one
> >> bug in the code. I feel just code comment here will not be
> >> sufficient/helpful.
> > See below.
> >
> >>>It seems not good to assert a local logic.
> >> It seems to me quite a few local asserts are in the same file already,
> >> vhost_save_backend_state,
> > For example it has assert for
> >
> > assert(!dev->started);
> >
> > which is not the logic of the function itself but require
> > vhost_dev_start() not to be called before.
> >
> > But it looks like this patch you assert the code just a few lines
> > above the assert itself?
> Yes, that was the intent - for e.g. xxx_ops may contain corrupted
> xxx_ops.backend_type already before coming to this
> vhost_set_backend_type() function. And we may capture this corrupted
> state by asserting the expected xxx_ops.backend_type (to be consistent
> with the backend_type passed in),

This can happen for all variables. Not sure why backend_ops is special.

> which needs be done in the first place
> when this discrepancy is detected. In practice I think there should be
> no harm to add this assert, but this will add warranted guarantee to the
> current code.

For example, such corruption can happen after the assert() so a TOCTOU issue.

Thanks

>
> Regards,
> -Siwei
>
> >
> > dev->vhost_ops = _ops;
> >
> > ...
> >
> > assert(dev->vhost_ops->backend_type == backend_type)
> >
> > ?
> >
> > Thanks
> >
> >> vhost_load_backend_state,
> >> vhost_virtqueue_mask, vhost_config_mask, just to name a few. Why local
> >> assert a problem?
> >>
> >> Thanks,
> >> -Siwei
> >>
> >>> Thanks
> >>>
>




Re: [PATCH] vhost-vdpa: check vhost_vdpa_set_vring_ready() return value

2024-03-17 Thread Jason Wang
On Fri, Mar 15, 2024 at 4:23 PM Stefano Garzarella  wrote:
>
> On Thu, Mar 14, 2024 at 11:17:01AM +0800, Jason Wang wrote:
> >On Wed, Feb 7, 2024 at 5:27 PM Stefano Garzarella  
> >wrote:
> >>
> >> vhost_vdpa_set_vring_ready() could already fail, but if Linux's
> >> patch [1] will be merged, it may fail with more chance if
> >> userspace does not activate virtqueues before DRIVER_OK when
> >> VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK is not negotiated.
> >
> >I wonder what happens if we just leave it as is.
>
> Are you referring to this patch or the kernel patch?

This patch.

>
> Here I'm just checking the return value of vhost_vdpa_set_vring_ready().
> It can return an error also without that kernel patch, so IMHO is
> better to check the return value here in QEMU.
>
> What issue do you see with this patch applied?

For the parent which can enable after driver_ok but not advertise it.

(To say the truth, I'm not sure if we need to care about this)

>
> >
> >VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK: We do know enabling could be
> >done after driver_ok.
> >Without VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK: We don't know whether
> >enabling could be done after driver_ok or not.
>
> I see your point, indeed I didn't send a v2 of that patch.
> Maybe we should document that, because it could be interpreted that if
> VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK is not negotiated the enabling
> should always be done before driver_ok (which is true for example in
> VDUSE).

I see, so I think we probably need the fix.

>
> BTW I think we should discuss it in the kernel patch.
>
> Thanks,
> Stefano
>
> >
> >Thanks
> >
> >>
> >> So better check its return value anyway.
> >>
> >> [1] 
> >> https://lore.kernel.org/virtualization/20240206145154.118044-1-sgarz...@redhat.com/T/#u
> >>
> >> Signed-off-by: Stefano Garzarella 
> >> ---
> >> Note: This patch conflicts with [2], but the resolution is simple,
> >> so for now I sent a patch for the current master, but I'll rebase
> >> this patch if we merge the other one first.

Will go through [2].

Thanks


> >>
> >> [2]
> >> https://lore.kernel.org/qemu-devel/20240202132521.32714-1-kw...@redhat.com/
> >> ---
> >>  hw/virtio/vdpa-dev.c |  8 +++-
> >>  net/vhost-vdpa.c | 15 ---
> >>  2 files changed, 19 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/hw/virtio/vdpa-dev.c b/hw/virtio/vdpa-dev.c
> >> index eb9ecea83b..d57cd76c18 100644
> >> --- a/hw/virtio/vdpa-dev.c
> >> +++ b/hw/virtio/vdpa-dev.c
> >> @@ -259,7 +259,11 @@ static int vhost_vdpa_device_start(VirtIODevice 
> >> *vdev, Error **errp)
> >>  goto err_guest_notifiers;
> >>  }
> >>  for (i = 0; i < s->dev.nvqs; ++i) {
> >> -vhost_vdpa_set_vring_ready(>vdpa, i);
> >> +ret = vhost_vdpa_set_vring_ready(>vdpa, i);
> >> +if (ret < 0) {
> >> +error_setg_errno(errp, -ret, "Error starting vring %d", i);
> >> +goto err_dev_stop;
> >> +}
> >>  }
> >>  s->started = true;
> >>
> >> @@ -274,6 +278,8 @@ static int vhost_vdpa_device_start(VirtIODevice *vdev, 
> >> Error **errp)
> >>
> >>  return ret;
> >>
> >> +err_dev_stop:
> >> +vhost_dev_stop(>dev, vdev, false);
> >>  err_guest_notifiers:
> >>  k->set_guest_notifiers(qbus->parent, s->dev.nvqs, false);
> >>  err_host_notifiers:
> >> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> >> index 3726ee5d67..e3d8036479 100644
> >> --- a/net/vhost-vdpa.c
> >> +++ b/net/vhost-vdpa.c
> >> @@ -381,7 +381,10 @@ static int vhost_vdpa_net_data_load(NetClientState 
> >> *nc)
> >>  }
> >>
> >>  for (int i = 0; i < v->dev->nvqs; ++i) {
> >> -vhost_vdpa_set_vring_ready(v, i + v->dev->vq_index);
> >> +int ret = vhost_vdpa_set_vring_ready(v, i + v->dev->vq_index);
> >> +if (ret < 0) {
> >> +return ret;
> >> +}
> >>  }
> >>  return 0;
> >>  }
> >> @@ -1213,7 +1216,10 @@ static int vhost_vdpa_net_cvq_load(NetClientState 
> >> *nc)
> >>
> >>  assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
> >>
> >> -vhost_vdpa_set_vring_ready(v, v->dev->vq_index);
> >> +r = vhost_vdpa_set_vring_ready(v, v->dev->vq_index);
> >> +if (unlikely(r < 0)) {
> >> +return r;
> >> +}
> >>
> >>  if (v->shadow_vqs_enabled) {
> >>  n = VIRTIO_NET(v->dev->vdev);
> >> @@ -1252,7 +1258,10 @@ static int vhost_vdpa_net_cvq_load(NetClientState 
> >> *nc)
> >>  }
> >>
> >>  for (int i = 0; i < v->dev->vq_index; ++i) {
> >> -vhost_vdpa_set_vring_ready(v, i);
> >> +r = vhost_vdpa_set_vring_ready(v, i);
> >> +if (unlikely(r < 0)) {
> >> +return r;
> >> +}
> >>  }
> >>
> >>  return 0;
> >> --
> >> 2.43.0
> >>
> >
>




Re: [PATCH for-9.0 v3] vdpa-dev: Fix initialisation order to restore VDUSE compatibility

2024-03-17 Thread Jason Wang
On Fri, Mar 15, 2024 at 11:59 PM Kevin Wolf  wrote:
>
> VDUSE requires that virtqueues are first enabled before the DRIVER_OK
> status flag is set; with the current API of the kernel module, it is
> impossible to enable the opposite order in our block export code because
> userspace is not notified when a virtqueue is enabled.
>
> This requirement also mathces the normal initialisation order as done by
> the generic vhost code in QEMU. However, commit 6c482547 accidentally
> changed the order for vdpa-dev and broke access to VDUSE devices with
> this.
>
> This changes vdpa-dev to use the normal order again and use the standard
> vhost callback .vhost_set_vring_enable for this. VDUSE devices can be
> used with vdpa-dev again after this fix.
>
> vhost_net intentionally avoided enabling the vrings for vdpa and does
> this manually later while it does enable them for other vhost backends.
> Reflect this in the vhost_net code and return early for vdpa, so that
> the behaviour doesn't change for this device.
>
> Cc: qemu-sta...@nongnu.org
> Fixes: 6c4825476a4351530bcac17abab72295b75ffe98
> Signed-off-by: Kevin Wolf 
> ---
> v2:
> - Actually make use of the @enable parameter
> - Change vhost_net to preserve the current behaviour
>
> v3:
> - Updated trace point [Stefano]
> - Fixed typo in comment [Stefano]
>
>  hw/net/vhost_net.c | 10 ++
>  hw/virtio/vdpa-dev.c   |  5 +
>  hw/virtio/vhost-vdpa.c | 29 ++---
>  hw/virtio/vhost.c  |  8 +++-
>  hw/virtio/trace-events |  2 +-
>  5 files changed, 45 insertions(+), 9 deletions(-)
>
> diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
> index e8e1661646..fd1a93701a 100644
> --- a/hw/net/vhost_net.c
> +++ b/hw/net/vhost_net.c
> @@ -541,6 +541,16 @@ int vhost_set_vring_enable(NetClientState *nc, int 
> enable)
>  VHostNetState *net = get_vhost_net(nc);
>  const VhostOps *vhost_ops = net->dev.vhost_ops;
>
> +/*
> + * vhost-vdpa network devices need to enable dataplane virtqueues after
> + * DRIVER_OK, so they can recover device state before starting dataplane.
> + * Because of that, we don't enable virtqueues here and leave it to
> + * net/vhost-vdpa.c.
> + */
> +if (nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
> +return 0;
> +}

I think we need some inputs from Eugenio, this is only needed for
shadow virtqueue during live migration but not other cases.

Thanks




Re: [PATCH v4 2/2] vhost: Perform memory section dirty scans once per iteration

2024-03-17 Thread Jason Wang
On Sat, Mar 16, 2024 at 2:45 AM Si-Wei Liu  wrote:
>
>
>
> On 3/14/2024 9:03 PM, Jason Wang wrote:
> > On Fri, Mar 15, 2024 at 5:39 AM Si-Wei Liu  wrote:
> >> On setups with one or more virtio-net devices with vhost on,
> >> dirty tracking iteration increases cost the bigger the number
> >> amount of queues are set up e.g. on idle guests migration the
> >> following is observed with virtio-net with vhost=on:
> >>
> >> 48 queues -> 78.11%  [.] vhost_dev_sync_region.isra.13
> >> 8 queues -> 40.50%   [.] vhost_dev_sync_region.isra.13
> >> 1 queue -> 6.89% [.] vhost_dev_sync_region.isra.13
> >> 2 devices, 1 queue -> 18.60%  [.] vhost_dev_sync_region.isra.14
> >>
> >> With high memory rates the symptom is lack of convergence as soon
> >> as it has a vhost device with a sufficiently high number of queues,
> >> the sufficient number of vhost devices.
> >>
> >> On every migration iteration (every 100msecs) it will redundantly
> >> query the *shared log* the number of queues configured with vhost
> >> that exist in the guest. For the virtqueue data, this is necessary,
> >> but not for the memory sections which are the same. So essentially
> >> we end up scanning the dirty log too often.
> >>
> >> To fix that, select a vhost device responsible for scanning the
> >> log with regards to memory sections dirty tracking. It is selected
> >> when we enable the logger (during migration) and cleared when we
> >> disable the logger. If the vhost logger device goes away for some
> >> reason, the logger will be re-selected from the rest of vhost
> >> devices.
> >>
> >> After making mem-section logger a singleton instance, constant cost
> >> of 7%-9% (like the 1 queue report) will be seen, no matter how many
> >> queues or how many vhost devices are configured:
> >>
> >> 48 queues -> 8.71%[.] vhost_dev_sync_region.isra.13
> >> 2 devices, 8 queues -> 7.97%   [.] vhost_dev_sync_region.isra.14
> >>
> >> Co-developed-by: Joao Martins 
> >> Signed-off-by: Joao Martins 
> >> Signed-off-by: Si-Wei Liu 
> >>
> >> ---
> >> v3 -> v4:
> >>- add comment to clarify effect on cache locality and
> >>  performance
> >>
> >> v2 -> v3:
> >>- add after-fix benchmark to commit log
> >>- rename vhost_log_dev_enabled to vhost_dev_should_log
> >>- remove unneeded comparisons for backend_type
> >>- use QLIST array instead of single flat list to store vhost
> >>  logger devices
> >>- simplify logger election logic
> >> ---
> >>   hw/virtio/vhost.c | 67 
> >> ++-
> >>   include/hw/virtio/vhost.h |  1 +
> >>   2 files changed, 62 insertions(+), 6 deletions(-)
> >>
> >> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> >> index 612f4db..58522f1 100644
> >> --- a/hw/virtio/vhost.c
> >> +++ b/hw/virtio/vhost.c
> >> @@ -45,6 +45,7 @@
> >>
> >>   static struct vhost_log *vhost_log[VHOST_BACKEND_TYPE_MAX];
> >>   static struct vhost_log *vhost_log_shm[VHOST_BACKEND_TYPE_MAX];
> >> +static QLIST_HEAD(, vhost_dev) vhost_log_devs[VHOST_BACKEND_TYPE_MAX];
> >>
> >>   /* Memslots used by backends that support private memslots (without an 
> >> fd). */
> >>   static unsigned int used_memslots;
> >> @@ -149,6 +150,47 @@ bool vhost_dev_has_iommu(struct vhost_dev *dev)
> >>   }
> >>   }
> >>
> >> +static inline bool vhost_dev_should_log(struct vhost_dev *dev)
> >> +{
> >> +assert(dev->vhost_ops);
> >> +assert(dev->vhost_ops->backend_type > VHOST_BACKEND_TYPE_NONE);
> >> +assert(dev->vhost_ops->backend_type < VHOST_BACKEND_TYPE_MAX);
> >> +
> >> +return dev == 
> >> QLIST_FIRST(_log_devs[dev->vhost_ops->backend_type]);
> > A dumb question, why not simple check
> >
> > dev->log == vhost_log_shm[dev->vhost_ops->backend_type]
> Because we are not sure if the logger comes from vhost_log_shm[] or
> vhost_log[]. Don't want to complicate the check here by calling into
> vhost_dev_log_is_shared() everytime when the .log_sync() is called.

It has very low overhead, isn't it?

static bool vhost_dev_log_is_shared(struct vhost_dev *dev)
{
return dev->vhost_ops->vhost_requires_shm_log &&
   dev->vhost_ops->vhost_requires_shm_log(dev);
}

And it helps to simplify the logic.

Thanks

>
> -Siwei
> > ?
> >
> > Thanks
> >
>




Re: [PATCH v4 1/2] vhost: dirty log should be per backend type

2024-03-17 Thread Jason Wang
On Sat, Mar 16, 2024 at 2:33 AM Si-Wei Liu  wrote:
>
>
>
> On 3/14/2024 8:50 PM, Jason Wang wrote:
> > On Fri, Mar 15, 2024 at 5:39 AM Si-Wei Liu  wrote:
> >> There could be a mix of both vhost-user and vhost-kernel clients
> >> in the same QEMU process, where separate vhost loggers for the
> >> specific vhost type have to be used. Make the vhost logger per
> >> backend type, and have them properly reference counted.
> > It's better to describe what's the advantage of doing this.
> Yes, I can add that to the log. Although it's a niche use case, it was
> actually a long standing limitation / bug that vhost-user and
> vhost-kernel loggers can't co-exist per QEMU process, but today it's
> just silent failure that may be ended up with. This bug fix removes that
> implicit limitation in the code.

Ok.

> >
> >> Suggested-by: Michael S. Tsirkin 
> >> Signed-off-by: Si-Wei Liu 
> >>
> >> ---
> >> v3->v4:
> >>- remove checking NULL return value from vhost_log_get
> >>
> >> v2->v3:
> >>- remove non-effective assertion that never be reached
> >>- do not return NULL from vhost_log_get()
> >>- add neccessary assertions to vhost_log_get()
> >> ---
> >>   hw/virtio/vhost.c | 45 +
> >>   1 file changed, 33 insertions(+), 12 deletions(-)
> >>
> >> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> >> index 2c9ac79..612f4db 100644
> >> --- a/hw/virtio/vhost.c
> >> +++ b/hw/virtio/vhost.c
> >> @@ -43,8 +43,8 @@
> >>   do { } while (0)
> >>   #endif
> >>
> >> -static struct vhost_log *vhost_log;
> >> -static struct vhost_log *vhost_log_shm;
> >> +static struct vhost_log *vhost_log[VHOST_BACKEND_TYPE_MAX];
> >> +static struct vhost_log *vhost_log_shm[VHOST_BACKEND_TYPE_MAX];
> >>
> >>   /* Memslots used by backends that support private memslots (without an 
> >> fd). */
> >>   static unsigned int used_memslots;
> >> @@ -287,6 +287,10 @@ static int vhost_set_backend_type(struct vhost_dev 
> >> *dev,
> >>   r = -1;
> >>   }
> >>
> >> +if (r == 0) {
> >> +assert(dev->vhost_ops->backend_type == backend_type);
> >> +}
> >> +
> > Under which condition could we hit this?
> Just in case some other function inadvertently corrupted this earlier,
> we have to capture discrepancy in the first place... On the other hand,
> it will be helpful for other vhost backend writers to diagnose day-one
> bug in the code. I feel just code comment here will not be
> sufficient/helpful.

See below.

>
> >   It seems not good to assert a local logic.
> It seems to me quite a few local asserts are in the same file already,
> vhost_save_backend_state,

For example it has assert for

assert(!dev->started);

which is not the logic of the function itself but require
vhost_dev_start() not to be called before.

But it looks like this patch you assert the code just a few lines
above the assert itself?

dev->vhost_ops = _ops;

...

assert(dev->vhost_ops->backend_type == backend_type)

?

Thanks

> vhost_load_backend_state,
> vhost_virtqueue_mask, vhost_config_mask, just to name a few. Why local
> assert a problem?
>
> Thanks,
> -Siwei
>
> > Thanks
> >
>




Re: [PATCH v4 2/2] vhost: Perform memory section dirty scans once per iteration

2024-03-14 Thread Jason Wang
On Fri, Mar 15, 2024 at 5:39 AM Si-Wei Liu  wrote:
>
> On setups with one or more virtio-net devices with vhost on,
> dirty tracking iteration increases cost the bigger the number
> amount of queues are set up e.g. on idle guests migration the
> following is observed with virtio-net with vhost=on:
>
> 48 queues -> 78.11%  [.] vhost_dev_sync_region.isra.13
> 8 queues -> 40.50%   [.] vhost_dev_sync_region.isra.13
> 1 queue -> 6.89% [.] vhost_dev_sync_region.isra.13
> 2 devices, 1 queue -> 18.60%  [.] vhost_dev_sync_region.isra.14
>
> With high memory rates the symptom is lack of convergence as soon
> as it has a vhost device with a sufficiently high number of queues,
> the sufficient number of vhost devices.
>
> On every migration iteration (every 100msecs) it will redundantly
> query the *shared log* the number of queues configured with vhost
> that exist in the guest. For the virtqueue data, this is necessary,
> but not for the memory sections which are the same. So essentially
> we end up scanning the dirty log too often.
>
> To fix that, select a vhost device responsible for scanning the
> log with regards to memory sections dirty tracking. It is selected
> when we enable the logger (during migration) and cleared when we
> disable the logger. If the vhost logger device goes away for some
> reason, the logger will be re-selected from the rest of vhost
> devices.
>
> After making mem-section logger a singleton instance, constant cost
> of 7%-9% (like the 1 queue report) will be seen, no matter how many
> queues or how many vhost devices are configured:
>
> 48 queues -> 8.71%[.] vhost_dev_sync_region.isra.13
> 2 devices, 8 queues -> 7.97%   [.] vhost_dev_sync_region.isra.14
>
> Co-developed-by: Joao Martins 
> Signed-off-by: Joao Martins 
> Signed-off-by: Si-Wei Liu 
>
> ---
> v3 -> v4:
>   - add comment to clarify effect on cache locality and
> performance
>
> v2 -> v3:
>   - add after-fix benchmark to commit log
>   - rename vhost_log_dev_enabled to vhost_dev_should_log
>   - remove unneeded comparisons for backend_type
>   - use QLIST array instead of single flat list to store vhost
> logger devices
>   - simplify logger election logic
> ---
>  hw/virtio/vhost.c | 67 
> ++-
>  include/hw/virtio/vhost.h |  1 +
>  2 files changed, 62 insertions(+), 6 deletions(-)
>
> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> index 612f4db..58522f1 100644
> --- a/hw/virtio/vhost.c
> +++ b/hw/virtio/vhost.c
> @@ -45,6 +45,7 @@
>
>  static struct vhost_log *vhost_log[VHOST_BACKEND_TYPE_MAX];
>  static struct vhost_log *vhost_log_shm[VHOST_BACKEND_TYPE_MAX];
> +static QLIST_HEAD(, vhost_dev) vhost_log_devs[VHOST_BACKEND_TYPE_MAX];
>
>  /* Memslots used by backends that support private memslots (without an fd). 
> */
>  static unsigned int used_memslots;
> @@ -149,6 +150,47 @@ bool vhost_dev_has_iommu(struct vhost_dev *dev)
>  }
>  }
>
> +static inline bool vhost_dev_should_log(struct vhost_dev *dev)
> +{
> +assert(dev->vhost_ops);
> +assert(dev->vhost_ops->backend_type > VHOST_BACKEND_TYPE_NONE);
> +assert(dev->vhost_ops->backend_type < VHOST_BACKEND_TYPE_MAX);
> +
> +return dev == QLIST_FIRST(_log_devs[dev->vhost_ops->backend_type]);

A dumb question, why not simple check

dev->log == vhost_log_shm[dev->vhost_ops->backend_type]

?

Thanks




Re: [PATCH v4 1/2] vhost: dirty log should be per backend type

2024-03-14 Thread Jason Wang
On Fri, Mar 15, 2024 at 5:39 AM Si-Wei Liu  wrote:
>
> There could be a mix of both vhost-user and vhost-kernel clients
> in the same QEMU process, where separate vhost loggers for the
> specific vhost type have to be used. Make the vhost logger per
> backend type, and have them properly reference counted.

It's better to describe what's the advantage of doing this.

>
> Suggested-by: Michael S. Tsirkin 
> Signed-off-by: Si-Wei Liu 
>
> ---
> v3->v4:
>   - remove checking NULL return value from vhost_log_get
>
> v2->v3:
>   - remove non-effective assertion that never be reached
>   - do not return NULL from vhost_log_get()
>   - add neccessary assertions to vhost_log_get()
> ---
>  hw/virtio/vhost.c | 45 +
>  1 file changed, 33 insertions(+), 12 deletions(-)
>
> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> index 2c9ac79..612f4db 100644
> --- a/hw/virtio/vhost.c
> +++ b/hw/virtio/vhost.c
> @@ -43,8 +43,8 @@
>  do { } while (0)
>  #endif
>
> -static struct vhost_log *vhost_log;
> -static struct vhost_log *vhost_log_shm;
> +static struct vhost_log *vhost_log[VHOST_BACKEND_TYPE_MAX];
> +static struct vhost_log *vhost_log_shm[VHOST_BACKEND_TYPE_MAX];
>
>  /* Memslots used by backends that support private memslots (without an fd). 
> */
>  static unsigned int used_memslots;
> @@ -287,6 +287,10 @@ static int vhost_set_backend_type(struct vhost_dev *dev,
>  r = -1;
>  }
>
> +if (r == 0) {
> +assert(dev->vhost_ops->backend_type == backend_type);
> +}
> +

Under which condition could we hit this? It seems not good to assert a
local logic.

Thanks




Re: [PATCH] hw/virtio: Add support for VDPA network simulation devices

2024-03-13 Thread Jason Wang
On Thu, Mar 14, 2024 at 3:52 AM Michael S. Tsirkin  wrote:
>
> On Wed, Mar 13, 2024 at 07:51:08PM +0100, Thomas Weißschuh wrote:
> > On 2024-02-21 15:38:02+0800, Hao Chen wrote:
> > > This patch adds support for VDPA network simulation devices.
> > > The device is developed based on virtio-net and tap backend,
> > > and supports hardware live migration function.
> > >
> > > For more details, please refer to "docs/system/devices/vdpa-net.rst"
> > >
> > > Signed-off-by: Hao Chen 
> > > ---
> > >  MAINTAINERS |   5 +
> > >  docs/system/device-emulation.rst|   1 +
> > >  docs/system/devices/vdpa-net.rst| 121 +
> > >  hw/net/virtio-net.c |  16 ++
> > >  hw/virtio/virtio-pci.c  | 189 +++-

I think those modifications should belong to a separate file as it
might conflict with virito features in the future.

> > >  hw/virtio/virtio.c  |  39 
> > >  include/hw/virtio/virtio-pci.h  |   5 +
> > >  include/hw/virtio/virtio.h  |  19 ++
> > >  include/standard-headers/linux/virtio_pci.h |   7 +
> > >  9 files changed, 399 insertions(+), 3 deletions(-)
> > >  create mode 100644 docs/system/devices/vdpa-net.rst
> >
> > [..]
> >
> > > diff --git a/include/standard-headers/linux/virtio_pci.h 
> > > b/include/standard-headers/linux/virtio_pci.h
> > > index b7fdfd0668..fb5391cef6 100644
> > > --- a/include/standard-headers/linux/virtio_pci.h
> > > +++ b/include/standard-headers/linux/virtio_pci.h
> > > @@ -216,6 +216,13 @@ struct virtio_pci_cfg_cap {
> > >  #define VIRTIO_PCI_COMMON_Q_NDATA  56
> > >  #define VIRTIO_PCI_COMMON_Q_RESET  58
> > >
> > > +#define LM_LOGGING_CTRL 0
> > > +#define LM_BASE_ADDR_LOW4
> > > +#define LM_BASE_ADDR_HIGH   8
> > > +#define LM_END_ADDR_LOW 12
> > > +#define LM_END_ADDR_HIGH16
> > > +#define LM_VRING_STATE_OFFSET   0x20
> >
> > These changes are not in upstream Linux and will be undone by
> > ./scripts/update-linux-headers.sh.
> >
> > Are they intentionally in this header?
>
>
> Good point. Pls move.

Right and this part, it's not a part of standard virtio.

Thanks

>
> > > +
> > >  #endif /* VIRTIO_PCI_NO_MODERN */
> > >
> > >  #endif
>




Re: [PATCH] vhost-vdpa: check vhost_vdpa_set_vring_ready() return value

2024-03-13 Thread Jason Wang
On Wed, Feb 7, 2024 at 5:27 PM Stefano Garzarella  wrote:
>
> vhost_vdpa_set_vring_ready() could already fail, but if Linux's
> patch [1] will be merged, it may fail with more chance if
> userspace does not activate virtqueues before DRIVER_OK when
> VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK is not negotiated.

I wonder what happens if we just leave it as is.

VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK: We do know enabling could be
done after driver_ok.
Without VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK: We don't know whether
enabling could be done after driver_ok or not.

Thanks

>
> So better check its return value anyway.
>
> [1] 
> https://lore.kernel.org/virtualization/20240206145154.118044-1-sgarz...@redhat.com/T/#u
>
> Signed-off-by: Stefano Garzarella 
> ---
> Note: This patch conflicts with [2], but the resolution is simple,
> so for now I sent a patch for the current master, but I'll rebase
> this patch if we merge the other one first.
>
> [2] 
> https://lore.kernel.org/qemu-devel/20240202132521.32714-1-kw...@redhat.com/
> ---
>  hw/virtio/vdpa-dev.c |  8 +++-
>  net/vhost-vdpa.c | 15 ---
>  2 files changed, 19 insertions(+), 4 deletions(-)
>
> diff --git a/hw/virtio/vdpa-dev.c b/hw/virtio/vdpa-dev.c
> index eb9ecea83b..d57cd76c18 100644
> --- a/hw/virtio/vdpa-dev.c
> +++ b/hw/virtio/vdpa-dev.c
> @@ -259,7 +259,11 @@ static int vhost_vdpa_device_start(VirtIODevice *vdev, 
> Error **errp)
>  goto err_guest_notifiers;
>  }
>  for (i = 0; i < s->dev.nvqs; ++i) {
> -vhost_vdpa_set_vring_ready(>vdpa, i);
> +ret = vhost_vdpa_set_vring_ready(>vdpa, i);
> +if (ret < 0) {
> +error_setg_errno(errp, -ret, "Error starting vring %d", i);
> +goto err_dev_stop;
> +}
>  }
>  s->started = true;
>
> @@ -274,6 +278,8 @@ static int vhost_vdpa_device_start(VirtIODevice *vdev, 
> Error **errp)
>
>  return ret;
>
> +err_dev_stop:
> +vhost_dev_stop(>dev, vdev, false);
>  err_guest_notifiers:
>  k->set_guest_notifiers(qbus->parent, s->dev.nvqs, false);
>  err_host_notifiers:
> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> index 3726ee5d67..e3d8036479 100644
> --- a/net/vhost-vdpa.c
> +++ b/net/vhost-vdpa.c
> @@ -381,7 +381,10 @@ static int vhost_vdpa_net_data_load(NetClientState *nc)
>  }
>
>  for (int i = 0; i < v->dev->nvqs; ++i) {
> -vhost_vdpa_set_vring_ready(v, i + v->dev->vq_index);
> +int ret = vhost_vdpa_set_vring_ready(v, i + v->dev->vq_index);
> +if (ret < 0) {
> +return ret;
> +}
>  }
>  return 0;
>  }
> @@ -1213,7 +1216,10 @@ static int vhost_vdpa_net_cvq_load(NetClientState *nc)
>
>  assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
>
> -vhost_vdpa_set_vring_ready(v, v->dev->vq_index);
> +r = vhost_vdpa_set_vring_ready(v, v->dev->vq_index);
> +if (unlikely(r < 0)) {
> +return r;
> +}
>
>  if (v->shadow_vqs_enabled) {
>  n = VIRTIO_NET(v->dev->vdev);
> @@ -1252,7 +1258,10 @@ static int vhost_vdpa_net_cvq_load(NetClientState *nc)
>  }
>
>  for (int i = 0; i < v->dev->vq_index; ++i) {
> -vhost_vdpa_set_vring_ready(v, i);
> +r = vhost_vdpa_set_vring_ready(v, i);
> +if (unlikely(r < 0)) {
> +return r;
> +}
>  }
>
>  return 0;
> --
> 2.43.0
>




Re: [PATCH v2 1/6] virtio/virtio-pci: Handle extra notification data

2024-03-13 Thread Jason Wang
On Wed, Mar 13, 2024 at 7:55 PM Jonah Palmer  wrote:
>
> Add support to virtio-pci devices for handling the extra data sent
> from the driver to the device when the VIRTIO_F_NOTIFICATION_DATA
> transport feature has been negotiated.
>
> The extra data that's passed to the virtio-pci device when this
> feature is enabled varies depending on the device's virtqueue
> layout.
>
> In a split virtqueue layout, this data includes:
>  - upper 16 bits: shadow_avail_idx
>  - lower 16 bits: virtqueue index
>
> In a packed virtqueue layout, this data includes:
>  - upper 16 bits: 1-bit wrap counter & 15-bit shadow_avail_idx
>  - lower 16 bits: virtqueue index
>
> Tested-by: Lei Yang 
> Reviewed-by: Eugenio Pérez 
> Signed-off-by: Jonah Palmer 
> ---
>  hw/virtio/virtio-pci.c | 10 +++---
>  hw/virtio/virtio.c | 18 ++
>  include/hw/virtio/virtio.h |  1 +
>  3 files changed, 26 insertions(+), 3 deletions(-)
>
> diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
> index cb6940fc0e..0f5c3c3b2f 100644
> --- a/hw/virtio/virtio-pci.c
> +++ b/hw/virtio/virtio-pci.c
> @@ -384,7 +384,7 @@ static void virtio_ioport_write(void *opaque, uint32_t 
> addr, uint32_t val)
>  {
>  VirtIOPCIProxy *proxy = opaque;
>  VirtIODevice *vdev = virtio_bus_get_device(>bus);
> -uint16_t vector;
> +uint16_t vector, vq_idx;
>  hwaddr pa;
>
>  switch (addr) {
> @@ -408,8 +408,12 @@ static void virtio_ioport_write(void *opaque, uint32_t 
> addr, uint32_t val)
>  vdev->queue_sel = val;
>  break;
>  case VIRTIO_PCI_QUEUE_NOTIFY:
> -if (val < VIRTIO_QUEUE_MAX) {
> -virtio_queue_notify(vdev, val);
> +vq_idx = val;
> +if (vq_idx < VIRTIO_QUEUE_MAX) {
> +if (virtio_vdev_has_feature(vdev, VIRTIO_F_NOTIFICATION_DATA)) {
> +virtio_queue_set_shadow_avail_data(vdev, val);
> +}
> +virtio_queue_notify(vdev, vq_idx);
>  }
>  break;
>  case VIRTIO_PCI_STATUS:
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index d229755eae..bcb9e09df0 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -2255,6 +2255,24 @@ void virtio_queue_set_align(VirtIODevice *vdev, int n, 
> int align)
>  }
>  }
>
> +void virtio_queue_set_shadow_avail_data(VirtIODevice *vdev, uint32_t data)
> +{
> +/* Lower 16 bits is the virtqueue index */
> +uint16_t i = data;
> +VirtQueue *vq = >vq[i];
> +
> +if (!vq->vring.desc) {
> +return;
> +}
> +
> +if (virtio_vdev_has_feature(vdev, VIRTIO_F_RING_PACKED)) {
> +vq->shadow_avail_wrap_counter = (data >> 31) & 0x1;
> +vq->shadow_avail_idx = (data >> 16) & 0x7FFF;
> +} else {
> +vq->shadow_avail_idx = (data >> 16);

Do we need to do a sanity check for this value?

Thanks

> +}
> +}
> +
>  static void virtio_queue_notify_vq(VirtQueue *vq)
>  {
>  if (vq->vring.desc && vq->handle_output) {
> diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> index c8f72850bc..53915947a7 100644
> --- a/include/hw/virtio/virtio.h
> +++ b/include/hw/virtio/virtio.h
> @@ -335,6 +335,7 @@ void virtio_queue_update_rings(VirtIODevice *vdev, int n);
>  void virtio_init_region_cache(VirtIODevice *vdev, int n);
>  void virtio_queue_set_align(VirtIODevice *vdev, int n, int align);
>  void virtio_queue_notify(VirtIODevice *vdev, int n);
> +void virtio_queue_set_shadow_avail_data(VirtIODevice *vdev, uint32_t data);
>  uint16_t virtio_queue_vector(VirtIODevice *vdev, int n);
>  void virtio_queue_set_vector(VirtIODevice *vdev, int n, uint16_t vector);
>  int virtio_queue_set_host_notifier_mr(VirtIODevice *vdev, int n,
> --
> 2.39.3
>




Re: [PULL 0/8] Net patches

2024-03-13 Thread Jason Wang
On Wed, Mar 13, 2024 at 1:56 AM Michael Tokarev  wrote:
>
> 12.03.2024 14:36, Jason Wang wrote:
> ...
> > 
> > Andrew Melnychenko (5):
> >ebpf: Added eBPF map update through mmap.
> >ebpf: Added eBPF initialization by fds.
> >virtio-net: Added property to load eBPF RSS with fds.
> >qmp: Added new command to retrieve eBPF blob.
> >ebpf: Updated eBPF program and skeleton.
> >
> > Laurent Vivier (2):
> >igb: fix link state on resume
> >e1000e: fix link state on resume
> >
> > Nick Briggs (1):
> >Avoid unaligned fetch in ladr_match()
>
>  From the above, I'm picking up igb & e100e "fix link state on resume"
> and "Avoid unaligned fetch in ladr_match()" for stable.
>
> Please let me know if this is incorrect.
>

It's correct.

Thanks

> Thanks,
>
> /mjt
>




Re: [PATCH v9 0/5] eBPF RSS through QMP support.

2024-03-12 Thread Jason Wang
Hi Andrew:

On Wed, Mar 13, 2024 at 7:11 AM Andrew Melnichenko  wrote:
>
> Hi all,
> Apparently, eBPF code from ebpf/* can't be a part of the 'common'
> library - that breaks non-system/user build. I'll change it to be a
> 'system' library.

I've dropped some of the tracing as a workaround (due to schedule
pressure) since yesterday was a soft freeze and I don't want to miss
it again.

The pull request has been merged. Please fix that on top (add some
tracing back probably).

Thanks

>
> On Fri, Mar 8, 2024 at 10:06 AM Jason Wang  wrote:
> >
> > On Fri, Mar 8, 2024 at 2:30 PM Jason Wang  wrote:
> > >
> > > On Mon, Feb 26, 2024 at 6:23 PM Andrew Melnichenko  
> > > wrote:
> > > >
> > > > Hi all,
> > > > Jason, can you please review the patch set, thank you.
> > >
> > > Queued.
> > >
> > > Thanks
> >
> > This seems to fail CI at:
> >
> > https://gitlab.com/jasowang/qemu/-/jobs/6348725269
> >
> > Please fix this.
> >
> > Thanks
> >
>




Re: [PATCH v9 4/5] qmp: Added new command to retrieve eBPF blob.

2024-03-12 Thread Jason Wang
On Wed, Mar 13, 2024 at 7:13 AM Andrew Melnichenko  wrote:
>
> Hi all,
> I've checked - apparently, qapi/ebpf.json should be added to
> MAINTAINERS - I'll fix it.

I've fixed this by myself and the pull request has been merged.

Thanks

>
> On Fri, Mar 8, 2024 at 10:14 AM Jason Wang  wrote:
> >
> > On Tue, Feb 6, 2024 at 12:55 AM Andrew Melnychenko  
> > wrote:
> > >
> > > Now, the binary objects may be retrieved by id.
> > > It would require for future qmp commands that may require specific
> > > eBPF blob.
> > >
> > > Added command "request-ebpf". This command returns
> > > eBPF program encoded base64. The program taken from the
> > > skeleton and essentially is an ELF object that can be
> > > loaded in the future with libbpf.
> > >
> > > The reason to use the command to provide the eBPF object
> > > instead of a separate artifact was to avoid issues related
> > > to finding the eBPF itself. eBPF object is an ELF binary
> > > that contains the eBPF program and eBPF map description(BTF).
> > > Overall, eBPF object should contain the program and enough
> > > metadata to create/load eBPF with libbpf. As the eBPF
> > > maps/program should correspond to QEMU, the eBPF can't
> > > be used from different QEMU build.
> > >
> > > The first solution was a helper that comes with QEMU
> > > and loads appropriate eBPF objects. And the issue is
> > > to find a proper helper if the system has several
> > > different QEMUs installed and/or built from the source,
> > > which helpers may not be compatible.
> > >
> > > Another issue is QEMU updating while there is a running
> > > QEMU instance. With an updated helper, it may not be
> > > possible to hotplug virtio-net device to the already
> > > running QEMU. Overall, requesting the eBPF object from
> > > QEMU itself solves possible failures with acceptable effort.
> > >
> > > Links:
> > > [PATCH 3/5] qmp: Added the helper stamp check.
> > > https://lore.kernel.org/all/20230219162100.174318-4-and...@daynix.com/
> > >
> > > Signed-off-by: Andrew Melnychenko 
> > > ---
> > >  ebpf/ebpf.c   | 69 +++
> >
> > Let's add ebpf.c to MAINTAINERS otherwise CI may warn like:
> >
> > https://gitlab.com/jasowang/qemu/-/jobs/6349138969
> >
> > Thanks
> >
>




[PULL 5/8] ebpf: Added eBPF initialization by fds.

2024-03-12 Thread Jason Wang
From: Andrew Melnychenko 

It allows using file descriptors of eBPF provided
outside of QEMU.
QEMU may be run without capabilities for eBPF and run
RSS program provided by management tool(g.e. libvirt).

Signed-off-by: Andrew Melnychenko 
Signed-off-by: Jason Wang 
---
 ebpf/ebpf_rss-stub.c |  6 ++
 ebpf/ebpf_rss.c  | 27 +++
 ebpf/ebpf_rss.h  |  5 +
 3 files changed, 38 insertions(+)

diff --git a/ebpf/ebpf_rss-stub.c b/ebpf/ebpf_rss-stub.c
index e71e229190..8d7fae2ad9 100644
--- a/ebpf/ebpf_rss-stub.c
+++ b/ebpf/ebpf_rss-stub.c
@@ -28,6 +28,12 @@ bool ebpf_rss_load(struct EBPFRSSContext *ctx)
 return false;
 }
 
+bool ebpf_rss_load_fds(struct EBPFRSSContext *ctx, int program_fd,
+   int config_fd, int toeplitz_fd, int table_fd)
+{
+return false;
+}
+
 bool ebpf_rss_set_all(struct EBPFRSSContext *ctx, struct EBPFRSSConfig *config,
   uint16_t *indirections_table, uint8_t *toeplitz_key)
 {
diff --git a/ebpf/ebpf_rss.c b/ebpf/ebpf_rss.c
index f774d9636b..150aa40813 100644
--- a/ebpf/ebpf_rss.c
+++ b/ebpf/ebpf_rss.c
@@ -146,6 +146,33 @@ error:
 return false;
 }
 
+bool ebpf_rss_load_fds(struct EBPFRSSContext *ctx, int program_fd,
+   int config_fd, int toeplitz_fd, int table_fd)
+{
+if (ebpf_rss_is_loaded(ctx)) {
+return false;
+}
+
+if (program_fd < 0 || config_fd < 0 || toeplitz_fd < 0 || table_fd < 0) {
+return false;
+}
+
+ctx->program_fd = program_fd;
+ctx->map_configuration = config_fd;
+ctx->map_toeplitz_key = toeplitz_fd;
+ctx->map_indirections_table = table_fd;
+
+if (!ebpf_rss_mmap(ctx)) {
+ctx->program_fd = -1;
+ctx->map_configuration = -1;
+ctx->map_toeplitz_key = -1;
+ctx->map_indirections_table = -1;
+return false;
+}
+
+return true;
+}
+
 static bool ebpf_rss_set_config(struct EBPFRSSContext *ctx,
 struct EBPFRSSConfig *config)
 {
diff --git a/ebpf/ebpf_rss.h b/ebpf/ebpf_rss.h
index ab08a7266d..239242b0d2 100644
--- a/ebpf/ebpf_rss.h
+++ b/ebpf/ebpf_rss.h
@@ -14,6 +14,8 @@
 #ifndef QEMU_EBPF_RSS_H
 #define QEMU_EBPF_RSS_H
 
+#define EBPF_RSS_MAX_FDS 4
+
 struct EBPFRSSContext {
 void *obj;
 int program_fd;
@@ -41,6 +43,9 @@ bool ebpf_rss_is_loaded(struct EBPFRSSContext *ctx);
 
 bool ebpf_rss_load(struct EBPFRSSContext *ctx);
 
+bool ebpf_rss_load_fds(struct EBPFRSSContext *ctx, int program_fd,
+   int config_fd, int toeplitz_fd, int table_fd);
+
 bool ebpf_rss_set_all(struct EBPFRSSContext *ctx, struct EBPFRSSConfig *config,
   uint16_t *indirections_table, uint8_t *toeplitz_key);
 
-- 
2.42.0




[PULL 3/8] Avoid unaligned fetch in ladr_match()

2024-03-12 Thread Jason Wang
From: Nick Briggs 

There is no guarantee that the PCNetState is allocated such that
csr[8] is allocated on an 8-byte boundary.  Since not all hosts are
capable of unaligned fetches the 16-bit elements need to be fetched
individually to avoid a potential fault.  Closes issue #2143

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2143
Signed-off-by: Nick Briggs 
Reviewed-by: Peter Maydell 
Signed-off-by: Jason Wang 
---
 hw/net/pcnet.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/net/pcnet.c b/hw/net/pcnet.c
index 494eab8479..ad675ab29d 100644
--- a/hw/net/pcnet.c
+++ b/hw/net/pcnet.c
@@ -632,7 +632,7 @@ static inline int ladr_match(PCNetState *s, const uint8_t 
*buf, int size)
 {
 struct qemu_ether_header *hdr = (void *)buf;
 if ((*(hdr->ether_dhost)&0x01) &&
-((uint64_t *)>csr[8])[0] != 0LL) {
+(s->csr[8] | s->csr[9] | s->csr[10] | s->csr[11]) != 0) {
 uint8_t ladr[8] = {
 s->csr[8] & 0xff, s->csr[8] >> 8,
 s->csr[9] & 0xff, s->csr[9] >> 8,
-- 
2.42.0




[PULL 8/8] ebpf: Updated eBPF program and skeleton.

2024-03-12 Thread Jason Wang
From: Andrew Melnychenko 

Updated section name, so libbpf should init/gues proper
program type without specifications during open/load.
Also, added map_flags with explicitly declared BPF_F_MMAPABLE.
Added check for BPF_F_MMAPABLE flag to meson script and
requirements to libbpf version.
Also changed fragmentation flag check - some TCP/UDP packets
may be considered fragmented if DF flag is set.

Signed-off-by: Andrew Melnychenko 
Signed-off-by: Jason Wang 
---
 ebpf/rss.bpf.skeleton.h | 1343 ---
 meson.build |   10 +-
 tools/ebpf/rss.bpf.c|7 +-
 3 files changed, 687 insertions(+), 673 deletions(-)

diff --git a/ebpf/rss.bpf.skeleton.h b/ebpf/rss.bpf.skeleton.h
index 18eb2adb12..aed4ef9a03 100644
--- a/ebpf/rss.bpf.skeleton.h
+++ b/ebpf/rss.bpf.skeleton.h
@@ -176,642 +176,647 @@ err:
 
 static inline const void *rss_bpf__elf_bytes(size_t *sz)
 {
-   *sz = 20440;
+   *sz = 20600;
return (const void *)"\
 \x7f\x45\x4c\x46\x02\x01\x01\0\0\0\0\0\0\0\0\0\x01\0\xf7\0\x01\0\0\0\0\0\0\0\0\
-\0\0\0\0\0\0\0\0\0\0\0\x98\x4c\0\0\0\0\0\0\0\0\0\0\x40\0\0\0\0\0\x40\0\x0d\0\
-\x01\0\xbf\x19\0\0\0\0\0\0\xb7\x01\0\0\0\0\0\0\x63\x1a\x54\xff\0\0\0\0\xbf\xa7\
-\0\0\0\0\0\0\x07\x07\0\0\x54\xff\xff\xff\x18\x01\0\0\0\0\0\0\0\0\0\0\0\0\0\0\
+\0\0\0\0\0\0\0\0\0\0\0\x38\x4d\0\0\0\0\0\0\0\0\0\0\x40\0\0\0\0\0\x40\0\x0d\0\
+\x01\0\xbf\x19\0\0\0\0\0\0\xb7\x01\0\0\0\0\0\0\x63\x1a\x4c\xff\0\0\0\0\xbf\xa7\
+\0\0\0\0\0\0\x07\x07\0\0\x4c\xff\xff\xff\x18\x01\0\0\0\0\0\0\0\0\0\0\0\0\0\0\
 \xbf\x72\0\0\0\0\0\0\x85\0\0\0\x01\0\0\0\xbf\x06\0\0\0\0\0\0\x18\x01\0\0\0\0\0\
-\0\0\0\0\0\0\0\0\0\xbf\x72\0\0\0\0\0\0\x85\0\0\0\x01\0\0\0\xbf\x08\0\0\0\0\0\0\
-\x18\0\0\0\xff\xff\xff\xff\0\0\0\0\0\0\0\0\x15\x06\x67\x02\0\0\0\0\xbf\x87\0\0\
-\0\0\0\0\x15\x07\x65\x02\0\0\0\0\x71\x61\0\0\0\0\0\0\x55\x01\x01\0\0\0\0\0\x05\
-\0\x5e\x02\0\0\0\0\xb7\x01\0\0\0\0\0\0\x63\x1a\xc8\xff\0\0\0\0\x7b\x1a\xc0\xff\
-\0\0\0\0\x7b\x1a\xb8\xff\0\0\0\0\x7b\x1a\xb0\xff\0\0\0\0\x7b\x1a\xa8\xff\0\0\0\
-\0\x63\x1a\xa0\xff\0\0\0\0\x7b\x1a\x98\xff\0\0\0\0\x7b\x1a\x90\xff\0\0\0\0\x7b\
-\x1a\x88\xff\0\0\0\0\x7b\x1a\x80\xff\0\0\0\0\x7b\x1a\x78\xff\0\0\0\0\x7b\x1a\
-\x70\xff\0\0\0\0\x7b\x1a\x68\xff\0\0\0\0\x7b\x1a\x60\xff\0\0\0\0\x7b\x1a\x58\
-\xff\0\0\0\0\x15\x09\x4d\x02\0\0\0\0\x6b\x1a\xd0\xff\0\0\0\0\xbf\xa3\0\0\0\0\0\
-\0\x07\x03\0\0\xd0\xff\xff\xff\xbf\x91\0\0\0\0\0\0\xb7\x02\0\0\x0c\0\0\0\xb7\
+\0\0\0\0\0\0\0\0\0\xbf\x72\0\0\0\0\0\0\x85\0\0\0\x01\0\0\0\xbf\x07\0\0\0\0\0\0\
+\x18\0\0\0\xff\xff\xff\xff\0\0\0\0\0\0\0\0\x15\x06\x61\x02\0\0\0\0\xbf\x78\0\0\
+\0\0\0\0\x15\x08\x5f\x02\0\0\0\0\x71\x61\0\0\0\0\0\0\x55\x01\x01\0\0\0\0\0\x05\
+\0\x58\x02\0\0\0\0\xb7\x01\0\0\0\0\0\0\x63\x1a\xc0\xff\0\0\0\0\x7b\x1a\xb8\xff\
+\0\0\0\0\x7b\x1a\xb0\xff\0\0\0\0\x7b\x1a\xa8\xff\0\0\0\0\x7b\x1a\xa0\xff\0\0\0\
+\0\x63\x1a\x98\xff\0\0\0\0\x7b\x1a\x90\xff\0\0\0\0\x7b\x1a\x88\xff\0\0\0\0\x7b\
+\x1a\x80\xff\0\0\0\0\x7b\x1a\x78\xff\0\0\0\0\x7b\x1a\x70\xff\0\0\0\0\x7b\x1a\
+\x68\xff\0\0\0\0\x7b\x1a\x60\xff\0\0\0\0\x7b\x1a\x58\xff\0\0\0\0\x7b\x1a\x50\
+\xff\0\0\0\0\x15\x09\x47\x02\0\0\0\0\x6b\x1a\xc8\xff\0\0\0\0\xbf\xa3\0\0\0\0\0\
+\0\x07\x03\0\0\xc8\xff\xff\xff\xbf\x91\0\0\0\0\0\0\xb7\x02\0\0\x0c\0\0\0\xb7\
 \x04\0\0\x02\0\0\0\xb7\x05\0\0\0\0\0\0\x85\0\0\0\x44\0\0\0\x67\0\0\0\x20\0\0\0\
-\x77\0\0\0\x20\0\0\0\x55\0\x42\x02\0\0\0\0\xb7\x02\0\0\x10\0\0\0\x69\xa1\xd0\
+\x77\0\0\0\x20\0\0\0\x55\0\x3c\x02\0\0\0\0\xb7\x02\0\0\x10\0\0\0\x69\xa1\xc8\
 \xff\0\0\0\0\xbf\x13\0\0\0\0\0\0\xdc\x03\0\0\x10\0\0\0\x15\x03\x02\0\0\x81\0\0\
 \x55\x03\x0b\0\xa8\x88\0\0\xb7\x02\0\0\x14\0\0\0\xbf\xa3\0\0\0\0\0\0\x07\x03\0\
-\0\xd0\xff\xff\xff\xbf\x91\0\0\0\0\0\0\xb7\x04\0\0\x02\0\0\0\xb7\x05\0\0\0\0\0\
-\0\x85\0\0\0\x44\0\0\0\x67\0\0\0\x20\0\0\0\x77\0\0\0\x20\0\0\0\x55\0\x32\x02\0\
-\0\0\0\x69\xa1\xd0\xff\0\0\0\0\x15\x01\x30\x02\0\0\0\0\x7b\x7a\x38\xff\0\0\0\0\
-\x7b\x9a\x40\xff\0\0\0\0\x15\x01\x55\0\x86\xdd\0\0\x55\x01\x39\0\x08\0\0\0\xb7\
-\x07\0\0\x01\0\0\0\x73\x7a\x58\xff\0\0\0\0\xb7\x01\0\0\0\0\0\0\x63\x1a\xe0\xff\
-\0\0\0\0\x7b\x1a\xd8\xff\0\0\0\0\x7b\x1a\xd0\xff\0\0\0\0\xbf\xa3\0\0\0\0\0\0\
-\x07\x03\0\0\xd0\xff\xff\xff\x79\xa1\x40\xff\0\0\0\0\xb7\x02\0\0\0\0\0\0\xb7\
-\x04\0\0\x14\0\0\0\xb7\x05\0\0\x01\0\0\0\x85\0\0\0\x44\0\0\0\x67\0\0\0\x20\0\0\
-\0\x77\0\0\0\x20\0\0\0\x55\0\x1c\x02\0\0\0\0\x69\xa1\xd6\xff\0\0\0\0\x55\x01\
-\x01\0\0\0\0\0\xb7\x07\0\0\0\0\0\0\x61\xa1\xdc\xff\0\0\0\0\x63\x1a\x64\xff\0\0\
-\0\0\x61\xa1\xe0\xff\0\0\0\0\x63\x1a\x68\xff\0\0\0\0\x71\xa9\xd9\xff\0\0\0\0\
-\x73\x7a\x5e\xff\0\0\0\0\x71\xa1\xd0\xff\0\0\0\0\x67\x01\0\0\x02\0\0\0\x57\x01\
-\0\0\x3c\0\0\0\x7b\x1a\x48\xff\0\0\0\0\xbf\x91\0\0\0\0\0\0\x57\x01\0\0\xff\0\0\
-\0\x15\x01\x19\0\0\0\0\0\x57\x07\0\0\xff\0\0\0\x55\x07\x17\0\0\0\0\0\x57\x09\0\
-\0\xff\0\0\0\x15\x09\x5a\x01\x11\0\0\0\x55\x09\x14\0\x06\0\0\0\xb7\x01\0\0\x01\
-\0\0\0\x73\x1a\x5b\xff\0\0\0\0\xb7\x01\0\0\0\0\0\0\x63\x1a\xe0\xff\0\0\0\0\x7b\
-\x1a\xd8\xff\0\0\0\0\x7b\x1a\xd0\xff\0\0\0\0\xbf\xa3\0\0\0\0\0\0\x07\

[PULL 6/8] virtio-net: Added property to load eBPF RSS with fds.

2024-03-12 Thread Jason Wang
From: Andrew Melnychenko 

eBPF RSS program and maps may now be passed during initialization.
Initially was implemented for libvirt to launch qemu without permissions,
and initialized eBPF program through the helper.

Signed-off-by: Andrew Melnychenko 
Signed-off-by: Jason Wang 
---
 hw/net/virtio-net.c| 54 ++
 include/hw/virtio/virtio-net.h |  2 ++
 2 files changed, 50 insertions(+), 6 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index a3c711b56d..403a693baf 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -42,6 +42,7 @@
 #include "sysemu/sysemu.h"
 #include "trace.h"
 #include "monitor/qdev.h"
+#include "monitor/monitor.h"
 #include "hw/pci/pci_device.h"
 #include "net_rx_pkt.h"
 #include "hw/virtio/vhost.h"
@@ -1328,14 +1329,53 @@ static void virtio_net_detach_epbf_rss(VirtIONet *n)
 virtio_net_attach_ebpf_to_backend(n->nic, -1);
 }
 
-static bool virtio_net_load_ebpf(VirtIONet *n)
+static bool virtio_net_load_ebpf_fds(VirtIONet *n, Error **errp)
 {
-if (!virtio_net_attach_ebpf_to_backend(n->nic, -1)) {
-/* backend doesn't support steering ebpf */
-return false;
+int fds[EBPF_RSS_MAX_FDS] = { [0 ... EBPF_RSS_MAX_FDS - 1] = -1};
+int ret = true;
+int i = 0;
+
+ERRP_GUARD();
+
+if (n->nr_ebpf_rss_fds != EBPF_RSS_MAX_FDS) {
+error_setg(errp,
+  "Expected %d file descriptors but got %d",
+  EBPF_RSS_MAX_FDS, n->nr_ebpf_rss_fds);
+   return false;
+   }
+
+for (i = 0; i < n->nr_ebpf_rss_fds; i++) {
+fds[i] = monitor_fd_param(monitor_cur(), n->ebpf_rss_fds[i], errp);
+if (*errp) {
+ret = false;
+goto exit;
+}
+}
+
+ret = ebpf_rss_load_fds(>ebpf_rss, fds[0], fds[1], fds[2], fds[3]);
+
+exit:
+if (!ret || *errp) {
+for (i = 0; i < n->nr_ebpf_rss_fds && fds[i] != -1; i++) {
+close(fds[i]);
+}
 }
 
-return ebpf_rss_load(>ebpf_rss);
+return ret;
+}
+
+static bool virtio_net_load_ebpf(VirtIONet *n, Error **errp)
+{
+bool ret = false;
+
+if (virtio_net_attach_ebpf_to_backend(n->nic, -1)) {
+if (!(n->ebpf_rss_fds
+&& virtio_net_load_ebpf_fds(n, errp))) {
+ret = ebpf_rss_load(>ebpf_rss);
+}
+}
+
+return ret;
 }
 
 static void virtio_net_unload_ebpf(VirtIONet *n)
@@ -3768,7 +3808,7 @@ static void virtio_net_device_realize(DeviceState *dev, 
Error **errp)
 net_rx_pkt_init(>rx_pkt);
 
 if (virtio_has_feature(n->host_features, VIRTIO_NET_F_RSS)) {
-virtio_net_load_ebpf(n);
+virtio_net_load_ebpf(n, errp);
 }
 }
 
@@ -3930,6 +3970,8 @@ static Property virtio_net_properties[] = {
 VIRTIO_NET_F_RSS, false),
 DEFINE_PROP_BIT64("hash", VirtIONet, host_features,
 VIRTIO_NET_F_HASH_REPORT, false),
+DEFINE_PROP_ARRAY("ebpf-rss-fds", VirtIONet, nr_ebpf_rss_fds,
+  ebpf_rss_fds, qdev_prop_string, char*),
 DEFINE_PROP_BIT64("guest_rsc_ext", VirtIONet, host_features,
 VIRTIO_NET_F_RSC_EXT, false),
 DEFINE_PROP_UINT32("rsc_interval", VirtIONet, rsc_timeout,
diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
index eaee8f4243..060c23c04d 100644
--- a/include/hw/virtio/virtio-net.h
+++ b/include/hw/virtio/virtio-net.h
@@ -225,6 +225,8 @@ struct VirtIONet {
 VirtioNetRssData rss_data;
 struct NetRxPkt *rx_pkt;
 struct EBPFRSSContext ebpf_rss;
+uint32_t nr_ebpf_rss_fds;
+char **ebpf_rss_fds;
 };
 
 size_t virtio_net_handle_ctrl_iov(VirtIODevice *vdev,
-- 
2.42.0




[PULL 2/8] e1000e: fix link state on resume

2024-03-12 Thread Jason Wang
From: Laurent Vivier 

On resume e1000e_vm_state_change() always calls e1000e_autoneg_resume()
that sets link_down to false, and thus activates the link even
if we have disabled it.

The problem can be reproduced starting qemu in paused state (-S) and
then set the link to down. When we resume the machine the link appears
to be up.

Reproducer:

   # qemu-system-x86_64 ... -device e1000e,netdev=netdev0,id=net0 -S

   {"execute": "qmp_capabilities" }
   {"execute": "set_link", "arguments": {"name": "net0", "up": false}}
   {"execute": "cont" }

To fix the problem, merge the content of e1000e_vm_state_change()
into e1000e_core_post_load() as e1000 does.

Buglink: https://issues.redhat.com/browse/RHEL-21867
Fixes: 6f3fbe4ed06a ("net: Introduce e1000e device emulation")
Suggested-by: Akihiko Odaki 
Signed-off-by: Laurent Vivier 
Signed-off-by: Jason Wang 
---
 hw/net/e1000e_core.c | 60 ++--
 hw/net/e1000e_core.h |  2 --
 2 files changed, 7 insertions(+), 55 deletions(-)

diff --git a/hw/net/e1000e_core.c b/hw/net/e1000e_core.c
index e324c02dd5..3ae2a184d5 100644
--- a/hw/net/e1000e_core.c
+++ b/hw/net/e1000e_core.c
@@ -123,14 +123,6 @@ e1000e_intmgr_timer_resume(E1000IntrDelayTimer *timer)
 }
 }
 
-static void
-e1000e_intmgr_timer_pause(E1000IntrDelayTimer *timer)
-{
-if (timer->running) {
-timer_del(timer->timer);
-}
-}
-
 static inline void
 e1000e_intrmgr_stop_timer(E1000IntrDelayTimer *timer)
 {
@@ -398,24 +390,6 @@ e1000e_intrmgr_resume(E1000ECore *core)
 }
 }
 
-static void
-e1000e_intrmgr_pause(E1000ECore *core)
-{
-int i;
-
-e1000e_intmgr_timer_pause(>radv);
-e1000e_intmgr_timer_pause(>rdtr);
-e1000e_intmgr_timer_pause(>raid);
-e1000e_intmgr_timer_pause(>tidv);
-e1000e_intmgr_timer_pause(>tadv);
-
-e1000e_intmgr_timer_pause(>itr);
-
-for (i = 0; i < E1000E_MSIX_VEC_NUM; i++) {
-e1000e_intmgr_timer_pause(>eitr[i]);
-}
-}
-
 static void
 e1000e_intrmgr_reset(E1000ECore *core)
 {
@@ -3334,12 +3308,6 @@ e1000e_core_read(E1000ECore *core, hwaddr addr, unsigned 
size)
 return 0;
 }
 
-static inline void
-e1000e_autoneg_pause(E1000ECore *core)
-{
-timer_del(core->autoneg_timer);
-}
-
 static void
 e1000e_autoneg_resume(E1000ECore *core)
 {
@@ -3351,22 +3319,6 @@ e1000e_autoneg_resume(E1000ECore *core)
 }
 }
 
-static void
-e1000e_vm_state_change(void *opaque, bool running, RunState state)
-{
-E1000ECore *core = opaque;
-
-if (running) {
-trace_e1000e_vm_state_running();
-e1000e_intrmgr_resume(core);
-e1000e_autoneg_resume(core);
-} else {
-trace_e1000e_vm_state_stopped();
-e1000e_autoneg_pause(core);
-e1000e_intrmgr_pause(core);
-}
-}
-
 void
 e1000e_core_pci_realize(E1000ECore *core,
 const uint16_t *eeprom_templ,
@@ -3379,9 +3331,6 @@ e1000e_core_pci_realize(E1000ECore *core,
e1000e_autoneg_timer, core);
 e1000e_intrmgr_pci_realize(core);
 
-core->vmstate =
-qemu_add_vm_change_state_handler(e1000e_vm_state_change, core);
-
 for (i = 0; i < E1000E_NUM_QUEUES; i++) {
 net_tx_pkt_init(>tx[i].tx_pkt, E1000E_MAX_TX_FRAGS);
 }
@@ -3405,8 +3354,6 @@ e1000e_core_pci_uninit(E1000ECore *core)
 
 e1000e_intrmgr_pci_unint(core);
 
-qemu_del_vm_change_state_handler(core->vmstate);
-
 for (i = 0; i < E1000E_NUM_QUEUES; i++) {
 net_tx_pkt_uninit(core->tx[i].tx_pkt);
 }
@@ -3576,5 +3523,12 @@ e1000e_core_post_load(E1000ECore *core)
  */
 nc->link_down = (core->mac[STATUS] & E1000_STATUS_LU) == 0;
 
+/*
+ * we need to restart intrmgr timers, as an older version of
+ * QEMU can have stopped them before migration
+ */
+e1000e_intrmgr_resume(core);
+e1000e_autoneg_resume(core);
+
 return 0;
 }
diff --git a/hw/net/e1000e_core.h b/hw/net/e1000e_core.h
index 66b025cc43..01510ca78b 100644
--- a/hw/net/e1000e_core.h
+++ b/hw/net/e1000e_core.h
@@ -98,8 +98,6 @@ struct E1000Core {
 
 E1000IntrDelayTimer eitr[E1000E_MSIX_VEC_NUM];
 
-VMChangeStateEntry *vmstate;
-
 uint32_t itr_guest_value;
 uint32_t eitr_guest_value[E1000E_MSIX_VEC_NUM];
 
-- 
2.42.0




[PULL 7/8] qmp: Added new command to retrieve eBPF blob.

2024-03-12 Thread Jason Wang
From: Andrew Melnychenko 

Now, the binary objects may be retrieved by id.
It would require for future qmp commands that may require specific
eBPF blob.

Added command "request-ebpf". This command returns
eBPF program encoded base64. The program taken from the
skeleton and essentially is an ELF object that can be
loaded in the future with libbpf.

The reason to use the command to provide the eBPF object
instead of a separate artifact was to avoid issues related
to finding the eBPF itself. eBPF object is an ELF binary
that contains the eBPF program and eBPF map description(BTF).
Overall, eBPF object should contain the program and enough
metadata to create/load eBPF with libbpf. As the eBPF
maps/program should correspond to QEMU, the eBPF can't
be used from different QEMU build.

The first solution was a helper that comes with QEMU
and loads appropriate eBPF objects. And the issue is
to find a proper helper if the system has several
different QEMUs installed and/or built from the source,
which helpers may not be compatible.

Another issue is QEMU updating while there is a running
QEMU instance. With an updated helper, it may not be
possible to hotplug virtio-net device to the already
running QEMU. Overall, requesting the eBPF object from
QEMU itself solves possible failures with acceptable effort.

Links:
[PATCH 3/5] qmp: Added the helper stamp check.
https://lore.kernel.org/all/20230219162100.174318-4-and...@daynix.com/

Signed-off-by: Andrew Melnychenko 
Signed-off-by: Jason Wang 
---
 ebpf/ebpf.c   | 69 +++
 ebpf/ebpf.h   | 29 ++
 ebpf/ebpf_rss.c   | 11 ---
 ebpf/meson.build  |  2 +-
 ebpf/trace.h  |  1 -
 qapi/ebpf.json| 66 +
 qapi/meson.build  |  1 +
 qapi/qapi-schema.json |  1 +
 8 files changed, 172 insertions(+), 8 deletions(-)
 create mode 100644 ebpf/ebpf.c
 create mode 100644 ebpf/ebpf.h
 delete mode 100644 ebpf/trace.h
 create mode 100644 qapi/ebpf.json

diff --git a/ebpf/ebpf.c b/ebpf/ebpf.c
new file mode 100644
index 00..2d73beb479
--- /dev/null
+++ b/ebpf/ebpf.c
@@ -0,0 +1,69 @@
+/*
+ * QEMU eBPF binary declaration routine.
+ *
+ * Developed by Daynix Computing LTD (http://www.daynix.com)
+ *
+ * Authors:
+ *  Andrew Melnychenko 
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/queue.h"
+#include "qapi/error.h"
+#include "qapi/qapi-commands-ebpf.h"
+#include "ebpf/ebpf.h"
+
+typedef struct ElfBinaryDataEntry {
+int id;
+const void *data;
+size_t datalen;
+
+QSLIST_ENTRY(ElfBinaryDataEntry) node;
+} ElfBinaryDataEntry;
+
+static QSLIST_HEAD(, ElfBinaryDataEntry) ebpf_elf_obj_list =
+QSLIST_HEAD_INITIALIZER();
+
+void ebpf_register_binary_data(int id, const void *data, size_t datalen)
+{
+struct ElfBinaryDataEntry *dataentry = NULL;
+
+dataentry = g_new0(struct ElfBinaryDataEntry, 1);
+dataentry->data = data;
+dataentry->datalen = datalen;
+dataentry->id = id;
+
+QSLIST_INSERT_HEAD(_elf_obj_list, dataentry, node);
+}
+
+const void *ebpf_find_binary_by_id(int id, size_t *sz, Error **errp)
+{
+struct ElfBinaryDataEntry *it = NULL;
+QSLIST_FOREACH(it, _elf_obj_list, node) {
+if (id == it->id) {
+*sz = it->datalen;
+return it->data;
+}
+}
+
+error_setg(errp, "can't find eBPF object with id: %d", id);
+
+return NULL;
+}
+
+EbpfObject *qmp_request_ebpf(EbpfProgramID id, Error **errp)
+{
+EbpfObject *ret = NULL;
+size_t size = 0;
+const void *data = ebpf_find_binary_by_id(id, , errp);
+if (!data) {
+return NULL;
+}
+
+ret = g_new0(EbpfObject, 1);
+ret->object = g_base64_encode(data, size);
+
+return ret;
+}
diff --git a/ebpf/ebpf.h b/ebpf/ebpf.h
new file mode 100644
index 00..378d4e9c70
--- /dev/null
+++ b/ebpf/ebpf.h
@@ -0,0 +1,29 @@
+/*
+ * QEMU eBPF binary declaration routine.
+ *
+ * Developed by Daynix Computing LTD (http://www.daynix.com)
+ *
+ * Authors:
+ *  Andrew Melnychenko 
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#ifndef EBPF_H
+#define EBPF_H
+
+
+void ebpf_register_binary_data(int id, const void *data,
+   size_t datalen);
+const void *ebpf_find_binary_by_id(int id, size_t *sz,
+   struct Error **errp);
+
+#define ebpf_binary_init(id, fn)   \
+static void __attribute__((constructor)) ebpf_binary_init_ ## fn(void) \
+{  \
+size_t datalen = 0;\
+const void *data = fn();   \
+ebpf_register_binary_data(id, data, datalen

[PULL 4/8] ebpf: Added eBPF map update through mmap.

2024-03-12 Thread Jason Wang
From: Andrew Melnychenko 

Changed eBPF map updates through mmaped array.
Mmaped arrays provide direct access to map data.
It should omit using bpf_map_update_elem() call,
which may require capabilities that are not present.

Signed-off-by: Andrew Melnychenko 
Signed-off-by: Jason Wang 
---
 ebpf/ebpf_rss.c | 117 ++--
 ebpf/ebpf_rss.h |   5 +++
 2 files changed, 99 insertions(+), 23 deletions(-)

diff --git a/ebpf/ebpf_rss.c b/ebpf/ebpf_rss.c
index cee658c158..f774d9636b 100644
--- a/ebpf/ebpf_rss.c
+++ b/ebpf/ebpf_rss.c
@@ -27,19 +27,83 @@ void ebpf_rss_init(struct EBPFRSSContext *ctx)
 {
 if (ctx != NULL) {
 ctx->obj = NULL;
+ctx->program_fd = -1;
+ctx->map_configuration = -1;
+ctx->map_toeplitz_key = -1;
+ctx->map_indirections_table = -1;
+
+ctx->mmap_configuration = NULL;
+ctx->mmap_toeplitz_key = NULL;
+ctx->mmap_indirections_table = NULL;
 }
 }
 
 bool ebpf_rss_is_loaded(struct EBPFRSSContext *ctx)
 {
-return ctx != NULL && ctx->obj != NULL;
+return ctx != NULL && (ctx->obj != NULL || ctx->program_fd != -1);
+}
+
+static bool ebpf_rss_mmap(struct EBPFRSSContext *ctx)
+{
+if (!ebpf_rss_is_loaded(ctx)) {
+return false;
+}
+
+ctx->mmap_configuration = mmap(NULL, qemu_real_host_page_size(),
+   PROT_READ | PROT_WRITE, MAP_SHARED,
+   ctx->map_configuration, 0);
+if (ctx->mmap_configuration == MAP_FAILED) {
+trace_ebpf_error("eBPF RSS", "can not mmap eBPF configuration array");
+return false;
+}
+ctx->mmap_toeplitz_key = mmap(NULL, qemu_real_host_page_size(),
+   PROT_READ | PROT_WRITE, MAP_SHARED,
+   ctx->map_toeplitz_key, 0);
+if (ctx->mmap_toeplitz_key == MAP_FAILED) {
+trace_ebpf_error("eBPF RSS", "can not mmap eBPF toeplitz key");
+goto toeplitz_fail;
+}
+ctx->mmap_indirections_table = mmap(NULL, qemu_real_host_page_size(),
+   PROT_READ | PROT_WRITE, MAP_SHARED,
+   ctx->map_indirections_table, 0);
+if (ctx->mmap_indirections_table == MAP_FAILED) {
+trace_ebpf_error("eBPF RSS", "can not mmap eBPF indirection table");
+goto indirection_fail;
+}
+
+return true;
+
+indirection_fail:
+munmap(ctx->mmap_toeplitz_key, qemu_real_host_page_size());
+ctx->mmap_toeplitz_key = NULL;
+toeplitz_fail:
+munmap(ctx->mmap_configuration, qemu_real_host_page_size());
+ctx->mmap_configuration = NULL;
+
+ctx->mmap_indirections_table = NULL;
+return false;
+}
+
+static void ebpf_rss_munmap(struct EBPFRSSContext *ctx)
+{
+if (!ebpf_rss_is_loaded(ctx)) {
+return;
+}
+
+munmap(ctx->mmap_indirections_table, qemu_real_host_page_size());
+munmap(ctx->mmap_toeplitz_key, qemu_real_host_page_size());
+munmap(ctx->mmap_configuration, qemu_real_host_page_size());
+
+ctx->mmap_configuration = NULL;
+ctx->mmap_toeplitz_key = NULL;
+ctx->mmap_indirections_table = NULL;
 }
 
 bool ebpf_rss_load(struct EBPFRSSContext *ctx)
 {
 struct rss_bpf *rss_bpf_ctx;
 
-if (ctx == NULL) {
+if (ebpf_rss_is_loaded(ctx)) {
 return false;
 }
 
@@ -66,10 +130,18 @@ bool ebpf_rss_load(struct EBPFRSSContext *ctx)
 ctx->map_toeplitz_key = bpf_map__fd(
 rss_bpf_ctx->maps.tap_rss_map_toeplitz_key);
 
+if (!ebpf_rss_mmap(ctx)) {
+goto error;
+}
+
 return true;
 error:
 rss_bpf__destroy(rss_bpf_ctx);
 ctx->obj = NULL;
+ctx->program_fd = -1;
+ctx->map_configuration = -1;
+ctx->map_toeplitz_key = -1;
+ctx->map_indirections_table = -1;
 
 return false;
 }
@@ -77,15 +149,11 @@ error:
 static bool ebpf_rss_set_config(struct EBPFRSSContext *ctx,
 struct EBPFRSSConfig *config)
 {
-uint32_t map_key = 0;
-
 if (!ebpf_rss_is_loaded(ctx)) {
 return false;
 }
-if (bpf_map_update_elem(ctx->map_configuration,
-_key, config, 0) < 0) {
-return false;
-}
+
+memcpy(ctx->mmap_configuration, config, sizeof(*config));
 return true;
 }
 
@@ -93,27 +161,19 @@ static bool ebpf_rss_set_indirections_table(struct 
EBPFRSSContext *ctx,
 uint16_t *indirections_table,
 size_t len)
 {
-uint32_t i = 0;
-
 if (!ebpf_rss_is_loaded(ctx) || indirections_table == NULL ||
len > VIRTIO_NET_RSS_MAX_TABLE_LEN) {
 return false;
 }
 
-for (; i < len; ++i) {
-if (bpf_map_update_elem(ctx->map_indirec

[PULL 1/8] igb: fix link state on resume

2024-03-12 Thread Jason Wang
From: Laurent Vivier 

On resume igb_vm_state_change() always calls igb_autoneg_resume()
that sets link_down to false, and thus activates the link even
if we have disabled it.

The problem can be reproduced starting qemu in paused state (-S) and
then set the link to down. When we resume the machine the link appears
to be up.

Reproducer:

   # qemu-system-x86_64 ... -device igb,netdev=netdev0,id=net0 -S

   {"execute": "qmp_capabilities" }
   {"execute": "set_link", "arguments": {"name": "net0", "up": false}}
   {"execute": "cont" }

To fix the problem, merge the content of igb_vm_state_change()
into igb_core_post_load() as e1000 does.

Buglink: https://issues.redhat.com/browse/RHEL-21867
Fixes: 3a977deebe6b ("Intrdocue igb device emulation")
Cc: akihiko.od...@daynix.com
Suggested-by: Akihiko Odaki 
Signed-off-by: Laurent Vivier 
Signed-off-by: Jason Wang 
---
 hw/net/igb_core.c | 51 +++
 hw/net/igb_core.h |  2 --
 2 files changed, 7 insertions(+), 46 deletions(-)

diff --git a/hw/net/igb_core.c b/hw/net/igb_core.c
index 2a7a11aa9e..bcd5f6cd9c 100644
--- a/hw/net/igb_core.c
+++ b/hw/net/igb_core.c
@@ -160,14 +160,6 @@ igb_intmgr_timer_resume(IGBIntrDelayTimer *timer)
 }
 }
 
-static void
-igb_intmgr_timer_pause(IGBIntrDelayTimer *timer)
-{
-if (timer->running) {
-timer_del(timer->timer);
-}
-}
-
 static void
 igb_intrmgr_on_msix_throttling_timer(void *opaque)
 {
@@ -212,16 +204,6 @@ igb_intrmgr_resume(IGBCore *core)
 }
 }
 
-static void
-igb_intrmgr_pause(IGBCore *core)
-{
-int i;
-
-for (i = 0; i < IGB_INTR_NUM; i++) {
-igb_intmgr_timer_pause(>eitr[i]);
-}
-}
-
 static void
 igb_intrmgr_reset(IGBCore *core)
 {
@@ -4290,12 +4272,6 @@ igb_core_read(IGBCore *core, hwaddr addr, unsigned size)
 return 0;
 }
 
-static inline void
-igb_autoneg_pause(IGBCore *core)
-{
-timer_del(core->autoneg_timer);
-}
-
 static void
 igb_autoneg_resume(IGBCore *core)
 {
@@ -4307,22 +4283,6 @@ igb_autoneg_resume(IGBCore *core)
 }
 }
 
-static void
-igb_vm_state_change(void *opaque, bool running, RunState state)
-{
-IGBCore *core = opaque;
-
-if (running) {
-trace_e1000e_vm_state_running();
-igb_intrmgr_resume(core);
-igb_autoneg_resume(core);
-} else {
-trace_e1000e_vm_state_stopped();
-igb_autoneg_pause(core);
-igb_intrmgr_pause(core);
-}
-}
-
 void
 igb_core_pci_realize(IGBCore*core,
  const uint16_t *eeprom_templ,
@@ -4335,8 +4295,6 @@ igb_core_pci_realize(IGBCore*core,
igb_autoneg_timer, core);
 igb_intrmgr_pci_realize(core);
 
-core->vmstate = qemu_add_vm_change_state_handler(igb_vm_state_change, 
core);
-
 for (i = 0; i < IGB_NUM_QUEUES; i++) {
 net_tx_pkt_init(>tx[i].tx_pkt, E1000E_MAX_TX_FRAGS);
 }
@@ -4360,8 +4318,6 @@ igb_core_pci_uninit(IGBCore *core)
 
 igb_intrmgr_pci_unint(core);
 
-qemu_del_vm_change_state_handler(core->vmstate);
-
 for (i = 0; i < IGB_NUM_QUEUES; i++) {
 net_tx_pkt_uninit(core->tx[i].tx_pkt);
 }
@@ -4586,5 +4542,12 @@ igb_core_post_load(IGBCore *core)
  */
 nc->link_down = (core->mac[STATUS] & E1000_STATUS_LU) == 0;
 
+/*
+ * we need to restart intrmgr timers, as an older version of
+ * QEMU can have stopped them before migration
+ */
+igb_intrmgr_resume(core);
+igb_autoneg_resume(core);
+
 return 0;
 }
diff --git a/hw/net/igb_core.h b/hw/net/igb_core.h
index bf8c46f26b..d70b54e318 100644
--- a/hw/net/igb_core.h
+++ b/hw/net/igb_core.h
@@ -90,8 +90,6 @@ struct IGBCore {
 
 IGBIntrDelayTimer eitr[IGB_INTR_NUM];
 
-VMChangeStateEntry *vmstate;
-
 uint32_t eitr_guest_value[IGB_INTR_NUM];
 
 uint8_t permanent_mac[ETH_ALEN];
-- 
2.42.0




[PULL 0/8] Net patches

2024-03-12 Thread Jason Wang
The following changes since commit 05ec974671200814fa5c1d5db710e0e4b88a40af:

  Merge tag 'm68k-for-9.0-pull-request' of https://github.com/vivier/qemu-m68k 
into staging (2024-03-11 18:42:53 +)

are available in the Git repository at:

  https://github.com/jasowang/qemu.git tags/net-pull-request

for you to fetch changes up to 0cc14182aba961f4c34a21dd202ce6e4a87470f5:

  ebpf: Updated eBPF program and skeleton. (2024-03-12 19:31:47 +0800)


-BEGIN PGP SIGNATURE-

iQEzBAABCAAdFiEEIV1G9IJGaJ7HfzVi7wSWWzmNYhEFAmXwPUAACgkQ7wSWWzmN
YhFnIwgAgctDniJwlRxXB01eVlzXz7IulHnpSby07XEJxENSpGB8ufaeE4eK5gJy
NVK6C2+1EU2vRxm4oIdcvtN4C4/jtRbYYjiSTx7eE4FmSkqshSnR5XCV72LDqG3i
WbzInjMvYfysmcMXLfrWgxOnVew9WqEzlpEWlc7FfNKnkzBVf+JDztfqCUx0XM7H
qefw4ImjqQw993QxJpipXC7aEGUyouB0RIBB71FkCa9ihlh9x7W68evbOI/jTn5q
HWuStgS02sKHjRFliMbdbMY77FNUz4Yroo/GKSvGt64atxkQSJqPNAV+/9n18LNy
QAH5eK6cXFPOIAaYpADU5kHDVVAFiw==
=iBdx
-END PGP SIGNATURE-


Andrew Melnychenko (5):
  ebpf: Added eBPF map update through mmap.
  ebpf: Added eBPF initialization by fds.
  virtio-net: Added property to load eBPF RSS with fds.
  qmp: Added new command to retrieve eBPF blob.
  ebpf: Updated eBPF program and skeleton.

Laurent Vivier (2):
  igb: fix link state on resume
  e1000e: fix link state on resume

Nick Briggs (1):
  Avoid unaligned fetch in ladr_match()

 ebpf/ebpf.c|   69 +++
 ebpf/ebpf.h|   29 +
 ebpf/ebpf_rss-stub.c   |6 +
 ebpf/ebpf_rss.c|  149 -
 ebpf/ebpf_rss.h|   10 +
 ebpf/meson.build   |2 +-
 ebpf/rss.bpf.skeleton.h| 1343 
 ebpf/trace.h   |1 -
 hw/net/e1000e_core.c   |   60 +-
 hw/net/e1000e_core.h   |2 -
 hw/net/igb_core.c  |   51 +-
 hw/net/igb_core.h  |2 -
 hw/net/pcnet.c |2 +-
 hw/net/virtio-net.c|   54 +-
 include/hw/virtio/virtio-net.h |2 +
 meson.build|   10 +-
 qapi/ebpf.json |   66 ++
 qapi/meson.build   |1 +
 qapi/qapi-schema.json  |1 +
 tools/ebpf/rss.bpf.c   |7 +-
 20 files changed, 1058 insertions(+), 809 deletions(-)
 create mode 100644 ebpf/ebpf.c
 create mode 100644 ebpf/ebpf.h
 delete mode 100644 ebpf/trace.h
 create mode 100644 qapi/ebpf.json




Re: [PATCH v9 4/5] qmp: Added new command to retrieve eBPF blob.

2024-03-08 Thread Jason Wang
On Tue, Feb 6, 2024 at 12:55 AM Andrew Melnychenko  wrote:
>
> Now, the binary objects may be retrieved by id.
> It would require for future qmp commands that may require specific
> eBPF blob.
>
> Added command "request-ebpf". This command returns
> eBPF program encoded base64. The program taken from the
> skeleton and essentially is an ELF object that can be
> loaded in the future with libbpf.
>
> The reason to use the command to provide the eBPF object
> instead of a separate artifact was to avoid issues related
> to finding the eBPF itself. eBPF object is an ELF binary
> that contains the eBPF program and eBPF map description(BTF).
> Overall, eBPF object should contain the program and enough
> metadata to create/load eBPF with libbpf. As the eBPF
> maps/program should correspond to QEMU, the eBPF can't
> be used from different QEMU build.
>
> The first solution was a helper that comes with QEMU
> and loads appropriate eBPF objects. And the issue is
> to find a proper helper if the system has several
> different QEMUs installed and/or built from the source,
> which helpers may not be compatible.
>
> Another issue is QEMU updating while there is a running
> QEMU instance. With an updated helper, it may not be
> possible to hotplug virtio-net device to the already
> running QEMU. Overall, requesting the eBPF object from
> QEMU itself solves possible failures with acceptable effort.
>
> Links:
> [PATCH 3/5] qmp: Added the helper stamp check.
> https://lore.kernel.org/all/20230219162100.174318-4-and...@daynix.com/
>
> Signed-off-by: Andrew Melnychenko 
> ---
>  ebpf/ebpf.c   | 69 +++

Let's add ebpf.c to MAINTAINERS otherwise CI may warn like:

https://gitlab.com/jasowang/qemu/-/jobs/6349138969

Thanks




Re: [PATCH v2 2/2] e1000e: fix link state on resume

2024-03-08 Thread Jason Wang
On Tue, Mar 5, 2024 at 6:07 PM Laurent Vivier  wrote:
>
> On 2/1/24 06:45, Jason Wang wrote:
> > On Wed, Jan 24, 2024 at 6:40 PM Laurent Vivier  wrote:
> >>
> >> On resume e1000e_vm_state_change() always calls e1000e_autoneg_resume()
> >> that sets link_down to false, and thus activates the link even
> >> if we have disabled it.
> >>
> >> The problem can be reproduced starting qemu in paused state (-S) and
> >> then set the link to down. When we resume the machine the link appears
> >> to be up.
> >>
> >> Reproducer:
> >>
> >> # qemu-system-x86_64 ... -device e1000e,netdev=netdev0,id=net0 -S
> >>
> >> {"execute": "qmp_capabilities" }
> >> {"execute": "set_link", "arguments": {"name": "net0", "up": false}}
> >> {"execute": "cont" }
> >>
> >> To fix the problem, merge the content of e1000e_vm_state_change()
> >> into e1000e_core_post_load() as e1000 does.
> >>
> >> Buglink: https://issues.redhat.com/browse/RHEL-21867
> >> Fixes: 6f3fbe4ed06a ("net: Introduce e1000e device emulation")
> >> Suggested-by: Akihiko Odaki 
> >> Signed-off-by: Laurent Vivier 
> >> ---
> >>
> >
> > I've queued this.
>
> Ping?
>
> Thanks,
> Laurent
>

This fail CI at:

https://gitlab.com/jasowang/qemu/-/jobs/6348725267

It looks to me we can safely drop e1000e_autoneg_pause()?

Thanks




Re: [PATCH v9 0/5] eBPF RSS through QMP support.

2024-03-08 Thread Jason Wang
On Fri, Mar 8, 2024 at 2:30 PM Jason Wang  wrote:
>
> On Mon, Feb 26, 2024 at 6:23 PM Andrew Melnichenko  wrote:
> >
> > Hi all,
> > Jason, can you please review the patch set, thank you.
>
> Queued.
>
> Thanks

This seems to fail CI at:

https://gitlab.com/jasowang/qemu/-/jobs/6348725269

Please fix this.

Thanks




Re: [PATCH v9 0/5] eBPF RSS through QMP support.

2024-03-07 Thread Jason Wang
On Mon, Feb 26, 2024 at 6:23 PM Andrew Melnichenko  wrote:
>
> Hi all,
> Jason, can you please review the patch set, thank you.

Queued.

Thanks




  1   2   3   4   5   6   7   8   9   10   >