date:20221122

Re: [PATCH] vhost-vdpa: skip TPM CRB memory section

2022-11-22 Thread Eugenio Perez Martin

On Wed, Nov 23, 2022 at 8:20 AM Marc-André Lureau
 wrote:
>
> Hi
>
> On Wed, Nov 23, 2022 at 12:32 AM Michael S. Tsirkin  wrote:
>>
>> On Tue, Nov 22, 2022 at 06:53:49PM +0400, marcandre.lur...@redhat.com wrote:
>> > From: Marc-André Lureau 
>> >
>> > 851d6d1a0f ("vfio/common: remove spurious tpm-crb-cmd misalignment
>> > warning") removed the warning on vfio_listener_region_add() path.
>> >
>> > An error is reported for vhost-vdpa case:
>> > qemu-kvm: vhost_vdpa_listener_region_add received unaligned region
>> >
>> > Skip the CRB device.
>> >
>> > Fixes:
>> > https://bugzilla.redhat.com/show_bug.cgi?id=2141965
>> >
>> > Signed-off-by: Marc-André Lureau 
>> > ---
>> >  hw/virtio/vhost-vdpa.c | 6 ++
>> >  1 file changed, 6 insertions(+)
>> >
>> > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
>> > index 7468e44b87..9d7206e4b8 100644
>> > --- a/hw/virtio/vhost-vdpa.c
>> > +++ b/hw/virtio/vhost-vdpa.c
>> > @@ -19,6 +19,7 @@
>> >  #include "hw/virtio/virtio-net.h"
>> >  #include "hw/virtio/vhost-shadow-virtqueue.h"
>> >  #include "hw/virtio/vhost-vdpa.h"
>> > +#include "sysemu/tpm.h"
>> >  #include "exec/address-spaces.h"
>> >  #include "migration/blocker.h"
>> >  #include "qemu/cutils.h"
>> > @@ -46,6 +47,11 @@ static bool 
>> > vhost_vdpa_listener_skipped_section(MemoryRegionSection *section,
>> >  {
>> >  Int128 llend;
>> >
>> > +if (TPM_IS_CRB(section->mr->owner)) {
>> > +/* The CRB command buffer has its base address unaligned. */
>> > +return true;
>> > +}
>> > +
>>
>> Quite a hack. We can't really keep adding dependency on random devices
>
>
> Agree it's not great. but it's not strictly a dependency. At least you can 
> still build with !CONFIG_TPM.
>
>> to vhost. And would you add hacks like this to listeners?
>> Pls figure out what's special about this buffer.
>> Also if this section is unaligned then doesn't it break up
>> other aligned sections?
>
>
> See the original discussion:
> https://patchew.org/QEMU/20220208133842.112017-1-eric.au...@redhat.com/20220208133842.112017-2-eric.au...@redhat.com/
>

Based on that, I'd tune the comment to something like:
QEMU represents the CRB cmd/response buffer as a standard RAM region
to the listeners, but real HW would not be able to access them.
Besides, these regions may not be properly page aligned.

Is that more accurate?

Thanks!

> It is not clear whether aligning the tpm-crb-cmd region would work 
> (overlapping tpm-crb-mmio).
>
> Peter Maydell said: "There's nothing that guarantees alignment for memory 
> regions at all, whether they're RAM, IO or anything else.".
>
> Maybe vfio/vhost should simply skip those odd regions silently.
>
>
>>
>>
>> >  if ((!memory_region_is_ram(section->mr) &&
>> >   !memory_region_is_iommu(section->mr)) ||
>> >  memory_region_is_protected(section->mr) ||
>> > --
>> > 2.38.1
>>

Re: [PATCH] vhost-vdpa: skip TPM CRB memory section

2022-11-22 Thread Michael S. Tsirkin

On Wed, Nov 23, 2022 at 11:20:41AM +0400, Marc-André Lureau wrote:
> Hi
> 
> On Wed, Nov 23, 2022 at 12:32 AM Michael S. Tsirkin  wrote:
> 
> On Tue, Nov 22, 2022 at 06:53:49PM +0400, marcandre.lur...@redhat.com
> wrote:
> > From: Marc-André Lureau 
> >
> > 851d6d1a0f ("vfio/common: remove spurious tpm-crb-cmd misalignment
> > warning") removed the warning on vfio_listener_region_add() path.
> >
> > An error is reported for vhost-vdpa case:
> > qemu-kvm: vhost_vdpa_listener_region_add received unaligned region
> >
> > Skip the CRB device.
> >
> > Fixes:
> > https://bugzilla.redhat.com/show_bug.cgi?id=2141965
> >
> > Signed-off-by: Marc-André Lureau 
> > ---
> >  hw/virtio/vhost-vdpa.c | 6 ++
> >  1 file changed, 6 insertions(+)
> >
> > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > index 7468e44b87..9d7206e4b8 100644
> > --- a/hw/virtio/vhost-vdpa.c
> > +++ b/hw/virtio/vhost-vdpa.c
> > @@ -19,6 +19,7 @@
> >  #include "hw/virtio/virtio-net.h"
> >  #include "hw/virtio/vhost-shadow-virtqueue.h"
> >  #include "hw/virtio/vhost-vdpa.h"
> > +#include "sysemu/tpm.h"
> >  #include "exec/address-spaces.h"
> >  #include "migration/blocker.h"
> >  #include "qemu/cutils.h"
> > @@ -46,6 +47,11 @@ static bool vhost_vdpa_listener_skipped_section
> (MemoryRegionSection *section,
> >  {
> >      Int128 llend;
> > 
> > +    if (TPM_IS_CRB(section->mr->owner)) {
> > +        /* The CRB command buffer has its base address unaligned. */
> > +        return true;
> > +    }
> > +
> 
> Quite a hack. We can't really keep adding dependency on random devices
> 
> 
> Agree it's not great. but it's not strictly a dependency. At least you can
> still build with !CONFIG_TPM.

but what does it have to do with tpm?


> 
> to vhost. And would you add hacks like this to listeners?
> Pls figure out what's special about this buffer.
> Also if this section is unaligned then doesn't it break up
> other aligned sections?
> 
> 
> See the original discussion:
> https://patchew.org/QEMU/20220208133842.112017-1-eric.au...@redhat.com/
> 20220208133842.112017-2-eric.au...@redhat.com/
> 
> It is not clear whether aligning the tpm-crb-cmd region would work 
> (overlapping
> tpm-crb-mmio).
> 
> Peter Maydell said: "There's nothing that guarantees alignment for memory
> regions at all, whether they're RAM, IO or anything else.".
> 
> Maybe vfio/vhost should simply skip those odd regions silently.


How do we detect them? Are these regions not DMA-able on real hardware?

> 
> 
> 
> 
> >      if ((!memory_region_is_ram(section->mr) &&
> >           !memory_region_is_iommu(section->mr)) ||
> >          memory_region_is_protected(section->mr) ||
> > --
> > 2.38.1
> 
>

Re: [PATCH] vhost-vdpa: skip TPM CRB memory section

2022-11-22 Thread Marc-André Lureau

Hi

On Wed, Nov 23, 2022 at 12:32 AM Michael S. Tsirkin  wrote:

> On Tue, Nov 22, 2022 at 06:53:49PM +0400, marcandre.lur...@redhat.com
> wrote:
> > From: Marc-André Lureau 
> >
> > 851d6d1a0f ("vfio/common: remove spurious tpm-crb-cmd misalignment
> > warning") removed the warning on vfio_listener_region_add() path.
> >
> > An error is reported for vhost-vdpa case:
> > qemu-kvm: vhost_vdpa_listener_region_add received unaligned region
> >
> > Skip the CRB device.
> >
> > Fixes:
> > https://bugzilla.redhat.com/show_bug.cgi?id=2141965
> >
> > Signed-off-by: Marc-André Lureau 
> > ---
> >  hw/virtio/vhost-vdpa.c | 6 ++
> >  1 file changed, 6 insertions(+)
> >
> > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > index 7468e44b87..9d7206e4b8 100644
> > --- a/hw/virtio/vhost-vdpa.c
> > +++ b/hw/virtio/vhost-vdpa.c
> > @@ -19,6 +19,7 @@
> >  #include "hw/virtio/virtio-net.h"
> >  #include "hw/virtio/vhost-shadow-virtqueue.h"
> >  #include "hw/virtio/vhost-vdpa.h"
> > +#include "sysemu/tpm.h"
> >  #include "exec/address-spaces.h"
> >  #include "migration/blocker.h"
> >  #include "qemu/cutils.h"
> > @@ -46,6 +47,11 @@ static bool
> vhost_vdpa_listener_skipped_section(MemoryRegionSection *section,
> >  {
> >  Int128 llend;
> >
> > +if (TPM_IS_CRB(section->mr->owner)) {
> > +/* The CRB command buffer has its base address unaligned. */
> > +return true;
> > +}
> > +
>
> Quite a hack. We can't really keep adding dependency on random devices
>

Agree it's not great. but it's not strictly a dependency. At least you can
still build with !CONFIG_TPM.

to vhost. And would you add hacks like this to listeners?
> Pls figure out what's special about this buffer.
> Also if this section is unaligned then doesn't it break up
> other aligned sections?
>

See the original discussion:
https://patchew.org/QEMU/20220208133842.112017-1-eric.au...@redhat.com/20220208133842.112017-2-eric.au...@redhat.com/

It is not clear whether aligning the tpm-crb-cmd region would work
(overlapping tpm-crb-mmio).

Peter Maydell said: "There's nothing that guarantees alignment for memory
regions at all, whether they're RAM, IO or anything else.".

Maybe vfio/vhost should simply skip those odd regions silently.



>
> >  if ((!memory_region_is_ram(section->mr) &&
> >   !memory_region_is_iommu(section->mr)) ||
> >  memory_region_is_protected(section->mr) ||
> > --
> > 2.38.1
>
>

Re: [PATCH for 7.2?] vhost: fix vq dirt bitmap syncing when vIOMMU is enabled

2022-11-22 Thread Michael S. Tsirkin

On Wed, Nov 23, 2022 at 03:08:25PM +0800, Jason Wang wrote:
> On Wed, Nov 23, 2022 at 2:15 PM Michael S. Tsirkin  wrote:
> >
> > On Wed, Nov 23, 2022 at 01:47:04PM +0800, Jason Wang wrote:
> > > On Wed, Nov 23, 2022 at 1:26 PM Jason Wang  wrote:
> > > >
> > > > On Wed, Nov 23, 2022 at 12:28 AM Eric Auger  
> > > > wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > On 11/22/22 10:43, Michael S. Tsirkin wrote:
> > > > > > On Tue, Nov 22, 2022 at 11:01:11AM +0800, Jason Wang wrote:
> > > > > >> When vIOMMU is enabled, the vq->used_phys is actually the IOVA not
> > > > > >> GPA. So we need to translate it to GPA before the syncing 
> > > > > >> otherwise we
> > > > > >> may hit the following crash since IOVA could be out of the scope of
> > > > > >> the GPA log size. This could be noted when using virtio-IOMMU with
> > > > > >> vhost using 1G memory.
> > > > > > Noted how exactly? What does "using 1G memory" mean?
> > > > >
> > > > > We hit the following assertion:
> > > > >
> > > > > qemu-system-x86_64: ../hw/virtio/vhost.c:85: vhost_dev_sync_region: 
> > > > > Assertion `end / VHOST_LOG_CHUNK < dev->log_size' failed.
> > > > >
> > > > > On the last time vhost_get_log_size() is called it takes into account 
> > > > > 2 regions when computing the log_size:
> > > > > qemu-system-x86_64: vhost_get_log_size region 0 last=0x9 updated 
> > > > > log_size=0x3
> > > > > qemu-system-x86_64: vhost_get_log_size region 1 last=0x3fff 
> > > > > updated log_size=0x1000
> > > > > so in vhost_migration_log() vhost_get_log_size(dev) returns 0x1000
> > > > >
> > > > > In the test case, memory_region_sync_dirty_bitmap() gets called for 
> > > > > mem-machine_mem, vga.vram (several times) and eventually on pc.bios. 
> > > > > This latter is reponsible for the assertion:
> > > > >
> > > > > qemu-system-x86_64: vhost_log_sync calls sync_dirty_map on pc.bios 
> > > > > for the full range
> > > > > qemu-system-x86_64: vhost_sync_dirty_bitmap calls 
> > > > > vhost_dev_sync_region on region 0
> > > > > qemu-system-x86_64: vhost_dev_sync_region end=0x9 < 
> > > > > start=0xfffc
> > > > > qemu-system-x86_64: vhost_sync_dirty_bitmap calls 
> > > > > vhost_dev_sync_region on region 1
> > > > > qemu-system-x86_64: vhost_dev_sync_region end=0x3fff < 
> > > > > start=0xfffc
> > > > > qemu-system-x86_64: vhost_sync_dirty_bitmap calls 
> > > > > vhost_dev_sync_region on vq 0 <-
> > > > > qemu-system-x86_64: vhost_dev_sync_region pc.bios mfirst=0xfffc 
> > > > > mlast=0x rfirst=0xf240 rlast=0xfa45
> > > > > qemu-system-x86_64: vhost_dev_sync_region pc.bios end=0xfa45 
> > > > > VHOST_LOG_CHUNK=0x4 end/VHOST_LOG_CHUNK=0x3fff 
> > > > > dev->log_size=0x1000
> > > > > qemu-system-x86_64: ../hw/virtio/vhost.c:85: vhost_dev_sync_region: 
> > > > > Assertion `end / VHOST_LOG_CHUNK < dev->log_size' failed.
> > > > >
> > > > >
> > > > >
> > > > > "using 1G memory": We hit the issue with a guest started with 1GB 
> > > > > initial RAM.
> > > >
> > > > Yes, so in the case the guest iova allocator may try to use an IOVA
> > > > that is beyond 1G.
> > > >
> > > > >
> > > > >
> > > > > >
> > > > > >> Fixes: c471ad0e9bd46 ("vhost_net: device IOTLB support")
> > > > > >> Cc: qemu-sta...@nongnu.org
> > > > > >> Reported-by: Yalan Zhang 
> > > > > >> Tested-by: Eric Auger 
> > > > > >> Tested-by: Lei Yang 
> > > > > >> Signed-off-by: Jason Wang 
> > > > > >> ---
> > > > > >>  hw/virtio/vhost.c | 65 
> > > > > >> ---
> > > > > >>  1 file changed, 45 insertions(+), 20 deletions(-)
> > > > > >>
> > > > > >> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> > > > > >> index d1c4c20b8c..26b319f34e 100644
> > > > > >> --- a/hw/virtio/vhost.c
> > > > > >> +++ b/hw/virtio/vhost.c
> > > > > >> @@ -106,11 +106,30 @@ static void vhost_dev_sync_region(struct 
> > > > > >> vhost_dev *dev,
> > > > > >>  }
> > > > > >>  }
> > > > > >>
> > > > > >> +static bool vhost_dev_has_iommu(struct vhost_dev *dev)
> > > > > >> +{
> > > > > >> +VirtIODevice *vdev = dev->vdev;
> > > > > >> +
> > > > > >> +/*
> > > > > >> + * For vhost, VIRTIO_F_IOMMU_PLATFORM means the backend 
> > > > > >> support
> > > > > >> + * incremental memory mapping API via IOTLB API. For platform 
> > > > > >> that
> > > > > >> + * does not have IOMMU, there's no need to enable this feature
> > > > > >> + * which may cause unnecessary IOTLB miss/update transactions.
> > > > > >> + */
> > > > > >> +if (vdev) {
> > > > > >> +return virtio_bus_device_iommu_enabled(vdev) &&
> > > > > >> +virtio_host_has_feature(vdev, 
> > > > > >> VIRTIO_F_IOMMU_PLATFORM);
> > > > > >> +} else {
> > > > > >> +return false;
> > > > > >> +}
> > > > > >> +}
> > > > > >> +
> > > > > >>  static int vhost_sync_dirty_bitmap(struct vhost_dev *dev,
> > > > > >> MemoryRegionSection *section,
> > > > > >>

Re: [PATCH for 7.2?] vhost: fix vq dirt bitmap syncing when vIOMMU is enabled

2022-11-22 Thread Jason Wang

On Wed, Nov 23, 2022 at 2:15 PM Michael S. Tsirkin  wrote:
>
> On Wed, Nov 23, 2022 at 01:47:04PM +0800, Jason Wang wrote:
> > On Wed, Nov 23, 2022 at 1:26 PM Jason Wang  wrote:
> > >
> > > On Wed, Nov 23, 2022 at 12:28 AM Eric Auger  wrote:
> > > >
> > > > Hi,
> > > >
> > > > On 11/22/22 10:43, Michael S. Tsirkin wrote:
> > > > > On Tue, Nov 22, 2022 at 11:01:11AM +0800, Jason Wang wrote:
> > > > >> When vIOMMU is enabled, the vq->used_phys is actually the IOVA not
> > > > >> GPA. So we need to translate it to GPA before the syncing otherwise 
> > > > >> we
> > > > >> may hit the following crash since IOVA could be out of the scope of
> > > > >> the GPA log size. This could be noted when using virtio-IOMMU with
> > > > >> vhost using 1G memory.
> > > > > Noted how exactly? What does "using 1G memory" mean?
> > > >
> > > > We hit the following assertion:
> > > >
> > > > qemu-system-x86_64: ../hw/virtio/vhost.c:85: vhost_dev_sync_region: 
> > > > Assertion `end / VHOST_LOG_CHUNK < dev->log_size' failed.
> > > >
> > > > On the last time vhost_get_log_size() is called it takes into account 2 
> > > > regions when computing the log_size:
> > > > qemu-system-x86_64: vhost_get_log_size region 0 last=0x9 updated 
> > > > log_size=0x3
> > > > qemu-system-x86_64: vhost_get_log_size region 1 last=0x3fff updated 
> > > > log_size=0x1000
> > > > so in vhost_migration_log() vhost_get_log_size(dev) returns 0x1000
> > > >
> > > > In the test case, memory_region_sync_dirty_bitmap() gets called for 
> > > > mem-machine_mem, vga.vram (several times) and eventually on pc.bios. 
> > > > This latter is reponsible for the assertion:
> > > >
> > > > qemu-system-x86_64: vhost_log_sync calls sync_dirty_map on pc.bios for 
> > > > the full range
> > > > qemu-system-x86_64: vhost_sync_dirty_bitmap calls vhost_dev_sync_region 
> > > > on region 0
> > > > qemu-system-x86_64: vhost_dev_sync_region end=0x9 < start=0xfffc
> > > > qemu-system-x86_64: vhost_sync_dirty_bitmap calls vhost_dev_sync_region 
> > > > on region 1
> > > > qemu-system-x86_64: vhost_dev_sync_region end=0x3fff < 
> > > > start=0xfffc
> > > > qemu-system-x86_64: vhost_sync_dirty_bitmap calls vhost_dev_sync_region 
> > > > on vq 0 <-
> > > > qemu-system-x86_64: vhost_dev_sync_region pc.bios mfirst=0xfffc 
> > > > mlast=0x rfirst=0xf240 rlast=0xfa45
> > > > qemu-system-x86_64: vhost_dev_sync_region pc.bios end=0xfa45 
> > > > VHOST_LOG_CHUNK=0x4 end/VHOST_LOG_CHUNK=0x3fff dev->log_size=0x1000
> > > > qemu-system-x86_64: ../hw/virtio/vhost.c:85: vhost_dev_sync_region: 
> > > > Assertion `end / VHOST_LOG_CHUNK < dev->log_size' failed.
> > > >
> > > >
> > > >
> > > > "using 1G memory": We hit the issue with a guest started with 1GB 
> > > > initial RAM.
> > >
> > > Yes, so in the case the guest iova allocator may try to use an IOVA
> > > that is beyond 1G.
> > >
> > > >
> > > >
> > > > >
> > > > >> Fixes: c471ad0e9bd46 ("vhost_net: device IOTLB support")
> > > > >> Cc: qemu-sta...@nongnu.org
> > > > >> Reported-by: Yalan Zhang 
> > > > >> Tested-by: Eric Auger 
> > > > >> Tested-by: Lei Yang 
> > > > >> Signed-off-by: Jason Wang 
> > > > >> ---
> > > > >>  hw/virtio/vhost.c | 65 
> > > > >> ---
> > > > >>  1 file changed, 45 insertions(+), 20 deletions(-)
> > > > >>
> > > > >> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> > > > >> index d1c4c20b8c..26b319f34e 100644
> > > > >> --- a/hw/virtio/vhost.c
> > > > >> +++ b/hw/virtio/vhost.c
> > > > >> @@ -106,11 +106,30 @@ static void vhost_dev_sync_region(struct 
> > > > >> vhost_dev *dev,
> > > > >>  }
> > > > >>  }
> > > > >>
> > > > >> +static bool vhost_dev_has_iommu(struct vhost_dev *dev)
> > > > >> +{
> > > > >> +VirtIODevice *vdev = dev->vdev;
> > > > >> +
> > > > >> +/*
> > > > >> + * For vhost, VIRTIO_F_IOMMU_PLATFORM means the backend support
> > > > >> + * incremental memory mapping API via IOTLB API. For platform 
> > > > >> that
> > > > >> + * does not have IOMMU, there's no need to enable this feature
> > > > >> + * which may cause unnecessary IOTLB miss/update transactions.
> > > > >> + */
> > > > >> +if (vdev) {
> > > > >> +return virtio_bus_device_iommu_enabled(vdev) &&
> > > > >> +virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM);
> > > > >> +} else {
> > > > >> +return false;
> > > > >> +}
> > > > >> +}
> > > > >> +
> > > > >>  static int vhost_sync_dirty_bitmap(struct vhost_dev *dev,
> > > > >> MemoryRegionSection *section,
> > > > >> hwaddr first,
> > > > >> hwaddr last)
> > > > >>  {
> > > > >> +IOMMUTLBEntry iotlb;
> > > > > why don't we move this inside the scope where it's used?
> > >
> > > That's fine.
> > >
> > > > >
> > > > >>  int i;
> > > > >>  hwaddr start_addr;
> > > > >>  hwaddr end_addr;
> >

[PATCH 3/2] ppc4xx_sdram: Simplify sdram_ddr_size() to return

2022-11-22 Thread Markus Armbruster

Suggested-by: BALATON Zoltan 
Signed-off-by: Markus Armbruster 
---
 hw/ppc/ppc4xx_sdram.c | 10 +++---
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/hw/ppc/ppc4xx_sdram.c b/hw/ppc/ppc4xx_sdram.c
index 54bf9a2b44..a24c80b1d2 100644
--- a/hw/ppc/ppc4xx_sdram.c
+++ b/hw/ppc/ppc4xx_sdram.c
@@ -192,17 +192,13 @@ static inline hwaddr sdram_ddr_base(uint32_t bcr)
 
 static hwaddr sdram_ddr_size(uint32_t bcr)
 {
-hwaddr size;
-int sh;
+int sh = (bcr >> 17) & 0x7;
 
-sh = (bcr >> 17) & 0x7;
 if (sh == 7) {
-size = -1;
-} else {
-size = (4 * MiB) << sh;
+return -1;
 }
 
-return size;
+return (4 * MiB) << sh;
 }
 
 static uint32_t sdram_ddr_dcr_read(void *opaque, int dcrn)
-- 
2.37.3

Re: [PATCH for-7.2 v2 3/4] libvhost-user: Fix two more format strings

2022-11-22 Thread Michael S. Tsirkin

On Wed, Nov 23, 2022 at 07:49:53AM +0100, Stefan Weil wrote:
> Am 23.11.22 um 07:35 schrieb Stefan Weil:
> > Am 15.11.22 um 08:25 schrieb Stefan Weil:
> > > Am 05.11.22 um 11:24 schrieb Stefan Weil:
> > > 
> > > > This fix is required for 32 bit host. The bug was detected by CI
> > > > for arm-linux, but is also relevant for i386-linux.
> > > > 
> > > > Reported-by: Stefan Hajnoczi 
> > > > Signed-off-by: Stefan Weil 
> > > > ---
> > > >   subprojects/libvhost-user/libvhost-user.c | 3 ++-
> > > >   1 file changed, 2 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/subprojects/libvhost-user/libvhost-user.c
> > > > b/subprojects/libvhost-user/libvhost-user.c
> > > > index d67953a1c3..80f9952e71 100644
> > > > --- a/subprojects/libvhost-user/libvhost-user.c
> > > > +++ b/subprojects/libvhost-user/libvhost-user.c
> > > > @@ -651,7 +651,8 @@ generate_faults(VuDev *dev) {
> > > >   if (ioctl(dev->postcopy_ufd, UFFDIO_REGISTER, _struct)) {
> > > >   vu_panic(dev, "%s: Failed to userfault region %d "
> > > > -  "@%" PRIx64 " + size:%zx offset: %zx:
> > > > (ufd=%d)%s\n",
> > > > +  "@%" PRIx64 " + size:%" PRIx64 "
> > > > offset: %" PRIx64
> > > > +  ": (ufd=%d)%s\n",
> > > >    __func__, i,
> > > >    dev_region->mmap_addr,
> > > >    dev_region->size, dev_region->mmap_offset,
> > > 
> > > 
> > > There is still no review for this patch.
> > > 
> > > I added "for-7.2" to the subject here in my answer. How can we get
> > > all 4 patches of this series into the new release?
> > > 
> > > Stefan
> > 
> > Ping? Those bug fixes are still missing in -rc2.
> > 
> > Stefan
> 
> Hello Michael,
> 
> I just noticed that MAINTAINERS has no entry for the files in
> subprojects/libvhost-user, so I did not cc you in my previous e-mails.
> Should that directory be added to the "vhost" section"?
> 
> Stefan

pls do

Re: [PATCH for-7.2 v2 3/4] libvhost-user: Fix two more format strings

2022-11-22 Thread Stefan Weil via


Am 23.11.22 um 07:35 schrieb Stefan Weil:

Am 15.11.22 um 08:25 schrieb Stefan Weil:

Am 05.11.22 um 11:24 schrieb Stefan Weil:


This fix is required for 32 bit host. The bug was detected by CI
for arm-linux, but is also relevant for i386-linux.

Reported-by: Stefan Hajnoczi 
Signed-off-by: Stefan Weil 
---
  subprojects/libvhost-user/libvhost-user.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/subprojects/libvhost-user/libvhost-user.c 
b/subprojects/libvhost-user/libvhost-user.c

index d67953a1c3..80f9952e71 100644
--- a/subprojects/libvhost-user/libvhost-user.c
+++ b/subprojects/libvhost-user/libvhost-user.c
@@ -651,7 +651,8 @@ generate_faults(VuDev *dev) {
  if (ioctl(dev->postcopy_ufd, UFFDIO_REGISTER, _struct)) {
  vu_panic(dev, "%s: Failed to userfault region %d "
-  "@%" PRIx64 " + size:%zx offset: %zx: 
(ufd=%d)%s\n",
+  "@%" PRIx64 " + size:%" PRIx64 " offset: 
%" PRIx64

+  ": (ufd=%d)%s\n",
   __func__, i,
   dev_region->mmap_addr,
   dev_region->size, dev_region->mmap_offset,



There is still no review for this patch.

I added "for-7.2" to the subject here in my answer. How can we get all 
4 patches of this series into the new release?


Stefan


Ping? Those bug fixes are still missing in -rc2.

Stefan


Hello Michael,

I just noticed that MAINTAINERS has no entry for the files in 
subprojects/libvhost-user, so I did not cc you in my previous e-mails. 
Should that directory be added to the "vhost" section"?


Stefan

Re: [PATCH v5 1/2] sysemu: tpm: Add a stub function for TPM_IS_CRB

2022-11-22 Thread Michael S. Tsirkin

On Fri, May 06, 2022 at 09:47:52AM -0400, Stefan Berger wrote:
> 
> 
> On 5/6/22 09:25, Eric Auger wrote:
> > In a subsequent patch, VFIO will need to recognize if
> > a memory region owner is a TPM CRB device. Hence VFIO
> > needs to use TPM_IS_CRB() even if CONFIG_TPM is unset. So
> > let's add a stub function.
> > 
> > Signed-off-by: Eric Auger 
> > Suggested-by: Cornelia Huck 
> Reviewed-by: Stefan Berger 

... and now in 7.2 vdpa needs a dependency on tpm too, what a hack :(
And what exactly is it about TPM CRB that everyone needs to
know about it and skip it? The API does not tell ...

> > ---
> >   include/sysemu/tpm.h | 6 ++
> >   1 file changed, 6 insertions(+)
> > 
> > diff --git a/include/sysemu/tpm.h b/include/sysemu/tpm.h
> > index 68b2206463c..fb40e30ff60 100644
> > --- a/include/sysemu/tpm.h
> > +++ b/include/sysemu/tpm.h
> > @@ -80,6 +80,12 @@ static inline TPMVersion tpm_get_version(TPMIf *ti)
> >   #define tpm_init()  (0)
> >   #define tpm_cleanup()
> > 
> > +/* needed for an alignment check in non-tpm code */
> > +static inline Object *TPM_IS_CRB(Object *obj)
> > +{
> > + return NULL;
> > +}
> > +
> >   #endif /* CONFIG_TPM */
> > 
> >   #endif /* QEMU_TPM_H */
> 
>

Re: [PATCH for-7.2 v2 3/4] libvhost-user: Fix two more format strings

2022-11-22 Thread Stefan Weil via


Am 15.11.22 um 08:25 schrieb Stefan Weil:

Am 05.11.22 um 11:24 schrieb Stefan Weil:


This fix is required for 32 bit host. The bug was detected by CI
for arm-linux, but is also relevant for i386-linux.

Reported-by: Stefan Hajnoczi 
Signed-off-by: Stefan Weil 
---
  subprojects/libvhost-user/libvhost-user.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/subprojects/libvhost-user/libvhost-user.c 
b/subprojects/libvhost-user/libvhost-user.c

index d67953a1c3..80f9952e71 100644
--- a/subprojects/libvhost-user/libvhost-user.c
+++ b/subprojects/libvhost-user/libvhost-user.c
@@ -651,7 +651,8 @@ generate_faults(VuDev *dev) {
  if (ioctl(dev->postcopy_ufd, UFFDIO_REGISTER, _struct)) {
  vu_panic(dev, "%s: Failed to userfault region %d "
-  "@%" PRIx64 " + size:%zx offset: %zx: 
(ufd=%d)%s\n",
+  "@%" PRIx64 " + size:%" PRIx64 " offset: %" 
PRIx64

+  ": (ufd=%d)%s\n",
   __func__, i,
   dev_region->mmap_addr,
   dev_region->size, dev_region->mmap_offset,



There is still no review for this patch.

I added "for-7.2" to the subject here in my answer. How can we get all 4 
patches of this series into the new release?


Stefan


Ping? Those bug fixes are still missing in -rc2.

Stefan


OpenPGP_0xE08C21D5677450AD.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature

Re: [PATCH v5 0/2] vfio/common: remove spurious tpm-crb-cmd misalignment warning

2022-11-22 Thread Michael S. Tsirkin

On Fri, May 06, 2022 at 03:25:08PM +0200, Eric Auger wrote:
> The CRB command buffer currently is a RAM MemoryRegion and given
> its base address alignment, it causes an error report on
> vfio_listener_region_add(). This region could have been a RAM device
> region, easing the detection of such safe situation but this option
> was not well received.

Eric could you point me at this discussion please?
We are now asked to proliferate stuff like this into vdpa
as well, this just doesn't scale. I'd like to see whether we
can make it a RAM device region after all - was a patch
like that posted?

> So let's add a helper function that uses the
> memory region owner type to detect the situation is safe wrt
> the assignment. Other device types can be checked here if such kind
> of problem occurs again.
> 
> As TPM devices can be compiled out we need to introduce a stub
> for TPM_IS_CRB.
> 
> Best Regards
> 
> Eric
> 
> This series can be found at:
> https://github.com/eauger/qemu/tree/tpm-crb-vfio-v5
> 
> History:
> 
> v4 -> v5:
> - Add sysemu: tpm: Add a stub function for TPM_IS_CRB to fix
>   compilation error if CONFIG_TPM is unset
> 
> Eric Auger (2):
>   sysemu: tpm: Add a stub function for TPM_IS_CRB
>   vfio/common: remove spurious tpm-crb-cmd misalignment warning
> 
>  hw/vfio/common.c | 27 ++-
>  hw/vfio/trace-events |  1 +
>  include/sysemu/tpm.h |  6 ++
>  3 files changed, 33 insertions(+), 1 deletion(-)
> 
> -- 
> 2.35.1
> 
> 
>

Re: [PATCH v1 5/9] hw/virtio: introduce virtio_device_should_start

2022-11-22 Thread Christian Borntraeger


Am 21.11.22 um 23:37 schrieb Michael S. Tsirkin:
[...]

qemu-system-x86_64: ../hw/virtio/vhost-vsock-common.c:203: 
vhost_vsock_common_pre_save: Assertion `!vhost_dev_is_started(>vhost_dev)' 
failed.
2022-11-15 16:38:46.096+: shutting down, reason=crashed


Alex were you able to replicate? Just curious.


Ping?

Re: [PATCH for 7.2?] vhost: fix vq dirt bitmap syncing when vIOMMU is enabled

2022-11-22 Thread Michael S. Tsirkin

On Wed, Nov 23, 2022 at 01:47:04PM +0800, Jason Wang wrote:
> On Wed, Nov 23, 2022 at 1:26 PM Jason Wang  wrote:
> >
> > On Wed, Nov 23, 2022 at 12:28 AM Eric Auger  wrote:
> > >
> > > Hi,
> > >
> > > On 11/22/22 10:43, Michael S. Tsirkin wrote:
> > > > On Tue, Nov 22, 2022 at 11:01:11AM +0800, Jason Wang wrote:
> > > >> When vIOMMU is enabled, the vq->used_phys is actually the IOVA not
> > > >> GPA. So we need to translate it to GPA before the syncing otherwise we
> > > >> may hit the following crash since IOVA could be out of the scope of
> > > >> the GPA log size. This could be noted when using virtio-IOMMU with
> > > >> vhost using 1G memory.
> > > > Noted how exactly? What does "using 1G memory" mean?
> > >
> > > We hit the following assertion:
> > >
> > > qemu-system-x86_64: ../hw/virtio/vhost.c:85: vhost_dev_sync_region: 
> > > Assertion `end / VHOST_LOG_CHUNK < dev->log_size' failed.
> > >
> > > On the last time vhost_get_log_size() is called it takes into account 2 
> > > regions when computing the log_size:
> > > qemu-system-x86_64: vhost_get_log_size region 0 last=0x9 updated 
> > > log_size=0x3
> > > qemu-system-x86_64: vhost_get_log_size region 1 last=0x3fff updated 
> > > log_size=0x1000
> > > so in vhost_migration_log() vhost_get_log_size(dev) returns 0x1000
> > >
> > > In the test case, memory_region_sync_dirty_bitmap() gets called for 
> > > mem-machine_mem, vga.vram (several times) and eventually on pc.bios. This 
> > > latter is reponsible for the assertion:
> > >
> > > qemu-system-x86_64: vhost_log_sync calls sync_dirty_map on pc.bios for 
> > > the full range
> > > qemu-system-x86_64: vhost_sync_dirty_bitmap calls vhost_dev_sync_region 
> > > on region 0
> > > qemu-system-x86_64: vhost_dev_sync_region end=0x9 < start=0xfffc
> > > qemu-system-x86_64: vhost_sync_dirty_bitmap calls vhost_dev_sync_region 
> > > on region 1
> > > qemu-system-x86_64: vhost_dev_sync_region end=0x3fff < 
> > > start=0xfffc
> > > qemu-system-x86_64: vhost_sync_dirty_bitmap calls vhost_dev_sync_region 
> > > on vq 0 <-
> > > qemu-system-x86_64: vhost_dev_sync_region pc.bios mfirst=0xfffc 
> > > mlast=0x rfirst=0xf240 rlast=0xfa45
> > > qemu-system-x86_64: vhost_dev_sync_region pc.bios end=0xfa45 
> > > VHOST_LOG_CHUNK=0x4 end/VHOST_LOG_CHUNK=0x3fff dev->log_size=0x1000
> > > qemu-system-x86_64: ../hw/virtio/vhost.c:85: vhost_dev_sync_region: 
> > > Assertion `end / VHOST_LOG_CHUNK < dev->log_size' failed.
> > >
> > >
> > >
> > > "using 1G memory": We hit the issue with a guest started with 1GB initial 
> > > RAM.
> >
> > Yes, so in the case the guest iova allocator may try to use an IOVA
> > that is beyond 1G.
> >
> > >
> > >
> > > >
> > > >> Fixes: c471ad0e9bd46 ("vhost_net: device IOTLB support")
> > > >> Cc: qemu-sta...@nongnu.org
> > > >> Reported-by: Yalan Zhang 
> > > >> Tested-by: Eric Auger 
> > > >> Tested-by: Lei Yang 
> > > >> Signed-off-by: Jason Wang 
> > > >> ---
> > > >>  hw/virtio/vhost.c | 65 ---
> > > >>  1 file changed, 45 insertions(+), 20 deletions(-)
> > > >>
> > > >> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> > > >> index d1c4c20b8c..26b319f34e 100644
> > > >> --- a/hw/virtio/vhost.c
> > > >> +++ b/hw/virtio/vhost.c
> > > >> @@ -106,11 +106,30 @@ static void vhost_dev_sync_region(struct 
> > > >> vhost_dev *dev,
> > > >>  }
> > > >>  }
> > > >>
> > > >> +static bool vhost_dev_has_iommu(struct vhost_dev *dev)
> > > >> +{
> > > >> +VirtIODevice *vdev = dev->vdev;
> > > >> +
> > > >> +/*
> > > >> + * For vhost, VIRTIO_F_IOMMU_PLATFORM means the backend support
> > > >> + * incremental memory mapping API via IOTLB API. For platform that
> > > >> + * does not have IOMMU, there's no need to enable this feature
> > > >> + * which may cause unnecessary IOTLB miss/update transactions.
> > > >> + */
> > > >> +if (vdev) {
> > > >> +return virtio_bus_device_iommu_enabled(vdev) &&
> > > >> +virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM);
> > > >> +} else {
> > > >> +return false;
> > > >> +}
> > > >> +}
> > > >> +
> > > >>  static int vhost_sync_dirty_bitmap(struct vhost_dev *dev,
> > > >> MemoryRegionSection *section,
> > > >> hwaddr first,
> > > >> hwaddr last)
> > > >>  {
> > > >> +IOMMUTLBEntry iotlb;
> > > > why don't we move this inside the scope where it's used?
> >
> > That's fine.
> >
> > > >
> > > >>  int i;
> > > >>  hwaddr start_addr;
> > > >>  hwaddr end_addr;
> > > >> @@ -132,13 +151,37 @@ static int vhost_sync_dirty_bitmap(struct 
> > > >> vhost_dev *dev,
> > > >>  }
> > > >>  for (i = 0; i < dev->nvqs; ++i) {
> > > >>  struct vhost_virtqueue *vq = dev->vqs + i;
> > > >> +hwaddr used_phys = vq->used_phys, used_size = vq->used_size;
> > > >> +

Re: [PATCH for 7.2?] vhost: fix vq dirt bitmap syncing when vIOMMU is enabled

2022-11-22 Thread Jason Wang

On Wed, Nov 23, 2022 at 1:26 PM Jason Wang  wrote:
>
> On Wed, Nov 23, 2022 at 12:28 AM Eric Auger  wrote:
> >
> > Hi,
> >
> > On 11/22/22 10:43, Michael S. Tsirkin wrote:
> > > On Tue, Nov 22, 2022 at 11:01:11AM +0800, Jason Wang wrote:
> > >> When vIOMMU is enabled, the vq->used_phys is actually the IOVA not
> > >> GPA. So we need to translate it to GPA before the syncing otherwise we
> > >> may hit the following crash since IOVA could be out of the scope of
> > >> the GPA log size. This could be noted when using virtio-IOMMU with
> > >> vhost using 1G memory.
> > > Noted how exactly? What does "using 1G memory" mean?
> >
> > We hit the following assertion:
> >
> > qemu-system-x86_64: ../hw/virtio/vhost.c:85: vhost_dev_sync_region: 
> > Assertion `end / VHOST_LOG_CHUNK < dev->log_size' failed.
> >
> > On the last time vhost_get_log_size() is called it takes into account 2 
> > regions when computing the log_size:
> > qemu-system-x86_64: vhost_get_log_size region 0 last=0x9 updated 
> > log_size=0x3
> > qemu-system-x86_64: vhost_get_log_size region 1 last=0x3fff updated 
> > log_size=0x1000
> > so in vhost_migration_log() vhost_get_log_size(dev) returns 0x1000
> >
> > In the test case, memory_region_sync_dirty_bitmap() gets called for 
> > mem-machine_mem, vga.vram (several times) and eventually on pc.bios. This 
> > latter is reponsible for the assertion:
> >
> > qemu-system-x86_64: vhost_log_sync calls sync_dirty_map on pc.bios for the 
> > full range
> > qemu-system-x86_64: vhost_sync_dirty_bitmap calls vhost_dev_sync_region on 
> > region 0
> > qemu-system-x86_64: vhost_dev_sync_region end=0x9 < start=0xfffc
> > qemu-system-x86_64: vhost_sync_dirty_bitmap calls vhost_dev_sync_region on 
> > region 1
> > qemu-system-x86_64: vhost_dev_sync_region end=0x3fff < start=0xfffc
> > qemu-system-x86_64: vhost_sync_dirty_bitmap calls vhost_dev_sync_region on 
> > vq 0 <-
> > qemu-system-x86_64: vhost_dev_sync_region pc.bios mfirst=0xfffc 
> > mlast=0x rfirst=0xf240 rlast=0xfa45
> > qemu-system-x86_64: vhost_dev_sync_region pc.bios end=0xfa45 
> > VHOST_LOG_CHUNK=0x4 end/VHOST_LOG_CHUNK=0x3fff dev->log_size=0x1000
> > qemu-system-x86_64: ../hw/virtio/vhost.c:85: vhost_dev_sync_region: 
> > Assertion `end / VHOST_LOG_CHUNK < dev->log_size' failed.
> >
> >
> >
> > "using 1G memory": We hit the issue with a guest started with 1GB initial 
> > RAM.
>
> Yes, so in the case the guest iova allocator may try to use an IOVA
> that is beyond 1G.
>
> >
> >
> > >
> > >> Fixes: c471ad0e9bd46 ("vhost_net: device IOTLB support")
> > >> Cc: qemu-sta...@nongnu.org
> > >> Reported-by: Yalan Zhang 
> > >> Tested-by: Eric Auger 
> > >> Tested-by: Lei Yang 
> > >> Signed-off-by: Jason Wang 
> > >> ---
> > >>  hw/virtio/vhost.c | 65 ---
> > >>  1 file changed, 45 insertions(+), 20 deletions(-)
> > >>
> > >> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> > >> index d1c4c20b8c..26b319f34e 100644
> > >> --- a/hw/virtio/vhost.c
> > >> +++ b/hw/virtio/vhost.c
> > >> @@ -106,11 +106,30 @@ static void vhost_dev_sync_region(struct vhost_dev 
> > >> *dev,
> > >>  }
> > >>  }
> > >>
> > >> +static bool vhost_dev_has_iommu(struct vhost_dev *dev)
> > >> +{
> > >> +VirtIODevice *vdev = dev->vdev;
> > >> +
> > >> +/*
> > >> + * For vhost, VIRTIO_F_IOMMU_PLATFORM means the backend support
> > >> + * incremental memory mapping API via IOTLB API. For platform that
> > >> + * does not have IOMMU, there's no need to enable this feature
> > >> + * which may cause unnecessary IOTLB miss/update transactions.
> > >> + */
> > >> +if (vdev) {
> > >> +return virtio_bus_device_iommu_enabled(vdev) &&
> > >> +virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM);
> > >> +} else {
> > >> +return false;
> > >> +}
> > >> +}
> > >> +
> > >>  static int vhost_sync_dirty_bitmap(struct vhost_dev *dev,
> > >> MemoryRegionSection *section,
> > >> hwaddr first,
> > >> hwaddr last)
> > >>  {
> > >> +IOMMUTLBEntry iotlb;
> > > why don't we move this inside the scope where it's used?
>
> That's fine.
>
> > >
> > >>  int i;
> > >>  hwaddr start_addr;
> > >>  hwaddr end_addr;
> > >> @@ -132,13 +151,37 @@ static int vhost_sync_dirty_bitmap(struct 
> > >> vhost_dev *dev,
> > >>  }
> > >>  for (i = 0; i < dev->nvqs; ++i) {
> > >>  struct vhost_virtqueue *vq = dev->vqs + i;
> > >> +hwaddr used_phys = vq->used_phys, used_size = vq->used_size;
> > >> +hwaddr phys, s;
> > > these two, too.
>
> Right.
>
> > >
> > >>
> > >>  if (!vq->used_phys && !vq->used_size) {
> > >>  continue;
> > >>  }
> > >>
> > >> -vhost_dev_sync_region(dev, section, start_addr, end_addr, 
> > >> vq->used_phys,
> > >> -

Re: [PATCH for 7.2?] vhost: fix vq dirt bitmap syncing when vIOMMU is enabled

2022-11-22 Thread Jason Wang

On Wed, Nov 23, 2022 at 12:28 AM Eric Auger  wrote:
>
> Hi,
>
> On 11/22/22 10:43, Michael S. Tsirkin wrote:
> > On Tue, Nov 22, 2022 at 11:01:11AM +0800, Jason Wang wrote:
> >> When vIOMMU is enabled, the vq->used_phys is actually the IOVA not
> >> GPA. So we need to translate it to GPA before the syncing otherwise we
> >> may hit the following crash since IOVA could be out of the scope of
> >> the GPA log size. This could be noted when using virtio-IOMMU with
> >> vhost using 1G memory.
> > Noted how exactly? What does "using 1G memory" mean?
>
> We hit the following assertion:
>
> qemu-system-x86_64: ../hw/virtio/vhost.c:85: vhost_dev_sync_region: Assertion 
> `end / VHOST_LOG_CHUNK < dev->log_size' failed.
>
> On the last time vhost_get_log_size() is called it takes into account 2 
> regions when computing the log_size:
> qemu-system-x86_64: vhost_get_log_size region 0 last=0x9 updated 
> log_size=0x3
> qemu-system-x86_64: vhost_get_log_size region 1 last=0x3fff updated 
> log_size=0x1000
> so in vhost_migration_log() vhost_get_log_size(dev) returns 0x1000
>
> In the test case, memory_region_sync_dirty_bitmap() gets called for 
> mem-machine_mem, vga.vram (several times) and eventually on pc.bios. This 
> latter is reponsible for the assertion:
>
> qemu-system-x86_64: vhost_log_sync calls sync_dirty_map on pc.bios for the 
> full range
> qemu-system-x86_64: vhost_sync_dirty_bitmap calls vhost_dev_sync_region on 
> region 0
> qemu-system-x86_64: vhost_dev_sync_region end=0x9 < start=0xfffc
> qemu-system-x86_64: vhost_sync_dirty_bitmap calls vhost_dev_sync_region on 
> region 1
> qemu-system-x86_64: vhost_dev_sync_region end=0x3fff < start=0xfffc
> qemu-system-x86_64: vhost_sync_dirty_bitmap calls vhost_dev_sync_region on vq 
> 0 <-
> qemu-system-x86_64: vhost_dev_sync_region pc.bios mfirst=0xfffc 
> mlast=0x rfirst=0xf240 rlast=0xfa45
> qemu-system-x86_64: vhost_dev_sync_region pc.bios end=0xfa45 
> VHOST_LOG_CHUNK=0x4 end/VHOST_LOG_CHUNK=0x3fff dev->log_size=0x1000
> qemu-system-x86_64: ../hw/virtio/vhost.c:85: vhost_dev_sync_region: Assertion 
> `end / VHOST_LOG_CHUNK < dev->log_size' failed.
>
>
>
> "using 1G memory": We hit the issue with a guest started with 1GB initial RAM.

Yes, so in the case the guest iova allocator may try to use an IOVA
that is beyond 1G.

>
>
> >
> >> Fixes: c471ad0e9bd46 ("vhost_net: device IOTLB support")
> >> Cc: qemu-sta...@nongnu.org
> >> Reported-by: Yalan Zhang 
> >> Tested-by: Eric Auger 
> >> Tested-by: Lei Yang 
> >> Signed-off-by: Jason Wang 
> >> ---
> >>  hw/virtio/vhost.c | 65 ---
> >>  1 file changed, 45 insertions(+), 20 deletions(-)
> >>
> >> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> >> index d1c4c20b8c..26b319f34e 100644
> >> --- a/hw/virtio/vhost.c
> >> +++ b/hw/virtio/vhost.c
> >> @@ -106,11 +106,30 @@ static void vhost_dev_sync_region(struct vhost_dev 
> >> *dev,
> >>  }
> >>  }
> >>
> >> +static bool vhost_dev_has_iommu(struct vhost_dev *dev)
> >> +{
> >> +VirtIODevice *vdev = dev->vdev;
> >> +
> >> +/*
> >> + * For vhost, VIRTIO_F_IOMMU_PLATFORM means the backend support
> >> + * incremental memory mapping API via IOTLB API. For platform that
> >> + * does not have IOMMU, there's no need to enable this feature
> >> + * which may cause unnecessary IOTLB miss/update transactions.
> >> + */
> >> +if (vdev) {
> >> +return virtio_bus_device_iommu_enabled(vdev) &&
> >> +virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM);
> >> +} else {
> >> +return false;
> >> +}
> >> +}
> >> +
> >>  static int vhost_sync_dirty_bitmap(struct vhost_dev *dev,
> >> MemoryRegionSection *section,
> >> hwaddr first,
> >> hwaddr last)
> >>  {
> >> +IOMMUTLBEntry iotlb;
> > why don't we move this inside the scope where it's used?

That's fine.

> >
> >>  int i;
> >>  hwaddr start_addr;
> >>  hwaddr end_addr;
> >> @@ -132,13 +151,37 @@ static int vhost_sync_dirty_bitmap(struct vhost_dev 
> >> *dev,
> >>  }
> >>  for (i = 0; i < dev->nvqs; ++i) {
> >>  struct vhost_virtqueue *vq = dev->vqs + i;
> >> +hwaddr used_phys = vq->used_phys, used_size = vq->used_size;
> >> +hwaddr phys, s;
> > these two, too.

Right.

> >
> >>
> >>  if (!vq->used_phys && !vq->used_size) {
> >>  continue;
> >>  }
> >>
> >> -vhost_dev_sync_region(dev, section, start_addr, end_addr, 
> >> vq->used_phys,
> >> -  range_get_last(vq->used_phys, 
> >> vq->used_size));
> >> +if (vhost_dev_has_iommu(dev)) {
> >> +while (used_size) {
> >> +rcu_read_lock();
> >> +iotlb = address_space_get_iotlb_entry(dev->vdev->dma_as,
> >> +

Re: [PATCH] vhost-vdpa: skip TPM CRB memory section

2022-11-22 Thread Jason Wang

On Wed, Nov 23, 2022 at 12:50 AM Eugenio Perez Martin
 wrote:
>
> On Tue, Nov 22, 2022 at 3:53 PM  wrote:
> >
> > From: Marc-André Lureau 
> >
> > 851d6d1a0f ("vfio/common: remove spurious tpm-crb-cmd misalignment
> > warning") removed the warning on vfio_listener_region_add() path.
> >
> > An error is reported for vhost-vdpa case:
> > qemu-kvm: vhost_vdpa_listener_region_add received unaligned region
> >
> > Skip the CRB device.
> >
> > Fixes:
> > https://bugzilla.redhat.com/show_bug.cgi?id=2141965
> >
> > Signed-off-by: Marc-André Lureau 
>
> Acked-by: Eugenio Pérez 
>
> And CCing Jason too.

Adding Cindy.

Acked-by: Jason Wang 

Thanks

>
> > ---
> >  hw/virtio/vhost-vdpa.c | 6 ++
> >  1 file changed, 6 insertions(+)
> >
> > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > index 7468e44b87..9d7206e4b8 100644
> > --- a/hw/virtio/vhost-vdpa.c
> > +++ b/hw/virtio/vhost-vdpa.c
> > @@ -19,6 +19,7 @@
> >  #include "hw/virtio/virtio-net.h"
> >  #include "hw/virtio/vhost-shadow-virtqueue.h"
> >  #include "hw/virtio/vhost-vdpa.h"
> > +#include "sysemu/tpm.h"
> >  #include "exec/address-spaces.h"
> >  #include "migration/blocker.h"
> >  #include "qemu/cutils.h"
> > @@ -46,6 +47,11 @@ static bool 
> > vhost_vdpa_listener_skipped_section(MemoryRegionSection *section,
> >  {
> >  Int128 llend;
> >
> > +if (TPM_IS_CRB(section->mr->owner)) {
> > +/* The CRB command buffer has its base address unaligned. */
> > +return true;
> > +}
> > +
> >  if ((!memory_region_is_ram(section->mr) &&
> >   !memory_region_is_iommu(section->mr)) ||
> >  memory_region_is_protected(section->mr) ||
> > --
> > 2.38.1
> >
>

Re: [PATCH] target/riscv: Dump sstatus CSR in riscv_cpu_dump_state()

2022-11-22 Thread Bin Meng

Hi Alistair,

On Wed, Nov 23, 2022 at 8:03 AM Alistair Francis  wrote:
>
> On Wed, Nov 23, 2022 at 2:07 AM Bin Meng  wrote:
> >
> > sstatus register dump is currently missing in riscv_cpu_dump_state().
> >
> > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1332
> > Signed-off-by: Bin Meng 
> >
> > ---
> >
> >  target/riscv/cpu.c | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
> > index d14e95c9dc..80d76f0181 100644
> > --- a/target/riscv/cpu.c
> > +++ b/target/riscv/cpu.c
> > @@ -382,6 +382,7 @@ static void riscv_cpu_dump_state(CPUState *cs, FILE *f, 
> > int flags)
> >  CSR_MHARTID,
> >  CSR_MSTATUS,
> >  CSR_MSTATUSH,
> > +CSR_SSTATUS,
>
> I don't think we need this. mstatus contains all of the information
> already and there is limited space to print all of this information
> out.
>

I am not sure what limited space restricts this? This is CPU state
dump, and printing sstatus CSR seems reasonable to me. We do the
similar thing in the gdb stub too.

Regards,
Bin

[PATCH] hw/usb: add configuration flags for emulated and passthru usb smartcard

2022-11-22 Thread Jon Maloy

We add two new configuration flags, USB_SMARTCARD_PASSTHRU and
USB_SMARTCARD_EMULATED in order to improve configurability
of these two functionalities.

Signed-off-by: Jon Maloy 
---
 hw/usb/Kconfig | 12 
 hw/usb/meson.build |  4 ++--
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/hw/usb/Kconfig b/hw/usb/Kconfig
index ce4f433976..50a82badd6 100644
--- a/hw/usb/Kconfig
+++ b/hw/usb/Kconfig
@@ -108,6 +108,18 @@ config USB_SMARTCARD
 bool
 default y
 depends on USB
+select USB_SMARTCARD_PASSTHRU
+select USB_SMARTCARD_EMULATED
+
+config USB_SMARTCARD_PASSTHRU
+bool
+default y
+depends on USB
+
+config USB_SMARTCARD_EMULATED
+bool
+default y
+depends on USB
 
 config USB_STORAGE_MTP
 bool
diff --git a/hw/usb/meson.build b/hw/usb/meson.build
index 793df42e21..353006fb6c 100644
--- a/hw/usb/meson.build
+++ b/hw/usb/meson.build
@@ -51,8 +51,8 @@ softmmu_ss.add(when: 'CONFIG_USB_SMARTCARD', if_true: 
files('dev-smartcard-reade
 
 if cacard.found()
   usbsmartcard_ss = ss.source_set()
-  usbsmartcard_ss.add(when: 'CONFIG_USB_SMARTCARD',
-  if_true: [cacard, files('ccid-card-emulated.c', 
'ccid-card-passthru.c')])
+  usbsmartcard_ss.add(when: 'CONFIG_USB_SMARTCARD_EMULATED', if_true: [cacard, 
files('ccid-card-emulated.c')])
+  usbsmartcard_ss.add(when: 'CONFIG_USB_SMARTCARD_PASSTHRU', if_true: [cacard, 
files('ccid-card-passthru.c')])
   hw_usb_modules += {'smartcard': usbsmartcard_ss}
 endif
 
-- 
2.35.3

Re: [PATCH v2 1/7] qemu/main-loop: Introduce QEMU_IOTHREAD_LOCK_GUARD, WITH_QEMU_IOTHREAD_LOCK

2022-11-22 Thread Richard Henderson


On 11/22/22 12:57, Richard Henderson wrote:

+#define QEMU_IOTHREAD_LOCK_GUARD()  \
+g_auto(IOThreadLockAuto) _iothread_lock_auto\
+= qemu_iothread_auto_lock(__FILE__, __LINE__)   \


Need an extra G_GNUC_UNUSED for (some versions of?) clang.
Which is really a bug, because it's always used via the cleanup.


r~

[PATCH v2] hw/loongarch: Add cfi01 pflash device

2022-11-22 Thread Xiaojuan Yang

Add cfi01 pflash device for LoongArch virt machine

Signed-off-by: Xiaojuan Yang 
---
 hw/loongarch/Kconfig|  1 +
 hw/loongarch/acpi-build.c   | 18 +
 hw/loongarch/virt.c | 80 -
 include/hw/loongarch/virt.h |  5 +++
 4 files changed, 103 insertions(+), 1 deletion(-)

diff --git a/hw/loongarch/Kconfig b/hw/loongarch/Kconfig
index 17d15b6c90..eb112af990 100644
--- a/hw/loongarch/Kconfig
+++ b/hw/loongarch/Kconfig
@@ -20,3 +20,4 @@ config LOONGARCH_VIRT
 select ACPI_HW_REDUCED
 select FW_CFG_DMA
 select DIMM
+select PFLASH_CFI01
diff --git a/hw/loongarch/acpi-build.c b/hw/loongarch/acpi-build.c
index 7d5f5a757d..bc5f5fe9dc 100644
--- a/hw/loongarch/acpi-build.c
+++ b/hw/loongarch/acpi-build.c
@@ -279,6 +279,23 @@ static void build_pci_device_aml(Aml *scope, 
LoongArchMachineState *lams)
 acpi_dsdt_add_gpex(scope, );
 }
 
+static void build_flash_aml(Aml *scope, LoongArchMachineState *lams)
+{
+Aml *dev, *crs;
+
+hwaddr flash0_base = VIRT_FLASH0_BASE;
+hwaddr flash0_size = VIRT_FLASH0_SIZE;
+
+dev = aml_device("FLS0");
+aml_append(dev, aml_name_decl("_HID", aml_string("LNRO0015")));
+aml_append(dev, aml_name_decl("_UID", aml_int(0)));
+
+crs = aml_resource_template();
+aml_append(crs, aml_memory32_fixed(flash0_base, flash0_size, 
AML_READ_WRITE));
+aml_append(dev, aml_name_decl("_CRS", crs));
+aml_append(scope, dev);
+}
+
 #ifdef CONFIG_TPM
 static void acpi_dsdt_add_tpm(Aml *scope, LoongArchMachineState *vms)
 {
@@ -328,6 +345,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker, 
MachineState *machine)
 build_uart_device_aml(dsdt);
 build_pci_device_aml(dsdt, lams);
 build_la_ged_aml(dsdt, machine);
+build_flash_aml(dsdt, lams);
 #ifdef CONFIG_TPM
 acpi_dsdt_add_tpm(dsdt, lams);
 #endif
diff --git a/hw/loongarch/virt.c b/hw/loongarch/virt.c
index 958be74fa1..ded77a2f3e 100644
--- a/hw/loongarch/virt.c
+++ b/hw/loongarch/virt.c
@@ -42,6 +42,74 @@
 #include "hw/display/ramfb.h"
 #include "hw/mem/pc-dimm.h"
 #include "sysemu/tpm.h"
+#include "sysemu/block-backend.h"
+#include "hw/block/flash.h"
+
+static PFlashCFI01 *virt_flash_create1(LoongArchMachineState *lams,
+   const char *name,
+   const char *alias_prop_name)
+{
+DeviceState *dev = qdev_new(TYPE_PFLASH_CFI01);
+
+qdev_prop_set_uint64(dev, "sector-length", VIRT_FLASH_SECTOR_SIZE);
+qdev_prop_set_uint8(dev, "width", 4);
+qdev_prop_set_uint8(dev, "device-width", 2);
+qdev_prop_set_bit(dev, "big-endian", false);
+qdev_prop_set_uint16(dev, "id0", 0x89);
+qdev_prop_set_uint16(dev, "id1", 0x18);
+qdev_prop_set_uint16(dev, "id2", 0x00);
+qdev_prop_set_uint16(dev, "id3", 0x00);
+qdev_prop_set_string(dev, "name", name);
+object_property_add_child(OBJECT(lams), name, OBJECT(dev));
+object_property_add_alias(OBJECT(lams), alias_prop_name,
+  OBJECT(dev), "drive");
+return PFLASH_CFI01(dev);
+}
+
+static void virt_flash_create(LoongArchMachineState *lams)
+{
+lams->flash[0] = virt_flash_create1(lams, "virt.flash0", "pflash0");
+}
+
+static void virt_flash_map1(PFlashCFI01 *flash,
+hwaddr base, hwaddr size,
+MemoryRegion *sysmem)
+{
+DeviceState *dev = DEVICE(flash);
+
+assert(QEMU_IS_ALIGNED(size, VIRT_FLASH_SECTOR_SIZE));
+assert(size / VIRT_FLASH_SECTOR_SIZE <= UINT32_MAX);
+
+qdev_prop_set_uint32(dev, "num-blocks", size / VIRT_FLASH_SECTOR_SIZE);
+sysbus_realize_and_unref(SYS_BUS_DEVICE(dev), _fatal);
+memory_region_add_subregion(sysmem, base,
+sysbus_mmio_get_region(SYS_BUS_DEVICE(dev), 
0));
+}
+
+static void virt_flash_map(LoongArchMachineState *lams,
+   MemoryRegion *sysmem)
+{
+PFlashCFI01 *flash0 = lams->flash[0];
+
+virt_flash_map1(flash0, VIRT_FLASH0_BASE, VIRT_FLASH0_SIZE, sysmem);
+}
+
+static void fdt_add_flash_node(LoongArchMachineState *lams)
+{
+MachineState *ms = MACHINE(lams);
+char *nodename;
+
+hwaddr flash0_base = VIRT_FLASH0_BASE;
+hwaddr flash0_size = VIRT_FLASH0_SIZE;
+
+nodename = g_strdup_printf("/flash@%" PRIx64, flash0_base);
+qemu_fdt_add_subnode(ms->fdt, nodename);
+qemu_fdt_setprop_string(ms->fdt, nodename, "compatible", "cfi-flash");
+qemu_fdt_setprop_sized_cells(ms->fdt, nodename, "reg",
+ 2, flash0_base, 2, flash0_size);
+qemu_fdt_setprop_cell(ms->fdt, nodename, "bank-width", 4);
+g_free(nodename);
+}
 
 static void fdt_add_rtc_node(LoongArchMachineState *lams)
 {
@@ -593,9 +661,17 @@ static void loongarch_firmware_init(LoongArchMachineState 
*lams)
 {
 char *filename = MACHINE(lams)->firmware;
 char *bios_name = NULL;
-int bios_size;
+int bios_size, i;
 
 lams->bios_loaded = false;
+/* Map legacy

[PATCH v2] vfio/pci: Verify each MSI vector to avoid invalid MSI vectors

2022-11-22 Thread chenxiang via

From: Xiang Chen 

Currently the number of MSI vectors comes from register PCI_MSI_FLAGS
which should be power-of-2 in qemu, in some scenaries it is not the same as
the number that driver requires in guest, for example, a PCI driver wants
to allocate 6 MSI vecotrs in guest, but as the limitation, it will allocate
8 MSI vectors. So it requires 8 MSI vectors in qemu while the driver in
guest only wants to allocate 6 MSI vectors.

When GICv4.1 is enabled, it iterates over all possible MSIs and enable the
forwarding while the guest has only created some of mappings in the virtual
ITS, so some calls fail. The exception print is as following:
vfio-pci :3a:00.1: irq bypass producer (token 8f08224d) registration
fails:66311

To avoid the issue, verify each MSI vector, skip some operations such as
request_irq() and irq_bypass_register_producer() for those invalid MSI vectors.

Signed-off-by: Xiang Chen 
---
I reported the issue at the link:
https://lkml.kernel.org/lkml/87cze9lcut.wl-...@kernel.org/T/

Change Log:
v1 -> v2:
Verify each MSI vector in kernel instead of adding systemcall according to
Mar's suggestion
---
 arch/arm64/kvm/vgic/vgic-irqfd.c  | 13 +
 arch/arm64/kvm/vgic/vgic-its.c| 36 
 arch/arm64/kvm/vgic/vgic.h|  1 +
 drivers/vfio/pci/vfio_pci_intrs.c | 33 +
 include/linux/kvm_host.h  |  2 ++
 5 files changed, 85 insertions(+)

diff --git a/arch/arm64/kvm/vgic/vgic-irqfd.c b/arch/arm64/kvm/vgic/vgic-irqfd.c
index 475059b..71f6af57 100644
--- a/arch/arm64/kvm/vgic/vgic-irqfd.c
+++ b/arch/arm64/kvm/vgic/vgic-irqfd.c
@@ -98,6 +98,19 @@ int kvm_set_msi(struct kvm_kernel_irq_routing_entry *e,
return vgic_its_inject_msi(kvm, );
 }
 
+int kvm_verify_msi(struct kvm *kvm,
+  struct kvm_kernel_irq_routing_entry *irq_entry)
+{
+   struct kvm_msi msi;
+
+   if (!vgic_has_its(kvm))
+   return -ENODEV;
+
+   kvm_populate_msi(irq_entry, );
+
+   return vgic_its_verify_msi(kvm, );
+}
+
 /**
  * kvm_arch_set_irq_inatomic: fast-path for irqfd injection
  */
diff --git a/arch/arm64/kvm/vgic/vgic-its.c b/arch/arm64/kvm/vgic/vgic-its.c
index 94a666d..8312a4a 100644
--- a/arch/arm64/kvm/vgic/vgic-its.c
+++ b/arch/arm64/kvm/vgic/vgic-its.c
@@ -767,6 +767,42 @@ int vgic_its_inject_cached_translation(struct kvm *kvm, 
struct kvm_msi *msi)
return 0;
 }
 
+int vgic_its_verify_msi(struct kvm *kvm, struct kvm_msi *msi)
+{
+   struct vgic_its *its;
+   struct its_ite *ite;
+   struct kvm_vcpu *vcpu;
+   int ret = 0;
+
+   if (!irqchip_in_kernel(kvm) || (msi->flags & ~KVM_MSI_VALID_DEVID))
+   return -EINVAL;
+
+   if (!vgic_has_its(kvm))
+   return -ENODEV;
+
+   its = vgic_msi_to_its(kvm, msi);
+   if (IS_ERR(its))
+   return PTR_ERR(its);
+
+   mutex_lock(>its_lock);
+   if (!its->enabled) {
+   ret = -EBUSY;
+   goto unlock;
+   }
+   ite = find_ite(its, msi->devid, msi->data);
+   if (!ite || !its_is_collection_mapped(ite->collection)) {
+   ret = E_ITS_INT_UNMAPPED_INTERRUPT;
+   goto unlock;
+   }
+
+   vcpu = kvm_get_vcpu(kvm, ite->collection->target_addr);
+   if (!vcpu)
+   ret = E_ITS_INT_UNMAPPED_INTERRUPT;
+unlock:
+   mutex_unlock(>its_lock);
+   return ret;
+}
+
 /*
  * Queries the KVM IO bus framework to get the ITS pointer from the given
  * doorbell address.
diff --git a/arch/arm64/kvm/vgic/vgic.h b/arch/arm64/kvm/vgic/vgic.h
index 0c8da72..d452150 100644
--- a/arch/arm64/kvm/vgic/vgic.h
+++ b/arch/arm64/kvm/vgic/vgic.h
@@ -240,6 +240,7 @@ int kvm_vgic_register_its_device(void);
 void vgic_enable_lpis(struct kvm_vcpu *vcpu);
 void vgic_flush_pending_lpis(struct kvm_vcpu *vcpu);
 int vgic_its_inject_msi(struct kvm *kvm, struct kvm_msi *msi);
+int vgic_its_verify_msi(struct kvm *kvm, struct kvm_msi *msi);
 int vgic_v3_has_attr_regs(struct kvm_device *dev, struct kvm_device_attr 
*attr);
 int vgic_v3_dist_uaccess(struct kvm_vcpu *vcpu, bool is_write,
 int offset, u32 *val);
diff --git a/drivers/vfio/pci/vfio_pci_intrs.c 
b/drivers/vfio/pci/vfio_pci_intrs.c
index 40c3d7c..3027805 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "vfio_pci_priv.h"
 
@@ -315,6 +316,28 @@ static int vfio_msi_enable(struct vfio_pci_core_device 
*vdev, int nvec, bool msi
return 0;
 }
 
+static int vfio_pci_verify_msi_entry(struct vfio_pci_core_device *vdev,
+   struct eventfd_ctx *trigger)
+{
+   struct kvm *kvm = vdev->vdev.kvm;
+   struct kvm_kernel_irqfd *tmp;
+   struct kvm_kernel_irq_routing_entry irq_entry;
+   int ret = -ENODEV;
+
+   spin_lock_irq(>irqfds.lock);
+   list_for_each_entry(tmp, >irqfds.items, list) {
+

RE: [PATCH V2 0/4] hw/block/nvme: Implement ZNS finish-zone ZDC AEN

2022-11-22 Thread Clay Mayers

> From: Klaus Jensen 
> Sent: Wednesday, November 9, 2022 12:40 AM
> 
> On Oct 21 16:10, clay.may...@kioxia.com wrote:
> > From: Clay Mayers 
> >
> > ZNS controllers have the option to limit the time a zone can remain in
> > the active state.  It begins with a background process in the controller
> > setting the finish-zone-recommended FZR attribute for a zone.  As part of
> > setting this attribute, the zone's id is added to the namespace's
> > zone-descriptor-changed (ZDC) log page. If enabled, items added to the
> > ZDC log page generate a ZDC "asynchronous event notification" AEN.
> Optionally,
> > the control can induce a "zone excursion" forcing the zone into the finished
> > state that also generates a ZDC event.
> >
> > Zone enabled applications need to properly handle ZDC events. In a real
> device,
> > the timeout is many hours making testing an application difficult.
> > Implemented is the generation of FZR ZDC events to speed up O/S and
> application
> > testing.
> >
> > Added to the zoned NVMe command set is an optional, per-namespace timer
> > (zoned.finish_time) to set the FZR attr for long-lived active zones; A per
> > namespace ZDC log page; AEN results to including CQE.DW1 (the NSID of the
> ZDC
> > AEN) and generating a ZDC AEN if it's been enabled. Zone excursions are not
> > modeled.
> >
> > See section 5.5 of the NVMe Zoned Namespace Command Set Specification
> v1.1
> > for more details.
> >
> > Changes since v1
> > - Fixed offset length checking in zdc log page
> > - Moved zdc_event_queued to the patch 4
> > - Unwatched zdc events in nvme_exit()
> >
> > Clay Mayers (4):
> >   hw/block/nvme: add ZONE_FINISH_RECOMMENDED functionality
> >   hw/block/nvme: add zone descriptor changed log page
> >   hw/block/nvme: supply dw1 for aen result
> >   hw/block/nvme: add zone descriptor changed AEN
> >
> >  docs/system/devices/nvme.rst |   5 +
> >  hw/nvme/ctrl.c   | 174 +--
> >  hw/nvme/ns.c |  15 +++
> >  hw/nvme/nvme.h   |  37 +++-
> >  hw/nvme/trace-events |   3 +-
> >  include/block/nvme.h |  14 ++-
> >  6 files changed, 233 insertions(+), 15 deletions(-)
> >
> > --
> > 2.27.0
> >
> 
> Nice work Clay!

Thanks! I think you had implemented excursions at one point and I'm
not sure why it didn't make it in. I originally left out excursions because
all I needed was the AEN to test my Linux patch.  I have a V3 in the works
with excursions so I can test what happens to libzbd applications. I can't
tell if there is real interest in this but I've tested rocksDB using zenfs and
it has difficulties with zone excursions.  What I don't know is if excursions
are a corner case or not - in my world, they are not.

> 
> Series looks good to me,
> 
> Reviewed-by: Klaus Jensen

Re: [PATCH] target/riscv: Dump sstatus CSR in riscv_cpu_dump_state()

2022-11-22 Thread Alistair Francis

On Wed, Nov 23, 2022 at 2:07 AM Bin Meng  wrote:
>
> sstatus register dump is currently missing in riscv_cpu_dump_state().
>
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1332
> Signed-off-by: Bin Meng 
>
> ---
>
>  target/riscv/cpu.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
> index d14e95c9dc..80d76f0181 100644
> --- a/target/riscv/cpu.c
> +++ b/target/riscv/cpu.c
> @@ -382,6 +382,7 @@ static void riscv_cpu_dump_state(CPUState *cs, FILE *f, 
> int flags)
>  CSR_MHARTID,
>  CSR_MSTATUS,
>  CSR_MSTATUSH,
> +CSR_SSTATUS,

I don't think we need this. mstatus contains all of the information
already and there is limited space to print all of this information
out.

Alistair

>  CSR_HSTATUS,
>  CSR_VSSTATUS,
>  CSR_MIP,
> --
> 2.34.1
>
>

[PATCH v2] target/arm: align exposed ID registers with Linux

2022-11-22 Thread Zhuojia Shen

In CPUID registers exposed to userspace, some registers were missing
and some fields were not exposed.  This patch aligns exposed ID
registers and their fields with what the upstream kernel currently
exposes.

Specifically, the following new ID registers/fields are exposed to
userspace:

ID_AA64PFR1_EL1.BT:   bits 3-0
ID_AA64PFR1_EL1.MTE:  bits 11-8
ID_AA64PFR1_EL1.SME:  bits 27-24

ID_AA64ZFR0_EL1.SVEver:   bits 3-0
ID_AA64ZFR0_EL1.AES:  bits 7-4
ID_AA64ZFR0_EL1.BitPerm:  bits 19-16
ID_AA64ZFR0_EL1.BF16: bits 23-20
ID_AA64ZFR0_EL1.SHA3: bits 35-32
ID_AA64ZFR0_EL1.SM4:  bits 43-40
ID_AA64ZFR0_EL1.I8MM: bits 47-44
ID_AA64ZFR0_EL1.F32MM:bits 55-52
ID_AA64ZFR0_EL1.F64MM:bits 59-56

ID_AA64SMFR0_EL1.F32F32:  bit 32
ID_AA64SMFR0_EL1.B16F32:  bit 34
ID_AA64SMFR0_EL1.F16F32:  bit 35
ID_AA64SMFR0_EL1.I8I32:   bits 39-36
ID_AA64SMFR0_EL1.F64F64:  bit 48
ID_AA64SMFR0_EL1.I16I64:  bits 55-52
ID_AA64SMFR0_EL1.FA64:bit 63

ID_AA64MMFR0_EL1.ECV: bits 63-60

ID_AA64MMFR1_EL1.AFP: bits 47-44

ID_AA64MMFR2_EL1.AT:  bits 35-32

ID_AA64ISAR0_EL1.RNDR:bits 63-60

ID_AA64ISAR1_EL1.FRINTTS: bits 35-32
ID_AA64ISAR1_EL1.BF16:bits 47-44
ID_AA64ISAR1_EL1.DGH: bits 51-48
ID_AA64ISAR1_EL1.I8MM:bits 55-52

ID_AA64ISAR2_EL1.WFxT:bits 3-0
ID_AA64ISAR2_EL1.RPRES:   bits 7-4
ID_AA64ISAR2_EL1.GPA3:bits 11-8
ID_AA64ISAR2_EL1.APA3:bits 15-12

The code is also refactored to use symbolic names for ID register fields
for better readability and maintainability.

Signed-off-by: Zhuojia Shen 
---
 target/arm/helper.c | 96 +
 1 file changed, 79 insertions(+), 17 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index d8c8223ec3..82e45be04a 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -7823,31 +7823,89 @@ void register_cp_regs_for_features(ARMCPU *cpu)
 #ifdef CONFIG_USER_ONLY
 static const ARMCPRegUserSpaceInfo v8_user_idregs[] = {
 { .name = "ID_AA64PFR0_EL1",
-  .exported_bits = 0x000f000f00ff,
-  .fixed_bits= 0x0011 },
+  .exported_bits = R_ID_AA64PFR0_FP_MASK |
+   R_ID_AA64PFR0_ADVSIMD_MASK |
+   R_ID_AA64PFR0_SVE_MASK |
+   R_ID_AA64PFR0_DIT_MASK,
+  .fixed_bits = (0x1 << R_ID_AA64PFR0_EL0_SHIFT) |
+(0x1 << R_ID_AA64PFR0_EL1_SHIFT) },
 { .name = "ID_AA64PFR1_EL1",
-  .exported_bits = 0x00f0 },
+  .exported_bits = R_ID_AA64PFR1_BT_MASK |
+   R_ID_AA64PFR1_SSBS_MASK |
+   R_ID_AA64PFR1_MTE_MASK |
+   R_ID_AA64PFR1_SME_MASK },
 { .name = "ID_AA64PFR*_EL1_RESERVED",
-  .is_glob = true },
-{ .name = "ID_AA64ZFR0_EL1"   },
+  .is_glob = true },
+{ .name = "ID_AA64ZFR0_EL1",
+  .exported_bits = R_ID_AA64ZFR0_SVEVER_MASK |
+   R_ID_AA64ZFR0_AES_MASK |
+   R_ID_AA64ZFR0_BITPERM_MASK |
+   R_ID_AA64ZFR0_BFLOAT16_MASK |
+   R_ID_AA64ZFR0_SHA3_MASK |
+   R_ID_AA64ZFR0_SM4_MASK |
+   R_ID_AA64ZFR0_I8MM_MASK |
+   R_ID_AA64ZFR0_F32MM_MASK |
+   R_ID_AA64ZFR0_F64MM_MASK },
+{ .name = "ID_AA64SMFR0_EL1",
+  .exported_bits = R_ID_AA64SMFR0_F32F32_MASK |
+   R_ID_AA64SMFR0_B16F32_MASK |
+   R_ID_AA64SMFR0_F16F32_MASK |
+   R_ID_AA64SMFR0_I8I32_MASK |
+   R_ID_AA64SMFR0_F64F64_MASK |
+   R_ID_AA64SMFR0_I16I64_MASK |
+   R_ID_AA64SMFR0_FA64_MASK },
 { .name = "ID_AA64MMFR0_EL1",
-  .fixed_bits= 0xff00 },
-{ .name = "ID_AA64MMFR1_EL1"  },
+  .exported_bits = R_ID_AA64MMFR0_ECV_MASK,
+  .fixed_bits = (0xf << R_ID_AA64MMFR0_TGRAN64_SHIFT) |
+(0xf << R_ID_AA64MMFR0_TGRAN4_SHIFT) },
+{ .name = "ID_AA64MMFR1_EL1",
+  .exported_bits = R_ID_AA64MMFR1_AFP_MASK },
+{ .name = "ID_AA64MMFR2_EL1",
+  .exported_bits = R_ID_AA64MMFR2_AT_MASK },
 { .name = "ID_AA64MMFR*_EL1_RESERVED",
-  .is_glob = true },
+  .is_glob = true },
 { .name = "ID_AA64DFR0_EL1",
-  .fixed_bits= 0x0006 },
-{ .name = "ID_AA64DFR1_EL1"   },
+  .fixed_bits = (0x6 << R_ID_AA64DFR0_DEBUGVER_SHIFT) },
+{ .name =

Re: [PATCH v2 1/7] qemu/main-loop: Introduce QEMU_IOTHREAD_LOCK_GUARD, WITH_QEMU_IOTHREAD_LOCK

2022-11-22 Thread Philippe Mathieu-Daudé


On 22/11/22 21:57, Richard Henderson wrote:

Create a couple of wrappers for locking/unlocking the iothread lock.

Signed-off-by: Richard Henderson 
---
  include/qemu/main-loop.h | 39 +++
  1 file changed, 39 insertions(+)




+#define QEMU_IOTHREAD_LOCK_GUARD()  \
+g_auto(IOThreadLockAuto) _iothread_lock_auto\
+= qemu_iothread_auto_lock(__FILE__, __LINE__)   \
+
+#define WITH_QEMU_IOTHREAD_LOCK()   \
+for (QEMU_IOTHREAD_LOCK_GUARD();\
+ _iothread_lock_auto.iterate;   \
+ _iothread_lock_auto.iterate = false)


Nice, thanks!

Reviewed-by: Philippe Mathieu-Daudé

[PATCH v2 2/7] hw/mips: Use WITH_QEMU_IOTHREAD_LOCK in cpu_mips_irq_request

2022-11-22 Thread Richard Henderson

Reviewed-by: Paolo Bonzini 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 hw/mips/mips_int.c | 37 ++---
 1 file changed, 14 insertions(+), 23 deletions(-)

diff --git a/hw/mips/mips_int.c b/hw/mips/mips_int.c
index 2db5e10fe0..c22598d353 100644
--- a/hw/mips/mips_int.c
+++ b/hw/mips/mips_int.c
@@ -32,36 +32,27 @@ static void cpu_mips_irq_request(void *opaque, int irq, int 
level)
 MIPSCPU *cpu = opaque;
 CPUMIPSState *env = >env;
 CPUState *cs = CPU(cpu);
-bool locked = false;
 
 if (irq < 0 || irq > 7) {
 return;
 }
 
-/* Make sure locking works even if BQL is already held by the caller */
-if (!qemu_mutex_iothread_locked()) {
-locked = true;
-qemu_mutex_lock_iothread();
-}
+WITH_QEMU_IOTHREAD_LOCK() {
+if (level) {
+env->CP0_Cause |= 1 << (irq + CP0Ca_IP);
+} else {
+env->CP0_Cause &= ~(1 << (irq + CP0Ca_IP));
+}
 
-if (level) {
-env->CP0_Cause |= 1 << (irq + CP0Ca_IP);
-} else {
-env->CP0_Cause &= ~(1 << (irq + CP0Ca_IP));
-}
+if (kvm_enabled() && (irq == 2 || irq == 3)) {
+kvm_mips_set_interrupt(cpu, irq, level);
+}
 
-if (kvm_enabled() && (irq == 2 || irq == 3)) {
-kvm_mips_set_interrupt(cpu, irq, level);
-}
-
-if (env->CP0_Cause & CP0Ca_IP_mask) {
-cpu_interrupt(cs, CPU_INTERRUPT_HARD);
-} else {
-cpu_reset_interrupt(cs, CPU_INTERRUPT_HARD);
-}
-
-if (locked) {
-qemu_mutex_unlock_iothread();
+if (env->CP0_Cause & CP0Ca_IP_mask) {
+cpu_interrupt(cs, CPU_INTERRUPT_HARD);
+} else {
+cpu_reset_interrupt(cs, CPU_INTERRUPT_HARD);
+}
 }
 }
 
-- 
2.34.1

[PATCH v2 1/7] qemu/main-loop: Introduce QEMU_IOTHREAD_LOCK_GUARD, WITH_QEMU_IOTHREAD_LOCK

2022-11-22 Thread Richard Henderson

Create a couple of wrappers for locking/unlocking the iothread lock.

Signed-off-by: Richard Henderson 
---
 include/qemu/main-loop.h | 39 +++
 1 file changed, 39 insertions(+)

diff --git a/include/qemu/main-loop.h b/include/qemu/main-loop.h
index 3c9a9a982d..73c60a9af4 100644
--- a/include/qemu/main-loop.h
+++ b/include/qemu/main-loop.h
@@ -343,6 +343,45 @@ void qemu_mutex_lock_iothread_impl(const char *file, int 
line);
  */
 void qemu_mutex_unlock_iothread(void);
 
+/**
+ * QEMU_IOTHREAD_LOCK_GUARD
+ * WITH_QEMU_IOTHREAD_LOCK
+ *
+ * Wrap a block of code in a conditional qemu_mutex_{lock,unlock}_iothread.
+ */
+typedef struct IOThreadLockAuto {
+bool locked;
+bool iterate;
+} IOThreadLockAuto;
+
+static inline IOThreadLockAuto qemu_iothread_auto_lock(const char *file,
+   int line)
+{
+bool need = !qemu_mutex_iothread_locked();
+if (need) {
+qemu_mutex_lock_iothread_impl(file, line);
+}
+return (IOThreadLockAuto){ .locked = need, .iterate = true };
+}
+
+static inline void qemu_iothread_auto_unlock(IOThreadLockAuto *l)
+{
+if (l->locked) {
+qemu_mutex_unlock_iothread();
+}
+}
+
+G_DEFINE_AUTO_CLEANUP_CLEAR_FUNC(IOThreadLockAuto, qemu_iothread_auto_unlock)
+
+#define QEMU_IOTHREAD_LOCK_GUARD()  \
+g_auto(IOThreadLockAuto) _iothread_lock_auto\
+= qemu_iothread_auto_lock(__FILE__, __LINE__)   \
+
+#define WITH_QEMU_IOTHREAD_LOCK()   \
+for (QEMU_IOTHREAD_LOCK_GUARD();\
+ _iothread_lock_auto.iterate;   \
+ _iothread_lock_auto.iterate = false)
+
 /*
  * qemu_cond_wait_iothread: Wait on condition for the main loop mutex
  *
-- 
2.34.1

[PATCH] hw/intc/arm_gicv3: Fix GICD_TYPER ITLinesNumber advertisement

2022-11-22 Thread Luke Starrett

The ARM GICv3 TRM describes that the ITLinesNumber field of GICD_TYPER
register:

"indicates the maximum SPI INTID that the GIC implementation supports"

As SPI #0 is absolute IRQ #32, the max SPI INTID should have accounted
for the internal 16x SGI's and 16x PPI's.  However, the original GICv3
model subtracted off the SGI/PPI.  Cosmetically this can be seen at OS
boot (Linux) showing 32 shy of what should be there, i.e.:

[0.00] GICv3: 224 SPIs implemented

Though in hw/arm/virt.c, the machine is configured for 256 SPI's.  ARM
virt machine likely doesn't have a problem with this because the upper
32 IRQ's don't actually have anything meaningful wired. But, this does
become a functional issue on a custom use case which wants to make use
of these IRQ's.  Additionally, boot code (i.e. TF-A) will only init up
to the number (blocks of 32) that it believes to actually be there.

Signed-off-by: Luke Starrett 
---
 hw/intc/arm_gicv3_dist.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/intc/arm_gicv3_dist.c b/hw/intc/arm_gicv3_dist.c
index eea0368118..d599fefcbc 100644
--- a/hw/intc/arm_gicv3_dist.c
+++ b/hw/intc/arm_gicv3_dist.c
@@ -390,9 +390,9 @@ static bool gicd_readl(GICv3State *s, hwaddr offset,
  * MBIS == 0 (message-based SPIs not supported)
  * SecurityExtn == 1 if security extns supported
  * CPUNumber == 0 since for us ARE is always 1
- * ITLinesNumber == (num external irqs / 32) - 1
+ * ITLinesNumber == (((max SPI IntID + 1) / 32) - 1)
  */
-int itlinesnumber = ((s->num_irq - GIC_INTERNAL) / 32) - 1;
+int itlinesnumber = (s->num_irq / 32) - 1;
 /*
  * SecurityExtn must be RAZ if GICD_CTLR.DS == 1, and
  * "security extensions not supported" always implies DS == 1,
-- 
2.27.0

Subject: [PATCH] hw/intc/arm_gicv3: Fix GICD_TYPER ITLinesNumber advertisement

2022-11-22 Thread Luke Starrett


The ARM GICv3 TRM describes that the ITLinesNumber field of GICD_TYPER
register:

"indicates the maximum SPI INTID that the GIC implementation supports"

As SPI #0 is absolute IRQ #32, the max SPI INTID should have accounted
for the internal 16x SGI's and 16x PPI's.  However, the original GICv3
model subtracted off the SGI/PPI.  Cosmetically this can be seen at OS
boot (Linux) showing 32 shy of what should be there, i.e.:

    [    0.00] GICv3: 224 SPIs implemented

Though in hw/arm/virt.c, the machine is configured for 256 SPI's. ARM
virt machine likely doesn't have a problem with this because the upper
32 IRQ's don't actually have anything meaningful wired. But, this does
become a functional issue on a custom use case which wants to make use
of these IRQ's.  Additionally, boot code (i.e. TF-A) will only init up
to the number (blocks of 32) that it believes to actually be there.

Signed-off-by: Luke Starrett 
---
 hw/intc/arm_gicv3_dist.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/intc/arm_gicv3_dist.c b/hw/intc/arm_gicv3_dist.c
index eea0368118..d599fefcbc 100644
--- a/hw/intc/arm_gicv3_dist.c
+++ b/hw/intc/arm_gicv3_dist.c
@@ -390,9 +390,9 @@ static bool gicd_readl(GICv3State *s, hwaddr offset,
  * MBIS == 0 (message-based SPIs not supported)
  * SecurityExtn == 1 if security extns supported
  * CPUNumber == 0 since for us ARE is always 1
- * ITLinesNumber == (num external irqs / 32) - 1
+ * ITLinesNumber == (((max SPI IntID + 1) / 32) - 1)
  */
-    int itlinesnumber = ((s->num_irq - GIC_INTERNAL) / 32) - 1;
+    int itlinesnumber = (s->num_irq / 32) - 1;
 /*
  * SecurityExtn must be RAZ if GICD_CTLR.DS == 1, and
  * "security extensions not supported" always implies DS == 1,
--
2.27.0

[PATCH v2 4/7] target/ppc: Use QEMU_IOTHREAD_LOCK_GUARD in cpu_interrupt_exittb

2022-11-22 Thread Richard Henderson

In addition, use tcg_enabled instead of !kvm_enabled.

Reviewed-by: Paolo Bonzini 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Daniel Henrique Barboza 
Signed-off-by: Richard Henderson 
---
 target/ppc/helper_regs.c | 14 --
 1 file changed, 4 insertions(+), 10 deletions(-)

diff --git a/target/ppc/helper_regs.c b/target/ppc/helper_regs.c
index c0aee5855b..779e7db513 100644
--- a/target/ppc/helper_regs.c
+++ b/target/ppc/helper_regs.c
@@ -22,6 +22,7 @@
 #include "qemu/main-loop.h"
 #include "exec/exec-all.h"
 #include "sysemu/kvm.h"
+#include "sysemu/tcg.h"
 #include "helper_regs.h"
 #include "power8-pmu.h"
 #include "cpu-models.h"
@@ -203,17 +204,10 @@ void cpu_interrupt_exittb(CPUState *cs)
 {
 /*
  * We don't need to worry about translation blocks
- * when running with KVM.
+ * unless running with TCG.
  */
-if (kvm_enabled()) {
-return;
-}
-
-if (!qemu_mutex_iothread_locked()) {
-qemu_mutex_lock_iothread();
-cpu_interrupt(cs, CPU_INTERRUPT_EXITTB);
-qemu_mutex_unlock_iothread();
-} else {
+if (tcg_enabled()) {
+QEMU_IOTHREAD_LOCK_GUARD();
 cpu_interrupt(cs, CPU_INTERRUPT_EXITTB);
 }
 }
-- 
2.34.1

[PATCH v2 0/7] main-loop: Introduce QEMU_IOTHREAD_LOCK_GUARD

2022-11-22 Thread Richard Henderson

Simplify the usage of qemu_mutex_lock_iothread.
Split out for ease of review.

Changes for v2:
  * Add WITH_QEMU_IOTHREAD_LOCK and use it a couple of places.
This re-implements patch 1, so r-b's dropped.

r~


Richard Henderson (7):
  qemu/main-loop: Introduce QEMU_IOTHREAD_LOCK_GUARD,
WITH_QEMU_IOTHREAD_LOCK
  hw/mips: Use WITH_QEMU_IOTHREAD_LOCK in cpu_mips_irq_request
  target/ppc: Use QEMU_IOTHREAD_LOCK_GUARD in ppc_maybe_interrupt
  target/ppc: Use QEMU_IOTHREAD_LOCK_GUARD in cpu_interrupt_exittb
  hw/ppc: Use QEMU_IOTHREAD_LOCK_GUARD in ppc_set_irq
  target/riscv: Use QEMU_IOTHREAD_LOCK_GUARD in riscv_cpu_update_mip
  accel/tcg: Use WITH_QEMU_IOTHREAD_LOCK in io_readx/io_writex

 include/qemu/main-loop.h  | 39 +++
 accel/tcg/cputlb.c| 23 ++-
 hw/mips/mips_int.c| 37 ++---
 hw/ppc/ppc.c  | 10 +-
 target/ppc/excp_helper.c  | 11 +--
 target/ppc/helper_regs.c  | 14 --
 target/riscv/cpu_helper.c | 22 +++---
 7 files changed, 72 insertions(+), 84 deletions(-)

-- 
2.34.1

Re: Subject: [PATCH] hw/intc/arm_gicv3: Fix GICD_TYPER ITLinesNumber advertisement

2022-11-22 Thread Luke Starrett


Please disregard this one.  Sent in error.

On 11/22/2022 2:18 PM, Luke Starrett wrote:

The ARM GICv3 TRM describes that the ITLinesNumber field of GICD_TYPER
register:

"indicates the maximum SPI INTID that the GIC implementation supports"

As SPI #0 is absolute IRQ #32, the max SPI INTID should have accounted
for the internal 16x SGI's and 16x PPI's.  However, the original GICv3
model subtracted off the SGI/PPI.  Cosmetically this can be seen at OS
boot (Linux) showing 32 shy of what should be there, i.e.:

    [    0.00] GICv3: 224 SPIs implemented

Though in hw/arm/virt.c, the machine is configured for 256 SPI's. ARM
virt machine likely doesn't have a problem with this because the upper
32 IRQ's don't actually have anything meaningful wired. But, this does
become a functional issue on a custom use case which wants to make use
of these IRQ's.  Additionally, boot code (i.e. TF-A) will only init up
to the number (blocks of 32) that it believes to actually be there.

Signed-off-by: Luke Starrett 
---
 hw/intc/arm_gicv3_dist.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/intc/arm_gicv3_dist.c b/hw/intc/arm_gicv3_dist.c
index eea0368118..d599fefcbc 100644
--- a/hw/intc/arm_gicv3_dist.c
+++ b/hw/intc/arm_gicv3_dist.c
@@ -390,9 +390,9 @@ static bool gicd_readl(GICv3State *s, hwaddr offset,
  * MBIS == 0 (message-based SPIs not supported)
  * SecurityExtn == 1 if security extns supported
  * CPUNumber == 0 since for us ARE is always 1
- * ITLinesNumber == (num external irqs / 32) - 1
+ * ITLinesNumber == (((max SPI IntID + 1) / 32) - 1)
  */
-    int itlinesnumber = ((s->num_irq - GIC_INTERNAL) / 32) - 1;
+    int itlinesnumber = (s->num_irq / 32) - 1;
 /*
  * SecurityExtn must be RAZ if GICD_CTLR.DS == 1, and
  * "security extensions not supported" always implies DS == 1,

[PATCH v2 7/7] accel/tcg: Use WITH_QEMU_IOTHREAD_LOCK in io_readx/io_writex

2022-11-22 Thread Richard Henderson

Narrow the scope of the lock to the actual read/write,
moving the cpu_transation_failed call outside the lock.

Reviewed-by: Paolo Bonzini 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 accel/tcg/cputlb.c | 23 ++-
 1 file changed, 6 insertions(+), 17 deletions(-)

diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index 6f1c00682b..8b86b70c60 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -1356,7 +1356,6 @@ static uint64_t io_readx(CPUArchState *env, 
CPUTLBEntryFull *full,
 MemoryRegionSection *section;
 MemoryRegion *mr;
 uint64_t val;
-bool locked = false;
 MemTxResult r;
 
 section = iotlb_to_section(cpu, full->xlat_section, full->attrs);
@@ -1367,11 +1366,10 @@ static uint64_t io_readx(CPUArchState *env, 
CPUTLBEntryFull *full,
 cpu_io_recompile(cpu, retaddr);
 }
 
-if (!qemu_mutex_iothread_locked()) {
-qemu_mutex_lock_iothread();
-locked = true;
+WITH_QEMU_IOTHREAD_LOCK() {
+r = memory_region_dispatch_read(mr, mr_offset, , op, full->attrs);
 }
-r = memory_region_dispatch_read(mr, mr_offset, , op, full->attrs);
+
 if (r != MEMTX_OK) {
 hwaddr physaddr = mr_offset +
 section->offset_within_address_space -
@@ -1380,10 +1378,6 @@ static uint64_t io_readx(CPUArchState *env, 
CPUTLBEntryFull *full,
 cpu_transaction_failed(cpu, physaddr, addr, memop_size(op), 
access_type,
mmu_idx, full->attrs, r, retaddr);
 }
-if (locked) {
-qemu_mutex_unlock_iothread();
-}
-
 return val;
 }
 
@@ -1410,7 +1404,6 @@ static void io_writex(CPUArchState *env, CPUTLBEntryFull 
*full,
 hwaddr mr_offset;
 MemoryRegionSection *section;
 MemoryRegion *mr;
-bool locked = false;
 MemTxResult r;
 
 section = iotlb_to_section(cpu, full->xlat_section, full->attrs);
@@ -1427,11 +1420,10 @@ static void io_writex(CPUArchState *env, 
CPUTLBEntryFull *full,
  */
 save_iotlb_data(cpu, section, mr_offset);
 
-if (!qemu_mutex_iothread_locked()) {
-qemu_mutex_lock_iothread();
-locked = true;
+WITH_QEMU_IOTHREAD_LOCK() {
+r = memory_region_dispatch_write(mr, mr_offset, val, op, full->attrs);
 }
-r = memory_region_dispatch_write(mr, mr_offset, val, op, full->attrs);
+
 if (r != MEMTX_OK) {
 hwaddr physaddr = mr_offset +
 section->offset_within_address_space -
@@ -1441,9 +1433,6 @@ static void io_writex(CPUArchState *env, CPUTLBEntryFull 
*full,
MMU_DATA_STORE, mmu_idx, full->attrs, r,
retaddr);
 }
-if (locked) {
-qemu_mutex_unlock_iothread();
-}
 }
 
 static inline target_ulong tlb_read_ofs(CPUTLBEntry *entry, size_t ofs)
-- 
2.34.1

[PATCH v2 6/7] target/riscv: Use QEMU_IOTHREAD_LOCK_GUARD in riscv_cpu_update_mip

2022-11-22 Thread Richard Henderson

Reviewed-by: Paolo Bonzini 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Alistair Francis 
Signed-off-by: Richard Henderson 
---
 target/riscv/cpu_helper.c | 22 +++---
 1 file changed, 7 insertions(+), 15 deletions(-)

diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
index 278d163803..4c4b404dad 100644
--- a/target/riscv/cpu_helper.c
+++ b/target/riscv/cpu_helper.c
@@ -610,7 +610,6 @@ uint64_t riscv_cpu_update_mip(RISCVCPU *cpu, uint64_t mask, 
uint64_t value)
 CPURISCVState *env = >env;
 CPUState *cs = CPU(cpu);
 uint64_t gein, vsgein = 0, vstip = 0, old = env->mip;
-bool locked = false;
 
 if (riscv_cpu_virt_enabled(env)) {
 gein = get_field(env->hstatus, HSTATUS_VGEIN);
@@ -621,21 +620,14 @@ uint64_t riscv_cpu_update_mip(RISCVCPU *cpu, uint64_t 
mask, uint64_t value)
 mask = ((mask == MIP_VSTIP) && env->vstime_irq) ? 0 : mask;
 vstip = env->vstime_irq ? MIP_VSTIP : 0;
 
-if (!qemu_mutex_iothread_locked()) {
-locked = true;
-qemu_mutex_lock_iothread();
-}
+WITH_QEMU_IOTHREAD_LOCK() {
+env->mip = (env->mip & ~mask) | (value & mask);
 
-env->mip = (env->mip & ~mask) | (value & mask);
-
-if (env->mip | vsgein | vstip) {
-cpu_interrupt(cs, CPU_INTERRUPT_HARD);
-} else {
-cpu_reset_interrupt(cs, CPU_INTERRUPT_HARD);
-}
-
-if (locked) {
-qemu_mutex_unlock_iothread();
+if (env->mip | vsgein | vstip) {
+cpu_interrupt(cs, CPU_INTERRUPT_HARD);
+} else {
+cpu_reset_interrupt(cs, CPU_INTERRUPT_HARD);
+}
 }
 
 return old;
-- 
2.34.1

[PATCH v2 5/7] hw/ppc: Use QEMU_IOTHREAD_LOCK_GUARD in ppc_set_irq

2022-11-22 Thread Richard Henderson

Reviewed-by: Paolo Bonzini 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Daniel Henrique Barboza 
Signed-off-by: Richard Henderson 
---
 hw/ppc/ppc.c | 10 +-
 1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/hw/ppc/ppc.c b/hw/ppc/ppc.c
index dc86c1c7db..4e816c68c7 100644
--- a/hw/ppc/ppc.c
+++ b/hw/ppc/ppc.c
@@ -44,13 +44,9 @@ void ppc_set_irq(PowerPCCPU *cpu, int irq, int level)
 {
 CPUPPCState *env = >env;
 unsigned int old_pending;
-bool locked = false;
 
 /* We may already have the BQL if coming from the reset path */
-if (!qemu_mutex_iothread_locked()) {
-locked = true;
-qemu_mutex_lock_iothread();
-}
+QEMU_IOTHREAD_LOCK_GUARD();
 
 old_pending = env->pending_interrupts;
 
@@ -67,10 +63,6 @@ void ppc_set_irq(PowerPCCPU *cpu, int irq, int level)
 
 trace_ppc_irq_set_exit(env, irq, level, env->pending_interrupts,
CPU(cpu)->interrupt_request);
-
-if (locked) {
-qemu_mutex_unlock_iothread();
-}
 }
 
 /* PowerPC 6xx / 7xx internal IRQ controller */
-- 
2.34.1

[PATCH v2 3/7] target/ppc: Use QEMU_IOTHREAD_LOCK_GUARD in ppc_maybe_interrupt

2022-11-22 Thread Richard Henderson

Reviewed-by: Paolo Bonzini 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Daniel Henrique Barboza 
Signed-off-by: Richard Henderson 
---
 target/ppc/excp_helper.c | 11 +--
 1 file changed, 1 insertion(+), 10 deletions(-)

diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
index 94adcb766b..8591bb3f73 100644
--- a/target/ppc/excp_helper.c
+++ b/target/ppc/excp_helper.c
@@ -2163,22 +2163,13 @@ static int ppc_next_unmasked_interrupt(CPUPPCState *env)
 void ppc_maybe_interrupt(CPUPPCState *env)
 {
 CPUState *cs = env_cpu(env);
-bool locked = false;
-
-if (!qemu_mutex_iothread_locked()) {
-locked = true;
-qemu_mutex_lock_iothread();
-}
+QEMU_IOTHREAD_LOCK_GUARD();
 
 if (ppc_next_unmasked_interrupt(env)) {
 cpu_interrupt(cs, CPU_INTERRUPT_HARD);
 } else {
 cpu_reset_interrupt(cs, CPU_INTERRUPT_HARD);
 }
-
-if (locked) {
-qemu_mutex_unlock_iothread();
-}
 }
 
 #if defined(TARGET_PPC64)
-- 
2.34.1

Re: [PATCH] multifd: Updated QAPI format for 'migrate' qemu monitor command

2022-11-22 Thread Het Gala




On 22/11/22 2:53 pm, Daniel P. Berrangé wrote:

On Mon, Nov 21, 2022 at 01:40:27PM +0100, Juan Quintela wrote:

Het Gala  wrote:

To prevent double data encoding of uris, instead of passing transport
mechanisms, host address and port all together in form of a single string
and writing different parsing functions, we intend the user to explicitly
mention most of the parameters through the medium of qmp command itself.
+{ 'struct': 'MigrateChannel',
+  'data' : {
+'channeltype' : 'MigrateChannelType',
+'*src-addr' : 'MigrateAddress',
+'dest-addr' : 'MigrateAddress',

Why do we want *both* addresses?

This is part of their goal to have source based routing, so that
traffic exits the src host via a particular NIC.

I think this patch would be better if we split it into two parts.

First introduce "MigrateChannel" struct with *only* the 'dest-addr'
field, and only allow one element in the list, making 'uri' optional.

Basically the first patch would *only* be about switching the
'migrate' command from using a plain string to a MigrateAddress
structure.

A second patch would then extend it to allow multiple MigrateChannel
elements, to support different destinations for each channel.

A third patch would then  extend it to add src-addr to attain the
source based routing.

A fourth patch can then deprecate the existing 'uri' field.
> Thanks Daniel. This is a nice idea. I will break this patch into 4 
different patches, so it would be clear to see how the QAPI design is 
evolving.

+'*multifd-count' : 'int' } }

And if we are passing a list, why do we want to pass the real number?

Yeah, technically I think this field is redundant, as you can just
add many entires to the 'channels' list, using the same addresses.
All this field does is allow a more compact JSON arg list. I'm
not sure this is neccessary, unless we're expecting huge numbers
of 'channels', and even then this isn't likely to be a performance
issue.
> I have tried to explain this to Juan. The main purpose is, if we want 
to add 3 channels to one pair of interface and 4 pair of channels to 
another pair of interface, instead of writing the same interface 3 or 4 
times, this field saves that redundancy, and I personally feel, it gives 
one clear representation of multifd capability.

+# -> { "execute": "migrate",
+#  "arguments": {
+#  "channels": [ { 'channeltype': 'control',
+#  'dest-addr': {'transport': 'socket',
+#'type': 'inet',
+#'host': '10.12.34.9', 'port': 
'1050'}},
+#{ 'channeltype': 'data',
+#  'src-addr': {'transport': 'socket',
+#'type': 'inet',
+#'host': '10.12.34.9',
+#'port': '4000', 'ipv4': 'true'},
+#  'dest-addr': { 'transport': 'socket',
+#  'type': 'inet',
+#  'host': '10.12.34.92',
+#  'port': '1234', 'ipv4': 'true'},
+#  'multifd-count': 5 },
+#{ 'channeltype': 'data',
+#  'src-addr': {'transport': 'socket',
+#'type': 'inet',
+#'host': '10.2.3.4', 'port': '1000'},
+#  'dest-addr': {'transport': 'socket',
+# 'type': 'inet',
+# 'host': '0.0.0.4', 'port': '3000'},
+#  'multifd-count': 3 } ] } }
+# <- { "return": {} }
+#
  ##
  { 'command': 'migrate',
-  'data': {'uri': 'str', '*blk': 'bool', '*inc': 'bool',
-   '*detach': 'bool', '*resume': 'bool' } }
+  'data': {'*uri': 'str', '*channels': ['MigrateChannel'], '*blk': 'bool',


- I would not care at all about the "exec" protocol, just leave that
   alone in the deprecated command.  Right now:
   * we can't move it to multifd without a lot of PAIN
   * there are patches on the list suggesting that what we really want is
 to create a file that is the size of RAM and just write all the RAM
 at the right place.
   * that would make the way to create snapshots (I don't know if anyones
 still wants them, much easier).
   * I think that the only real use of exec migration was to create
 snapshots, for real migration, using a socket is much, much saner.
   * I.e. what I mean here is that for exec migration, we need to think
 if we want to continue supporting it for normal migration, or only
 as a way to create snapshots.


With regards,
Daniel


Thanks and regards,

Het Gala

Re: [PULL 0/2] target-arm queue

2022-11-22 Thread Stefan Hajnoczi

Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/7.2 for any 
user-visible changes.


signature.asc
Description: PGP signature

Re: [PATCH] vhost-vdpa: skip TPM CRB memory section

2022-11-22 Thread Michael S. Tsirkin

On Tue, Nov 22, 2022 at 06:53:49PM +0400, marcandre.lur...@redhat.com wrote:
> From: Marc-André Lureau 
> 
> 851d6d1a0f ("vfio/common: remove spurious tpm-crb-cmd misalignment
> warning") removed the warning on vfio_listener_region_add() path.
> 
> An error is reported for vhost-vdpa case:
> qemu-kvm: vhost_vdpa_listener_region_add received unaligned region
> 
> Skip the CRB device.
> 
> Fixes:
> https://bugzilla.redhat.com/show_bug.cgi?id=2141965
> 
> Signed-off-by: Marc-André Lureau 
> ---
>  hw/virtio/vhost-vdpa.c | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index 7468e44b87..9d7206e4b8 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -19,6 +19,7 @@
>  #include "hw/virtio/virtio-net.h"
>  #include "hw/virtio/vhost-shadow-virtqueue.h"
>  #include "hw/virtio/vhost-vdpa.h"
> +#include "sysemu/tpm.h"
>  #include "exec/address-spaces.h"
>  #include "migration/blocker.h"
>  #include "qemu/cutils.h"
> @@ -46,6 +47,11 @@ static bool 
> vhost_vdpa_listener_skipped_section(MemoryRegionSection *section,
>  {
>  Int128 llend;
>  
> +if (TPM_IS_CRB(section->mr->owner)) {
> +/* The CRB command buffer has its base address unaligned. */
> +return true;
> +}
> +

Quite a hack. We can't really keep adding dependency on random devices
to vhost. And would you add hacks like this to listeners?
Pls figure out what's special about this buffer.
Also if this section is unaligned then doesn't it break up
other aligned sections?


>  if ((!memory_region_is_ram(section->mr) &&
>   !memory_region_is_iommu(section->mr)) ||
>  memory_region_is_protected(section->mr) ||
> -- 
> 2.38.1

Re: [RFC 4/7] migration: Split save_live_pending() into state_pending_*

2022-11-22 Thread Dr. David Alan Gilbert

* Juan Quintela (quint...@redhat.com) wrote:
> We split the function into to:
> 
> - state_pending_estimate: We estimate the remaining state size without
>   stopping the machine.
> 
> - state pending_exact: We calculate the exact amount of remaining
>   state.
> 
> The only "device" that implements different functions for _estimate()
> and _exact() is ram.
> 
> Signed-off-by: Juan Quintela 

Yeh I think that's OK; I'm a little worried whether you end up calling
the two functions in migration_iteration_run a lot, but still


Reviewed-by: Dr. David Alan Gilbert 

> ---
>  docs/devel/migration.rst   | 18 ++
>  docs/devel/vfio-migration.rst  |  4 ++--
>  include/migration/register.h   | 12 
>  migration/savevm.h |  8 ++--
>  hw/s390x/s390-stattrib.c   |  7 ---
>  hw/vfio/migration.c|  9 +
>  migration/block-dirty-bitmap.c | 11 ++-
>  migration/block.c  | 11 ++-
>  migration/migration.c  | 13 +
>  migration/ram.c| 31 ---
>  migration/savevm.c | 34 +-
>  hw/vfio/trace-events   |  2 +-
>  migration/trace-events |  7 ---
>  13 files changed, 114 insertions(+), 53 deletions(-)
> 
> diff --git a/docs/devel/migration.rst b/docs/devel/migration.rst
> index 3e9656d8e0..6f65c23b47 100644
> --- a/docs/devel/migration.rst
> +++ b/docs/devel/migration.rst
> @@ -482,15 +482,17 @@ An iterative device must provide:
>- A ``load_setup`` function that initialises the data structures on the
>  destination.
>  
> -  - A ``save_live_pending`` function that is called repeatedly and must
> -indicate how much more data the iterative data must save.  The core
> -migration code will use this to determine when to pause the CPUs
> -and complete the migration.
> +  - A ``state_pending_exact`` function that indicates how much more
> +data we must save.  The core migration code will use this to
> +determine when to pause the CPUs and complete the migration.
>  
> -  - A ``save_live_iterate`` function (called after ``save_live_pending``
> -when there is significant data still to be sent).  It should send
> -a chunk of data until the point that stream bandwidth limits tell it
> -to stop.  Each call generates one section.
> +  - A ``state_pending_estimate`` function that indicates how much more
> +data we must save.  When the estimated amount is smaller than the
> +threshold, we call ``state_pending_exact``.
> +
> +  - A ``save_live_iterate`` function should send a chunk of data until
> +the point that stream bandwidth limits tell it to stop.  Each call
> +generates one section.
>  
>- A ``save_live_complete_precopy`` function that must transmit the
>  last section for the device containing any remaining data.
> diff --git a/docs/devel/vfio-migration.rst b/docs/devel/vfio-migration.rst
> index 9ff6163c88..673057c90d 100644
> --- a/docs/devel/vfio-migration.rst
> +++ b/docs/devel/vfio-migration.rst
> @@ -28,7 +28,7 @@ VFIO implements the device hooks for the iterative approach 
> as follows:
>  * A ``load_setup`` function that sets up the migration region on the
>destination and sets _RESUMING flag in the VFIO device state.
>  
> -* A ``save_live_pending`` function that reads pending_bytes from the vendor
> +* A ``state_pending_exact`` function that reads pending_bytes from the vendor
>driver, which indicates the amount of data that the vendor driver has yet 
> to
>save for the VFIO device.
>  
> @@ -114,7 +114,7 @@ Live migration save path
>  (RUNNING, _SETUP, _RUNNING|_SAVING)
>|
>  (RUNNING, _ACTIVE, _RUNNING|_SAVING)
> - If device is active, get pending_bytes by .save_live_pending()
> + If device is active, get pending_bytes by .state_pending_exact()
>If total pending_bytes >= threshold_size, call .save_live_iterate()
>Data of VFIO device for pre-copy phase is copied
>  Iterate till total pending bytes converge and are less than threshold
> diff --git a/include/migration/register.h b/include/migration/register.h
> index 5b5424ed8f..313b8e1c3b 100644
> --- a/include/migration/register.h
> +++ b/include/migration/register.h
> @@ -46,9 +46,7 @@ typedef struct SaveVMHandlers {
>  
>  /* This runs outside the iothread lock!  */
>  int (*save_setup)(QEMUFile *f, void *opaque);
> -void (*save_live_pending)(void *opaque,  uint64_t threshold_size,
> -  uint64_t *rest_precopy, uint64_t 
> *rest_postcopy);
> -/* Note for save_live_pending:
> +/* Note for state_pending_*:
>   * - res_precopy is for data which must be migrated in precopy
>   * phase or in stopped state, in other words - before target
>   * vm start
> @@ -59,7 +57,13 @@ typedef

Re: [PATCH] multifd: Updated QAPI format for 'migrate' qemu monitor command

2022-11-22 Thread Het Gala




On 21/11/22 6:10 pm, Juan Quintela wrote:

Het Gala  wrote:

To prevent double data encoding of uris, instead of passing transport
mechanisms, host address and port all together in form of a single string
and writing different parsing functions, we intend the user to explicitly
mention most of the parameters through the medium of qmp command itself.

Hi

1st of all, I can't see how this is 7.1 material, I guess we need to
move it to 8.0.


> Thankyou for informing. I will change it to 8.0

+##
+# @MigrateTransport:
+#
+# The supported communication transport mechanisms for migration
+#
+# @socket: Supported communication type between two devices for migration.
+#  Socket is able to cover all of 'tcp', 'unix', 'vsock' and
+#  'fd' already
+#
+# @exec: Supported communication type to redirect migration stream into file.
+#
+# Since 7.1
+##
+{ 'enum': 'MigrateTransport',
+  'data': ['socket', 'exec'] }

I haven't looked too much into this, but as Danield told in the past, I
can see where the rdma falls into this scheme.  I guess it is going to
be its own, but who knows.
>So do you mean, 'data' : ['socket', 'exec', 'rdma'] ? or it will be 
separately represented

+# The supported options for migration channel type requests
+#
+# @control: Support request for main outbound migration control channel
+#
+# @data: Supported Channel type for multifd data channels
+#
+# @async: Supported Channel type for post-copy async requests
+#
+# Since 7.1
+##
+{ 'enum': 'MigrateChannelType',
+  'data': ['control', 'data', 'async'] }
+

'data': ['main', 'data', 'ram-async'] } ???

I don't like the 'control' name because without multifd we still pass
everything through it.

And with multifd, we still pass all devices through it.
> I agree with you Juan. 'main' seams a better name here. Thanks for 
the suggestion :)

About the asynchronous channel, I don't know if calling it postcopy is
better.

> Okay, noted. Will change in the next patchset

+{ 'struct': 'MigrateChannel',
+  'data' : {
+'channeltype' : 'MigrateChannelType',
+'*src-addr' : 'MigrateAddress',
+'dest-addr' : 'MigrateAddress',

Why do we want *both* addresses?


> As mentioned by Daniel, multifd right now is supported for tcp 
socket, and we want to make multifd migration as bi-directional 
migration stream.



+'*multifd-count' : 'int' } }

And if we are passing a list, why do we want to pass the real number?


> I think multifd-count variable would be handy. Just to take a used 
case here, we have heavy workloads to migrate, and in our system we have 
a 25 Gig X 2 NIC cards available, and we want to ensure that migration 
does not fail due to less available network (because IO, and other 
processes might also consume network). By experiments, we know that per 
multifd channel - on an average we get around 8.5-10 Gbps. Writing the 
same channel atleast 3 times to cover one NIC seems again redundant. So 
to prevent this, we think inclusion of multifd-count would be useful 
there. And anyways it is an optional, so if you don't mention it, it 
will take into account as a single thread by default. Let me know if 
this is convincing or we can discuss more on this ?



+# -> { "execute": "migrate",
+#  "arguments": {
+#  "channels": [ { 'channeltype': 'control',
+#  'dest-addr': {'transport': 'socket',
+#'type': 'inet',
+#'host': '10.12.34.9', 'port': 
'1050'}},
+#{ 'channeltype': 'data',
+#  'src-addr': {'transport': 'socket',
+#'type': 'inet',
+#'host': '10.12.34.9',
+#'port': '4000', 'ipv4': 'true'},
+#  'dest-addr': { 'transport': 'socket',
+#  'type': 'inet',
+#  'host': '10.12.34.92',
+#  'port': '1234', 'ipv4': 'true'},
+#  'multifd-count': 5 },
+#{ 'channeltype': 'data',
+#  'src-addr': {'transport': 'socket',
+#'type': 'inet',
+#'host': '10.2.3.4', 'port': '1000'},
+#  'dest-addr': {'transport': 'socket',
+# 'type': 'inet',
+# 'host': '0.0.0.4', 'port': '3000'},
+#  'multifd-count': 3 } ] } }
+# <- { "return": {} }
+#
  ##
  { 'command': 'migrate',
-  'data': {'uri': 'str', '*blk': 'bool', '*inc': 'bool',
-   '*detach': 'bool', '*resume': 'bool' } }
+  'data': {'*uri': 'str', '*channels': ['MigrateChannel'], '*blk': 'bool',

I think that "uri" bit should be dropped, right?
> yes, it will be dropped at a later point, right

Re: [PATCH] target/arm: align exposed ID registers with Linux

2022-11-22 Thread Zhuojia Shen

On 11/22/2022 10:12 AM -0800, Richard Henderson wrote:
> On 11/21/22 18:48, Zhuojia Shen wrote:
> > In CPUID registers exposed to userspace, some registers were missing
> > and some fields were not exposed.  This patch aligns exposed ID
> > registers and their fields with what Linux exposes currently.
> > 
> > Specifically, the following new ID registers/fields are exposed to
> > userspace:
> > 
> > ID_AA64PFR1_EL1.BT:   bits 3-0
> > ID_AA64PFR1_EL1.MTE:  bits 11-8
> > ID_AA64PFR1_EL1.SME:  bits 27-24
> > 
> > ID_AA64ZFR0_EL1.SVEver:   bits 3-0
> > ID_AA64ZFR0_EL1.AES:  bits 7-4
> > ID_AA64ZFR0_EL1.BitPerm:  bits 19-16
> > ID_AA64ZFR0_EL1.BF16: bits 23-20
> > ID_AA64ZFR0_EL1.SHA3: bits 35-32
> > ID_AA64ZFR0_EL1.SM4:  bits 43-40
> > ID_AA64ZFR0_EL1.I8MM: bits 47-44
> > ID_AA64ZFR0_EL1.F32MM:bits 55-52
> > ID_AA64ZFR0_EL1.F64MM:bits 59-56
> > 
> > ID_AA64SMFR0_EL1.F32F32:  bit 32
> > ID_AA64SMFR0_EL1.B16F32:  bit 34
> > ID_AA64SMFR0_EL1.F16F32:  bit 35
> > ID_AA64SMFR0_EL1.I8I32:   bits 39-36
> > ID_AA64SMFR0_EL1.F64F64:  bit 48
> > ID_AA64SMFR0_EL1.I16I64:  bits 55-52
> > ID_AA64SMFR0_EL1.FA64:bit 63
> > 
> > ID_AA64MMFR0_EL1.ECV: bits 63-60
> > 
> > ID_AA64MMFR1_EL1.AFP: bits 47-44
> > 
> > ID_AA64MMFR2_EL1.AT:  bits 35-32
> > 
> > ID_AA64ISAR0_EL1.RNDR:bits 63-60
> > 
> > ID_AA64ISAR1_EL1.FRINTTS: bits 35-32
> > ID_AA64ISAR1_EL1.BF16:bits 47-44
> > ID_AA64ISAR1_EL1.DGH: bits 51-48
> > ID_AA64ISAR1_EL1.I8MM:bits 55-52
> > 
> > ID_AA64ISAR2_EL1.WFxT:bits 3-0
> > ID_AA64ISAR2_EL1.RPRES:   bits 7-4
> > ID_AA64ISAR2_EL1.GPA3:bits 11-8
> > ID_AA64ISAR2_EL1.APA3:bits 15-12
> > 
> > Signed-off-by: Zhuojia Shen 
> > ---
> >   target/arm/helper.c | 19 ++-
> >   1 file changed, 14 insertions(+), 5 deletions(-)
> > 
> > diff --git a/target/arm/helper.c b/target/arm/helper.c
> > index d8c8223ec3..ce6fd7a96d 100644
> > --- a/target/arm/helper.c
> > +++ b/target/arm/helper.c
> > @@ -7826,13 +7826,20 @@ void register_cp_regs_for_features(ARMCPU *cpu)
> > .exported_bits = 0x000f000f00ff,
> > .fixed_bits= 0x0011 },
> >   { .name = "ID_AA64PFR1_EL1",
> > -  .exported_bits = 0x00f0 },
> > +  .exported_bits = 0x0f000fff },
> 
> Existing, but I think it would be nicer to do this symbolically.  e.g.
> 
>.exported_bits = R_ID_AA64PFR1_BT_MASK |
> R_ID_AA64PFR1_SBSS_MASK |
> R_ID_AA64PFR1_MTE_MASK |
> R_ID_AA64PFR1_SME_MASK,
> 
> etc.

It would be more readable but longer.  I can try to refactor this way.

> 
> 
> r~

Re: [PATCH] target/arm: align exposed ID registers with Linux

2022-11-22 Thread Zhuojia Shen

On 11/22/2022 06:26 PM +, Peter Maydell wrote:
> On Tue, 22 Nov 2022 at 03:05, Zhuojia Shen  
> wrote:
> >
> > In CPUID registers exposed to userspace, some registers were missing
> > and some fields were not exposed.  This patch aligns exposed ID
> > registers and their fields with what Linux exposes currently.
> >
> > Specifically, the following new ID registers/fields are exposed to
> > userspace:
> 
> These changes don't quite seem to line up with what the kernel
> documents that it exposes:
> https://www.kernel.org/doc/Documentation/arm64/cpu-feature-registers.rst

This kernel document seems not quite up-to-date.  The patch is based on
the upstream code:
https://github.com/torvalds/linux/blob/master/arch/arm64/kernel/cpufeature.c

> 
> > ID_AA64PFR1_EL1.BT:   bits 3-0
> > ID_AA64PFR1_EL1.MTE:  bits 11-8
> > ID_AA64PFR1_EL1.SME:  bits 27-24
> 
> .SME not listed as exposed.
> 
> > ID_AA64SMFR0_EL1.F32F32:  bit 32
> > ID_AA64SMFR0_EL1.B16F32:  bit 34
> > ID_AA64SMFR0_EL1.F16F32:  bit 35
> > ID_AA64SMFR0_EL1.I8I32:   bits 39-36
> > ID_AA64SMFR0_EL1.F64F64:  bit 48
> > ID_AA64SMFR0_EL1.I16I64:  bits 55-52
> > ID_AA64SMFR0_EL1.FA64:bit 63
> 
> This register not listed as exposed.

This register and ID_AA64PFR1_EL1.SME are exposed since v5.19.

> 
> > ID_AA64ISAR2_EL1.WFxT:bits 3-0
> > ID_AA64ISAR2_EL1.RPRES:   bits 7-4
> > ID_AA64ISAR2_EL1.GPA3:bits 11-8
> > ID_AA64ISAR2_EL1.APA3:bits 15-12
> 
> GPA3 and APA3 not listed as exposed.

These two are exposed since v5.18.

> 
> thanks
> -- PMM

Re: [PATCH 4/4] be less conservative with the 64bit pci io window

2022-11-22 Thread Kevin O'Connor

On Mon, Nov 21, 2022 at 11:32:13AM +0100, Gerd Hoffmann wrote:
> Current seabios code will only enable and use the 64bit pci io window in
> case it runs out of space in the 32bit pci mmio window below 4G.
> 
> This patch will also enable the 64bit pci io window when
>   (a) RAM above 4G is present, and
>   (b) the physical address space size is known, and
>   (c) seabios is running on a 64bit capable processor.
> 
> This operates with the assumption that guests which are ok with memory
> above 4G most likely can handle mmio above 4G too.

Thanks.  In general, the series looks good to me.  Can you elaborate
on the background to this change though?  It sounds like there is a
(small) risk of a regression, so I think it would be good to have a
high level understanding of what is driving this memory reorg.

Cheers,
-Kevin


> 
> In case the 64bit pci io window is enabled also assign more memory to
> prefetchable pci bridge windows (scale with address space).
> 
> Signed-off-by: Gerd Hoffmann 
> ---
>  src/fw/pciinit.c | 8 +++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/src/fw/pciinit.c b/src/fw/pciinit.c
> index ad6def93633b..3e9636b139a4 100644
> --- a/src/fw/pciinit.c
> +++ b/src/fw/pciinit.c
> @@ -51,6 +51,7 @@ u64 pcimem_end = BUILD_PCIMEM_END;
>  u64 pcimem64_start = BUILD_PCIMEM64_START;
>  u64 pcimem64_end   = BUILD_PCIMEM64_END;
>  u64 pci_io_low_end = 0xa000;
> +u32 pci_use_64bit  = 0;
>  
>  struct pci_region_entry {
>  struct pci_device *dev;
> @@ -920,6 +921,8 @@ static int pci_bios_check_devices(struct pci_bus *busses)
>  for (type = 0; type < PCI_REGION_TYPE_COUNT; type++) {
>  u64 align = (type == PCI_REGION_TYPE_IO) ?
>  PCI_BRIDGE_IO_MIN : PCI_BRIDGE_MEM_MIN;
> +if (pci_use_64bit && (type == PCI_REGION_TYPE_PREFMEM))
> +align = (u64)1 << (PhysBits - 11);
>  if (!pci_bridge_has_region(s->bus_dev, type))
>  continue;
>  u64 size = 0;
> @@ -1108,7 +,7 @@ static void pci_bios_map_devices(struct pci_bus *busses)
>  panic("PCI: out of I/O address space\n");
>  
>  dprintf(1, "PCI: 32: %016llx - %016llx\n", pcimem_start, pcimem_end);
> -if (pci_bios_init_root_regions_mem(busses)) {
> +if (pci_use_64bit || pci_bios_init_root_regions_mem(busses)) {
>  struct pci_region r64_mem, r64_pref;
>  r64_mem.list.first = NULL;
>  r64_pref.list.first = NULL;
> @@ -1174,6 +1177,9 @@ pci_setup(void)
>  
>  dprintf(3, "pci setup\n");
>  
> +if (PhysBits >= 36 && LongMode && RamSizeOver4G)
> +pci_use_64bit = 1;
> +
>  dprintf(1, "=== PCI bus & bridge init ===\n");
>  if (pci_probe_host() != 0) {
>  return;
> -- 
> 2.38.1
> 
>

Re: [PATCH for-8.0 15/29] include/qemu/int128: Add vector type to Int128Alias

2022-11-22 Thread Philippe Mathieu-Daudé


On 22/11/22 19:21, Philippe Mathieu-Daudé wrote:

On 18/11/22 10:47, Richard Henderson wrote:

Adding a vector type will make it easier to handle i386
have_atomic16 via AVX.

Signed-off-by: Richard Henderson 
---
  include/qemu/int128.h | 10 +-
  1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/include/qemu/int128.h b/include/qemu/int128.h
index f62a46b48c..f29f90e6f4 100644
--- a/include/qemu/int128.h
+++ b/include/qemu/int128.h
@@ -479,16 +479,16 @@ static inline void bswap128s(Int128 *s)
  /*
   * When compiler supports a 128-bit type, define a combination of
   * a possible structure and the native types.  Ease parameter passing
- * via use of the transparent union extension.
+ * via use of the transparent union extension.  Provide a vector type
+ * for use in atomicity on some hosts.
   */
-#ifdef CONFIG_INT128
  typedef union {
  Int128 s;
+    uint64_t v __attribute__((vector_size(16)));
+#ifdef CONFIG_INT128
  __int128_t i;
  __uint128_t u;
-} Int128Alias __attribute__((transparent_union));
-#else
-typedef Int128 Int128Alias;
  #endif /* CONFIG_INT128 */
+} Int128Alias __attribute__((transparent_union));
  #endif /* INT128_H */


This triggers a warning with GCC:


Ah no, looking closer, even configured as ''--cc=gcc-12 --host-cc=gcc-12
--cxx=/bin/false', Clang got selected for ObjC, and this warning comes
from it:

Compiling Objective-C object libcommon.fa.p/ui_cocoa.m.o
In file included from ../../ui/cocoa.m:36:
In file included from include/sysemu/sysemu.h:5:
In file included from include/qemu/timer.h:4:
In file included from include/qemu/bitops.h:16:
In file included from include/qemu/host-utils.h:35:

include/qemu/int128.h:487:14: warning: alignment of field 'v' (128 bits) 
does not match the alignment of the first field in transparent union; 
transparent_union attribute ignored [-Wignored-attributes]

     uint64_t v __attribute__((vector_size(16)));
  ^
include/qemu/int128.h:486:12: note: alignment of first field is 64 bits
     Int128 s;
    ^
Meson:

Project version: 7.1.91
C compiler for the host machine: gcc-12 (gcc 12.2.0 "gcc-12 (Homebrew 
GCC 12.2.0) 12.2.0")

C linker for the host machine: gcc-12 ld64 819.6
Host machine cpu family: aarch64
Host machine cpu: arm64


Objective-C compiler for the host machine: clang (clang 14.0.0)
Objective-C linker for the host machine: clang ld64 819.6

Regards,

Phil.

Re: [PATCH] target/arm: align exposed ID registers with Linux

2022-11-22 Thread Peter Maydell

On Tue, 22 Nov 2022 at 03:05, Zhuojia Shen  wrote:
>
> In CPUID registers exposed to userspace, some registers were missing
> and some fields were not exposed.  This patch aligns exposed ID
> registers and their fields with what Linux exposes currently.
>
> Specifically, the following new ID registers/fields are exposed to
> userspace:

These changes don't quite seem to line up with what the kernel
documents that it exposes:
https://www.kernel.org/doc/Documentation/arm64/cpu-feature-registers.rst

> ID_AA64PFR1_EL1.BT:   bits 3-0
> ID_AA64PFR1_EL1.MTE:  bits 11-8
> ID_AA64PFR1_EL1.SME:  bits 27-24

.SME not listed as exposed.

> ID_AA64SMFR0_EL1.F32F32:  bit 32
> ID_AA64SMFR0_EL1.B16F32:  bit 34
> ID_AA64SMFR0_EL1.F16F32:  bit 35
> ID_AA64SMFR0_EL1.I8I32:   bits 39-36
> ID_AA64SMFR0_EL1.F64F64:  bit 48
> ID_AA64SMFR0_EL1.I16I64:  bits 55-52
> ID_AA64SMFR0_EL1.FA64:bit 63

This register not listed as exposed.

> ID_AA64ISAR2_EL1.WFxT:bits 3-0
> ID_AA64ISAR2_EL1.RPRES:   bits 7-4
> ID_AA64ISAR2_EL1.GPA3:bits 11-8
> ID_AA64ISAR2_EL1.APA3:bits 15-12

GPA3 and APA3 not listed as exposed.

thanks
-- PMM

Re: [PATCH for-8.0 15/29] include/qemu/int128: Add vector type to Int128Alias

2022-11-22 Thread Philippe Mathieu-Daudé


On 18/11/22 10:47, Richard Henderson wrote:

Adding a vector type will make it easier to handle i386
have_atomic16 via AVX.

Signed-off-by: Richard Henderson 
---
  include/qemu/int128.h | 10 +-
  1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/include/qemu/int128.h b/include/qemu/int128.h
index f62a46b48c..f29f90e6f4 100644
--- a/include/qemu/int128.h
+++ b/include/qemu/int128.h
@@ -479,16 +479,16 @@ static inline void bswap128s(Int128 *s)
  /*
   * When compiler supports a 128-bit type, define a combination of
   * a possible structure and the native types.  Ease parameter passing
- * via use of the transparent union extension.
+ * via use of the transparent union extension.  Provide a vector type
+ * for use in atomicity on some hosts.
   */
-#ifdef CONFIG_INT128
  typedef union {
  Int128 s;
+uint64_t v __attribute__((vector_size(16)));
+#ifdef CONFIG_INT128
  __int128_t i;
  __uint128_t u;
-} Int128Alias __attribute__((transparent_union));
-#else
-typedef Int128 Int128Alias;
  #endif /* CONFIG_INT128 */
+} Int128Alias __attribute__((transparent_union));
  
  #endif /* INT128_H */


This triggers a warning with GCC:

include/qemu/int128.h:487:14: warning: alignment of field 'v' (128 bits) 
does not match the alignment of the first field in transparent union; 
transparent_union attribute ignored [-Wignored-attributes]

uint64_t v __attribute__((vector_size(16)));
 ^
include/qemu/int128.h:486:12: note: alignment of first field is 64 bits
Int128 s;
   ^
Meson:

Project version: 7.1.91
C compiler for the host machine: gcc-12 (gcc 12.2.0 "gcc-12 (Homebrew 
GCC 12.2.0) 12.2.0")

C linker for the host machine: gcc-12 ld64 819.6
Host machine cpu family: aarch64
Host machine cpu: arm64

Re: [PATCH for-8.0 v3 14/45] tcg: Introduce tcg_type_size

2022-11-22 Thread Richard Henderson


On 11/22/22 10:14, Philippe Mathieu-Daudé wrote:

I'd feel safer if we assign TCG_TYPE_I32 .. TCG_TYPE_V256 in TCGType,
just in case.


What do you mean?


-- >8 --
diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
@@ -289,8 +289,8 @@ typedef struct TCGPool {
  typedef enum TCGType {
-    TCG_TYPE_I32,
-    TCG_TYPE_I64,
+    TCG_TYPE_I32  = 0,
+    TCG_TYPE_I64  = 1,

-    TCG_TYPE_V64,
-    TCG_TYPE_V128,
-    TCG_TYPE_V256,
+    TCG_TYPE_V64  = 2,
+    TCG_TYPE_V128 = 3,
+    TCG_TYPE_V256 = 4,


But that's what C does.  I don't see the point.


r~

Re: [PATCH for-8.0 v3 14/45] tcg: Introduce tcg_type_size

2022-11-22 Thread Philippe Mathieu-Daudé


On 22/11/22 17:54, Richard Henderson wrote:

On 11/22/22 03:30, Philippe Mathieu-Daudé wrote:

On 11/11/22 08:40, Richard Henderson wrote:

Add a helper function for computing the size of a type.

Signed-off-by: Richard Henderson 
---
  include/tcg/tcg.h | 16 
  tcg/tcg.c | 26 --
  2 files changed, 28 insertions(+), 14 deletions(-)

diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
index f2da340bb9..8bcd60d0ed 100644
--- a/include/tcg/tcg.h
+++ b/include/tcg/tcg.h
@@ -319,6 +319,22 @@ typedef enum TCGType {
  #endif
  } TCGType;
+/**
+ * tcg_type_size
+ * @t: type
+ *
+ * Return the size of the type in bytes.
+ */
+static inline int tcg_type_size(TCGType t)
+{
+    unsigned i = t;
+    if (i >= TCG_TYPE_V64) {
+    tcg_debug_assert(i < TCG_TYPE_COUNT);
+    i -= TCG_TYPE_V64 - 1;
+    }
+    return 4 << i;


I'd feel safer if we assign TCG_TYPE_I32 .. TCG_TYPE_V256 in TCGType,
just in case.


What do you mean?


-- >8 --
diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
@@ -289,8 +289,8 @@ typedef struct TCGPool {
 typedef enum TCGType {
-TCG_TYPE_I32,
-TCG_TYPE_I64,
+TCG_TYPE_I32  = 0,
+TCG_TYPE_I64  = 1,

-TCG_TYPE_V64,
-TCG_TYPE_V128,
-TCG_TYPE_V256,
+TCG_TYPE_V64  = 2,
+TCG_TYPE_V128 = 3,
+TCG_TYPE_V256 = 4,

---

Re: [PATCH] target/arm: align exposed ID registers with Linux

2022-11-22 Thread Richard Henderson


On 11/21/22 18:48, Zhuojia Shen wrote:

In CPUID registers exposed to userspace, some registers were missing
and some fields were not exposed.  This patch aligns exposed ID
registers and their fields with what Linux exposes currently.

Specifically, the following new ID registers/fields are exposed to
userspace:

ID_AA64PFR1_EL1.BT:   bits 3-0
ID_AA64PFR1_EL1.MTE:  bits 11-8
ID_AA64PFR1_EL1.SME:  bits 27-24

ID_AA64ZFR0_EL1.SVEver:   bits 3-0
ID_AA64ZFR0_EL1.AES:  bits 7-4
ID_AA64ZFR0_EL1.BitPerm:  bits 19-16
ID_AA64ZFR0_EL1.BF16: bits 23-20
ID_AA64ZFR0_EL1.SHA3: bits 35-32
ID_AA64ZFR0_EL1.SM4:  bits 43-40
ID_AA64ZFR0_EL1.I8MM: bits 47-44
ID_AA64ZFR0_EL1.F32MM:bits 55-52
ID_AA64ZFR0_EL1.F64MM:bits 59-56

ID_AA64SMFR0_EL1.F32F32:  bit 32
ID_AA64SMFR0_EL1.B16F32:  bit 34
ID_AA64SMFR0_EL1.F16F32:  bit 35
ID_AA64SMFR0_EL1.I8I32:   bits 39-36
ID_AA64SMFR0_EL1.F64F64:  bit 48
ID_AA64SMFR0_EL1.I16I64:  bits 55-52
ID_AA64SMFR0_EL1.FA64:bit 63

ID_AA64MMFR0_EL1.ECV: bits 63-60

ID_AA64MMFR1_EL1.AFP: bits 47-44

ID_AA64MMFR2_EL1.AT:  bits 35-32

ID_AA64ISAR0_EL1.RNDR:bits 63-60

ID_AA64ISAR1_EL1.FRINTTS: bits 35-32
ID_AA64ISAR1_EL1.BF16:bits 47-44
ID_AA64ISAR1_EL1.DGH: bits 51-48
ID_AA64ISAR1_EL1.I8MM:bits 55-52

ID_AA64ISAR2_EL1.WFxT:bits 3-0
ID_AA64ISAR2_EL1.RPRES:   bits 7-4
ID_AA64ISAR2_EL1.GPA3:bits 11-8
ID_AA64ISAR2_EL1.APA3:bits 15-12

Signed-off-by: Zhuojia Shen 
---
  target/arm/helper.c | 19 ++-
  1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index d8c8223ec3..ce6fd7a96d 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -7826,13 +7826,20 @@ void register_cp_regs_for_features(ARMCPU *cpu)
.exported_bits = 0x000f000f00ff,
.fixed_bits= 0x0011 },
  { .name = "ID_AA64PFR1_EL1",
-  .exported_bits = 0x00f0 },
+  .exported_bits = 0x0f000fff },


Existing, but I think it would be nicer to do this symbolically.  e.g.

   .exported_bits = R_ID_AA64PFR1_BT_MASK |
R_ID_AA64PFR1_SBSS_MASK |
R_ID_AA64PFR1_MTE_MASK |
R_ID_AA64PFR1_SME_MASK,

etc.


r~

[PATCH 2/3] tcg: Factor init_ffi_layouts() out of tcg_context_init()

2022-11-22 Thread Philippe Mathieu-Daudé

Signed-off-by: Richard Henderson 
Message-Id: <2022074101.2069454-27-richard.hender...@linaro.org>
[PMD: Split from bigger patch]
Signed-off-by: Philippe Mathieu-Daudé 
---
 tcg/tcg.c | 83 +--
 1 file changed, 44 insertions(+), 39 deletions(-)

diff --git a/tcg/tcg.c b/tcg/tcg.c
index 8aa6fc9a25..9b24b4d863 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -572,7 +572,49 @@ static ffi_type *typecode_to_ffi(int argmask)
 }
 g_assert_not_reached();
 }
-#endif
+
+static void init_ffi_layouts(void)
+{
+/* g_direct_hash/equal for direct comparisons on uint32_t.  */
+ffi_table = g_hash_table_new(NULL, NULL);
+for (int i = 0; i < ARRAY_SIZE(all_helpers); ++i) {
+uint32_t typemask = all_helpers[i].typemask;
+gpointer hash = (gpointer)(uintptr_t)typemask;
+struct {
+ffi_cif cif;
+ffi_type *args[];
+} *ca;
+ffi_status status;
+int nargs;
+
+if (g_hash_table_lookup(ffi_table, hash)) {
+continue;
+}
+
+/* Ignoring the return type, find the last non-zero field. */
+nargs = 32 - clz32(typemask >> 3);
+nargs = DIV_ROUND_UP(nargs, 3);
+
+ca = g_malloc0(sizeof(*ca) + nargs * sizeof(ffi_type *));
+ca->cif.rtype = typecode_to_ffi(typemask & 7);
+ca->cif.nargs = nargs;
+
+if (nargs != 0) {
+ca->cif.arg_types = ca->args;
+for (int j = 0; j < nargs; ++j) {
+int typecode = extract32(typemask, (j + 1) * 3, 3);
+ca->args[j] = typecode_to_ffi(typecode);
+}
+}
+
+status = ffi_prep_cif(>cif, FFI_DEFAULT_ABI, nargs,
+  ca->cif.rtype, ca->cif.arg_types);
+assert(status == FFI_OK);
+
+g_hash_table_insert(ffi_table, hash, (gpointer)>cif);
+}
+}
+#endif /* CONFIG_TCG_INTERPRETER */
 
 typedef struct TCGCumulativeArgs {
 int arg_idx;/* tcg_gen_callN args[] */
@@ -753,44 +795,7 @@ static void tcg_context_init(unsigned max_cpus)
 }
 
 #ifdef CONFIG_TCG_INTERPRETER
-/* g_direct_hash/equal for direct comparisons on uint32_t.  */
-ffi_table = g_hash_table_new(NULL, NULL);
-for (i = 0; i < ARRAY_SIZE(all_helpers); ++i) {
-struct {
-ffi_cif cif;
-ffi_type *args[];
-} *ca;
-uint32_t typemask = all_helpers[i].typemask;
-gpointer hash = (gpointer)(uintptr_t)typemask;
-ffi_status status;
-int nargs;
-
-if (g_hash_table_lookup(ffi_table, hash)) {
-continue;
-}
-
-/* Ignoring the return type, find the last non-zero field. */
-nargs = 32 - clz32(typemask >> 3);
-nargs = DIV_ROUND_UP(nargs, 3);
-
-ca = g_malloc0(sizeof(*ca) + nargs * sizeof(ffi_type *));
-ca->cif.rtype = typecode_to_ffi(typemask & 7);
-ca->cif.nargs = nargs;
-
-if (nargs != 0) {
-ca->cif.arg_types = ca->args;
-for (int j = 0; j < nargs; ++j) {
-int typecode = extract32(typemask, (j + 1) * 3, 3);
-ca->args[j] = typecode_to_ffi(typecode);
-}
-}
-
-status = ffi_prep_cif(>cif, FFI_DEFAULT_ABI, nargs,
-  ca->cif.rtype, ca->cif.arg_types);
-assert(status == FFI_OK);
-
-g_hash_table_insert(ffi_table, hash, (gpointer)>cif);
-}
+init_ffi_layouts();
 #endif
 
 tcg_target_init(s);
-- 
2.38.1

[PATCH 1/3] tcg: Convert typecode_to_ffi from array to function

2022-11-22 Thread Philippe Mathieu-Daudé

In the unlikely case of invalid typecode mask, the function
will abort instead of returning a NULL pointer.

Signed-off-by: Richard Henderson 
Message-Id: <2022074101.2069454-27-richard.hender...@linaro.org>
[PMD: Split from bigger patch]
Signed-off-by: Philippe Mathieu-Daudé 
---
 tcg/tcg.c | 30 --
 1 file changed, 20 insertions(+), 10 deletions(-)

diff --git a/tcg/tcg.c b/tcg/tcg.c
index cabc397a38..8aa6fc9a25 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -554,14 +554,24 @@ static GHashTable *helper_table;
 #ifdef CONFIG_TCG_INTERPRETER
 static GHashTable *ffi_table;
 
-static ffi_type * const typecode_to_ffi[8] = {
-[dh_typecode_void] = _type_void,
-[dh_typecode_i32]  = _type_uint32,
-[dh_typecode_s32]  = _type_sint32,
-[dh_typecode_i64]  = _type_uint64,
-[dh_typecode_s64]  = _type_sint64,
-[dh_typecode_ptr]  = _type_pointer,
-};
+static ffi_type *typecode_to_ffi(int argmask)
+{
+switch (argmask) {
+case dh_typecode_void:
+return _type_void;
+case dh_typecode_i32:
+return _type_uint32;
+case dh_typecode_s32:
+return _type_sint32;
+case dh_typecode_i64:
+return _type_uint64;
+case dh_typecode_s64:
+return _type_sint64;
+case dh_typecode_ptr:
+return _type_pointer;
+}
+g_assert_not_reached();
+}
 #endif
 
 typedef struct TCGCumulativeArgs {
@@ -764,14 +774,14 @@ static void tcg_context_init(unsigned max_cpus)
 nargs = DIV_ROUND_UP(nargs, 3);
 
 ca = g_malloc0(sizeof(*ca) + nargs * sizeof(ffi_type *));
-ca->cif.rtype = typecode_to_ffi[typemask & 7];
+ca->cif.rtype = typecode_to_ffi(typemask & 7);
 ca->cif.nargs = nargs;
 
 if (nargs != 0) {
 ca->cif.arg_types = ca->args;
 for (int j = 0; j < nargs; ++j) {
 int typecode = extract32(typemask, (j + 1) * 3, 3);
-ca->args[j] = typecode_to_ffi[typecode];
+ca->args[j] = typecode_to_ffi(typecode);
 }
 }
 
-- 
2.38.1

[PATCH 3/3] tcg: Move ffi_cif pointer into TCGHelperInfo

2022-11-22 Thread Philippe Mathieu-Daudé

From: Richard Henderson 

Instead of requiring a separate hash table lookup,
put a pointer to the CIF into TCGHelperInfo.

Signed-off-by: Richard Henderson 
Message-Id: <2022074101.2069454-27-richard.hender...@linaro.org>
[PMD: Split from bigger patch]
Signed-off-by: Philippe Mathieu-Daudé 
---
 tcg/tcg-internal.h |  7 +++
 tcg/tcg.c  | 26 ++
 2 files changed, 21 insertions(+), 12 deletions(-)

diff --git a/tcg/tcg-internal.h b/tcg/tcg-internal.h
index c7e87e193d..6e50aeba3a 100644
--- a/tcg/tcg-internal.h
+++ b/tcg/tcg-internal.h
@@ -25,6 +25,10 @@
 #ifndef TCG_INTERNAL_H
 #define TCG_INTERNAL_H
 
+#ifdef CONFIG_TCG_INTERPRETER
+#include 
+#endif
+
 #define TCG_HIGHWATER 1024
 
 /*
@@ -57,6 +61,9 @@ typedef struct TCGCallArgumentLoc {
 typedef struct TCGHelperInfo {
 void *func;
 const char *name;
+#ifdef CONFIG_TCG_INTERPRETER
+ffi_cif *cif;
+#endif
 unsigned typemask   : 32;
 unsigned flags  : 8;
 unsigned nr_in  : 8;
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 9b24b4d863..d6a3036412 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -552,8 +552,6 @@ static TCGHelperInfo all_helpers[] = {
 static GHashTable *helper_table;
 
 #ifdef CONFIG_TCG_INTERPRETER
-static GHashTable *ffi_table;
-
 static ffi_type *typecode_to_ffi(int argmask)
 {
 switch (argmask) {
@@ -576,9 +574,11 @@ static ffi_type *typecode_to_ffi(int argmask)
 static void init_ffi_layouts(void)
 {
 /* g_direct_hash/equal for direct comparisons on uint32_t.  */
-ffi_table = g_hash_table_new(NULL, NULL);
+GHashTable *ffi_table = g_hash_table_new(NULL, NULL);
+
 for (int i = 0; i < ARRAY_SIZE(all_helpers); ++i) {
-uint32_t typemask = all_helpers[i].typemask;
+TCGHelperInfo *info = _helpers[i];
+unsigned typemask = info->typemask;
 gpointer hash = (gpointer)(uintptr_t)typemask;
 struct {
 ffi_cif cif;
@@ -586,8 +586,11 @@ static void init_ffi_layouts(void)
 } *ca;
 ffi_status status;
 int nargs;
+ffi_cif *cif;
 
-if (g_hash_table_lookup(ffi_table, hash)) {
+cif = g_hash_table_lookup(ffi_table, hash);
+if (cif) {
+info->cif = cif;
 continue;
 }
 
@@ -611,8 +614,12 @@ static void init_ffi_layouts(void)
   ca->cif.rtype, ca->cif.arg_types);
 assert(status == FFI_OK);
 
-g_hash_table_insert(ffi_table, hash, (gpointer)>cif);
+cif = >cif;
+info->cif = cif;
+g_hash_table_insert(ffi_table, hash, (gpointer)cif);
 }
+
+g_hash_table_destroy(ffi_table);
 }
 #endif /* CONFIG_TCG_INTERPRETER */
 
@@ -4413,12 +4420,7 @@ static void tcg_reg_alloc_call(TCGContext *s, TCGOp *op)
 }
 
 #ifdef CONFIG_TCG_INTERPRETER
-{
-gpointer hash = (gpointer)(uintptr_t)info->typemask;
-ffi_cif *cif = g_hash_table_lookup(ffi_table, hash);
-assert(cif != NULL);
-tcg_out_call(s, tcg_call_func(op), cif);
-}
+tcg_out_call(s, tcg_call_func(op), info->cif);
 #else
 tcg_out_call(s, tcg_call_func(op));
 #endif
-- 
2.38.1

[PATCH 0/3] tcg: Move ffi_cif pointer into TCGHelperInfo (splitted)

2022-11-22 Thread Philippe Mathieu-Daudé

Hi Richard,

Your "Move ffi_cif pointer into TCGHelperInfo" patch was
not obvious (to me) at first, so I split it in 3 trivial
patches.

Philippe Mathieu-Daudé (2):
  tcg: Convert typecode_to_ffi from array to function
  tcg: Factor init_ffi_layouts() out of tcg_context_init()

Richard Henderson (1):
  tcg: Move ffi_cif pointer into TCGHelperInfo

 tcg/tcg-internal.h |   7 +++
 tcg/tcg.c  | 125 +
 2 files changed, 78 insertions(+), 54 deletions(-)

-- 
2.38.1

Re: [PATCH for-8.0 07/29] accel/tcg: Honor atomicity of loads

2022-11-22 Thread Richard Henderson


On 11/22/22 06:35, Peter Maydell wrote:

+/*
+ * Here we have the architectural atomicity of the operation.
+ * However, when executing in a serial context, we need no extra
+ * host atomicity in order to avoid racing.  This reduction
+ * avoids looping with cpu_loop_exit_atomic.
+ */
+if (cpu_in_serial_context(env_cpu(env))) {


Is it OK to use cpu_in_serial_context() here ? Even if
there's no other vCPU executing in parallel, there might
be device model code doing a memory write in the iothread,
I think.


Well, it's no different from how we currently treat compare-and-swap expansion.  But you 
have a point -- we should probably be doing something with the iothread lock for both 
EXCP_ATOMIC and round-robin mode.



r~

Re: [PATCH] vhost: mask VIRTIO_F_RING_RESET for vhost and vhost-user devices

2022-11-22 Thread Eugenio Perez Martin

On Tue, Nov 22, 2022 at 4:13 AM Jason Wang  wrote:
>
> On Mon, Nov 21, 2022 at 6:11 PM Stefano Garzarella  
> wrote:
> >
> > Commit 69e1c14aa2 ("virtio: core: vq reset feature negotation support")
> > enabled VIRTIO_F_RING_RESET by default for all virtio devices.
> >
> > This feature is not currently emulated by QEMU, so for vhost and
> > vhost-user devices we need to make sure it is supported by the offloaded
> > device emulation (in-kernel or in another process).
> > To do this we need to add VIRTIO_F_RING_RESET to the features bitmap
> > passed to vhost_get_features(). This way it will be masked if the device
> > does not support it.
> >
> > This issue was initially discovered with vhost-vsock and vhost-user-vsock,
> > and then also tested with vhost-user-rng which confirmed the same issue.
> > They fail when sending features through VHOST_SET_FEATURES ioctl or
> > VHOST_USER_SET_FEATURES message, since VIRTIO_F_RING_RESET is negotiated
> > by the guest (Linux >= v6.0), but not supported by the device.
> >
> > Fixes: 69e1c14aa2 ("virtio: core: vq reset feature negotation support")
> > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1318
> > Signed-off-by: Stefano Garzarella 
>
> Acked-by: Jason Wang 
>
> > ---
> >
> > To prevent this problem in the future, perhaps we should provide a function
> > (e.g. vhost_device_get_features) where we go to mask all non-device-specific
> > features (e.g VIRTIO_F_*, VIRTIO_RING_F_*) that are not emulated by QEMU but
> > we expect them to be emulated by the vhost or vhost-user devices.
>
> Adding Eugenio, this might not be true if we want to use shadow
> virtqueue for emulating some features?
>

I think we should be able to introduce that in the (hypothetical)
vhost_device_get_features if SVQ starts emulating a ring feature,
isn't it? I think a first version not aware of SVQ is fine anyway, and
to introduce it later should be easy.

Thanks!

> Thanks
>
> > Then we can call it in all .get_features callbacks just before return the
> > features.
> >
> > What do you think?
> >
> > But maybe better to do that for the next release, I will send an RFC.
> >
> > Thanks,
> > Stefano
> > ---
> >  hw/block/vhost-user-blk.c  |  1 +
> >  hw/net/vhost_net.c |  1 +
> >  hw/scsi/vhost-scsi.c   |  1 +
> >  hw/scsi/vhost-user-scsi.c  |  1 +
> >  hw/virtio/vhost-user-fs.c  |  1 +
> >  hw/virtio/vhost-user-gpio.c|  1 +
> >  hw/virtio/vhost-user-i2c.c |  1 +
> >  hw/virtio/vhost-user-rng.c | 11 +--
> >  hw/virtio/vhost-vsock-common.c |  1 +
> >  net/vhost-vdpa.c   |  1 +
> >  10 files changed, 18 insertions(+), 2 deletions(-)
> >
> > diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
> > index 16ad400889..0d5190accf 100644
> > --- a/hw/block/vhost-user-blk.c
> > +++ b/hw/block/vhost-user-blk.c
> > @@ -52,6 +52,7 @@ static const int user_feature_bits[] = {
> >  VIRTIO_F_NOTIFY_ON_EMPTY,
> >  VIRTIO_F_RING_PACKED,
> >  VIRTIO_F_IOMMU_PLATFORM,
> > +VIRTIO_F_RING_RESET,
> >  VHOST_INVALID_FEATURE_BIT
> >  };
> >
> > diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
> > index feda448878..26e4930676 100644
> > --- a/hw/net/vhost_net.c
> > +++ b/hw/net/vhost_net.c
> > @@ -75,6 +75,7 @@ static const int user_feature_bits[] = {
> >  VIRTIO_NET_F_MTU,
> >  VIRTIO_F_IOMMU_PLATFORM,
> >  VIRTIO_F_RING_PACKED,
> > +VIRTIO_F_RING_RESET,
> >  VIRTIO_NET_F_RSS,
> >  VIRTIO_NET_F_HASH_REPORT,
> >
> > diff --git a/hw/scsi/vhost-scsi.c b/hw/scsi/vhost-scsi.c
> > index bdf337a7a2..6a0fd0dfb1 100644
> > --- a/hw/scsi/vhost-scsi.c
> > +++ b/hw/scsi/vhost-scsi.c
> > @@ -38,6 +38,7 @@ static const int kernel_feature_bits[] = {
> >  VIRTIO_RING_F_INDIRECT_DESC,
> >  VIRTIO_RING_F_EVENT_IDX,
> >  VIRTIO_SCSI_F_HOTPLUG,
> > +VIRTIO_F_RING_RESET,
> >  VHOST_INVALID_FEATURE_BIT
> >  };
> >
> > diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c
> > index bc37317d55..b7a71a802c 100644
> > --- a/hw/scsi/vhost-user-scsi.c
> > +++ b/hw/scsi/vhost-user-scsi.c
> > @@ -36,6 +36,7 @@ static const int user_feature_bits[] = {
> >  VIRTIO_RING_F_INDIRECT_DESC,
> >  VIRTIO_RING_F_EVENT_IDX,
> >  VIRTIO_SCSI_F_HOTPLUG,
> > +VIRTIO_F_RING_RESET,
> >  VHOST_INVALID_FEATURE_BIT
> >  };
> >
> > diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
> > index 1c40f42045..dc4014cdef 100644
> > --- a/hw/virtio/vhost-user-fs.c
> > +++ b/hw/virtio/vhost-user-fs.c
> > @@ -32,6 +32,7 @@ static const int user_feature_bits[] = {
> >  VIRTIO_F_NOTIFY_ON_EMPTY,
> >  VIRTIO_F_RING_PACKED,
> >  VIRTIO_F_IOMMU_PLATFORM,
> > +VIRTIO_F_RING_RESET,
> >
> >  VHOST_INVALID_FEATURE_BIT
> >  };
> > diff --git a/hw/virtio/vhost-user-gpio.c b/hw/virtio/vhost-user-gpio.c
> > index 677d1c7730..5851cb3bc9 100644
> > --- a/hw/virtio/vhost-user-gpio.c
> > +++ b/hw/virtio/vhost-user-gpio.c
> > @@ -24,6 +24,7 @@ static const int feature_bits[] =

Re: [RFC 2/7] migration: No save_live_pending() method uses the QEMUFile parameter

2022-11-22 Thread Dr. David Alan Gilbert

* Juan Quintela (quint...@redhat.com) wrote:
> So remove it everywhere.
> 
> Signed-off-by: Juan Quintela 

Reviewed-by: Dr. David Alan Gilbert 

> ---
>  include/migration/register.h   | 6 ++
>  migration/savevm.h | 2 +-
>  hw/s390x/s390-stattrib.c   | 2 +-
>  hw/vfio/migration.c| 6 ++
>  migration/block-dirty-bitmap.c | 5 ++---
>  migration/block.c  | 2 +-
>  migration/migration.c  | 3 +--
>  migration/ram.c| 2 +-
>  migration/savevm.c | 5 ++---
>  9 files changed, 13 insertions(+), 20 deletions(-)
> 
> diff --git a/include/migration/register.h b/include/migration/register.h
> index 1950fee6a8..5b5424ed8f 100644
> --- a/include/migration/register.h
> +++ b/include/migration/register.h
> @@ -46,10 +46,8 @@ typedef struct SaveVMHandlers {
>  
>  /* This runs outside the iothread lock!  */
>  int (*save_setup)(QEMUFile *f, void *opaque);
> -void (*save_live_pending)(QEMUFile *f, void *opaque,
> -  uint64_t threshold_size,
> -  uint64_t *rest_precopy,
> -  uint64_t *rest_postcopy);
> +void (*save_live_pending)(void *opaque,  uint64_t threshold_size,
> +  uint64_t *rest_precopy, uint64_t 
> *rest_postcopy);
>  /* Note for save_live_pending:
>   * - res_precopy is for data which must be migrated in precopy
>   * phase or in stopped state, in other words - before target
> diff --git a/migration/savevm.h b/migration/savevm.h
> index 9bd55c336c..98fae6f9b3 100644
> --- a/migration/savevm.h
> +++ b/migration/savevm.h
> @@ -40,7 +40,7 @@ void qemu_savevm_state_cleanup(void);
>  void qemu_savevm_state_complete_postcopy(QEMUFile *f);
>  int qemu_savevm_state_complete_precopy(QEMUFile *f, bool iterable_only,
> bool inactivate_disks);
> -void qemu_savevm_state_pending(QEMUFile *f, uint64_t max_size,
> +void qemu_savevm_state_pending(uint64_t max_size,
> uint64_t *res_precopy, uint64_t 
> *res_postcopy);
>  void qemu_savevm_send_ping(QEMUFile *f, uint32_t value);
>  void qemu_savevm_send_open_return_path(QEMUFile *f);
> diff --git a/hw/s390x/s390-stattrib.c b/hw/s390x/s390-stattrib.c
> index ee60b53da4..9b74eeadf3 100644
> --- a/hw/s390x/s390-stattrib.c
> +++ b/hw/s390x/s390-stattrib.c
> @@ -182,7 +182,7 @@ static int cmma_save_setup(QEMUFile *f, void *opaque)
>  return 0;
>  }
>  
> -static void cmma_save_pending(QEMUFile *f, void *opaque, uint64_t max_size,
> +static void cmma_save_pending(void *opaque, uint64_t max_size,
>uint64_t *res_precopy, uint64_t *res_postcopy)
>  {
>  S390StAttribState *sas = S390_STATTRIB(opaque);
> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> index 3423f113f0..760d5f3c5c 100644
> --- a/hw/vfio/migration.c
> +++ b/hw/vfio/migration.c
> @@ -456,10 +456,8 @@ static void vfio_save_cleanup(void *opaque)
>  trace_vfio_save_cleanup(vbasedev->name);
>  }
>  
> -static void vfio_save_pending(QEMUFile *f, void *opaque,
> -  uint64_t threshold_size,
> -  uint64_t *res_precopy,
> -  uint64_t *res_postcopy)
> +static void vfio_save_pending(void *opaque,  uint64_t threshold_size,
> +  uint64_t *res_precopy, uint64_t *res_postcopy)
>  {
>  VFIODevice *vbasedev = opaque;
>  VFIOMigration *migration = vbasedev->migration;
> diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
> index dfea546330..a445bdc3c3 100644
> --- a/migration/block-dirty-bitmap.c
> +++ b/migration/block-dirty-bitmap.c
> @@ -761,9 +761,8 @@ static int dirty_bitmap_save_complete(QEMUFile *f, void 
> *opaque)
>  return 0;
>  }
>  
> -static void dirty_bitmap_save_pending(QEMUFile *f, void *opaque,
> -  uint64_t max_size,
> -  uint64_t *res_precopy,
> +static void dirty_bitmap_save_pending(void *opaque, uint64_t max_size,
> +  uint64_t *res_precopy, 
>uint64_t *res_postcopy)
>  {
>  DBMSaveState *s = &((DBMState *)opaque)->save;
> diff --git a/migration/block.c b/migration/block.c
> index 4ae8f837b0..b3d680af75 100644
> --- a/migration/block.c
> +++ b/migration/block.c
> @@ -862,7 +862,7 @@ static int block_save_complete(QEMUFile *f, void *opaque)
>  return 0;
>  }
>  
> -static void block_save_pending(QEMUFile *f, void *opaque, uint64_t max_size,
> +static void block_save_pending(void *opaque, uint64_t max_size,
> uint64_t *res_precopy,
> uint64_t *res_postcopy)
>  {
> diff --git a/migration/migration.c b/migration/migration.c
> index 440aa62f16..038fc58a96 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@

Re: [PATCH v3 1/2] io: Add support for MSG_PEEK for socket channel

2022-11-22 Thread Daniel P . Berrangé

On Tue, Nov 22, 2022 at 12:16:08PM -0500, Peter Xu wrote:
> On Tue, Nov 22, 2022 at 10:12:25PM +0530, manish.mishra wrote:
> > 
> > On 22/11/22 10:03 pm, Peter Xu wrote:
> > > On Tue, Nov 22, 2022 at 11:29:05AM -0500, Peter Xu wrote:
> > > > On Tue, Nov 22, 2022 at 11:10:18AM -0500, Peter Xu wrote:
> > > > > On Tue, Nov 22, 2022 at 09:01:59PM +0530, manish.mishra wrote:
> > > > > > On 22/11/22 8:19 pm, Daniel P. Berrangé wrote:
> > > > > > > On Tue, Nov 22, 2022 at 09:41:01AM -0500, Peter Xu wrote:
> > > > > > > > On Tue, Nov 22, 2022 at 02:38:53PM +0530, manish.mishra wrote:
> > > > > > > > > On 22/11/22 2:30 pm, Daniel P. Berrangé wrote:
> > > > > > > > > > On Sat, Nov 19, 2022 at 09:36:14AM +, manish.mishra 
> > > > > > > > > > wrote:
> > > > > > > > > > > MSG_PEEK reads from the peek of channel, The data is 
> > > > > > > > > > > treated as
> > > > > > > > > > > unread and the next read shall still return this data. 
> > > > > > > > > > > This
> > > > > > > > > > > support is currently added only for socket class. Extra 
> > > > > > > > > > > parameter
> > > > > > > > > > > 'flags' is added to io_readv calls to pass extra read 
> > > > > > > > > > > flags like
> > > > > > > > > > > MSG_PEEK.
> > > > > > > > > > > 
> > > > > > > > > > > Suggested-by: Daniel P. Berrangé  > > > > > > > > > > Signed-off-by: manish.mishra
> > > > > > > > > > > ---
> > > > > > > > > > > chardev/char-socket.c   |  4 +-
> > > > > > > > > > > include/io/channel.h| 83 
> > > > > > > > > > > +
> > > > > > > > > > > io/channel-buffer.c |  1 +
> > > > > > > > > > > io/channel-command.c|  1 +
> > > > > > > > > > > io/channel-file.c   |  1 +
> > > > > > > > > > > io/channel-null.c   |  1 +
> > > > > > > > > > > io/channel-socket.c | 16 +-
> > > > > > > > > > > io/channel-tls.c|  1 +
> > > > > > > > > > > io/channel-websock.c|  1 +
> > > > > > > > > > > io/channel.c| 73 
> > > > > > > > > > > +++--
> > > > > > > > > > > migration/channel-block.c   |  1 +
> > > > > > > > > > > scsi/qemu-pr-helper.c   |  2 +-
> > > > > > > > > > > tests/qtest/tpm-emu.c   |  2 +-
> > > > > > > > > > > tests/unit/test-io-channel-socket.c |  1 +
> > > > > > > > > > > util/vhost-user-server.c|  2 +-
> > > > > > > > > > > 15 files changed, 179 insertions(+), 11 deletions(-)
> > > > > > > > > > > diff --git a/io/channel-socket.c b/io/channel-socket.c
> > > > > > > > > > > index b76dca9cc1..a06b24766d 100644
> > > > > > > > > > > --- a/io/channel-socket.c
> > > > > > > > > > > +++ b/io/channel-socket.c
> > > > > > > > > > > @@ -406,6 +406,8 @@ 
> > > > > > > > > > > qio_channel_socket_accept(QIOChannelSocket *ioc,
> > > > > > > > > > > }
> > > > > > > > > > > #endif /* WIN32 */
> > > > > > > > > > > +qio_channel_set_feature(QIO_CHANNEL(cioc), 
> > > > > > > > > > > QIO_CHANNEL_FEATURE_READ_MSG_PEEK);
> > > > > > > > > > > +
> > > > > > > > > > This covers the incoming server side socket.
> > > > > > > > > > 
> > > > > > > > > > This also needs to be set in outgoing client side socket in
> > > > > > > > > > qio_channel_socket_connect_async
> > > > > > > > > Yes sorry, i considered only current use-case, but as it is 
> > > > > > > > > generic one both should be there. Thanks will update it.
> > > > > > > > > 
> > > > > > > > > > > @@ -705,7 +718,6 @@ static ssize_t 
> > > > > > > > > > > qio_channel_socket_writev(QIOChannel *ioc,
> > > > > > > > > > > }
> > > > > > > > > > > #endif /* WIN32 */
> > > > > > > > > > > -
> > > > > > > > > > > #ifdef QEMU_MSG_ZEROCOPY
> > > > > > > > > > > static int qio_channel_socket_flush(QIOChannel *ioc,
> > > > > > > > > > > Error **errp)
> > > > > > > > > > Please remove this unrelated whitespace change.
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > > @@ -109,6 +117,37 @@ int 
> > > > > > > > > > > qio_channel_readv_all_eof(QIOChannel *ioc,
> > > > > > > > > > > return qio_channel_readv_full_all_eof(ioc, iov, 
> > > > > > > > > > > niov, NULL, NULL, errp);
> > > > > > > > > > > }
> > > > > > > > > > > +int qio_channel_readv_peek_all_eof(QIOChannel *ioc,
> > > > > > > > > > > +   const struct iovec 
> > > > > > > > > > > *iov,
> > > > > > > > > > > +   size_t niov,
> > > > > > > > > > > +   Error **errp)
> > > > > > > > > > > +{
> > > > > > > > > > > +   ssize_t len = 0;
> > > > > > > > > > > +   ssize_t total = iov_size(iov, niov);
> > > > > > > > > > > +
> > > > > > > > > > > +   while (len < total) {
> > > > > > > > > > > +   len = qio_channel_readv_full(ioc, iov, niov, NULL,
> > > > > > > > > > > +

Re: [PATCH for-7.2 1/5] hw/nvme: fix aio cancel in format

2022-11-22 Thread Keith Busch

On Tue, Nov 22, 2022 at 09:13:44AM +0100, Klaus Jensen wrote:
> There are several bugs in the async cancel code for the Format command.
> 
> Firstly, cancelling a format operation neglects to set iocb->ret as well
> as clearing the iocb->aiocb after cancelling the underlying aiocb which
> causes the aio callback to ignore the cancellation. Trivial fix.
> 
> Secondly, and worse, because the request is queued up for posting to the
> CQ in a bottom half, if the cancellation is due to the submission queue
> being deleted (which calls blk_aio_cancel), the req structure is
> deallocated in nvme_del_sq prior to the bottom half being schedulued.
> 
> Fix this by simply removing the bottom half, there is no reason to defer
> it anyway.

I thought for sure I'd find a reason defered execution was needed, but
hey, it looks perfectly fine with this change!
 
> diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
> index ac3885ce5079..26b53469328f 100644
> --- a/hw/nvme/ctrl.c
> +++ b/hw/nvme/ctrl.c
> @@ -5756,14 +5756,15 @@ typedef struct NvmeFormatAIOCB {
>  uint8_t pil;
>  } NvmeFormatAIOCB;

I think you can remove the QEMUBH member from the above struct with this
patch.

Otherwise looks good.

Reviewed-by: Keith Busch

Re: [PATCH v3 1/2] io: Add support for MSG_PEEK for socket channel

2022-11-22 Thread Peter Xu

On Tue, Nov 22, 2022 at 10:12:25PM +0530, manish.mishra wrote:
> 
> On 22/11/22 10:03 pm, Peter Xu wrote:
> > On Tue, Nov 22, 2022 at 11:29:05AM -0500, Peter Xu wrote:
> > > On Tue, Nov 22, 2022 at 11:10:18AM -0500, Peter Xu wrote:
> > > > On Tue, Nov 22, 2022 at 09:01:59PM +0530, manish.mishra wrote:
> > > > > On 22/11/22 8:19 pm, Daniel P. Berrangé wrote:
> > > > > > On Tue, Nov 22, 2022 at 09:41:01AM -0500, Peter Xu wrote:
> > > > > > > On Tue, Nov 22, 2022 at 02:38:53PM +0530, manish.mishra wrote:
> > > > > > > > On 22/11/22 2:30 pm, Daniel P. Berrangé wrote:
> > > > > > > > > On Sat, Nov 19, 2022 at 09:36:14AM +, manish.mishra wrote:
> > > > > > > > > > MSG_PEEK reads from the peek of channel, The data is 
> > > > > > > > > > treated as
> > > > > > > > > > unread and the next read shall still return this data. This
> > > > > > > > > > support is currently added only for socket class. Extra 
> > > > > > > > > > parameter
> > > > > > > > > > 'flags' is added to io_readv calls to pass extra read flags 
> > > > > > > > > > like
> > > > > > > > > > MSG_PEEK.
> > > > > > > > > > 
> > > > > > > > > > Suggested-by: Daniel P. Berrangé  > > > > > > > > > Signed-off-by: manish.mishra
> > > > > > > > > > ---
> > > > > > > > > > chardev/char-socket.c   |  4 +-
> > > > > > > > > > include/io/channel.h| 83 
> > > > > > > > > > +
> > > > > > > > > > io/channel-buffer.c |  1 +
> > > > > > > > > > io/channel-command.c|  1 +
> > > > > > > > > > io/channel-file.c   |  1 +
> > > > > > > > > > io/channel-null.c   |  1 +
> > > > > > > > > > io/channel-socket.c | 16 +-
> > > > > > > > > > io/channel-tls.c|  1 +
> > > > > > > > > > io/channel-websock.c|  1 +
> > > > > > > > > > io/channel.c| 73 
> > > > > > > > > > +++--
> > > > > > > > > > migration/channel-block.c   |  1 +
> > > > > > > > > > scsi/qemu-pr-helper.c   |  2 +-
> > > > > > > > > > tests/qtest/tpm-emu.c   |  2 +-
> > > > > > > > > > tests/unit/test-io-channel-socket.c |  1 +
> > > > > > > > > > util/vhost-user-server.c|  2 +-
> > > > > > > > > > 15 files changed, 179 insertions(+), 11 deletions(-)
> > > > > > > > > > diff --git a/io/channel-socket.c b/io/channel-socket.c
> > > > > > > > > > index b76dca9cc1..a06b24766d 100644
> > > > > > > > > > --- a/io/channel-socket.c
> > > > > > > > > > +++ b/io/channel-socket.c
> > > > > > > > > > @@ -406,6 +406,8 @@ 
> > > > > > > > > > qio_channel_socket_accept(QIOChannelSocket *ioc,
> > > > > > > > > > }
> > > > > > > > > > #endif /* WIN32 */
> > > > > > > > > > +qio_channel_set_feature(QIO_CHANNEL(cioc), 
> > > > > > > > > > QIO_CHANNEL_FEATURE_READ_MSG_PEEK);
> > > > > > > > > > +
> > > > > > > > > This covers the incoming server side socket.
> > > > > > > > > 
> > > > > > > > > This also needs to be set in outgoing client side socket in
> > > > > > > > > qio_channel_socket_connect_async
> > > > > > > > Yes sorry, i considered only current use-case, but as it is 
> > > > > > > > generic one both should be there. Thanks will update it.
> > > > > > > > 
> > > > > > > > > > @@ -705,7 +718,6 @@ static ssize_t 
> > > > > > > > > > qio_channel_socket_writev(QIOChannel *ioc,
> > > > > > > > > > }
> > > > > > > > > > #endif /* WIN32 */
> > > > > > > > > > -
> > > > > > > > > > #ifdef QEMU_MSG_ZEROCOPY
> > > > > > > > > > static int qio_channel_socket_flush(QIOChannel *ioc,
> > > > > > > > > > Error **errp)
> > > > > > > > > Please remove this unrelated whitespace change.
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > > @@ -109,6 +117,37 @@ int 
> > > > > > > > > > qio_channel_readv_all_eof(QIOChannel *ioc,
> > > > > > > > > > return qio_channel_readv_full_all_eof(ioc, iov, 
> > > > > > > > > > niov, NULL, NULL, errp);
> > > > > > > > > > }
> > > > > > > > > > +int qio_channel_readv_peek_all_eof(QIOChannel *ioc,
> > > > > > > > > > +   const struct iovec *iov,
> > > > > > > > > > +   size_t niov,
> > > > > > > > > > +   Error **errp)
> > > > > > > > > > +{
> > > > > > > > > > +   ssize_t len = 0;
> > > > > > > > > > +   ssize_t total = iov_size(iov, niov);
> > > > > > > > > > +
> > > > > > > > > > +   while (len < total) {
> > > > > > > > > > +   len = qio_channel_readv_full(ioc, iov, niov, NULL,
> > > > > > > > > > +NULL, 
> > > > > > > > > > QIO_CHANNEL_READ_FLAG_MSG_PEEK, errp);
> > > > > > > > > > +
> > > > > > > > > > +   if (len == QIO_CHANNEL_ERR_BLOCK) {
> > > > > > > > > > +if (qemu_in_coroutine()) {
> > > > > > > > > > +

Re: [PATCH 1/2] hw/xen/xen_pt: Call default handler only if no custom one is set

2022-11-22 Thread Anthony PERARD via

On Mon, Nov 14, 2022 at 08:20:10PM +0100, Marek Marczykowski-Górecki wrote:
> diff --git a/hw/xen/xen_pt.c b/hw/xen/xen_pt.c
> index 0ec7e52183..269bd26109 100644
> --- a/hw/xen/xen_pt.c
> +++ b/hw/xen/xen_pt.c
> @@ -255,6 +255,7 @@ static void xen_pt_pci_write_config(PCIDevice *d, 
> uint32_t addr,
>  uint32_t find_addr = addr;
>  XenPTRegInfo *reg = NULL;
>  bool wp_flag = false;
> +uint32_t emul_mask = 0, write_val;
>  
>  if (xen_pt_pci_config_access_check(d, addr, len)) {
>  return;
> @@ -310,7 +311,6 @@ static void xen_pt_pci_write_config(PCIDevice *d, 
> uint32_t addr,
>  }
>  
>  memory_region_transaction_begin();
> -pci_default_write_config(d, addr, val, len);
>  
>  /* adjust the read and write value to appropriate CFC-CFF window */
>  read_val <<= (addr & 3) << 3;
> @@ -370,6 +370,8 @@ static void xen_pt_pci_write_config(PCIDevice *d, 
> uint32_t addr,
>  return;
>  }
>  
> +emul_mask |= ( (1 << (reg->size * 8) ) - 1 ) << ((find_addr & 3) 
> * 8);
> +
>  /* calculate next address to find */
>  emul_len -= reg->size;
>  if (emul_len > 0) {
> @@ -396,6 +398,24 @@ static void xen_pt_pci_write_config(PCIDevice *d, 
> uint32_t addr,
>  /* need to shift back before passing them to xen_host_pci_set_block. */
>  val >>= (addr & 3) << 3;
>  
> +/* store emulated registers that didn't have specific hooks */
> +write_val = val;
> +for (index = 0; emul_mask; index += emul_len) {

`index` isn't used, was it meant to be use for something?

> +emul_len = 0;
> +while (emul_mask & 0xff) {
> +emul_len++;

This seems to count the number of byte that have a hook
(xen_pt_find_reg() found a `reg_entry`).
This loop should count instead the number of bytes for which no
`reg_entry` have been found, right? Shouldn't the loop count when a byte
in emul_mask is unset?

> +emul_mask >>= 8;
> +}
> +if (emul_len) {
> +uint32_t mask = ((1 << (emul_len * 8)) - 1);
> +pci_default_write_config(d, addr, write_val & mask, emul_len);

`addr` isn't updated in the loop, aren't we going to write bytes to the
wrong place? If for example "emul_mask == 0x00ff00ff" ?

> +write_val >>= emul_len * 8;
> +} else {
> +emul_mask >>= 8;
> +write_val >>= 8;
> +}
> +}

Thanks,

-- 
Anthony PERARD

Re: [PULL for 7.2-rc2 0/3] loongarch for 7.2-rc2 patches

2022-11-22 Thread Stefan Hajnoczi

Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/7.2 for any 
user-visible changes.


signature.asc
Description: PGP signature

Re: [PULL 0/8] pc,virtio: regression, test fixes

2022-11-22 Thread Stefan Hajnoczi

Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/7.2 for any 
user-visible changes.


signature.asc
Description: PGP signature

Re: [PULL v2 for 7.2-rc2 00/11] testing and doc updates

2022-11-22 Thread Stefan Hajnoczi

Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/7.2 for any 
user-visible changes.


signature.asc
Description: PGP signature

Re: Plugin Memory Callback Debugging

2022-11-22 Thread Aaron Lindsay via

On Nov 21 22:02, Alex Bennée wrote:
> 
> Aaron Lindsay  writes:
> 
> > Sorry, left off the very end of my timeline:
> >
> > On Nov 18 16:58, Aaron Lindsay wrote:
> >> I have, so far, discovered the following timeline:
> >> 1. My plugin receives a instruction execution callback for a load
> >>instruction. At this time, cpu->plugin_mem_cbs points to the same
> >>memory which will later be freed
> >> 2. During the handling of this callback, my plugin calls
> >qemu_plugin_reset()
> 
> The final plugin reset should only execute in the safe async context
> (i.e. no other vCPUs running code). That flushes all current generated
> code.
> 
> >> 3. Ostensibly something goes wrong here with the cleanup of
> >>cpu->plugin_mem_cbs???
> 
> This may be missed by the reset path (hence your patch) but it should be
> being reset every instruction we instrument.
> 
> >> 4. Step 2 triggers the TBs to be flushed, which frees the memory pointed
> >>to by cpu->plugin_mem_cbs 
> >
> > 5. A store exclusive instruction is translated and then executed, which
> >requires the use of a helper. When executed, this helper checks
> >cpu->plugin_mem_cbs, which is non-null, so it attempts to dereference
> >and use it, resulting in the assertion.
> 
> It should be being reset for each instruction I think.

FYI - I suspect my above presentation of the problem suffered from the
"searching where the streetlamp is instead of where you lost something"
problem. In other words, I did/do observe the error at reset, but I now
believe that is merely where it is easiest to observe. The
cpu->plugin_mem_cbs doesn't appear to be reset at the end of
instructions and manifests at reset because that's when the underlying
memory is freed.

-Aaron

Re: [PATCH] multifd: Updated QAPI format for 'migrate' qemu monitor command

2022-11-22 Thread Daniel P . Berrangé

On Tue, Nov 22, 2022 at 09:27:52PM +0530, manish.mishra wrote:
> 
> On 22/11/22 2:53 pm, Daniel P. Berrangé wrote:
> > For our future sanity I think we need to define a brand new migration
> > protocol which is bidirectional from the start. A large number of the
> > MigrateParameters would become obsolete immediately, as QEMU could
> > negotiate them automatically. This would let QEMU introduce new
> > migration features without requiring any work in libvirt in many
> > cases. Libvirt should only be required when there are performance
> > tunables that need exposing, never for protocol feature negotiation.
> > 
> > I think introducing a new QMP command, without introducing a fully
> > new protocol would be a big mistake as the QMP command is not the
> > problem we have.
> > 
> 
> Daniel, Totally agree on above mentioned discussion. Does this
> account for verifying other things too along with migration
> capabilities e.g. if qemu machine type, vm config or cpu
> features are excatly same of both side. Currently libvirt
> takes that responsibility, but as you mentioned libvirt may
> take some time to get to level where qemu is so causing
> potential issues till then. Similar to this discussion we
> had on libvirt list 
> https://www.mail-archive.com/libvir-list@redhat.com/msg233725.html.
> If these things too were managed by qemu indepenedelty
> things could have been much better. I mean those too are
> kind of related to live migration. :)

Today libvirt has no choice, because if there's an ABI compat
mistake in the dest QEMU config, vs the src QEMU config, then
when loading VMstate on the target, the dst QEMU will often be
unable to de-serialize the migration device stream and end
up printing an error on stderr and exiting. Getting that info
back to the src QEMU is impossible.

If QEMU had a bi-directional migration stream, then the dst
QEMU could simply send an error message back to the src QEMU
and 'query-migrate' could actually display this.

There may still be validation libvirt wants to do as well,
but at least there would be the possiblity of offloading
to QEMU without sacrificing error reporting.

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH for-8.0 v3 14/45] tcg: Introduce tcg_type_size

2022-11-22 Thread Richard Henderson


On 11/22/22 03:30, Philippe Mathieu-Daudé wrote:

On 11/11/22 08:40, Richard Henderson wrote:

Add a helper function for computing the size of a type.

Signed-off-by: Richard Henderson 
---
  include/tcg/tcg.h | 16 
  tcg/tcg.c | 26 --
  2 files changed, 28 insertions(+), 14 deletions(-)

diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
index f2da340bb9..8bcd60d0ed 100644
--- a/include/tcg/tcg.h
+++ b/include/tcg/tcg.h
@@ -319,6 +319,22 @@ typedef enum TCGType {
  #endif
  } TCGType;
+/**
+ * tcg_type_size
+ * @t: type
+ *
+ * Return the size of the type in bytes.
+ */
+static inline int tcg_type_size(TCGType t)
+{
+    unsigned i = t;
+    if (i >= TCG_TYPE_V64) {
+    tcg_debug_assert(i < TCG_TYPE_COUNT);
+    i -= TCG_TYPE_V64 - 1;
+    }
+    return 4 << i;


I'd feel safer if we assign TCG_TYPE_I32 .. TCG_TYPE_V256 in TCGType,
just in case.


What do you mean?

r~

Re: [PATCH] vhost-vdpa: skip TPM CRB memory section

2022-11-22 Thread Eugenio Perez Martin

On Tue, Nov 22, 2022 at 3:53 PM  wrote:
>
> From: Marc-André Lureau 
>
> 851d6d1a0f ("vfio/common: remove spurious tpm-crb-cmd misalignment
> warning") removed the warning on vfio_listener_region_add() path.
>
> An error is reported for vhost-vdpa case:
> qemu-kvm: vhost_vdpa_listener_region_add received unaligned region
>
> Skip the CRB device.
>
> Fixes:
> https://bugzilla.redhat.com/show_bug.cgi?id=2141965
>
> Signed-off-by: Marc-André Lureau 

Acked-by: Eugenio Pérez 

And CCing Jason too.

> ---
>  hw/virtio/vhost-vdpa.c | 6 ++
>  1 file changed, 6 insertions(+)
>
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index 7468e44b87..9d7206e4b8 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -19,6 +19,7 @@
>  #include "hw/virtio/virtio-net.h"
>  #include "hw/virtio/vhost-shadow-virtqueue.h"
>  #include "hw/virtio/vhost-vdpa.h"
> +#include "sysemu/tpm.h"
>  #include "exec/address-spaces.h"
>  #include "migration/blocker.h"
>  #include "qemu/cutils.h"
> @@ -46,6 +47,11 @@ static bool 
> vhost_vdpa_listener_skipped_section(MemoryRegionSection *section,
>  {
>  Int128 llend;
>
> +if (TPM_IS_CRB(section->mr->owner)) {
> +/* The CRB command buffer has its base address unaligned. */
> +return true;
> +}
> +
>  if ((!memory_region_is_ram(section->mr) &&
>   !memory_region_is_iommu(section->mr)) ||
>  memory_region_is_protected(section->mr) ||
> --
> 2.38.1
>

Re: [PATCH v3 1/2] io: Add support for MSG_PEEK for socket channel

2022-11-22 Thread manish.mishra




On 22/11/22 10:03 pm, Peter Xu wrote:

On Tue, Nov 22, 2022 at 11:29:05AM -0500, Peter Xu wrote:

On Tue, Nov 22, 2022 at 11:10:18AM -0500, Peter Xu wrote:

On Tue, Nov 22, 2022 at 09:01:59PM +0530, manish.mishra wrote:

On 22/11/22 8:19 pm, Daniel P. Berrangé wrote:

On Tue, Nov 22, 2022 at 09:41:01AM -0500, Peter Xu wrote:

On Tue, Nov 22, 2022 at 02:38:53PM +0530, manish.mishra wrote:

On 22/11/22 2:30 pm, Daniel P. Berrangé wrote:

On Sat, Nov 19, 2022 at 09:36:14AM +, manish.mishra wrote:

MSG_PEEK reads from the peek of channel, The data is treated as
unread and the next read shall still return this data. This
support is currently added only for socket class. Extra parameter
'flags' is added to io_readv calls to pass extra read flags like
MSG_PEEK.

Suggested-by: Daniel P. Berrangé 
---
chardev/char-socket.c   |  4 +-
include/io/channel.h| 83 +
io/channel-buffer.c |  1 +
io/channel-command.c|  1 +
io/channel-file.c   |  1 +
io/channel-null.c   |  1 +
io/channel-socket.c | 16 +-
io/channel-tls.c|  1 +
io/channel-websock.c|  1 +
io/channel.c| 73 +++--
migration/channel-block.c   |  1 +
scsi/qemu-pr-helper.c   |  2 +-
tests/qtest/tpm-emu.c   |  2 +-
tests/unit/test-io-channel-socket.c |  1 +
util/vhost-user-server.c|  2 +-
15 files changed, 179 insertions(+), 11 deletions(-)
diff --git a/io/channel-socket.c b/io/channel-socket.c
index b76dca9cc1..a06b24766d 100644
--- a/io/channel-socket.c
+++ b/io/channel-socket.c
@@ -406,6 +406,8 @@ qio_channel_socket_accept(QIOChannelSocket *ioc,
}
#endif /* WIN32 */
+qio_channel_set_feature(QIO_CHANNEL(cioc), 
QIO_CHANNEL_FEATURE_READ_MSG_PEEK);
+

This covers the incoming server side socket.

This also needs to be set in outgoing client side socket in
qio_channel_socket_connect_async

Yes sorry, i considered only current use-case, but as it is generic one both 
should be there. Thanks will update it.


@@ -705,7 +718,6 @@ static ssize_t qio_channel_socket_writev(QIOChannel *ioc,
}
#endif /* WIN32 */
-
#ifdef QEMU_MSG_ZEROCOPY
static int qio_channel_socket_flush(QIOChannel *ioc,
Error **errp)

Please remove this unrelated whitespace change.



@@ -109,6 +117,37 @@ int qio_channel_readv_all_eof(QIOChannel *ioc,
return qio_channel_readv_full_all_eof(ioc, iov, niov, NULL, NULL, errp);
}
+int qio_channel_readv_peek_all_eof(QIOChannel *ioc,
+   const struct iovec *iov,
+   size_t niov,
+   Error **errp)
+{
+   ssize_t len = 0;
+   ssize_t total = iov_size(iov, niov);
+
+   while (len < total) {
+   len = qio_channel_readv_full(ioc, iov, niov, NULL,
+NULL, QIO_CHANNEL_READ_FLAG_MSG_PEEK, 
errp);
+
+   if (len == QIO_CHANNEL_ERR_BLOCK) {
+if (qemu_in_coroutine()) {
+qio_channel_yield(ioc, G_IO_IN);
+} else {
+qio_channel_wait(ioc, G_IO_IN);
+}
+continue;
+   }
+   if (len == 0) {
+   return 0;
+   }
+   if (len < 0) {
+   return -1;
+   }
+   }

This will busy wait burning CPU where there is a read > 0 and < total.


Daniel, i could use MSG_WAITALL too if that works but then we will lose 
opportunity to yield. Or if you have some other idea.

How easy would this happen?

Another alternative is we could just return the partial len to caller then
we fallback to the original channel orders if it happens.  And then if it
mostly will never happen it'll behave merely the same as what we want.

Well we're trying to deal with a bug where the slow and/or unreliable
network causes channels to arrive in unexpected order. Given we know
we're having network trouble, I wouldn't want to make more assumptions
about things happening correctly.


With regards,
Daniel


Peter, I have seen MSG_PEEK used in combination with MSG_WAITALL, but looks 
like even though chances are less it can still return partial data even with 
multiple retries for signal case, so is not full proof.

*MSG_WAITALL *(since Linux 2.2)
   This flag requests that the operation block until the full
   request is satisfied.  However, the call may still return
   less data than requested if a signal is caught, an error
   or disconnect occurs, or the next data to be received is
   of a different type than that returned.  This flag has no
   effect for datagram sockets.

Actual read ahead will be little hackish, so just confirming we all are in 
agreement to do actual read ahead and i can

[PULL 2/2] target/arm: Use signed quantity to represent VMSAv8-64 translation level

2022-11-22 Thread Peter Maydell

From: Ard Biesheuvel 

The LPA2 extension implements 52-bit virtual addressing for 4k and 16k
translation granules, and for the former, this means an additional level
of translation is needed. This means we start counting at -1 instead of
0 when doing a walk, and so 'level' is now a signed quantity, and should
be typed as such. So turn it from uint32_t into int32_t.

This avoids a level of -1 getting misinterpreted as being >= 3, and
terminating a page table walk prematurely with a bogus output address.

Cc: Peter Maydell 
Cc: Philippe Mathieu-Daudé 
Cc: Richard Henderson 
Signed-off-by: Ard Biesheuvel 
Reviewed-by: Peter Maydell 
Signed-off-by: Peter Maydell 
---
 target/arm/ptw.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index 8ca468d65bc..f812734bfb2 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -1172,7 +1172,7 @@ static bool get_phys_addr_lpae(CPUARMState *env, 
S1Translate *ptw,
 ARMCPU *cpu = env_archcpu(env);
 ARMMMUIdx mmu_idx = ptw->in_mmu_idx;
 bool is_secure = ptw->in_secure;
-uint32_t level;
+int32_t level;
 ARMVAParameters param;
 uint64_t ttbr;
 hwaddr descaddr, indexmask, indexmask_grainsize;
@@ -1302,7 +1302,7 @@ static bool get_phys_addr_lpae(CPUARMState *env, 
S1Translate *ptw,
  */
 uint32_t sl0 = extract32(tcr, 6, 2);
 uint32_t sl2 = extract64(tcr, 33, 1);
-uint32_t startlevel;
+int32_t startlevel;
 bool ok;
 
 /* SL2 is RES0 unless DS=1 & 4kb granule. */
-- 
2.25.1

[PULL 0/2] target-arm queue

2022-11-22 Thread Peter Maydell

Hi; this pull request has a couple of fixes for bugs in
the Arm page-table-walk code, which arrived in the last
day or so.

I'm sending this out now in the hope it might just sneak
in before rc2 gets tagged, so the fixes can get more
testing time before the 7.2 release; but if they don't
make it then this should go into rc3.

thanks
-- PMM

The following changes since commit 6d71357a3b651ec9db126e4862b77e13165427f5:

  rtl8139: honor large send MSS value (2022-11-21 09:28:43 -0500)

are available in the Git repository at:

  https://git.linaro.org/people/pmaydell/qemu-arm.git 
tags/pull-target-arm-20221122

for you to fetch changes up to 15f8f4671afd22491ce99d28a296514717fead4f:

  target/arm: Use signed quantity to represent VMSAv8-64 translation level 
(2022-11-22 16:10:25 +)


target-arm:
 * Fix broken 5-level pagetable handling
 * Fix debug accesses when EL2 is present


Ard Biesheuvel (1):
  target/arm: Use signed quantity to represent VMSAv8-64 translation level

Peter Maydell (1):
  target/arm: Don't do two-stage lookup if stage 2 is disabled

 target/arm/ptw.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

[PULL 1/2] target/arm: Don't do two-stage lookup if stage 2 is disabled

2022-11-22 Thread Peter Maydell

In get_phys_addr_with_struct(), we call get_phys_addr_twostage() if
the CPU supports EL2.  However, we don't check here that stage 2 is
actually enabled.  Instead we only check that inside
get_phys_addr_twostage() to skip stage 2 translation.  This means
that even if stage 2 is disabled we still tell the stage 1 lookup to
do its page table walks via stage 2.

This works by luck for normal CPU accesses, but it breaks for debug
accesses, which are used by the disassembler and also by semihosting
file reads and writes, because the debug case takes a different code
path inside S1_ptw_translate().

This means that setups that use semihosting for file loads are broken
(a regression since 7.1, introduced in recent ptw refactoring), and
that sometimes disassembly in debug logs reports "unable to read
memory" rather than showing the guest insns.

Fix the bug by hoisting the "is stage 2 enabled?" check up to
get_phys_addr_with_struct(), so that we handle S2 disabled the same
way we do the "no EL2" case, with a simple single stage lookup.

Reported-by: Jens Wiklander 
Reviewed-by: Richard Henderson 
Signed-off-by: Peter Maydell 
Message-id: 20221121212404.1450382-1-peter.mayd...@linaro.org
---
 target/arm/ptw.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index 9a6277d862f..8ca468d65bc 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -2612,8 +2612,8 @@ static bool get_phys_addr_twostage(CPUARMState *env, 
S1Translate *ptw,
 
 ret = get_phys_addr_with_struct(env, ptw, address, access_type, result, 
fi);
 
-/* If S1 fails or S2 is disabled, return early.  */
-if (ret || regime_translation_disabled(env, ARMMMUIdx_Stage2, is_secure)) {
+/* If S1 fails, return early.  */
+if (ret) {
 return ret;
 }
 
@@ -2739,7 +2739,8 @@ static bool get_phys_addr_with_struct(CPUARMState *env, 
S1Translate *ptw,
  * Otherwise, a stage1+stage2 translation is just stage 1.
  */
 ptw->in_mmu_idx = mmu_idx = s1_mmu_idx;
-if (arm_feature(env, ARM_FEATURE_EL2)) {
+if (arm_feature(env, ARM_FEATURE_EL2) &&
+!regime_translation_disabled(env, ARMMMUIdx_Stage2, is_secure)) {
 return get_phys_addr_twostage(env, ptw, address, access_type,
   result, fi);
 }
-- 
2.25.1

Re: [PATCH v3 1/2] io: Add support for MSG_PEEK for socket channel

2022-11-22 Thread Peter Xu

On Tue, Nov 22, 2022 at 11:29:05AM -0500, Peter Xu wrote:
> On Tue, Nov 22, 2022 at 11:10:18AM -0500, Peter Xu wrote:
> > On Tue, Nov 22, 2022 at 09:01:59PM +0530, manish.mishra wrote:
> > > 
> > > On 22/11/22 8:19 pm, Daniel P. Berrangé wrote:
> > > > On Tue, Nov 22, 2022 at 09:41:01AM -0500, Peter Xu wrote:
> > > > > On Tue, Nov 22, 2022 at 02:38:53PM +0530, manish.mishra wrote:
> > > > > > On 22/11/22 2:30 pm, Daniel P. Berrangé wrote:
> > > > > > > On Sat, Nov 19, 2022 at 09:36:14AM +, manish.mishra wrote:
> > > > > > > > MSG_PEEK reads from the peek of channel, The data is treated as
> > > > > > > > unread and the next read shall still return this data. This
> > > > > > > > support is currently added only for socket class. Extra 
> > > > > > > > parameter
> > > > > > > > 'flags' is added to io_readv calls to pass extra read flags like
> > > > > > > > MSG_PEEK.
> > > > > > > > 
> > > > > > > > Suggested-by: Daniel P. Berrangé  > > > > > > > Signed-off-by: manish.mishra
> > > > > > > > ---
> > > > > > > >chardev/char-socket.c   |  4 +-
> > > > > > > >include/io/channel.h| 83 
> > > > > > > > +
> > > > > > > >io/channel-buffer.c |  1 +
> > > > > > > >io/channel-command.c|  1 +
> > > > > > > >io/channel-file.c   |  1 +
> > > > > > > >io/channel-null.c   |  1 +
> > > > > > > >io/channel-socket.c | 16 +-
> > > > > > > >io/channel-tls.c|  1 +
> > > > > > > >io/channel-websock.c|  1 +
> > > > > > > >io/channel.c| 73 
> > > > > > > > +++--
> > > > > > > >migration/channel-block.c   |  1 +
> > > > > > > >scsi/qemu-pr-helper.c   |  2 +-
> > > > > > > >tests/qtest/tpm-emu.c   |  2 +-
> > > > > > > >tests/unit/test-io-channel-socket.c |  1 +
> > > > > > > >util/vhost-user-server.c|  2 +-
> > > > > > > >15 files changed, 179 insertions(+), 11 deletions(-)
> > > > > > > 
> > > > > > > > diff --git a/io/channel-socket.c b/io/channel-socket.c
> > > > > > > > index b76dca9cc1..a06b24766d 100644
> > > > > > > > --- a/io/channel-socket.c
> > > > > > > > +++ b/io/channel-socket.c
> > > > > > > > @@ -406,6 +406,8 @@ qio_channel_socket_accept(QIOChannelSocket 
> > > > > > > > *ioc,
> > > > > > > >}
> > > > > > > >#endif /* WIN32 */
> > > > > > > > +qio_channel_set_feature(QIO_CHANNEL(cioc), 
> > > > > > > > QIO_CHANNEL_FEATURE_READ_MSG_PEEK);
> > > > > > > > +
> > > > > > > This covers the incoming server side socket.
> > > > > > > 
> > > > > > > This also needs to be set in outgoing client side socket in
> > > > > > > qio_channel_socket_connect_async
> > > > > > 
> > > > > > Yes sorry, i considered only current use-case, but as it is generic 
> > > > > > one both should be there. Thanks will update it.
> > > > > > 
> > > > > > > > @@ -705,7 +718,6 @@ static ssize_t 
> > > > > > > > qio_channel_socket_writev(QIOChannel *ioc,
> > > > > > > >}
> > > > > > > >#endif /* WIN32 */
> > > > > > > > -
> > > > > > > >#ifdef QEMU_MSG_ZEROCOPY
> > > > > > > >static int qio_channel_socket_flush(QIOChannel *ioc,
> > > > > > > >Error **errp)
> > > > > > > Please remove this unrelated whitespace change.
> > > > > > > 
> > > > > > > 
> > > > > > > > @@ -109,6 +117,37 @@ int qio_channel_readv_all_eof(QIOChannel 
> > > > > > > > *ioc,
> > > > > > > >return qio_channel_readv_full_all_eof(ioc, iov, niov, 
> > > > > > > > NULL, NULL, errp);
> > > > > > > >}
> > > > > > > > +int qio_channel_readv_peek_all_eof(QIOChannel *ioc,
> > > > > > > > +   const struct iovec *iov,
> > > > > > > > +   size_t niov,
> > > > > > > > +   Error **errp)
> > > > > > > > +{
> > > > > > > > +   ssize_t len = 0;
> > > > > > > > +   ssize_t total = iov_size(iov, niov);
> > > > > > > > +
> > > > > > > > +   while (len < total) {
> > > > > > > > +   len = qio_channel_readv_full(ioc, iov, niov, NULL,
> > > > > > > > +NULL, 
> > > > > > > > QIO_CHANNEL_READ_FLAG_MSG_PEEK, errp);
> > > > > > > > +
> > > > > > > > +   if (len == QIO_CHANNEL_ERR_BLOCK) {
> > > > > > > > +if (qemu_in_coroutine()) {
> > > > > > > > +qio_channel_yield(ioc, G_IO_IN);
> > > > > > > > +} else {
> > > > > > > > +qio_channel_wait(ioc, G_IO_IN);
> > > > > > > > +}
> > > > > > > > +continue;
> > > > > > > > +   }
> > > > > > > > +   if (len == 0) {
> > > > > > > > +   return 0;
> > > > > > > > +   }
> > > > > > > > +   if (len < 0) {
> > > > > > > > +   return -1;
> > > > > > > > +   }
> > > > > > > > +   }
> > > > > > > This will busy wait

Re: [PATCH v3 1/2] io: Add support for MSG_PEEK for socket channel

2022-11-22 Thread Peter Xu

On Tue, Nov 22, 2022 at 11:10:18AM -0500, Peter Xu wrote:
> On Tue, Nov 22, 2022 at 09:01:59PM +0530, manish.mishra wrote:
> > 
> > On 22/11/22 8:19 pm, Daniel P. Berrangé wrote:
> > > On Tue, Nov 22, 2022 at 09:41:01AM -0500, Peter Xu wrote:
> > > > On Tue, Nov 22, 2022 at 02:38:53PM +0530, manish.mishra wrote:
> > > > > On 22/11/22 2:30 pm, Daniel P. Berrangé wrote:
> > > > > > On Sat, Nov 19, 2022 at 09:36:14AM +, manish.mishra wrote:
> > > > > > > MSG_PEEK reads from the peek of channel, The data is treated as
> > > > > > > unread and the next read shall still return this data. This
> > > > > > > support is currently added only for socket class. Extra parameter
> > > > > > > 'flags' is added to io_readv calls to pass extra read flags like
> > > > > > > MSG_PEEK.
> > > > > > > 
> > > > > > > Suggested-by: Daniel P. Berrangé  > > > > > > Signed-off-by: manish.mishra
> > > > > > > ---
> > > > > > >chardev/char-socket.c   |  4 +-
> > > > > > >include/io/channel.h| 83 
> > > > > > > +
> > > > > > >io/channel-buffer.c |  1 +
> > > > > > >io/channel-command.c|  1 +
> > > > > > >io/channel-file.c   |  1 +
> > > > > > >io/channel-null.c   |  1 +
> > > > > > >io/channel-socket.c | 16 +-
> > > > > > >io/channel-tls.c|  1 +
> > > > > > >io/channel-websock.c|  1 +
> > > > > > >io/channel.c| 73 
> > > > > > > +++--
> > > > > > >migration/channel-block.c   |  1 +
> > > > > > >scsi/qemu-pr-helper.c   |  2 +-
> > > > > > >tests/qtest/tpm-emu.c   |  2 +-
> > > > > > >tests/unit/test-io-channel-socket.c |  1 +
> > > > > > >util/vhost-user-server.c|  2 +-
> > > > > > >15 files changed, 179 insertions(+), 11 deletions(-)
> > > > > > 
> > > > > > > diff --git a/io/channel-socket.c b/io/channel-socket.c
> > > > > > > index b76dca9cc1..a06b24766d 100644
> > > > > > > --- a/io/channel-socket.c
> > > > > > > +++ b/io/channel-socket.c
> > > > > > > @@ -406,6 +406,8 @@ qio_channel_socket_accept(QIOChannelSocket 
> > > > > > > *ioc,
> > > > > > >}
> > > > > > >#endif /* WIN32 */
> > > > > > > +qio_channel_set_feature(QIO_CHANNEL(cioc), 
> > > > > > > QIO_CHANNEL_FEATURE_READ_MSG_PEEK);
> > > > > > > +
> > > > > > This covers the incoming server side socket.
> > > > > > 
> > > > > > This also needs to be set in outgoing client side socket in
> > > > > > qio_channel_socket_connect_async
> > > > > 
> > > > > Yes sorry, i considered only current use-case, but as it is generic 
> > > > > one both should be there. Thanks will update it.
> > > > > 
> > > > > > > @@ -705,7 +718,6 @@ static ssize_t 
> > > > > > > qio_channel_socket_writev(QIOChannel *ioc,
> > > > > > >}
> > > > > > >#endif /* WIN32 */
> > > > > > > -
> > > > > > >#ifdef QEMU_MSG_ZEROCOPY
> > > > > > >static int qio_channel_socket_flush(QIOChannel *ioc,
> > > > > > >Error **errp)
> > > > > > Please remove this unrelated whitespace change.
> > > > > > 
> > > > > > 
> > > > > > > @@ -109,6 +117,37 @@ int qio_channel_readv_all_eof(QIOChannel 
> > > > > > > *ioc,
> > > > > > >return qio_channel_readv_full_all_eof(ioc, iov, niov, 
> > > > > > > NULL, NULL, errp);
> > > > > > >}
> > > > > > > +int qio_channel_readv_peek_all_eof(QIOChannel *ioc,
> > > > > > > +   const struct iovec *iov,
> > > > > > > +   size_t niov,
> > > > > > > +   Error **errp)
> > > > > > > +{
> > > > > > > +   ssize_t len = 0;
> > > > > > > +   ssize_t total = iov_size(iov, niov);
> > > > > > > +
> > > > > > > +   while (len < total) {
> > > > > > > +   len = qio_channel_readv_full(ioc, iov, niov, NULL,
> > > > > > > +NULL, 
> > > > > > > QIO_CHANNEL_READ_FLAG_MSG_PEEK, errp);
> > > > > > > +
> > > > > > > +   if (len == QIO_CHANNEL_ERR_BLOCK) {
> > > > > > > +if (qemu_in_coroutine()) {
> > > > > > > +qio_channel_yield(ioc, G_IO_IN);
> > > > > > > +} else {
> > > > > > > +qio_channel_wait(ioc, G_IO_IN);
> > > > > > > +}
> > > > > > > +continue;
> > > > > > > +   }
> > > > > > > +   if (len == 0) {
> > > > > > > +   return 0;
> > > > > > > +   }
> > > > > > > +   if (len < 0) {
> > > > > > > +   return -1;
> > > > > > > +   }
> > > > > > > +   }
> > > > > > This will busy wait burning CPU where there is a read > 0 and < 
> > > > > > total.
> > > > > > 
> > > > > Daniel, i could use MSG_WAITALL too if that works but then we will 
> > > > > lose opportunity to yield. Or if you have some other idea.
> > > > How easy would this happen?
> > > > 
> > > >

Re: [PATCH for 7.2?] vhost: fix vq dirt bitmap syncing when vIOMMU is enabled

2022-11-22 Thread Eric Auger

Hi,

On 11/22/22 10:43, Michael S. Tsirkin wrote:
> On Tue, Nov 22, 2022 at 11:01:11AM +0800, Jason Wang wrote:
>> When vIOMMU is enabled, the vq->used_phys is actually the IOVA not
>> GPA. So we need to translate it to GPA before the syncing otherwise we
>> may hit the following crash since IOVA could be out of the scope of
>> the GPA log size. This could be noted when using virtio-IOMMU with
>> vhost using 1G memory.
> Noted how exactly? What does "using 1G memory" mean?

We hit the following assertion:

qemu-system-x86_64: ../hw/virtio/vhost.c:85: vhost_dev_sync_region: Assertion 
`end / VHOST_LOG_CHUNK < dev->log_size' failed.

On the last time vhost_get_log_size() is called it takes into account 2 regions 
when computing the log_size:
qemu-system-x86_64: vhost_get_log_size region 0 last=0x9 updated 
log_size=0x3
qemu-system-x86_64: vhost_get_log_size region 1 last=0x3fff updated 
log_size=0x1000
so in vhost_migration_log() vhost_get_log_size(dev) returns 0x1000

In the test case, memory_region_sync_dirty_bitmap() gets called for 
mem-machine_mem, vga.vram (several times) and eventually on pc.bios. This 
latter is reponsible for the assertion:

qemu-system-x86_64: vhost_log_sync calls sync_dirty_map on pc.bios for the full 
range
qemu-system-x86_64: vhost_sync_dirty_bitmap calls vhost_dev_sync_region on 
region 0
qemu-system-x86_64: vhost_dev_sync_region end=0x9 < start=0xfffc
qemu-system-x86_64: vhost_sync_dirty_bitmap calls vhost_dev_sync_region on 
region 1
qemu-system-x86_64: vhost_dev_sync_region end=0x3fff < start=0xfffc
qemu-system-x86_64: vhost_sync_dirty_bitmap calls vhost_dev_sync_region on vq 0 
<-
qemu-system-x86_64: vhost_dev_sync_region pc.bios mfirst=0xfffc 
mlast=0x rfirst=0xf240 rlast=0xfa45
qemu-system-x86_64: vhost_dev_sync_region pc.bios end=0xfa45 
VHOST_LOG_CHUNK=0x4 end/VHOST_LOG_CHUNK=0x3fff dev->log_size=0x1000
qemu-system-x86_64: ../hw/virtio/vhost.c:85: vhost_dev_sync_region: Assertion 
`end / VHOST_LOG_CHUNK < dev->log_size' failed.



"using 1G memory": We hit the issue with a guest started with 1GB initial RAM.


>
>> Fixes: c471ad0e9bd46 ("vhost_net: device IOTLB support")
>> Cc: qemu-sta...@nongnu.org
>> Reported-by: Yalan Zhang 
>> Tested-by: Eric Auger 
>> Tested-by: Lei Yang 
>> Signed-off-by: Jason Wang 
>> ---
>>  hw/virtio/vhost.c | 65 ---
>>  1 file changed, 45 insertions(+), 20 deletions(-)
>>
>> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
>> index d1c4c20b8c..26b319f34e 100644
>> --- a/hw/virtio/vhost.c
>> +++ b/hw/virtio/vhost.c
>> @@ -106,11 +106,30 @@ static void vhost_dev_sync_region(struct vhost_dev 
>> *dev,
>>  }
>>  }
>>  
>> +static bool vhost_dev_has_iommu(struct vhost_dev *dev)
>> +{
>> +VirtIODevice *vdev = dev->vdev;
>> +
>> +/*
>> + * For vhost, VIRTIO_F_IOMMU_PLATFORM means the backend support
>> + * incremental memory mapping API via IOTLB API. For platform that
>> + * does not have IOMMU, there's no need to enable this feature
>> + * which may cause unnecessary IOTLB miss/update transactions.
>> + */
>> +if (vdev) {
>> +return virtio_bus_device_iommu_enabled(vdev) &&
>> +virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM);
>> +} else {
>> +return false;
>> +}
>> +}
>> +
>>  static int vhost_sync_dirty_bitmap(struct vhost_dev *dev,
>> MemoryRegionSection *section,
>> hwaddr first,
>> hwaddr last)
>>  {
>> +IOMMUTLBEntry iotlb;
> why don't we move this inside the scope where it's used?
>
>>  int i;
>>  hwaddr start_addr;
>>  hwaddr end_addr;
>> @@ -132,13 +151,37 @@ static int vhost_sync_dirty_bitmap(struct vhost_dev 
>> *dev,
>>  }
>>  for (i = 0; i < dev->nvqs; ++i) {
>>  struct vhost_virtqueue *vq = dev->vqs + i;
>> +hwaddr used_phys = vq->used_phys, used_size = vq->used_size;
>> +hwaddr phys, s;
> these two, too.
>
>>  
>>  if (!vq->used_phys && !vq->used_size) {
>>  continue;
>>  }
>>  
>> -vhost_dev_sync_region(dev, section, start_addr, end_addr, 
>> vq->used_phys,
>> -  range_get_last(vq->used_phys, vq->used_size));
>> +if (vhost_dev_has_iommu(dev)) {
>> +while (used_size) {
>> +rcu_read_lock();
>> +iotlb = address_space_get_iotlb_entry(dev->vdev->dma_as,
>> +  used_phys,
>> +  true, 
>> MEMTXATTRS_UNSPECIFIED);
>> +rcu_read_unlock();
>> +
>> +if (iotlb.target_as == NULL) {
>> +return -EINVAL;
> I am not sure how this can trigger. I don't like == NULL:
> !iotlb.target_as is more succint. But a bigger question is how to
> handle this. callers

Re: [PATCH v2] target/arm: Use signed quantity to represent VMSAv8-64 translation level

2022-11-22 Thread Peter Maydell

On Tue, 22 Nov 2022 at 15:55, Ard Biesheuvel  wrote:
>
> The LPA2 extension implements 52-bit virtual addressing for 4k and 16k
> translation granules, and for the former, this means an additional level
> of translation is needed. This means we start counting at -1 instead of
> 0 when doing a walk, and so 'level' is now a signed quantity, and should
> be typed as such. So turn it from uint32_t into int32_t.
>
> This avoids a level of -1 getting misinterpreted as being >= 3, and
> terminating a page table walk prematurely with a bogus output address.
>
> Cc: Peter Maydell 
> Cc: Philippe Mathieu-Daudé 
> Cc: Richard Henderson 
> Signed-off-by: Ard Biesheuvel 



Applied to target-arm.next for 7.2, thanks.

-- PMM

Re: [PATCH v3 1/2] io: Add support for MSG_PEEK for socket channel

2022-11-22 Thread Peter Xu

On Tue, Nov 22, 2022 at 09:01:59PM +0530, manish.mishra wrote:
> 
> On 22/11/22 8:19 pm, Daniel P. Berrangé wrote:
> > On Tue, Nov 22, 2022 at 09:41:01AM -0500, Peter Xu wrote:
> > > On Tue, Nov 22, 2022 at 02:38:53PM +0530, manish.mishra wrote:
> > > > On 22/11/22 2:30 pm, Daniel P. Berrangé wrote:
> > > > > On Sat, Nov 19, 2022 at 09:36:14AM +, manish.mishra wrote:
> > > > > > MSG_PEEK reads from the peek of channel, The data is treated as
> > > > > > unread and the next read shall still return this data. This
> > > > > > support is currently added only for socket class. Extra parameter
> > > > > > 'flags' is added to io_readv calls to pass extra read flags like
> > > > > > MSG_PEEK.
> > > > > > 
> > > > > > Suggested-by: Daniel P. Berrangé  > > > > > Signed-off-by: manish.mishra
> > > > > > ---
> > > > > >chardev/char-socket.c   |  4 +-
> > > > > >include/io/channel.h| 83 
> > > > > > +
> > > > > >io/channel-buffer.c |  1 +
> > > > > >io/channel-command.c|  1 +
> > > > > >io/channel-file.c   |  1 +
> > > > > >io/channel-null.c   |  1 +
> > > > > >io/channel-socket.c | 16 +-
> > > > > >io/channel-tls.c|  1 +
> > > > > >io/channel-websock.c|  1 +
> > > > > >io/channel.c| 73 
> > > > > > +++--
> > > > > >migration/channel-block.c   |  1 +
> > > > > >scsi/qemu-pr-helper.c   |  2 +-
> > > > > >tests/qtest/tpm-emu.c   |  2 +-
> > > > > >tests/unit/test-io-channel-socket.c |  1 +
> > > > > >util/vhost-user-server.c|  2 +-
> > > > > >15 files changed, 179 insertions(+), 11 deletions(-)
> > > > > 
> > > > > > diff --git a/io/channel-socket.c b/io/channel-socket.c
> > > > > > index b76dca9cc1..a06b24766d 100644
> > > > > > --- a/io/channel-socket.c
> > > > > > +++ b/io/channel-socket.c
> > > > > > @@ -406,6 +406,8 @@ qio_channel_socket_accept(QIOChannelSocket *ioc,
> > > > > >}
> > > > > >#endif /* WIN32 */
> > > > > > +qio_channel_set_feature(QIO_CHANNEL(cioc), 
> > > > > > QIO_CHANNEL_FEATURE_READ_MSG_PEEK);
> > > > > > +
> > > > > This covers the incoming server side socket.
> > > > > 
> > > > > This also needs to be set in outgoing client side socket in
> > > > > qio_channel_socket_connect_async
> > > > 
> > > > Yes sorry, i considered only current use-case, but as it is generic one 
> > > > both should be there. Thanks will update it.
> > > > 
> > > > > > @@ -705,7 +718,6 @@ static ssize_t 
> > > > > > qio_channel_socket_writev(QIOChannel *ioc,
> > > > > >}
> > > > > >#endif /* WIN32 */
> > > > > > -
> > > > > >#ifdef QEMU_MSG_ZEROCOPY
> > > > > >static int qio_channel_socket_flush(QIOChannel *ioc,
> > > > > >Error **errp)
> > > > > Please remove this unrelated whitespace change.
> > > > > 
> > > > > 
> > > > > > @@ -109,6 +117,37 @@ int qio_channel_readv_all_eof(QIOChannel *ioc,
> > > > > >return qio_channel_readv_full_all_eof(ioc, iov, niov, NULL, 
> > > > > > NULL, errp);
> > > > > >}
> > > > > > +int qio_channel_readv_peek_all_eof(QIOChannel *ioc,
> > > > > > +   const struct iovec *iov,
> > > > > > +   size_t niov,
> > > > > > +   Error **errp)
> > > > > > +{
> > > > > > +   ssize_t len = 0;
> > > > > > +   ssize_t total = iov_size(iov, niov);
> > > > > > +
> > > > > > +   while (len < total) {
> > > > > > +   len = qio_channel_readv_full(ioc, iov, niov, NULL,
> > > > > > +NULL, 
> > > > > > QIO_CHANNEL_READ_FLAG_MSG_PEEK, errp);
> > > > > > +
> > > > > > +   if (len == QIO_CHANNEL_ERR_BLOCK) {
> > > > > > +if (qemu_in_coroutine()) {
> > > > > > +qio_channel_yield(ioc, G_IO_IN);
> > > > > > +} else {
> > > > > > +qio_channel_wait(ioc, G_IO_IN);
> > > > > > +}
> > > > > > +continue;
> > > > > > +   }
> > > > > > +   if (len == 0) {
> > > > > > +   return 0;
> > > > > > +   }
> > > > > > +   if (len < 0) {
> > > > > > +   return -1;
> > > > > > +   }
> > > > > > +   }
> > > > > This will busy wait burning CPU where there is a read > 0 and < total.
> > > > > 
> > > > Daniel, i could use MSG_WAITALL too if that works but then we will lose 
> > > > opportunity to yield. Or if you have some other idea.
> > > How easy would this happen?
> > > 
> > > Another alternative is we could just return the partial len to caller then
> > > we fallback to the original channel orders if it happens.  And then if it
> > > mostly will never happen it'll behave merely the same as what we want.
> > Well we're trying to deal with a bug where the slow and/or unreliable

Re: [PATCH] vhost-vdpa: skip TPM CRB memory section

2022-11-22 Thread Eric Auger

Hi Marc-André,
On 11/22/22 15:53, marcandre.lur...@redhat.com wrote:
> From: Marc-André Lureau 
>
> 851d6d1a0f ("vfio/common: remove spurious tpm-crb-cmd misalignment
> warning") removed the warning on vfio_listener_region_add() path.
>
> An error is reported for vhost-vdpa case:
> qemu-kvm: vhost_vdpa_listener_region_add received unaligned region
>
> Skip the CRB device.
>
> Fixes:
> https://bugzilla.redhat.com/show_bug.cgi?id=2141965
>
> Signed-off-by: Marc-André Lureau 
Reviewed-by: Eric Auger 

Eric
> ---
>  hw/virtio/vhost-vdpa.c | 6 ++
>  1 file changed, 6 insertions(+)
>
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index 7468e44b87..9d7206e4b8 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -19,6 +19,7 @@
>  #include "hw/virtio/virtio-net.h"
>  #include "hw/virtio/vhost-shadow-virtqueue.h"
>  #include "hw/virtio/vhost-vdpa.h"
> +#include "sysemu/tpm.h"
>  #include "exec/address-spaces.h"
>  #include "migration/blocker.h"
>  #include "qemu/cutils.h"
> @@ -46,6 +47,11 @@ static bool 
> vhost_vdpa_listener_skipped_section(MemoryRegionSection *section,
>  {
>  Int128 llend;
>  
> +if (TPM_IS_CRB(section->mr->owner)) {
> +/* The CRB command buffer has its base address unaligned. */
> +return true;
> +}
> +
>  if ((!memory_region_is_ram(section->mr) &&
>   !memory_region_is_iommu(section->mr)) ||
>  memory_region_is_protected(section->mr) ||

[PATCH] target/riscv: Dump sstatus CSR in riscv_cpu_dump_state()

2022-11-22 Thread Bin Meng

sstatus register dump is currently missing in riscv_cpu_dump_state().

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1332
Signed-off-by: Bin Meng 

---

 target/riscv/cpu.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index d14e95c9dc..80d76f0181 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -382,6 +382,7 @@ static void riscv_cpu_dump_state(CPUState *cs, FILE *f, int 
flags)
 CSR_MHARTID,
 CSR_MSTATUS,
 CSR_MSTATUSH,
+CSR_SSTATUS,
 CSR_HSTATUS,
 CSR_VSSTATUS,
 CSR_MIP,
-- 
2.34.1

Re: [PATCH] multifd: Updated QAPI format for 'migrate' qemu monitor command

2022-11-22 Thread manish.mishra




On 22/11/22 2:53 pm, Daniel P. Berrangé wrote:

On Mon, Nov 21, 2022 at 01:40:27PM +0100, Juan Quintela wrote:

Het Gala  wrote:

To prevent double data encoding of uris, instead of passing transport
mechanisms, host address and port all together in form of a single string
and writing different parsing functions, we intend the user to explicitly
mention most of the parameters through the medium of qmp command itself.

The current patch is continuation of patchset series
https://urldefense.proofpoint.com/v2/url?u=https-3A__www.mail-2Darchive.com_qemu-2Ddevel-40nongnu.org_msg901274.html=DwIBaQ=s883GpUCOChKOHiocYtGcg=c4KON2DiMd-szjwjggQcuUvTsPWblztAL0gVzaHnNmc=htOVZmDRCu27EhvYDu-28snN6GV0a6-ffX34joSwBkELLLVGHzRqD0LicZed3Xtw=xvzrWRBN4S5l3orPqu6di0MRq-gSGWZZU6-e7wpn8w4=
and reply to the ongoing discussion for better QAPI design here
https://urldefense.proofpoint.com/v2/url?u=https-3A__www.mail-2Darchive.com_qemu-2Ddevel-40nongnu.org_msg903753.html=DwIBaQ=s883GpUCOChKOHiocYtGcg=c4KON2DiMd-szjwjggQcuUvTsPWblztAL0gVzaHnNmc=htOVZmDRCu27EhvYDu-28snN6GV0a6-ffX34joSwBkELLLVGHzRqD0LicZed3Xtw=anLqZbhqa73P9SyPUaGk5q8R0SYR6IUtH_FWFML3lZY=
 .

Suggested-by: Daniel P. Berrange 
Suggested-by: Aravind Retnakaran 
Suggested-by: Manish Mishra 
Signed-off-by: Het Gala 





+{ 'struct': 'MigrateChannel',
+  'data' : {
+'channeltype' : 'MigrateChannelType',
+'*src-addr' : 'MigrateAddress',
+'dest-addr' : 'MigrateAddress',

Why do we want *both* addresses?

This is part of their goal to have source based routing, so that
traffic exits the src host via a particular NIC.

I think this patch would be better if we split it into two parts.

First introduce "MigrateChannel" struct with *only* the 'dest-addr'
field, and only allow one element in the list, making 'uri' optional.

Basically the first patch would *only* be about switching the
'migrate' command from using a plain string to a MigrateAddress
structure.

A second patch would then extend it to allow multiple MigrateChannel
elements, to support different destinations for each channel.

A third patch would then  extend it to add src-addr to attain the
source based routing.

A fourth patch can then deprecate the existing 'uri' field.


+'*multifd-count' : 'int' } }

And if we are passing a list, why do we want to pass the real number?

Yeah, technically I think this field is redundant, as you can just
add many entires to the 'channels' list, using the same addresses.
All this field does is allow a more compact JSON arg list. I'm
not sure this is neccessary, unless we're expecting huge numbers
of 'channels', and even then this isn't likely to be a performance
issue.


  # -> { "execute": "migrate", "arguments": { "uri": "tcp:0:4446" } }
  # <- { "return": {} }
  #
+# -> { "execute": "migrate",
+#  "arguments": {
+#  "channels": [ { 'channeltype': 'control',
+#  'dest-addr': {'transport': 'socket',
+#'type': 'inet',
+#'host': '10.12.34.9', 'port': 
'1050'}},
+#{ 'channeltype': 'data',
+#  'src-addr': {'transport': 'socket',
+#'type': 'inet',
+#'host': '10.12.34.9',
+#'port': '4000', 'ipv4': 'true'},
+#  'dest-addr': { 'transport': 'socket',
+#  'type': 'inet',
+#  'host': '10.12.34.92',
+#  'port': '1234', 'ipv4': 'true'},
+#  'multifd-count': 5 },
+#{ 'channeltype': 'data',
+#  'src-addr': {'transport': 'socket',
+#'type': 'inet',
+#'host': '10.2.3.4', 'port': '1000'},
+#  'dest-addr': {'transport': 'socket',
+# 'type': 'inet',
+# 'host': '0.0.0.4', 'port': '3000'},
+#  'multifd-count': 3 } ] } }
+# <- { "return": {} }
+#
  ##
  { 'command': 'migrate',
-  'data': {'uri': 'str', '*blk': 'bool', '*inc': 'bool',
-   '*detach': 'bool', '*resume': 'bool' } }
+  'data': {'*uri': 'str', '*channels': ['MigrateChannel'], '*blk': 'bool',

I think that "uri" bit should be dropped, right?

No, it is required for back compatibility with existing clients.
It should be marked deprecated, and removed several releases later,
at which point 'chanels' can become mandatory instead of optional.


+   '*inc': 'bool', '*detach': 'bool', '*resume': 'bool' } }
  
  ##

  # @migrate-incoming:

I can't see how to make the old one to work on top of this one (i.e. we
would have to create strings from lists on QAPI, I think that is just
too much).

All we need is a piece of code

Re: Plugin Memory Callback Debugging

2022-11-22 Thread Aaron Lindsay via

On Nov 21 18:22, Richard Henderson wrote:
> On 11/21/22 13:51, Alex Bennée wrote:
> > 
> > Aaron Lindsay  writes:
> > 
> > > On Nov 15 22:36, Alex Bennée wrote:
> > > > Aaron Lindsay  writes:
> > > > > I believe the code *should* always reset `cpu->plugin_mem_cbs` to 
> > > > > NULL at the
> > > > > end of an instruction/TB's execution, so its not exactly clear to me 
> > > > > how this
> > > > > is occurring. However, I suspect it may be relevant that we are 
> > > > > calling
> > > > > `free_dyn_cb_arr()` because my plugin called `qemu_plugin_reset()`.
> > > > 
> > > > Hmm I'm going to have to remind myself about how this bit works.
> > > 
> > > When is it expected that cpu->plugin_mem_cbs is reset to NULL if it is
> > > set for an instruction? Is it guaranteed it is reset by the end of the
> > > tb?
> > 
> > It should be by the end of the instruction. See
> > inject_mem_disable_helper() which inserts TCG code to disable the
> > helpers. We also have plugin_gen_disable_mem_helpers() which should
> > catch every exit out of a block (exit_tb, goto_tb, goto_ptr). That is
> > why qemu_plugin_disable_mem_helpers() is only really concerned about
> > when we longjmp out of the loop.
> > 
> > > If I were to put an assertion in cpu_tb_exec() just after the call
> > > to tcg_qemu_tb_exec(), should cpu->plugin_mem_cbs always be NULL
> > > there?
> > 
> > Yes I think so.
> 
> Indeed.

Well, the good news is that if this is an assumption we're relying on, it is
now trivial to reproduce the problem!

Compile some simple program (doesn't really matter, the issue gets triggered
early):

$ echo "int main() { return 0; }" > simple.c && gcc simple.c -o simple

Make this change to cpu_tb_exec():

> diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
> index 356fe348de..50a010327d 100644
> --- a/accel/tcg/cpu-exec.c
> +++ b/accel/tcg/cpu-exec.c
> @@ -436,6 +436,9 @@ cpu_tb_exec(CPUState *cpu, TranslationBlock *itb, int 
> *tb_exit)
> 
>  qemu_thread_jit_execute();
>  ret = tcg_qemu_tb_exec(env, tb_ptr);
> +if (cpu->plugin_mem_cbs != NULL) {
> +g_assert_not_reached();
> +}
>  cpu->can_do_io = 1;
>  /*
>   * TODO: Delay swapping back to the read-write region of the TB

And run:

$ ./build/qemu-aarch64 -plugin contrib/plugins/libexeclog.so -d plugin ./simple

You should fairly quickly see something like:

> [snip]
> 0, 0x5502814d04, 0xb482, ""
> 0, 0x5502814d08, 0xf9400440, "", load, 0x5502844ed0
> 0, 0x5502814d0c, 0xf1001c1f, ""
> **
> ERROR:../accel/tcg/cpu-exec.c:440:cpu_tb_exec: code should not be reached
> Bail out! ERROR:../accel/tcg/cpu-exec.c:440:cpu_tb_exec: code should not be 
> reached

When digging through my other failure in `rr` I saw the cpu->plugin_mem_cbs
pointer changing from one non-null value to another (which also seems to
indicate it is not being cleared between instructions).

Does this hint that there are cases where reset cpu->plugin_mem_cbs to NULL is
getting optimized away, but not the code to set it in the first place?

-Aaron

[PATCH v2] target/arm: Use signed quantity to represent VMSAv8-64 translation level

2022-11-22 Thread Ard Biesheuvel

The LPA2 extension implements 52-bit virtual addressing for 4k and 16k
translation granules, and for the former, this means an additional level
of translation is needed. This means we start counting at -1 instead of
0 when doing a walk, and so 'level' is now a signed quantity, and should
be typed as such. So turn it from uint32_t into int32_t.

This avoids a level of -1 getting misinterpreted as being >= 3, and
terminating a page table walk prematurely with a bogus output address.

Cc: Peter Maydell 
Cc: Philippe Mathieu-Daudé 
Cc: Richard Henderson 
Signed-off-by: Ard Biesheuvel 
---
 target/arm/ptw.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index 9a6277d862fac229..1d9bb4448761ddf4 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -1172,7 +1172,7 @@ static bool get_phys_addr_lpae(CPUARMState *env, 
S1Translate *ptw,
 ARMCPU *cpu = env_archcpu(env);
 ARMMMUIdx mmu_idx = ptw->in_mmu_idx;
 bool is_secure = ptw->in_secure;
-uint32_t level;
+int32_t level;
 ARMVAParameters param;
 uint64_t ttbr;
 hwaddr descaddr, indexmask, indexmask_grainsize;
@@ -1302,7 +1302,7 @@ static bool get_phys_addr_lpae(CPUARMState *env, 
S1Translate *ptw,
  */
 uint32_t sl0 = extract32(tcr, 6, 2);
 uint32_t sl2 = extract64(tcr, 33, 1);
-uint32_t startlevel;
+int32_t startlevel;
 bool ok;
 
 /* SL2 is RES0 unless DS=1 & 4kb granule. */
-- 
2.35.1

Re: [PATCH] target/arm: Use signed quantity to represent VMSAv8-64 translation level

2022-11-22 Thread Ard Biesheuvel

On Tue, 22 Nov 2022 at 14:21, Peter Maydell  wrote:
>
> On Mon, 21 Nov 2022 at 19:02, Ard Biesheuvel  wrote:
> >
> > On Mon, 21 Nov 2022 at 19:51, Peter Maydell  
> > wrote:
> > >
> > > On Mon, 21 Nov 2022 at 17:43, Ard Biesheuvel  wrote:
> > > >
> > > > The LPA2 extension implements 52-bit virtual addressing for 4k and 16k
> > > > translation granules, and for the former, this means an additional level
> > > > of translation is needed. This means we start counting at -1 instead of
> > > > 0 when doing a walk, and so 'level' is now a signed quantity, and should
> > > > be typed as such. So turn it from uint32_t into int32_t.
> > > >
> > >
> > > Does this cause any visible wrong behaviour, or is it just
> > > a cleanup thing ?
> > >
> >
> > No, 5 level paging is completely broken because of this, given that
> > the 'level < 3' tests give the wrong result for (uint32_t)-1
>
> Right, thanks. This seems like a bug worth fixing for 7.2.
>

Indeed. And the other patch I sent is needed too if you want to run with LPA2

'target/arm: Limit LPA2 effective output address when TCR.DS == 0'

In case it is useful, I have a WIP kernel branch here which can be
built with 52-bit virtual addressing for 4k or 16k pages.

https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/log/?h=arm64-4k-lpa2


> We should make 'uint32_t startlevel' also an int32_t
> for consistency, I think, given that it is also sometimes
> negative, though in that case it doesn't get used in any
> comparisons so it's not going to cause wrong behaviour.
>

Indeed. I'll send a v2 and fold that in.

Re: [PATCH v2 1/2] cleanup: Tweak and re-run return_directly.cocci

2022-11-22 Thread Markus Armbruster

BALATON Zoltan  writes:

> On Tue, 22 Nov 2022, Markus Armbruster wrote:
>> Tweak the semantic patch to drop redundant parenthesis around the
>> return expression.
>>
>> Coccinelle drops a comment in hw/rdma/vmw/pvrdma_cmd.c; restored
>> manually.
>>
>> Coccinelle messes up vmdk_co_create(), not sure why.  Change dropped,
>> will be done manually in the next commit.
>>
>> Line breaks in target/avr/cpu.h and hw/rdma/vmw/pvrdma_cmd.c tidied up
>> manually.
>>
>> Whitespace in tools/virtiofsd/fuse_lowlevel.c tidied up manually.
>>
>> checkpatch.pl complains "return of an errno should typically be -ve"
>> two times for hw/9pfs/9p-synth.c.  Preexisting, the patch merely makes
>> it visible to checkpatch.pl.
>>
>> Signed-off-by: Markus Armbruster 

[...]

>> diff --git a/hw/ppc/ppc4xx_sdram.c b/hw/ppc/ppc4xx_sdram.c
>> index 8d7137faf3..54bf9a2b44 100644
>> --- a/hw/ppc/ppc4xx_sdram.c
>> +++ b/hw/ppc/ppc4xx_sdram.c
>> @@ -520,13 +520,10 @@ static inline hwaddr sdram_ddr2_base(uint32_t bcr)
>>
>> static hwaddr sdram_ddr2_size(uint32_t bcr)
>> {
>> -hwaddr size;
>>  int sh;
>>
>>  sh = 1024 - ((bcr >> 6) & 0x3ff);
>> -size = 8 * MiB * sh;
>> -
>> -return size;
>> +return 8 * MiB * sh;
>> }
>>
>> static uint32_t sdram_ddr2_dcr_read(void *opaque, int dcrn)
>
> There's also an sdram_ddr_size() that's similar and could be changed to
>
> return sh == 7 ? -1 : (4 * MiB) << sh;
>
> just to keep these two functions simliar but Coccinelle probably does not 
> catch that. Also while you're at it the assigmment of sh could be moved to 
> the declaration to save even more lines. As this then becomes more of a 
> handwritten patch, maybe it should be a separate patch cleaning these two 
> functions before the rest.

I think it needs to be separate to keep me off Peter's naughty list ;)

> Otherwise for this part (or separate patch as above):
>
> Reviewed-by: BALATON Zoltan 

Thanks!

Re: [PATCH v3 1/2] io: Add support for MSG_PEEK for socket channel

2022-11-22 Thread manish.mishra



On 22/11/22 8:19 pm, Daniel P. Berrangé wrote:

On Tue, Nov 22, 2022 at 09:41:01AM -0500, Peter Xu wrote:

On Tue, Nov 22, 2022 at 02:38:53PM +0530, manish.mishra wrote:

On 22/11/22 2:30 pm, Daniel P. Berrangé wrote:

On Sat, Nov 19, 2022 at 09:36:14AM +, manish.mishra wrote:

MSG_PEEK reads from the peek of channel, The data is treated as
unread and the next read shall still return this data. This
support is currently added only for socket class. Extra parameter
'flags' is added to io_readv calls to pass extra read flags like
MSG_PEEK.

Suggested-by: Daniel P. Berrangé 
---
   chardev/char-socket.c   |  4 +-
   include/io/channel.h| 83 +
   io/channel-buffer.c |  1 +
   io/channel-command.c|  1 +
   io/channel-file.c   |  1 +
   io/channel-null.c   |  1 +
   io/channel-socket.c | 16 +-
   io/channel-tls.c|  1 +
   io/channel-websock.c|  1 +
   io/channel.c| 73 +++--
   migration/channel-block.c   |  1 +
   scsi/qemu-pr-helper.c   |  2 +-
   tests/qtest/tpm-emu.c   |  2 +-
   tests/unit/test-io-channel-socket.c |  1 +
   util/vhost-user-server.c|  2 +-
   15 files changed, 179 insertions(+), 11 deletions(-)



diff --git a/io/channel-socket.c b/io/channel-socket.c
index b76dca9cc1..a06b24766d 100644
--- a/io/channel-socket.c
+++ b/io/channel-socket.c
@@ -406,6 +406,8 @@ qio_channel_socket_accept(QIOChannelSocket *ioc,
   }
   #endif /* WIN32 */
+qio_channel_set_feature(QIO_CHANNEL(cioc), 
QIO_CHANNEL_FEATURE_READ_MSG_PEEK);
+

This covers the incoming server side socket.

This also needs to be set in outgoing client side socket in
qio_channel_socket_connect_async


Yes sorry, i considered only current use-case, but as it is generic one both 
should be there. Thanks will update it.


@@ -705,7 +718,6 @@ static ssize_t qio_channel_socket_writev(QIOChannel *ioc,
   }
   #endif /* WIN32 */
-
   #ifdef QEMU_MSG_ZEROCOPY
   static int qio_channel_socket_flush(QIOChannel *ioc,
   Error **errp)

Please remove this unrelated whitespace change.



@@ -109,6 +117,37 @@ int qio_channel_readv_all_eof(QIOChannel *ioc,
   return qio_channel_readv_full_all_eof(ioc, iov, niov, NULL, NULL, errp);
   }
+int qio_channel_readv_peek_all_eof(QIOChannel *ioc,
+   const struct iovec *iov,
+   size_t niov,
+   Error **errp)
+{
+   ssize_t len = 0;
+   ssize_t total = iov_size(iov, niov);
+
+   while (len < total) {
+   len = qio_channel_readv_full(ioc, iov, niov, NULL,
+NULL, QIO_CHANNEL_READ_FLAG_MSG_PEEK, 
errp);
+
+   if (len == QIO_CHANNEL_ERR_BLOCK) {
+if (qemu_in_coroutine()) {
+qio_channel_yield(ioc, G_IO_IN);
+} else {
+qio_channel_wait(ioc, G_IO_IN);
+}
+continue;
+   }
+   if (len == 0) {
+   return 0;
+   }
+   if (len < 0) {
+   return -1;
+   }
+   }

This will busy wait burning CPU where there is a read > 0 and < total.


Daniel, i could use MSG_WAITALL too if that works but then we will lose 
opportunity to yield. Or if you have some other idea.

How easy would this happen?

Another alternative is we could just return the partial len to caller then
we fallback to the original channel orders if it happens.  And then if it
mostly will never happen it'll behave merely the same as what we want.

Well we're trying to deal with a bug where the slow and/or unreliable
network causes channels to arrive in unexpected order. Given we know
we're having network trouble, I wouldn't want to make more assumptions
about things happening correctly.


With regards,
Daniel



Peter, I have seen MSG_PEEK used in combination with MSG_WAITALL, but looks 
like even though chances are less it can still return partial data even with 
multiple retries for signal case, so is not full proof.

*MSG_WAITALL *(since Linux 2.2)
  This flag requests that the operation block until the full
  request is satisfied.  However, the call may still return
  less data than requested if a signal is caught, an error
  or disconnect occurs, or the next data to be received is
  of a different type than that returned.  This flag has no
  effect for datagram sockets.

Actual read ahead will be little hackish, so just confirming we all are in 
agreement to do actual read ahead and i can send patch? :)

Thanks
Manish Mishra

Re: [PATCH for-8.0 v3 24/45] tcg: Use output_pref wrapper function

2022-11-22 Thread Philippe Mathieu-Daudé


On 11/11/22 08:40, Richard Henderson wrote:

We will shortly have the possibility of more that two outputs,
though only for calls (for which preferences are moot).  Avoid
direct references to op->output_pref[] when possible.

Signed-off-by: Richard Henderson 
---
  include/tcg/tcg.h |  5 +
  tcg/tcg.c | 34 ++
  2 files changed, 23 insertions(+), 16 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH for-8.0 v3 22/45] tci: MAX_OPC_PARAM_IARGS is no longer used

2022-11-22 Thread Philippe Mathieu-Daudé


On 11/11/22 08:40, Richard Henderson wrote:

Signed-off-by: Richard Henderson 
---
  tcg/tci.c| 1 -
  tcg/tci/tcg-target.c.inc | 4 
  2 files changed, 5 deletions(-)


Since commit 7b7d8b2d9a ("tcg/tci: Use ffi for calls")?

Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH for-8.0 v3 21/45] accel/tcg/plugin: Use copy_op in append_{udata, mem}_cb

2022-11-22 Thread Philippe Mathieu-Daudé


On 11/11/22 08:40, Richard Henderson wrote:

Better to re-use the existing function for copying ops.

Signed-off-by: Richard Henderson 
---
  accel/tcg/plugin-gen.c | 16 
  1 file changed, 8 insertions(+), 8 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH for-8.0 v3 20/45] accel/tcg/plugin: Avoid duplicate copy in copy_call

2022-11-22 Thread Philippe Mathieu-Daudé


On 11/11/22 08:40, Richard Henderson wrote:

We copied all of the arguments in copy_op_nocheck.
We only need to replace the one argument that we change.

Signed-off-by: Richard Henderson 
---
  accel/tcg/plugin-gen.c | 2 --
  1 file changed, 2 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH-for-7.2] vhost-vdpa: skip TPM CRB memory section

2022-11-22 Thread Philippe Mathieu-Daudé


On 22/11/22 15:53, marcandre.lur...@redhat.com wrote:

From: Marc-André Lureau 

851d6d1a0f ("vfio/common: remove spurious tpm-crb-cmd misalignment
warning") removed the warning on vfio_listener_region_add() path.

An error is reported for vhost-vdpa case:
qemu-kvm: vhost_vdpa_listener_region_add received unaligned region

Skip the CRB device.

Fixes:
https://bugzilla.redhat.com/show_bug.cgi?id=2141965

Signed-off-by: Marc-André Lureau 
---
  hw/virtio/vhost-vdpa.c | 6 ++
  1 file changed, 6 insertions(+)

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 7468e44b87..9d7206e4b8 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -19,6 +19,7 @@
  #include "hw/virtio/virtio-net.h"
  #include "hw/virtio/vhost-shadow-virtqueue.h"
  #include "hw/virtio/vhost-vdpa.h"
+#include "sysemu/tpm.h"
  #include "exec/address-spaces.h"
  #include "migration/blocker.h"
  #include "qemu/cutils.h"
@@ -46,6 +47,11 @@ static bool 
vhost_vdpa_listener_skipped_section(MemoryRegionSection *section,
  {
  Int128 llend;
  
+if (TPM_IS_CRB(section->mr->owner)) {

+/* The CRB command buffer has its base address unaligned. */
+return true;
+}


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH] cleanup: Tweak and re-run return_directly.cocci

2022-11-22 Thread Peter Maydell

On Tue, 22 Nov 2022 at 15:04, Philippe Mathieu-Daudé  wrote:
>
> On 21/11/22 17:42, Max Filippov wrote:
> > On Mon, Nov 21, 2022 at 6:01 AM Markus Armbruster  wrote:
> >>   .../xtensa/core-dsp3400/xtensa-modules.c.inc  | 136 +-
> >>   target/xtensa/core-lx106/xtensa-modules.c.inc |  16 +--
> >
> > These files are generated and were imported from xtensa configuration
> > overlays, they're not supposed to be changed.
>
> Tools can get the repository file list using 'git ls-files', which
> itself support file pattern exclusion [*].
>
> We can create i.e. 'scripts/imported-files.txt' with:
>
>linux-headers/
>target/hexagon/imported/
>target/xtensa/core*
>tests/tcg/mips/user/

Good idea. scripts/clean-header-guards.pl also has these to add
to the exclude list:

include/standard-headers/
pc-bios/
tests/tcg/
tests/multiboot/

thanks
-- PMM

Re: [PATCH v2 1/2] cleanup: Tweak and re-run return_directly.cocci

2022-11-22 Thread Philippe Mathieu-Daudé


On 22/11/22 14:49, Markus Armbruster wrote:

Tweak the semantic patch to drop redundant parenthesis around the
return expression.

Coccinelle drops a comment in hw/rdma/vmw/pvrdma_cmd.c; restored
manually.

Coccinelle messes up vmdk_co_create(), not sure why.  Change dropped,
will be done manually in the next commit.

Line breaks in target/avr/cpu.h and hw/rdma/vmw/pvrdma_cmd.c tidied up
manually.

Whitespace in tools/virtiofsd/fuse_lowlevel.c tidied up manually.

checkpatch.pl complains "return of an errno should typically be -ve"
two times for hw/9pfs/9p-synth.c.  Preexisting, the patch merely makes
it visible to checkpatch.pl.

Signed-off-by: Markus Armbruster 
---
  scripts/coccinelle/return_directly.cocci |  5 +--
  include/hw/pci/pci.h |  7 +--
  target/avr/cpu.h |  4 +-
  hw/9pfs/9p-synth.c   | 14 ++
  hw/char/sifive_uart.c|  4 +-
  hw/ppc/ppc4xx_sdram.c|  5 +--
  hw/rdma/vmw/pvrdma_cmd.c | 57 +---
  hw/virtio/vhost-user.c   |  6 +--
  migration/dirtyrate.c| 10 +
  migration/tls.c  |  6 +--
  replay/replay-time.c |  5 +--
  semihosting/console.c|  4 +-
  softmmu/memory.c | 11 +
  softmmu/physmem.c|  9 +---
  target/loongarch/cpu.c   |  4 +-
  target/mips/tcg/dsp_helper.c | 15 ++-
  target/riscv/debug.c |  6 +--
  target/riscv/vector_helper.c | 28 +++-
  tests/bench/benchmark-crypto-akcipher.c  |  6 +--
  tests/qtest/erst-test.c  |  5 +--
  tests/qtest/hexloader-test.c |  6 +--
  tests/qtest/pvpanic-pci-test.c   |  6 +--
  tests/qtest/pvpanic-test.c   |  6 +--
  tests/qtest/test-filter-mirror.c |  6 +--
  tests/qtest/virtio-ccw-test.c|  6 +--
  tests/tcg/multiarch/sha512.c |  9 +---
  tools/virtiofsd/fuse_lowlevel.c  | 24 +++---
  27 files changed, 70 insertions(+), 204 deletions(-)

diff --git a/scripts/coccinelle/return_directly.cocci 
b/scripts/coccinelle/return_directly.cocci
index 4cf50e75ea..6cb1b3c99a 100644
--- a/scripts/coccinelle/return_directly.cocci
+++ b/scripts/coccinelle/return_directly.cocci
@@ -11,9 +11,8 @@ identifier F;
  -T VAR;
   ... when != VAR
  
--VAR =

-+return
- E;
+-VAR = (E);
  -return VAR;
++return E;
   ... when != VAR
   }


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH v2 2/2] block/vmdk: Simplify vmdk_co_create() to return directly

2022-11-22 Thread Philippe Mathieu-Daudé


On 22/11/22 14:49, Markus Armbruster wrote:

Cc: Fam Zheng 
Cc: Kevin Wolf 
Cc: Hanna Reitz 
Cc: qemu-bl...@nongnu.org
Signed-off-by: Markus Armbruster 
---
  block/vmdk.c | 28 +++-
  1 file changed, 11 insertions(+), 17 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH] cleanup: Tweak and re-run return_directly.cocci

2022-11-22 Thread Philippe Mathieu-Daudé


On 21/11/22 17:42, Max Filippov wrote:

On Mon, Nov 21, 2022 at 6:01 AM Markus Armbruster  wrote:

  .../xtensa/core-dsp3400/xtensa-modules.c.inc  | 136 +-
  target/xtensa/core-lx106/xtensa-modules.c.inc |  16 +--


These files are generated and were imported from xtensa configuration
overlays, they're not supposed to be changed.


Tools can get the repository file list using 'git ls-files', which
itself support file pattern exclusion [*].

We can create i.e. 'scripts/imported-files.txt' with:

  linux-headers/
  target/hexagon/imported/
  target/xtensa/core*
  tests/tcg/mips/user/

Then use 'git ls-files --exclude-from=scripts/imported-files.txt' ...

[*] 
https://git-scm.com/docs/git-ls-files#Documentation/git-ls-files.txt---exclude-fromltfilegt

[PATCH] vhost-vdpa: skip TPM CRB memory section

2022-11-22 Thread marcandre . lureau

From: Marc-André Lureau 

851d6d1a0f ("vfio/common: remove spurious tpm-crb-cmd misalignment
warning") removed the warning on vfio_listener_region_add() path.

An error is reported for vhost-vdpa case:
qemu-kvm: vhost_vdpa_listener_region_add received unaligned region

Skip the CRB device.

Fixes:
https://bugzilla.redhat.com/show_bug.cgi?id=2141965

Signed-off-by: Marc-André Lureau 
---
 hw/virtio/vhost-vdpa.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 7468e44b87..9d7206e4b8 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -19,6 +19,7 @@
 #include "hw/virtio/virtio-net.h"
 #include "hw/virtio/vhost-shadow-virtqueue.h"
 #include "hw/virtio/vhost-vdpa.h"
+#include "sysemu/tpm.h"
 #include "exec/address-spaces.h"
 #include "migration/blocker.h"
 #include "qemu/cutils.h"
@@ -46,6 +47,11 @@ static bool 
vhost_vdpa_listener_skipped_section(MemoryRegionSection *section,
 {
 Int128 llend;
 
+if (TPM_IS_CRB(section->mr->owner)) {
+/* The CRB command buffer has its base address unaligned. */
+return true;
+}
+
 if ((!memory_region_is_ram(section->mr) &&
  !memory_region_is_iommu(section->mr)) ||
 memory_region_is_protected(section->mr) ||
-- 
2.38.1

Re: [PATCH] cleanup: Tweak and re-run return_directly.cocci

2022-11-22 Thread Philippe Mathieu-Daudé


On 22/11/22 09:58, Markus Armbruster wrote:

Thomas Huth  writes:


On 21/11/2022 17.32, Markus Armbruster wrote:

Philippe Mathieu-Daudé  writes:


On 21/11/22 15:36, Peter Maydell wrote:

On Mon, 21 Nov 2022 at 14:03, Markus Armbruster  wrote:


Tweak the semantic patch to drop redundant parenthesis around the
return expression.

Coccinelle drops comments in hw/rdma/vmw/pvrdma_cmd.c; restored
manually.

Coccinelle messes up vmdk_co_create(), not sure why.  Transformed
manually.

Line breaks in target/avr/cpu.h and hw/rdma/vmw/pvrdma_cmd.c tidied up
manually.

Whitespace in fuse_reply_iov() tidied up manually.

checkpatch.pl complains "return of an errno should typically be -ve"
two times for hw/9pfs/9p-synth.c.  Preexisting, the patch merely makes
it visible to checkpatch.pl.

checkpatch.pl complains "return is not a function, parentheses are not
required" three times for target/mips/tcg/dsp_helper.c.  False
positives.

Signed-off-by: Markus Armbruster 



.../user/ase/msa/bit-count/test_msa_nloc_b.c  |   9 +-
.../user/ase/msa/bit-count/test_msa_nloc_d.c  |   9 +-

[snip long list of other mips test files]


328 files changed, 989 insertions(+), 2099 deletions(-)

This patch seems to almost entirely be huge because of these
mips test case files. Are they specific to QEMU or are they
effectively a 3rd-party import that it doesn't make sense
to make local changes to?


They are imported and will unlikely be modified.


Not obvious to me from git-log.

Should I drop the changes to tests/tcg/mips/?


I'd say yes. At least move them to a separate patch.


Possible status of tests/tcg/mips/:

1. Imported, should not be modified

Drop from the patch.

2. Not imported, should be modified

2a. To be reviewed separately from the remainder of the patch

 Split off.

2b. Likewise, but nobody will care to review, realistically

 Split off and merge anyway, or drop.  I'd go for the latter.

2c. To be reviewed together with the remainder of the patch

 Keep as is.

Which one is it?


"1. Imported, should not be modified" please :)

Re: [PATCH v3 1/2] io: Add support for MSG_PEEK for socket channel

2022-11-22 Thread Daniel P . Berrangé

On Tue, Nov 22, 2022 at 09:41:01AM -0500, Peter Xu wrote:
> On Tue, Nov 22, 2022 at 02:38:53PM +0530, manish.mishra wrote:
> > 
> > On 22/11/22 2:30 pm, Daniel P. Berrangé wrote:
> > > On Sat, Nov 19, 2022 at 09:36:14AM +, manish.mishra wrote:
> > > > MSG_PEEK reads from the peek of channel, The data is treated as
> > > > unread and the next read shall still return this data. This
> > > > support is currently added only for socket class. Extra parameter
> > > > 'flags' is added to io_readv calls to pass extra read flags like
> > > > MSG_PEEK.
> > > > 
> > > > Suggested-by: Daniel P. Berrangé  > > > Signed-off-by: manish.mishra 
> > > > ---
> > > >   chardev/char-socket.c   |  4 +-
> > > >   include/io/channel.h| 83 +
> > > >   io/channel-buffer.c |  1 +
> > > >   io/channel-command.c|  1 +
> > > >   io/channel-file.c   |  1 +
> > > >   io/channel-null.c   |  1 +
> > > >   io/channel-socket.c | 16 +-
> > > >   io/channel-tls.c|  1 +
> > > >   io/channel-websock.c|  1 +
> > > >   io/channel.c| 73 +++--
> > > >   migration/channel-block.c   |  1 +
> > > >   scsi/qemu-pr-helper.c   |  2 +-
> > > >   tests/qtest/tpm-emu.c   |  2 +-
> > > >   tests/unit/test-io-channel-socket.c |  1 +
> > > >   util/vhost-user-server.c|  2 +-
> > > >   15 files changed, 179 insertions(+), 11 deletions(-)
> > > 
> > > 
> > > > diff --git a/io/channel-socket.c b/io/channel-socket.c
> > > > index b76dca9cc1..a06b24766d 100644
> > > > --- a/io/channel-socket.c
> > > > +++ b/io/channel-socket.c
> > > > @@ -406,6 +406,8 @@ qio_channel_socket_accept(QIOChannelSocket *ioc,
> > > >   }
> > > >   #endif /* WIN32 */
> > > > +qio_channel_set_feature(QIO_CHANNEL(cioc), 
> > > > QIO_CHANNEL_FEATURE_READ_MSG_PEEK);
> > > > +
> > > This covers the incoming server side socket.
> > > 
> > > This also needs to be set in outgoing client side socket in
> > > qio_channel_socket_connect_async
> > 
> > 
> > Yes sorry, i considered only current use-case, but as it is generic one 
> > both should be there. Thanks will update it.
> > 
> > > 
> > > > @@ -705,7 +718,6 @@ static ssize_t qio_channel_socket_writev(QIOChannel 
> > > > *ioc,
> > > >   }
> > > >   #endif /* WIN32 */
> > > > -
> > > >   #ifdef QEMU_MSG_ZEROCOPY
> > > >   static int qio_channel_socket_flush(QIOChannel *ioc,
> > > >   Error **errp)
> > > Please remove this unrelated whitespace change.
> > > 
> > > 
> > > > @@ -109,6 +117,37 @@ int qio_channel_readv_all_eof(QIOChannel *ioc,
> > > >   return qio_channel_readv_full_all_eof(ioc, iov, niov, NULL, NULL, 
> > > > errp);
> > > >   }
> > > > +int qio_channel_readv_peek_all_eof(QIOChannel *ioc,
> > > > +   const struct iovec *iov,
> > > > +   size_t niov,
> > > > +   Error **errp)
> > > > +{
> > > > +   ssize_t len = 0;
> > > > +   ssize_t total = iov_size(iov, niov);
> > > > +
> > > > +   while (len < total) {
> > > > +   len = qio_channel_readv_full(ioc, iov, niov, NULL,
> > > > +NULL, 
> > > > QIO_CHANNEL_READ_FLAG_MSG_PEEK, errp);
> > > > +
> > > > +   if (len == QIO_CHANNEL_ERR_BLOCK) {
> > > > +if (qemu_in_coroutine()) {
> > > > +qio_channel_yield(ioc, G_IO_IN);
> > > > +} else {
> > > > +qio_channel_wait(ioc, G_IO_IN);
> > > > +}
> > > > +continue;
> > > > +   }
> > > > +   if (len == 0) {
> > > > +   return 0;
> > > > +   }
> > > > +   if (len < 0) {
> > > > +   return -1;
> > > > +   }
> > > > +   }
> > > This will busy wait burning CPU where there is a read > 0 and < total.
> > > 
> > 
> > Daniel, i could use MSG_WAITALL too if that works but then we will lose 
> > opportunity to yield. Or if you have some other idea.
> 
> How easy would this happen?
> 
> Another alternative is we could just return the partial len to caller then
> we fallback to the original channel orders if it happens.  And then if it
> mostly will never happen it'll behave merely the same as what we want.

Well we're trying to deal with a bug where the slow and/or unreliable
network causes channels to arrive in unexpected order. Given we know
we're having network trouble, I wouldn't want to make more assumptions
about things happening correctly.


With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH] target/arm: added cortex-a55 CPU support for qemu-virt

2022-11-22 Thread Peter Maydell

On Mon, 21 Nov 2022 at 15:08, Timofey Kutergin  wrote:
>
>   cortex-a55 is one of newer armv8.2+ CPUs supporting native
>   Privileged Access Never (PAN) feature. Using this CPU
>   provides access to this feature without using fictitious "max"
>   CPU.
>
> Signed-off-by: Timofey Kutergin 

Thanks; I've applied this to my target-arm-for-8.0 branch,
which will eventually become target-arm when 7.2 is
released in a couple of weeks.

-- PMM

Re: [PATCH v3 1/2] io: Add support for MSG_PEEK for socket channel

2022-11-22 Thread Peter Xu

On Tue, Nov 22, 2022 at 02:38:53PM +0530, manish.mishra wrote:
> 
> On 22/11/22 2:30 pm, Daniel P. Berrangé wrote:
> > On Sat, Nov 19, 2022 at 09:36:14AM +, manish.mishra wrote:
> > > MSG_PEEK reads from the peek of channel, The data is treated as
> > > unread and the next read shall still return this data. This
> > > support is currently added only for socket class. Extra parameter
> > > 'flags' is added to io_readv calls to pass extra read flags like
> > > MSG_PEEK.
> > > 
> > > Suggested-by: Daniel P. Berrangé  > > Signed-off-by: manish.mishra 
> > > ---
> > >   chardev/char-socket.c   |  4 +-
> > >   include/io/channel.h| 83 +
> > >   io/channel-buffer.c |  1 +
> > >   io/channel-command.c|  1 +
> > >   io/channel-file.c   |  1 +
> > >   io/channel-null.c   |  1 +
> > >   io/channel-socket.c | 16 +-
> > >   io/channel-tls.c|  1 +
> > >   io/channel-websock.c|  1 +
> > >   io/channel.c| 73 +++--
> > >   migration/channel-block.c   |  1 +
> > >   scsi/qemu-pr-helper.c   |  2 +-
> > >   tests/qtest/tpm-emu.c   |  2 +-
> > >   tests/unit/test-io-channel-socket.c |  1 +
> > >   util/vhost-user-server.c|  2 +-
> > >   15 files changed, 179 insertions(+), 11 deletions(-)
> > 
> > 
> > > diff --git a/io/channel-socket.c b/io/channel-socket.c
> > > index b76dca9cc1..a06b24766d 100644
> > > --- a/io/channel-socket.c
> > > +++ b/io/channel-socket.c
> > > @@ -406,6 +406,8 @@ qio_channel_socket_accept(QIOChannelSocket *ioc,
> > >   }
> > >   #endif /* WIN32 */
> > > +qio_channel_set_feature(QIO_CHANNEL(cioc), 
> > > QIO_CHANNEL_FEATURE_READ_MSG_PEEK);
> > > +
> > This covers the incoming server side socket.
> > 
> > This also needs to be set in outgoing client side socket in
> > qio_channel_socket_connect_async
> 
> 
> Yes sorry, i considered only current use-case, but as it is generic one both 
> should be there. Thanks will update it.
> 
> > 
> > > @@ -705,7 +718,6 @@ static ssize_t qio_channel_socket_writev(QIOChannel 
> > > *ioc,
> > >   }
> > >   #endif /* WIN32 */
> > > -
> > >   #ifdef QEMU_MSG_ZEROCOPY
> > >   static int qio_channel_socket_flush(QIOChannel *ioc,
> > >   Error **errp)
> > Please remove this unrelated whitespace change.
> > 
> > 
> > > @@ -109,6 +117,37 @@ int qio_channel_readv_all_eof(QIOChannel *ioc,
> > >   return qio_channel_readv_full_all_eof(ioc, iov, niov, NULL, NULL, 
> > > errp);
> > >   }
> > > +int qio_channel_readv_peek_all_eof(QIOChannel *ioc,
> > > +   const struct iovec *iov,
> > > +   size_t niov,
> > > +   Error **errp)
> > > +{
> > > +   ssize_t len = 0;
> > > +   ssize_t total = iov_size(iov, niov);
> > > +
> > > +   while (len < total) {
> > > +   len = qio_channel_readv_full(ioc, iov, niov, NULL,
> > > +NULL, 
> > > QIO_CHANNEL_READ_FLAG_MSG_PEEK, errp);
> > > +
> > > +   if (len == QIO_CHANNEL_ERR_BLOCK) {
> > > +if (qemu_in_coroutine()) {
> > > +qio_channel_yield(ioc, G_IO_IN);
> > > +} else {
> > > +qio_channel_wait(ioc, G_IO_IN);
> > > +}
> > > +continue;
> > > +   }
> > > +   if (len == 0) {
> > > +   return 0;
> > > +   }
> > > +   if (len < 0) {
> > > +   return -1;
> > > +   }
> > > +   }
> > This will busy wait burning CPU where there is a read > 0 and < total.
> > 
> 
> Daniel, i could use MSG_WAITALL too if that works but then we will lose 
> opportunity to yield. Or if you have some other idea.

How easy would this happen?

Another alternative is we could just return the partial len to caller then
we fallback to the original channel orders if it happens.  And then if it
mostly will never happen it'll behave merely the same as what we want.

The thing is the other approach will be hacky in another way (have a flag
migration_consumed_4_bytes_header to either main and multifd channels),
then if it'll solve 99.99% cases I'd think it's good enough.  Anyway we're
working on a corner case already on unreliable network, and even if failure
triggered it's not so bad - we just redo the migration.

-- 
Peter Xu

Re: [PATCH for-8.0 07/29] accel/tcg: Honor atomicity of loads

2022-11-22 Thread Peter Maydell

On Fri, 18 Nov 2022 at 09:50, Richard Henderson
 wrote:
>
> Create ldst_atomicity.c.inc.
>
> Not required for user-only code loads, because we've ensured that
> the page is read-only before beginning to translate code.
>
> Signed-off-by: Richard Henderson 

> +/**
> + * required_atomicity:
> + *
> + * Return the lg2 bytes of atomicity required by @memop for @p.
> + * If the operation must be split into two operations to be
> + * examined separately for atomicity, return -lg2.
> + */
> +static int required_atomicity(CPUArchState *env, uintptr_t p, MemOp memop)
> +{
> +int atmax = memop & MO_ATMAX_MASK;
> +int size = memop & MO_SIZE;
> +unsigned tmp;
> +
> +if (atmax == MO_ATMAX_SIZE) {
> +atmax = size;
> +} else {
> +atmax >>= MO_ATMAX_SHIFT;
> +}
> +
> +switch (memop & MO_ATOM_MASK) {
> +case MO_ATOM_IFALIGN:
> +tmp = (1 << atmax) - 1;
> +if (p & tmp) {
> +return MO_8;
> +}
> +break;
> +case MO_ATOM_NONE:
> +return MO_8;
> +case MO_ATOM_SUBALIGN:
> +tmp = p & -p;
> +if (tmp != 0 && tmp < atmax) {
> +atmax = tmp;
> +}
> +break;
> +case MO_ATOM_WITHIN16:
> +tmp = p & 15;
> +if (tmp + (1 << size) <= 16) {
> +atmax = size;
> +} else if (atmax < size && tmp + (1 << atmax) != 16) {
> +/*
> + * Paired load/store, where the pairs aren't aligned.
> + * One of the two must still be handled atomically.
> + */
> +atmax = -atmax;
> +}
> +break;
> +default:
> +g_assert_not_reached();
> +}
> +
> +/*
> + * Here we have the architectural atomicity of the operation.
> + * However, when executing in a serial context, we need no extra
> + * host atomicity in order to avoid racing.  This reduction
> + * avoids looping with cpu_loop_exit_atomic.
> + */
> +if (cpu_in_serial_context(env_cpu(env))) {

Is it OK to use cpu_in_serial_context() here ? Even if
there's no other vCPU executing in parallel, there might
be device model code doing a memory write in the iothread,
I think.

> +return MO_8;
> +}
> +return atmax;
> +}

thanks
-- PMM

1 2 3 >

1 - 100 of 202 matches

Mail list logo