date:20231005

Re: [PATCH v2 2/3] target/hexagon: fix some occurrences of -Wshadow=local

2023-10-05 Thread Philippe Mathieu-Daudé


On 6/10/23 00:22, Brian Cain wrote:

Of the changes in this commit, the changes in `HELPER(commit_hvx_stores)()`
are less obvious.  They are required because of some macro invocations like
SCATTER_OP_WRITE_TO_MEM().

e.g.:

 In file included from ../target/hexagon/op_helper.c:31:
 ../target/hexagon/mmvec/macros.h:205:18: error: declaration of ‘i’ shadows 
a previous local [-Werror=shadow=compatible-local]
   205 | for (int i = 0; i < sizeof(MMVector); i += sizeof(TYPE)) { 
\
   |  ^
 ../target/hexagon/op_helper.c:157:17: note: in expansion of macro 
‘SCATTER_OP_WRITE_TO_MEM’
   157 | SCATTER_OP_WRITE_TO_MEM(uint16_t);
   | ^~~
 ../target/hexagon/op_helper.c:135:9: note: shadowed declaration is here
   135 | int i;
   | ^
 In file included from ../target/hexagon/op_helper.c:31:
 ../target/hexagon/mmvec/macros.h:204:19: error: declaration of ‘ra’ 
shadows a previous local [-Werror=shadow=compatible-local]
   204 | uintptr_t ra = GETPC(); \
   |   ^~
 ../target/hexagon/op_helper.c:160:17: note: in expansion of macro 
‘SCATTER_OP_WRITE_TO_MEM’
   160 | SCATTER_OP_WRITE_TO_MEM(uint32_t);
   | ^~~
 ../target/hexagon/op_helper.c:134:15: note: shadowed declaration is here
   134 | uintptr_t ra = GETPC();
   |   ^~

Reviewed-by: Matheus Tavares Bernardino 
Signed-off-by: Brian Cain 
---
  target/hexagon/imported/alu.idef | 6 +++---
  target/hexagon/mmvec/macros.h| 6 +++---
  target/hexagon/op_helper.c   | 9 +++--
  target/hexagon/translate.c   | 9 -
  4 files changed, 13 insertions(+), 17 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé

[PATCH] target/ppc: Clean up local variable shadowing in kvm_arch_*_registers()

2023-10-05 Thread Cédric Le Goater

Remove extra 'i' variable to fix this warning :

  ../target/ppc/kvm.c: In function ‘kvm_arch_put_registers’:
  ../target/ppc/kvm.c:963:13: warning: declaration of ‘i’ shadows a previous 
local [-Wshadow=compatible-local]
963 | int i;
| ^
  ../target/ppc/kvm.c:906:9: note: shadowed declaration is here
906 | int i;
| ^
  ../target/ppc/kvm.c: In function ‘kvm_arch_get_registers’:
  ../target/ppc/kvm.c:1265:13: warning: declaration of ‘i’ shadows a previous 
local [-Wshadow=compatible-local]
   1265 | int i;
| ^
  ../target/ppc/kvm.c:1212:9: note: shadowed declaration is here
   1212 | int i, ret;
| ^

Signed-off-by: Cédric Le Goater 
---
 target/ppc/kvm.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
index 51112bd3670d..d0e2dcdc7734 100644
--- a/target/ppc/kvm.c
+++ b/target/ppc/kvm.c
@@ -960,8 +960,6 @@ int kvm_arch_put_registers(CPUState *cs, int level)
 }
 
 if (cap_one_reg) {
-int i;
-
 /*
  * We deliberately ignore errors here, for kernels which have
  * the ONE_REG calls, but don't support the specific
@@ -1262,8 +1260,6 @@ int kvm_arch_get_registers(CPUState *cs)
 }
 
 if (cap_one_reg) {
-int i;
-
 /*
  * We deliberately ignore errors here, for kernels which have
  * the ONE_REG calls, but don't support the specific
-- 
2.41.0

Re: [PATCH qemu] timer/i8254: Fix one shot PIT mode

2023-10-05 Thread Michael S. Tsirkin

On Fri, Oct 06, 2023 at 02:36:52AM +, Damien Zammit wrote:
> >From: Michael Tokarev 
> >26.02.2023 04:58, Damien Zammit wrote:
> >> Currently, the one-shot (mode 1) PIT expires far too quickly,
> >> due to the output being set under the wrong logic.
> >> This change fixes the one-shot PIT mode to behave similarly to mode 0.
> >> 
> >> TESTED: using the one-shot PIT mode to calibrate a local apic timer.
> >
> >Has this been forgotten, or is it not needed anymore?
> 
> This is still required but nobody uses the
> PIT one-shot mode (probably because it *is* currently broken).
> 
> Can it be merged?
> 
> Thanks,
> Damien

OK, I tagged it. thanks!

[PATCH] tap-win32: Remove unnecessary stubs

2023-10-05 Thread Akihiko Odaki

Some of them are only necessary for POSIX systems. The others are
assigned to function pointers in NetClientInfo that can actually be
NULL.

Signed-off-by: Akihiko Odaki 
---
 net/tap-win32.c | 54 -
 1 file changed, 54 deletions(-)

diff --git a/net/tap-win32.c b/net/tap-win32.c
index f327d62ab0..7edbd71633 100644
--- a/net/tap-win32.c
+++ b/net/tap-win32.c
@@ -707,70 +707,16 @@ static void tap_win32_send(void *opaque)
 }
 }
 
-static bool tap_has_ufo(NetClientState *nc)
-{
-return false;
-}
-
-static bool tap_has_vnet_hdr(NetClientState *nc)
-{
-return false;
-}
-
-int tap_probe_vnet_hdr_len(int fd, int len)
-{
-return 0;
-}
-
-void tap_fd_set_vnet_hdr_len(int fd, int len)
-{
-}
-
-int tap_fd_set_vnet_le(int fd, int is_le)
-{
-return -EINVAL;
-}
-
-int tap_fd_set_vnet_be(int fd, int is_be)
-{
-return -EINVAL;
-}
-
-static void tap_using_vnet_hdr(NetClientState *nc, bool using_vnet_hdr)
-{
-}
-
-static void tap_set_offload(NetClientState *nc, int csum, int tso4,
- int tso6, int ecn, int ufo)
-{
-}
-
 struct vhost_net *tap_get_vhost_net(NetClientState *nc)
 {
 return NULL;
 }
 
-static bool tap_has_vnet_hdr_len(NetClientState *nc, int len)
-{
-return false;
-}
-
-static void tap_set_vnet_hdr_len(NetClientState *nc, int len)
-{
-abort();
-}
-
 static NetClientInfo net_tap_win32_info = {
 .type = NET_CLIENT_DRIVER_TAP,
 .size = sizeof(TAPState),
 .receive = tap_receive,
 .cleanup = tap_cleanup,
-.has_ufo = tap_has_ufo,
-.has_vnet_hdr = tap_has_vnet_hdr,
-.has_vnet_hdr_len = tap_has_vnet_hdr_len,
-.using_vnet_hdr = tap_using_vnet_hdr,
-.set_offload = tap_set_offload,
-.set_vnet_hdr_len = tap_set_vnet_hdr_len,
 };
 
 static int tap_win32_init(NetClientState *peer, const char *model,
-- 
2.42.0

Re: [PATCH] scripts/xml-preprocess: Make sure this script is invoked via the right Python

2023-10-05 Thread Marc-André Lureau

On Fri, Oct 6, 2023 at 8:53 AM Thomas Huth  wrote:
>
> If a script is executable and has a shebang line, Meson treats it as
> a normal executable, so that this script here is run via the "python3"
> binary in the $PATH. However, "python3" might not be in the $PATH at
> all, or it might be a wrong version, so we should make sure to run
> this script via the Python version that has been chosen for the QEMU
> build process. The best way to do this is to remove the executable bit
> from the access mode bits. (See also commit 4b424c757188f7a4)
>
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1918
> Signed-off-by: Thomas Huth 

Reviewed-by: Marc-André Lureau 

thanks

> ---
>  scripts/xml-preprocess.py | 0
>  1 file changed, 0 insertions(+), 0 deletions(-)
>  mode change 100755 => 100644 scripts/xml-preprocess.py
>
> diff --git a/scripts/xml-preprocess.py b/scripts/xml-preprocess.py
> old mode 100755
> new mode 100644
> --
> 2.41.0
>

[PATCH] scripts/xml-preprocess: Make sure this script is invoked via the right Python

2023-10-05 Thread Thomas Huth

If a script is executable and has a shebang line, Meson treats it as
a normal executable, so that this script here is run via the "python3"
binary in the $PATH. However, "python3" might not be in the $PATH at
all, or it might be a wrong version, so we should make sure to run
this script via the Python version that has been chosen for the QEMU
build process. The best way to do this is to remove the executable bit
from the access mode bits. (See also commit 4b424c757188f7a4)

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1918
Signed-off-by: Thomas Huth 
---
 scripts/xml-preprocess.py | 0
 1 file changed, 0 insertions(+), 0 deletions(-)
 mode change 100755 => 100644 scripts/xml-preprocess.py

diff --git a/scripts/xml-preprocess.py b/scripts/xml-preprocess.py
old mode 100755
new mode 100644
-- 
2.41.0

Re: [PATCH v4 3/3] hw/vfio: add ramfb migration support

2023-10-05 Thread Laszlo Ersek

On 10/5/23 18:34, Cédric Le Goater wrote:
> On 10/5/23 13:30, marcandre.lur...@redhat.com wrote:
>> From: Marc-André Lureau 
>>
>> Add a "VFIODisplay" subsection whenever "x-ramfb-migrate" is turned on.
>>
>> Turn it off by default on machines <= 8.1 for compatibility reasons.
>>
>> Signed-off-by: Marc-André Lureau 
>> ---
>>   hw/vfio/pci.h |  3 +++
>>   hw/core/machine.c |  1 +
>>   hw/vfio/display.c | 20 
>>   hw/vfio/pci.c | 44 
>>   stubs/ramfb.c |  2 ++
>>   5 files changed, 70 insertions(+)
>>
>> diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
>> index 2d836093a8..fd06695542 100644
>> --- a/hw/vfio/pci.h
>> +++ b/hw/vfio/pci.h
>> @@ -173,6 +173,7 @@ struct VFIOPCIDevice {
>>   bool no_kvm_ioeventfd;
>>   bool no_vfio_ioeventfd;
>>   bool enable_ramfb;
>> +    OnOffAuto ramfb_migrate;
>>   bool defer_kvm_irq_routing;
>>   bool clear_parent_atomics_on_exit;
>>   VFIODisplay *dpy;
>> @@ -226,4 +227,6 @@ void vfio_display_reset(VFIOPCIDevice *vdev);
>>   int vfio_display_probe(VFIOPCIDevice *vdev, Error **errp);
>>   void vfio_display_finalize(VFIOPCIDevice *vdev);
>>   +extern const VMStateDescription vfio_display_vmstate;
>> +
>>   #endif /* HW_VFIO_VFIO_PCI_H */
>> diff --git a/hw/core/machine.c b/hw/core/machine.c
>> index e4361e3d48..f2c59a293c 100644
>> --- a/hw/core/machine.c
>> +++ b/hw/core/machine.c
>> @@ -33,6 +33,7 @@
>>     GlobalProperty hw_compat_8_1[] = {
>>   { "ramfb", "x-migrate", "off" },
>> +    { "vfio-pci-nohotplug", "x-ramfb-migrate", "off" }
>>   };
>>   const size_t hw_compat_8_1_len = G_N_ELEMENTS(hw_compat_8_1);
>>   diff --git a/hw/vfio/display.c b/hw/vfio/display.c
>> index bec864f482..0bdb807642 100644
>> --- a/hw/vfio/display.c
>> +++ b/hw/vfio/display.c
>> @@ -542,3 +542,23 @@ void vfio_display_finalize(VFIOPCIDevice *vdev)
>>   vfio_display_edid_exit(vdev->dpy);
>>   g_free(vdev->dpy);
>>   }
>> +
>> +static bool migrate_needed(void *opaque)
>> +{
>> +    VFIODisplay *dpy = opaque;
>> +    bool ramfb_exists = dpy->ramfb != NULL;
>> +
>> +    /* see vfio_display_migration_needed() */
>> +    assert(ramfb_exists);
>> +    return ramfb_exists;
>> +}
>> +
>> +const VMStateDescription vfio_display_vmstate = {
>> +    .name = "VFIODisplay",
>> +    .version_id = 1,
>> +    .minimum_version_id = 1,
>> +    .needed = migrate_needed,
>> +    .fields = (VMStateField[]) {
>> +    VMSTATE_STRUCT_POINTER(ramfb, VFIODisplay, ramfb_vmstate,
>> RAMFBState),
>> +    }
>> +};
>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>> index 3b2ca3c24c..d2ede2f1a2 100644
>> --- a/hw/vfio/pci.c
>> +++ b/hw/vfio/pci.c
>> @@ -2608,6 +2608,32 @@ static bool vfio_msix_present(void *opaque, int
>> version_id)
>>   return msix_present(pdev);
>>   }
>>   +static bool vfio_display_migration_needed(void *opaque)
>> +{
>> +    VFIOPCIDevice *vdev = opaque;
>> +
>> +    /*
>> + * We need to migrate the VFIODisplay object if ramfb *migration*
>> was
>> + * explicitly requested (in which case we enforced both ramfb=on and
>> + * display=on), or ramfb migration was left at the default "auto"
>> + * setting, and *ramfb* was explicitly requested (in which case we
>> + * enforced display=on).
>> + */
>> +    return vdev->ramfb_migrate == ON_OFF_AUTO_ON ||
>> +    (vdev->ramfb_migrate == ON_OFF_AUTO_AUTO &&
>> vdev->enable_ramfb);> +}
>> +
>> +const VMStateDescription vmstate_vfio_display = {
>> +    .name = "VFIOPCIDevice/VFIODisplay",
>> +    .version_id = 1,
>> +    .minimum_version_id = 1,
>> +    .needed = vfio_display_migration_needed,
>> +    .fields = (VMStateField[]){
>> +    VMSTATE_STRUCT_POINTER(dpy, VFIOPCIDevice,
>> vfio_display_vmstate, VFIODisplay),
>> +    VMSTATE_END_OF_LIST()
>> +    }
>> +};
>> +
>>   const VMStateDescription vmstate_vfio_pci_config = {
>>   .name = "VFIOPCIDevice",
>>   .version_id = 1,
>> @@ -2616,6 +2642,10 @@ const VMStateDescription
>> vmstate_vfio_pci_config = {
>>   VMSTATE_PCI_DEVICE(pdev, VFIOPCIDevice),
>>   VMSTATE_MSIX_TEST(pdev, VFIOPCIDevice, vfio_msix_present),
>>   VMSTATE_END_OF_LIST()
>> +    },
>> +    .subsections = (const VMStateDescription*[]) {
>> +    _vfio_display,
>> +    NULL
>>   }
>>   };
>>   @@ -3271,6 +3301,19 @@ static void vfio_realize(PCIDevice *pdev,
>> Error **errp)
>>   }
>>   }
>>   +    if (vdev->ramfb_migrate == ON_OFF_AUTO_ON &&
>> !vdev->enable_ramfb) {
>> +    error_setg(errp, "x-ramfb-migrate requires ramfb=on");
> 
> This is a case where QEMU would be run from the command line. Could we
> ouput a warning and force "ramfb_migrate" to OFF in that case ? since
> the machine would still run.

Sounds like a valid idea to me:

- consistency between x-ramfb-migrate and ramfb would be maintained

- x-* properties are not meant as user interface, so QEMU doesn't
guarantee anything about them, AIUI

Laszlo

> 
> Thanks,
> 
> C.
> 
> 
>> +

[PATCH] hw/ide: Add command to sync cache on ATAPI

2023-10-05 Thread Damien Zammit

Bumping this thread[1] as it seems to have gotten lost:

[1] https://lists.gnu.org/archive/html/qemu-block/2022-03/msg00586.html

Damien

Re: [PATCH qemu] timer/i8254: Fix one shot PIT mode

2023-10-05 Thread Damien Zammit

>From: Michael Tokarev 
>26.02.2023 04:58, Damien Zammit wrote:
>> Currently, the one-shot (mode 1) PIT expires far too quickly,
>> due to the output being set under the wrong logic.
>> This change fixes the one-shot PIT mode to behave similarly to mode 0.
>> 
>> TESTED: using the one-shot PIT mode to calibrate a local apic timer.
>
>Has this been forgotten, or is it not needed anymore?

This is still required but nobody uses the
PIT one-shot mode (probably because it *is* currently broken).

Can it be merged?

Thanks,
Damien

RE: [PATCH v4] target/riscv: update checks on writing pmpcfg for Smepmp to version 1.0

2023-10-05 Thread Chang Alvin

> On Mon, Sep 25, 2023 at 2:11 AM Alvin Chang 

> wrote:

> >

> > Current checks on writing pmpcfg for Smepmp follows Smepmp version

> > 0.9.1. However, Smepmp specification has already been ratified, and

> > there are some differences between version 0.9.1 and 1.0. In this

> > commit we update the checks of writing pmpcfg to follow Smepmp version

> > 1.0.

> >

> > When mseccfg.MML is set, the constraints to modify PMP rules are:

> > 1. Locked rules cannot be removed or modified until a PMP reset, unless

> >mseccfg.RLB is set.

> > 2. From Smepmp specification version 1.0, chapter 2 section 4b:

> >Adding a rule with executable privileges that either is M-mode-only

> >or a locked Shared-Region is not possible and such pmpcfg writes are

> >ignored, leaving pmpcfg unchanged.

> >

> > The commit transfers the value of pmpcfg into the index of the Smepmp

> > truth table, and checks the rules by aforementioned specification

> > changes.

> >

> > Signed-off-by: Alvin Chang 

> > ---

> > Changes from v3: Modify "epmp_operation" to "smepmp_operation".

>

> This has the same issue as all the previous versions.

>

> QEMU is currently not shipping with Smepmp support. So updating some of
the

> code to support Smepmp is confusing.

>

> As I pointed out for the v3, we currently only support ePMP 0.9.3. So
that is

> what the code must work with.

>

> In order for this change to go in, we also need to upgrade QEMU to support

> Smepmp 1.0.

>

> This patch is an attempt to do that:

> https://www.mail-archive.com/qemu-devel@nongnu.org/msg967676.html

>

> You basically need to combine the changes from

> https://www.mail-archive.com/qemu-devel@nongnu.org/msg967676.html into

> this patch. So there is a single patch that updates to Smepmp.

>

> Alistair

>


Hi Alistair,


Mayuresh has sent that patch again recently:

https://www.mail-archive.com/qemu-devel@nongnu.org/msg991428.html

Since the author is from Ventana, I would like to send my patch separately.

I think our patches can be merged/applied together.

Thanks!


Alvin


> >

> > Changes from v2: Adopt switch case ranges and numerical order.

> >

> > Changes from v1: Convert ePMP over to Smepmp.

> >

> >  target/riscv/pmp.c | 40 

> >  1 file changed, 32 insertions(+), 8 deletions(-)

> >

> > diff --git a/target/riscv/pmp.c b/target/riscv/pmp.c index

> > a08cd95658..2ebf18c941 100644

> > --- a/target/riscv/pmp.c

> > +++ b/target/riscv/pmp.c

> > @@ -99,16 +99,40 @@ static void pmp_write_cfg(CPURISCVState *env,

> uint32_t pmp_index, uint8_t val)

> >  locked = false;

> >  }

> >

> > -/* mseccfg.MML is set */

> > -if (MSECCFG_MML_ISSET(env)) {

> > -/* not adding execute bit */

> > -if ((val & PMP_LOCK) != 0 && (val & PMP_EXEC) !=

> PMP_EXEC) {

> > +/*

> > + * mseccfg.MML is set. Locked rules cannot be removed or

> modified

> > + * until a PMP reset. Besides, from Smepmp specification

> version 1.0

> > + * , chapter 2 section 4b says:

> > + * Adding a rule with executable privileges that either is

> > + * M-mode-only or a locked Shared-Region is not possible

> and such

> > + * pmpcfg writes are ignored, leaving pmpcfg unchanged.

> > + */

> > +if (MSECCFG_MML_ISSET(env) && !pmp_is_locked(env,

> pmp_index)) {

> > +/*

> > + * Convert the PMP permissions to match the truth

> table in the

> > + * Smepmp spec.

> > + */

> > +const uint8_t smepmp_operation =

> > +((val & PMP_LOCK) >> 4) | ((val & PMP_READ) <<

> 2) |

> > +(val & PMP_WRITE) | ((val & PMP_EXEC) >> 2);

> > +

> > +switch (smepmp_operation) {

> > +case 0 ... 8:

> >  locked = false;

> > -}

> > -/* shared region and not adding X bit */

> > -if ((val & PMP_LOCK) != PMP_LOCK &&

> > -(val & 0x7) != (PMP_WRITE | PMP_EXEC)) {

> > +break;

> > +case 9 ... 11:

> > +break;

> > +case 12:

> > +locked = false;

> > +break;

> > +case 13:

> > +break;

> > +case 14:

> > +case 15:

> >  locked = false;

> > +break;

> > +default:

> > +g_assert_not_reached();

> >  }

> >  }

> >  } else {

> > --

> > 2.34.1

> >

> >

[PATCH v17 4/9] virtio-gpu: blob prep

2023-10-05 Thread Gurchetan Singh

From: Antonio Caggiano 

This adds preparatory functions needed to:

 - decode blob cmds
 - tracking iovecs

Signed-off-by: Antonio Caggiano 
Signed-off-by: Dmitry Osipenko 
Signed-off-by: Gurchetan Singh 
Tested-by: Alyssa Ross 
Tested-by: Emmanouil Pitsidianakis 
Tested-by: Akihiko Odaki 
Tested-by: Huang Rui 
Acked-by: Huang Rui 
Reviewed-by: Emmanouil Pitsidianakis 
Reviewed-by: Akihiko Odaki 
---
 hw/display/virtio-gpu.c  | 10 +++---
 include/hw/virtio/virtio-gpu-bswap.h | 15 +++
 include/hw/virtio/virtio-gpu.h   |  5 +
 3 files changed, 23 insertions(+), 7 deletions(-)

diff --git a/hw/display/virtio-gpu.c b/hw/display/virtio-gpu.c
index 5585558855..be16efbd38 100644
--- a/hw/display/virtio-gpu.c
+++ b/hw/display/virtio-gpu.c
@@ -33,15 +33,11 @@
 
 #define VIRTIO_GPU_VM_VERSION 1
 
-static struct virtio_gpu_simple_resource*
-virtio_gpu_find_resource(VirtIOGPU *g, uint32_t resource_id);
 static struct virtio_gpu_simple_resource *
 virtio_gpu_find_check_resource(VirtIOGPU *g, uint32_t resource_id,
bool require_backing,
const char *caller, uint32_t *error);
 
-static void virtio_gpu_cleanup_mapping(VirtIOGPU *g,
-   struct virtio_gpu_simple_resource *res);
 static void virtio_gpu_reset_bh(void *opaque);
 
 void virtio_gpu_update_cursor_data(VirtIOGPU *g,
@@ -116,7 +112,7 @@ static void update_cursor(VirtIOGPU *g, struct 
virtio_gpu_update_cursor *cursor)
   cursor->resource_id ? 1 : 0);
 }
 
-static struct virtio_gpu_simple_resource *
+struct virtio_gpu_simple_resource *
 virtio_gpu_find_resource(VirtIOGPU *g, uint32_t resource_id)
 {
 struct virtio_gpu_simple_resource *res;
@@ -904,8 +900,8 @@ void virtio_gpu_cleanup_mapping_iov(VirtIOGPU *g,
 g_free(iov);
 }
 
-static void virtio_gpu_cleanup_mapping(VirtIOGPU *g,
-   struct virtio_gpu_simple_resource *res)
+void virtio_gpu_cleanup_mapping(VirtIOGPU *g,
+struct virtio_gpu_simple_resource *res)
 {
 virtio_gpu_cleanup_mapping_iov(g, res->iov, res->iov_cnt);
 res->iov = NULL;
diff --git a/include/hw/virtio/virtio-gpu-bswap.h 
b/include/hw/virtio/virtio-gpu-bswap.h
index 637a0585d0..dd1975e2d4 100644
--- a/include/hw/virtio/virtio-gpu-bswap.h
+++ b/include/hw/virtio/virtio-gpu-bswap.h
@@ -70,6 +70,21 @@ virtio_gpu_create_blob_bswap(struct 
virtio_gpu_resource_create_blob *cblob)
 le64_to_cpus(>size);
 }
 
+static inline void
+virtio_gpu_map_blob_bswap(struct virtio_gpu_resource_map_blob *mblob)
+{
+virtio_gpu_ctrl_hdr_bswap(>hdr);
+le32_to_cpus(>resource_id);
+le64_to_cpus(>offset);
+}
+
+static inline void
+virtio_gpu_unmap_blob_bswap(struct virtio_gpu_resource_unmap_blob *ublob)
+{
+virtio_gpu_ctrl_hdr_bswap(>hdr);
+le32_to_cpus(>resource_id);
+}
+
 static inline void
 virtio_gpu_scanout_blob_bswap(struct virtio_gpu_set_scanout_blob *ssb)
 {
diff --git a/include/hw/virtio/virtio-gpu.h b/include/hw/virtio/virtio-gpu.h
index de4f624e94..55973e112f 100644
--- a/include/hw/virtio/virtio-gpu.h
+++ b/include/hw/virtio/virtio-gpu.h
@@ -257,6 +257,9 @@ void virtio_gpu_base_fill_display_info(VirtIOGPUBase *g,
 void virtio_gpu_base_generate_edid(VirtIOGPUBase *g, int scanout,
struct virtio_gpu_resp_edid *edid);
 /* virtio-gpu.c */
+struct virtio_gpu_simple_resource *
+virtio_gpu_find_resource(VirtIOGPU *g, uint32_t resource_id);
+
 void virtio_gpu_ctrl_response(VirtIOGPU *g,
   struct virtio_gpu_ctrl_command *cmd,
   struct virtio_gpu_ctrl_hdr *resp,
@@ -275,6 +278,8 @@ int virtio_gpu_create_mapping_iov(VirtIOGPU *g,
   uint32_t *niov);
 void virtio_gpu_cleanup_mapping_iov(VirtIOGPU *g,
 struct iovec *iov, uint32_t count);
+void virtio_gpu_cleanup_mapping(VirtIOGPU *g,
+struct virtio_gpu_simple_resource *res);
 void virtio_gpu_process_cmdq(VirtIOGPU *g);
 void virtio_gpu_device_realize(DeviceState *qdev, Error **errp);
 void virtio_gpu_reset(VirtIODevice *vdev);
-- 
2.42.0.609.gbb76f46606-goog

[PATCH v17 3/9] virtio-gpu: hostmem

2023-10-05 Thread Gurchetan Singh

From: Gerd Hoffmann 

Use VIRTIO_GPU_SHM_ID_HOST_VISIBLE as id for virtio-gpu.

Signed-off-by: Antonio Caggiano 
Tested-by: Alyssa Ross 
Tested-by: Akihiko Odaki 
Tested-by: Huang Rui 
Acked-by: Huang Rui 
Acked-by: Michael S. Tsirkin 
Reviewed-by: Akihiko Odaki 
---
 hw/display/virtio-gpu-pci.c| 14 ++
 hw/display/virtio-gpu.c|  1 +
 hw/display/virtio-vga.c| 33 -
 include/hw/virtio/virtio-gpu.h |  5 +
 4 files changed, 44 insertions(+), 9 deletions(-)

diff --git a/hw/display/virtio-gpu-pci.c b/hw/display/virtio-gpu-pci.c
index 93f214ff58..da6a99f038 100644
--- a/hw/display/virtio-gpu-pci.c
+++ b/hw/display/virtio-gpu-pci.c
@@ -33,6 +33,20 @@ static void virtio_gpu_pci_base_realize(VirtIOPCIProxy 
*vpci_dev, Error **errp)
 DeviceState *vdev = DEVICE(g);
 int i;
 
+if (virtio_gpu_hostmem_enabled(g->conf)) {
+vpci_dev->msix_bar_idx = 1;
+vpci_dev->modern_mem_bar_idx = 2;
+memory_region_init(>hostmem, OBJECT(g), "virtio-gpu-hostmem",
+   g->conf.hostmem);
+pci_register_bar(_dev->pci_dev, 4,
+ PCI_BASE_ADDRESS_SPACE_MEMORY |
+ PCI_BASE_ADDRESS_MEM_PREFETCH |
+ PCI_BASE_ADDRESS_MEM_TYPE_64,
+ >hostmem);
+virtio_pci_add_shm_cap(vpci_dev, 4, 0, g->conf.hostmem,
+   VIRTIO_GPU_SHM_ID_HOST_VISIBLE);
+}
+
 virtio_pci_force_virtio_1(vpci_dev);
 if (!qdev_realize(vdev, BUS(_dev->bus), errp)) {
 return;
diff --git a/hw/display/virtio-gpu.c b/hw/display/virtio-gpu.c
index 93857ad523..5585558855 100644
--- a/hw/display/virtio-gpu.c
+++ b/hw/display/virtio-gpu.c
@@ -1511,6 +1511,7 @@ static Property virtio_gpu_properties[] = {
  256 * MiB),
 DEFINE_PROP_BIT("blob", VirtIOGPU, parent_obj.conf.flags,
 VIRTIO_GPU_FLAG_BLOB_ENABLED, false),
+DEFINE_PROP_SIZE("hostmem", VirtIOGPU, parent_obj.conf.hostmem, 0),
 DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/hw/display/virtio-vga.c b/hw/display/virtio-vga.c
index e6fb0aa876..c8552ff760 100644
--- a/hw/display/virtio-vga.c
+++ b/hw/display/virtio-vga.c
@@ -115,17 +115,32 @@ static void virtio_vga_base_realize(VirtIOPCIProxy 
*vpci_dev, Error **errp)
 pci_register_bar(_dev->pci_dev, 0,
  PCI_BASE_ADDRESS_MEM_PREFETCH, >vram);
 
-/*
- * Configure virtio bar and regions
- *
- * We use bar #2 for the mmio regions, to be compatible with stdvga.
- * virtio regions are moved to the end of bar #2, to make room for
- * the stdvga mmio registers at the start of bar #2.
- */
-vpci_dev->modern_mem_bar_idx = 2;
-vpci_dev->msix_bar_idx = 4;
 vpci_dev->modern_io_bar_idx = 5;
 
+if (!virtio_gpu_hostmem_enabled(g->conf)) {
+/*
+ * Configure virtio bar and regions
+ *
+ * We use bar #2 for the mmio regions, to be compatible with stdvga.
+ * virtio regions are moved to the end of bar #2, to make room for
+ * the stdvga mmio registers at the start of bar #2.
+ */
+vpci_dev->modern_mem_bar_idx = 2;
+vpci_dev->msix_bar_idx = 4;
+} else {
+vpci_dev->msix_bar_idx = 1;
+vpci_dev->modern_mem_bar_idx = 2;
+memory_region_init(>hostmem, OBJECT(g), "virtio-gpu-hostmem",
+   g->conf.hostmem);
+pci_register_bar(_dev->pci_dev, 4,
+ PCI_BASE_ADDRESS_SPACE_MEMORY |
+ PCI_BASE_ADDRESS_MEM_PREFETCH |
+ PCI_BASE_ADDRESS_MEM_TYPE_64,
+ >hostmem);
+virtio_pci_add_shm_cap(vpci_dev, 4, 0, g->conf.hostmem,
+   VIRTIO_GPU_SHM_ID_HOST_VISIBLE);
+}
+
 if (!(vpci_dev->flags & VIRTIO_PCI_FLAG_PAGE_PER_VQ)) {
 /*
  * with page-per-vq=off there is no padding space we can use
diff --git a/include/hw/virtio/virtio-gpu.h b/include/hw/virtio/virtio-gpu.h
index 8377c365ef..de4f624e94 100644
--- a/include/hw/virtio/virtio-gpu.h
+++ b/include/hw/virtio/virtio-gpu.h
@@ -108,12 +108,15 @@ enum virtio_gpu_base_conf_flags {
 (_cfg.flags & (1 << VIRTIO_GPU_FLAG_BLOB_ENABLED))
 #define virtio_gpu_context_init_enabled(_cfg) \
 (_cfg.flags & (1 << VIRTIO_GPU_FLAG_CONTEXT_INIT_ENABLED))
+#define virtio_gpu_hostmem_enabled(_cfg) \
+(_cfg.hostmem > 0)
 
 struct virtio_gpu_base_conf {
 uint32_t max_outputs;
 uint32_t flags;
 uint32_t xres;
 uint32_t yres;
+uint64_t hostmem;
 };
 
 struct virtio_gpu_ctrl_command {
@@ -137,6 +140,8 @@ struct VirtIOGPUBase {
 int renderer_blocked;
 int enable;
 
+MemoryRegion hostmem;
+
 struct virtio_gpu_scanout scanout[VIRTIO_GPU_MAX_SCANOUTS];
 
 int enabled_output_bitmask;
-- 
2.42.0.609.gbb76f46606-goog

[PATCH v17 8/9] gfxstream + rutabaga: enable rutabaga

2023-10-05 Thread Gurchetan Singh

This change enables rutabaga to receive virtio-gpu-3d hypercalls
when it is active.

Signed-off-by: Gurchetan Singh 
Tested-by: Alyssa Ross 
Tested-by: Emmanouil Pitsidianakis 
Tested-by: Akihiko Odaki 
Reviewed-by: Antonio Caggiano 
Reviewed-by: Emmanouil Pitsidianakis 
Reviewed-by: Akihiko Odaki 
---
 hw/display/virtio-gpu-base.c | 3 ++-
 hw/display/virtio-gpu.c  | 5 +++--
 softmmu/qdev-monitor.c   | 3 +++
 softmmu/vl.c | 1 +
 4 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/hw/display/virtio-gpu-base.c b/hw/display/virtio-gpu-base.c
index 4f2b0ba1f3..50c5373b65 100644
--- a/hw/display/virtio-gpu-base.c
+++ b/hw/display/virtio-gpu-base.c
@@ -223,7 +223,8 @@ virtio_gpu_base_get_features(VirtIODevice *vdev, uint64_t 
features,
 {
 VirtIOGPUBase *g = VIRTIO_GPU_BASE(vdev);
 
-if (virtio_gpu_virgl_enabled(g->conf)) {
+if (virtio_gpu_virgl_enabled(g->conf) ||
+virtio_gpu_rutabaga_enabled(g->conf)) {
 features |= (1 << VIRTIO_GPU_F_VIRGL);
 }
 if (virtio_gpu_edid_enabled(g->conf)) {
diff --git a/hw/display/virtio-gpu.c b/hw/display/virtio-gpu.c
index be16efbd38..6efd15b6ae 100644
--- a/hw/display/virtio-gpu.c
+++ b/hw/display/virtio-gpu.c
@@ -1363,8 +1363,9 @@ void virtio_gpu_device_realize(DeviceState *qdev, Error 
**errp)
 VirtIOGPU *g = VIRTIO_GPU(qdev);
 
 if (virtio_gpu_blob_enabled(g->parent_obj.conf)) {
-if (!virtio_gpu_have_udmabuf()) {
-error_setg(errp, "cannot enable blob resources without udmabuf");
+if (!virtio_gpu_rutabaga_enabled(g->parent_obj.conf) &&
+!virtio_gpu_have_udmabuf()) {
+error_setg(errp, "need rutabaga or udmabuf for blob resources");
 return;
 }
 
diff --git a/softmmu/qdev-monitor.c b/softmmu/qdev-monitor.c
index 74f4e41338..1b8005ae55 100644
--- a/softmmu/qdev-monitor.c
+++ b/softmmu/qdev-monitor.c
@@ -86,6 +86,9 @@ static const QDevAlias qdev_alias_table[] = {
 { "virtio-gpu-pci", "virtio-gpu", QEMU_ARCH_VIRTIO_PCI },
 { "virtio-gpu-gl-device", "virtio-gpu-gl", QEMU_ARCH_VIRTIO_MMIO },
 { "virtio-gpu-gl-pci", "virtio-gpu-gl", QEMU_ARCH_VIRTIO_PCI },
+{ "virtio-gpu-rutabaga-device", "virtio-gpu-rutabaga",
+  QEMU_ARCH_VIRTIO_MMIO },
+{ "virtio-gpu-rutabaga-pci", "virtio-gpu-rutabaga", QEMU_ARCH_VIRTIO_PCI },
 { "virtio-input-host-device", "virtio-input-host", QEMU_ARCH_VIRTIO_MMIO },
 { "virtio-input-host-ccw", "virtio-input-host", QEMU_ARCH_VIRTIO_CCW },
 { "virtio-input-host-pci", "virtio-input-host", QEMU_ARCH_VIRTIO_PCI },
diff --git a/softmmu/vl.c b/softmmu/vl.c
index 98e071e63b..dd82c6eb13 100644
--- a/softmmu/vl.c
+++ b/softmmu/vl.c
@@ -215,6 +215,7 @@ static struct {
 { .driver = "ati-vga",  .flag = _vga   },
 { .driver = "vhost-user-vga",   .flag = _vga   },
 { .driver = "virtio-vga-gl",.flag = _vga   },
+{ .driver = "virtio-vga-rutabaga",  .flag = _vga   },
 };
 
 static QemuOptsList qemu_rtc_opts = {
-- 
2.42.0.609.gbb76f46606-goog

[PATCH v17 5/9] gfxstream + rutabaga prep: added need defintions, fields, and options

2023-10-05 Thread Gurchetan Singh

This modifies the common virtio-gpu.h file have the fields and
defintions needed by gfxstream/rutabaga, by VirtioGpuRutabaga.

Signed-off-by: Gurchetan Singh 
Tested-by: Alyssa Ross 
Tested-by: Emmanouil Pitsidianakis 
Tested-by: Akihiko Odaki 
Reviewed-by: Emmanouil Pitsidianakis 
Reviewed-by: Antonio Caggiano 
Reviewed-by: Akihiko Odaki 
---
 include/hw/virtio/virtio-gpu.h | 27 +++
 1 file changed, 27 insertions(+)

diff --git a/include/hw/virtio/virtio-gpu.h b/include/hw/virtio/virtio-gpu.h
index 55973e112f..39018377d2 100644
--- a/include/hw/virtio/virtio-gpu.h
+++ b/include/hw/virtio/virtio-gpu.h
@@ -38,6 +38,9 @@ OBJECT_DECLARE_SIMPLE_TYPE(VirtIOGPUGL, VIRTIO_GPU_GL)
 #define TYPE_VHOST_USER_GPU "vhost-user-gpu"
 OBJECT_DECLARE_SIMPLE_TYPE(VhostUserGPU, VHOST_USER_GPU)
 
+#define TYPE_VIRTIO_GPU_RUTABAGA "virtio-gpu-rutabaga-device"
+OBJECT_DECLARE_SIMPLE_TYPE(VirtIOGPURutabaga, VIRTIO_GPU_RUTABAGA)
+
 struct virtio_gpu_simple_resource {
 uint32_t resource_id;
 uint32_t width;
@@ -94,6 +97,7 @@ enum virtio_gpu_base_conf_flags {
 VIRTIO_GPU_FLAG_DMABUF_ENABLED,
 VIRTIO_GPU_FLAG_BLOB_ENABLED,
 VIRTIO_GPU_FLAG_CONTEXT_INIT_ENABLED,
+VIRTIO_GPU_FLAG_RUTABAGA_ENABLED,
 };
 
 #define virtio_gpu_virgl_enabled(_cfg) \
@@ -108,6 +112,8 @@ enum virtio_gpu_base_conf_flags {
 (_cfg.flags & (1 << VIRTIO_GPU_FLAG_BLOB_ENABLED))
 #define virtio_gpu_context_init_enabled(_cfg) \
 (_cfg.flags & (1 << VIRTIO_GPU_FLAG_CONTEXT_INIT_ENABLED))
+#define virtio_gpu_rutabaga_enabled(_cfg) \
+(_cfg.flags & (1 << VIRTIO_GPU_FLAG_RUTABAGA_ENABLED))
 #define virtio_gpu_hostmem_enabled(_cfg) \
 (_cfg.hostmem > 0)
 
@@ -232,6 +238,27 @@ struct VhostUserGPU {
 bool backend_blocked;
 };
 
+#define MAX_SLOTS 4096
+
+struct MemoryRegionInfo {
+int used;
+MemoryRegion mr;
+uint32_t resource_id;
+};
+
+struct rutabaga;
+
+struct VirtIOGPURutabaga {
+VirtIOGPU parent_obj;
+struct MemoryRegionInfo memory_regions[MAX_SLOTS];
+uint64_t capset_mask;
+char *wayland_socket_path;
+char *wsi;
+bool headless;
+uint32_t num_capsets;
+struct rutabaga *rutabaga;
+};
+
 #define VIRTIO_GPU_FILL_CMD(out) do {   \
 size_t s;   \
 s = iov_to_buf(cmd->elem.out_sg, cmd->elem.out_num, 0,  \
-- 
2.42.0.609.gbb76f46606-goog

[PATCH v17 2/9] virtio-gpu: CONTEXT_INIT feature

2023-10-05 Thread Gurchetan Singh

From: Antonio Caggiano 

The feature can be enabled when a backend wants it.

Signed-off-by: Antonio Caggiano 
Signed-off-by: Gurchetan Singh 
Tested-by: Alyssa Ross 
Tested-by: Akihiko Odaki 
Tested-by: Huang Rui 
Acked-by: Huang Rui 
Reviewed-by: Marc-André Lureau 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Akihiko Odaki 
---
 hw/display/virtio-gpu-base.c   | 3 +++
 include/hw/virtio/virtio-gpu.h | 3 +++
 2 files changed, 6 insertions(+)

diff --git a/hw/display/virtio-gpu-base.c b/hw/display/virtio-gpu-base.c
index ca1fb7b16f..4f2b0ba1f3 100644
--- a/hw/display/virtio-gpu-base.c
+++ b/hw/display/virtio-gpu-base.c
@@ -232,6 +232,9 @@ virtio_gpu_base_get_features(VirtIODevice *vdev, uint64_t 
features,
 if (virtio_gpu_blob_enabled(g->conf)) {
 features |= (1 << VIRTIO_GPU_F_RESOURCE_BLOB);
 }
+if (virtio_gpu_context_init_enabled(g->conf)) {
+features |= (1 << VIRTIO_GPU_F_CONTEXT_INIT);
+}
 
 return features;
 }
diff --git a/include/hw/virtio/virtio-gpu.h b/include/hw/virtio/virtio-gpu.h
index 390c4642b8..8377c365ef 100644
--- a/include/hw/virtio/virtio-gpu.h
+++ b/include/hw/virtio/virtio-gpu.h
@@ -93,6 +93,7 @@ enum virtio_gpu_base_conf_flags {
 VIRTIO_GPU_FLAG_EDID_ENABLED,
 VIRTIO_GPU_FLAG_DMABUF_ENABLED,
 VIRTIO_GPU_FLAG_BLOB_ENABLED,
+VIRTIO_GPU_FLAG_CONTEXT_INIT_ENABLED,
 };
 
 #define virtio_gpu_virgl_enabled(_cfg) \
@@ -105,6 +106,8 @@ enum virtio_gpu_base_conf_flags {
 (_cfg.flags & (1 << VIRTIO_GPU_FLAG_DMABUF_ENABLED))
 #define virtio_gpu_blob_enabled(_cfg) \
 (_cfg.flags & (1 << VIRTIO_GPU_FLAG_BLOB_ENABLED))
+#define virtio_gpu_context_init_enabled(_cfg) \
+(_cfg.flags & (1 << VIRTIO_GPU_FLAG_CONTEXT_INIT_ENABLED))
 
 struct virtio_gpu_base_conf {
 uint32_t max_outputs;
-- 
2.42.0.609.gbb76f46606-goog

[PATCH v17 1/9] virtio: Add shared memory capability

2023-10-05 Thread Gurchetan Singh

From: "Dr. David Alan Gilbert" 

Define a new capability type 'VIRTIO_PCI_CAP_SHARED_MEMORY_CFG' to allow
defining shared memory regions with sizes and offsets of 2^32 and more.
Multiple instances of the capability are allowed and distinguished
by a device-specific 'id'.

Signed-off-by: Dr. David Alan Gilbert 
Signed-off-by: Antonio Caggiano 
Signed-off-by: Gurchetan Singh 
Tested-by: Alyssa Ross 
Tested-by: Huang Rui 
Tested-by: Akihiko Odaki 
Acked-by: Huang Rui 
Reviewed-by: Gurchetan Singh 
Reviewed-by: Akihiko Odaki 
---
 hw/virtio/virtio-pci.c | 18 ++
 include/hw/virtio/virtio-pci.h |  4 
 2 files changed, 22 insertions(+)

diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index edbc0daa18..da8c9ea12d 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -1435,6 +1435,24 @@ static int virtio_pci_add_mem_cap(VirtIOPCIProxy *proxy,
 return offset;
 }
 
+int virtio_pci_add_shm_cap(VirtIOPCIProxy *proxy,
+   uint8_t bar, uint64_t offset, uint64_t length,
+   uint8_t id)
+{
+struct virtio_pci_cap64 cap = {
+.cap.cap_len = sizeof cap,
+.cap.cfg_type = VIRTIO_PCI_CAP_SHARED_MEMORY_CFG,
+};
+
+cap.cap.bar = bar;
+cap.cap.length = cpu_to_le32(length);
+cap.length_hi = cpu_to_le32(length >> 32);
+cap.cap.offset = cpu_to_le32(offset);
+cap.offset_hi = cpu_to_le32(offset >> 32);
+cap.cap.id = id;
+return virtio_pci_add_mem_cap(proxy, );
+}
+
 static uint64_t virtio_pci_common_read(void *opaque, hwaddr addr,
unsigned size)
 {
diff --git a/include/hw/virtio/virtio-pci.h b/include/hw/virtio/virtio-pci.h
index ab2051b64b..5a3f182f99 100644
--- a/include/hw/virtio/virtio-pci.h
+++ b/include/hw/virtio/virtio-pci.h
@@ -264,4 +264,8 @@ unsigned virtio_pci_optimal_num_queues(unsigned 
fixed_queues);
 void virtio_pci_set_guest_notifier_fd_handler(VirtIODevice *vdev, VirtQueue 
*vq,
   int n, bool assign,
   bool with_irqfd);
+
+int virtio_pci_add_shm_cap(VirtIOPCIProxy *proxy, uint8_t bar, uint64_t offset,
+   uint64_t length, uint8_t id);
+
 #endif
-- 
2.42.0.609.gbb76f46606-goog

[PATCH v17 7/9] gfxstream + rutabaga: meson support

2023-10-05 Thread Gurchetan Singh

- Add meson detection of rutabaga_gfx
- Build virtio-gpu-rutabaga.c + associated vga/pci files when
  present

Signed-off-by: Gurchetan Singh 
Tested-by: Alyssa Ross 
Tested-by: Emmanouil Pitsidianakis 
Tested-by: Akihiko Odaki 
Reviewed-by: Emmanouil Pitsidianakis 
Reviewed-by: Antonio Caggiano 
Reviewed-by: Akihiko Odaki 
---
 hw/display/meson.build| 22 ++
 meson.build   |  7 +++
 meson_options.txt |  2 ++
 scripts/meson-buildoptions.sh |  3 +++
 4 files changed, 34 insertions(+)

diff --git a/hw/display/meson.build b/hw/display/meson.build
index 05619c6968..2b64fd9f9d 100644
--- a/hw/display/meson.build
+++ b/hw/display/meson.build
@@ -80,6 +80,13 @@ if config_all_devices.has_key('CONFIG_VIRTIO_GPU')
  if_true: [files('virtio-gpu-gl.c', 
'virtio-gpu-virgl.c'), pixman, virgl])
 hw_display_modules += {'virtio-gpu-gl': virtio_gpu_gl_ss}
   endif
+
+  if rutabaga.found()
+virtio_gpu_rutabaga_ss = ss.source_set()
+virtio_gpu_rutabaga_ss.add(when: ['CONFIG_VIRTIO_GPU', rutabaga],
+   if_true: [files('virtio-gpu-rutabaga.c'), 
pixman])
+hw_display_modules += {'virtio-gpu-rutabaga': virtio_gpu_rutabaga_ss}
+  endif
 endif
 
 if config_all_devices.has_key('CONFIG_VIRTIO_PCI')
@@ -96,6 +103,12 @@ if config_all_devices.has_key('CONFIG_VIRTIO_PCI')
  if_true: [files('virtio-gpu-pci-gl.c'), pixman])
 hw_display_modules += {'virtio-gpu-pci-gl': virtio_gpu_pci_gl_ss}
   endif
+  if rutabaga.found()
+virtio_gpu_pci_rutabaga_ss = ss.source_set()
+virtio_gpu_pci_rutabaga_ss.add(when: ['CONFIG_VIRTIO_GPU', 
'CONFIG_VIRTIO_PCI', rutabaga],
+   if_true: 
[files('virtio-gpu-pci-rutabaga.c'), pixman])
+hw_display_modules += {'virtio-gpu-pci-rutabaga': 
virtio_gpu_pci_rutabaga_ss}
+  endif
 endif
 
 if config_all_devices.has_key('CONFIG_VIRTIO_VGA')
@@ -114,6 +127,15 @@ if config_all_devices.has_key('CONFIG_VIRTIO_VGA')
   virtio_vga_gl_ss.add(when: 'CONFIG_ACPI', if_true: files('acpi-vga.c'),
 if_false: files('acpi-vga-stub.c'))
   hw_display_modules += {'virtio-vga-gl': virtio_vga_gl_ss}
+
+  if rutabaga.found()
+virtio_vga_rutabaga_ss = ss.source_set()
+virtio_vga_rutabaga_ss.add(when: ['CONFIG_VIRTIO_VGA', rutabaga],
+   if_true: [files('virtio-vga-rutabaga.c'), 
pixman])
+virtio_vga_rutabaga_ss.add(when: 'CONFIG_ACPI', if_true: 
files('acpi-vga.c'),
+if_false: 
files('acpi-vga-stub.c'))
+hw_display_modules += {'virtio-vga-rutabaga': virtio_vga_rutabaga_ss}
+  endif
 endif
 
 system_ss.add(when: 'CONFIG_OMAP', if_true: files('omap_lcdc.c'))
diff --git a/meson.build b/meson.build
index 3bb64b536c..65848a662a 100644
--- a/meson.build
+++ b/meson.build
@@ -1046,6 +1046,12 @@ if not get_option('virglrenderer').auto() or have_system 
or have_vhost_user_gpu
dependencies: virgl))
   endif
 endif
+rutabaga = not_found
+if not get_option('rutabaga_gfx').auto() or have_system or have_vhost_user_gpu
+  rutabaga = dependency('rutabaga_gfx_ffi',
+ method: 'pkg-config',
+ required: get_option('rutabaga_gfx'))
+endif
 blkio = not_found
 if not get_option('blkio').auto() or have_block
   blkio = dependency('blkio',
@@ -4277,6 +4283,7 @@ summary_info += {'libtasn1':  tasn1}
 summary_info += {'PAM':   pam}
 summary_info += {'iconv support': iconv}
 summary_info += {'virgl support': virgl}
+summary_info += {'rutabaga support':  rutabaga}
 summary_info += {'blkio support': blkio}
 summary_info += {'curl support':  curl}
 summary_info += {'Multipath support': mpathpersist}
diff --git a/meson_options.txt b/meson_options.txt
index 6a17b90968..e49309dd78 100644
--- a/meson_options.txt
+++ b/meson_options.txt
@@ -230,6 +230,8 @@ option('vmnet', type : 'feature', value : 'auto',
description: 'vmnet.framework network backend support')
 option('virglrenderer', type : 'feature', value : 'auto',
description: 'virgl rendering support')
+option('rutabaga_gfx', type : 'feature', value : 'auto',
+   description: 'rutabaga_gfx support')
 option('png', type : 'feature', value : 'auto',
description: 'PNG support with libpng')
 option('vnc', type : 'feature', value : 'auto',
diff --git a/scripts/meson-buildoptions.sh b/scripts/meson-buildoptions.sh
index 2a74b0275b..a28ccbcaf6 100644
--- a/scripts/meson-buildoptions.sh
+++ b/scripts/meson-buildoptions.sh
@@ -156,6 +156,7 @@ meson_options_help() {
   printf "%s\n" '  rbd Ceph block device driver'
   printf "%s\n" '  rdmaEnable RDMA-based migration'
   printf "%s\n" '  replication replication support'
+  printf "%s\n" '  rutabaga-gfxrutabaga_gfx support'
   printf "%s\n" '  sdl SDL

[PATCH v17 9/9] docs/system: add basic virtio-gpu documentation

2023-10-05 Thread Gurchetan Singh

This adds basic documentation for virtio-gpu.

Suggested-by: Akihiko Odaki 
Signed-off-by: Gurchetan Singh 
Tested-by: Alyssa Ross 
Tested-by: Emmanouil Pitsidianakis 
Tested-by: Akihiko Odaki 
Reviewed-by: Emmanouil Pitsidianakis 
Reviewed-by: Antonio Caggiano 
Reviewed-by: Akihiko Odaki 
---
 docs/system/device-emulation.rst   |   1 +
 docs/system/devices/virtio-gpu.rst | 112 +
 2 files changed, 113 insertions(+)
 create mode 100644 docs/system/devices/virtio-gpu.rst

diff --git a/docs/system/device-emulation.rst b/docs/system/device-emulation.rst
index 4491c4cbf7..1167f3a9f2 100644
--- a/docs/system/device-emulation.rst
+++ b/docs/system/device-emulation.rst
@@ -91,6 +91,7 @@ Emulated Devices
devices/nvme.rst
devices/usb.rst
devices/vhost-user.rst
+   devices/virtio-gpu.rst
devices/virtio-pmem.rst
devices/vhost-user-rng.rst
devices/canokey.rst
diff --git a/docs/system/devices/virtio-gpu.rst 
b/docs/system/devices/virtio-gpu.rst
new file mode 100644
index 00..cb73dd7998
--- /dev/null
+++ b/docs/system/devices/virtio-gpu.rst
@@ -0,0 +1,112 @@
+..
+   SPDX-License-Identifier: GPL-2.0-or-later
+
+virtio-gpu
+==
+
+This document explains the setup and usage of the virtio-gpu device.
+The virtio-gpu device paravirtualizes the GPU and display controller.
+
+Linux kernel support
+
+
+virtio-gpu requires a guest Linux kernel built with the
+``CONFIG_DRM_VIRTIO_GPU`` option.
+
+QEMU virtio-gpu variants
+
+
+QEMU virtio-gpu device variants come in the following form:
+
+ * ``virtio-vga[-BACKEND]``
+ * ``virtio-gpu[-BACKEND][-INTERFACE]``
+ * ``vhost-user-vga``
+ * ``vhost-user-pci``
+
+**Backends:** QEMU provides a 2D virtio-gpu backend, and two accelerated
+backends: virglrenderer ('gl' device label) and rutabaga_gfx ('rutabaga'
+device label).  There is a vhost-user backend that runs the graphics stack
+in a separate process for improved isolation.
+
+**Interfaces:** QEMU further categorizes virtio-gpu device variants based
+on the interface exposed to the guest. The interfaces can be classified
+into VGA and non-VGA variants. The VGA ones are prefixed with virtio-vga
+or vhost-user-vga while the non-VGA ones are prefixed with virtio-gpu or
+vhost-user-gpu.
+
+The VGA ones always use the PCI interface, but for the non-VGA ones, the
+user can further pick between MMIO or PCI. For MMIO, the user can suffix
+the device name with -device, though vhost-user-gpu does not support MMIO.
+For PCI, the user can suffix it with -pci. Without these suffixes, the
+platform default will be chosen.
+
+virtio-gpu 2d
+-
+
+The default 2D backend only performs 2D operations. The guest needs to
+employ a software renderer for 3D graphics.
+
+Typically, the software renderer is provided by `Mesa`_ or `SwiftShader`_.
+Mesa's implementations (LLVMpipe, Lavapipe and virgl below) work out of box
+on typical modern Linux distributions.
+
+.. parsed-literal::
+-device virtio-gpu
+
+.. _Mesa: https://www.mesa3d.org/
+.. _SwiftShader: https://github.com/google/swiftshader
+
+virtio-gpu virglrenderer
+
+
+When using virgl accelerated graphics mode in the guest, OpenGL API calls
+are translated into an intermediate representation (see `Gallium3D`_). The
+intermediate representation is communicated to the host and the
+`virglrenderer`_ library on the host translates the intermediate
+representation back to OpenGL API calls.
+
+.. parsed-literal::
+-device virtio-gpu-gl
+
+.. _Gallium3D: https://www.freedesktop.org/wiki/Software/gallium/
+.. _virglrenderer: https://gitlab.freedesktop.org/virgl/virglrenderer/
+
+virtio-gpu rutabaga
+---
+
+virtio-gpu can also leverage rutabaga_gfx to provide `gfxstream`_
+rendering and `Wayland display passthrough`_.  With the gfxstream rendering
+mode, GLES and Vulkan calls are forwarded to the host with minimal
+modification.
+
+The crosvm book provides directions on how to build a `gfxstream-enabled
+rutabaga`_ and launch a `guest Wayland proxy`_.
+
+This device does require host blob support (``hostmem`` field below). The
+``hostmem`` field specifies the size of virtio-gpu host memory window.
+This is typically between 256M and 8G.
+
+At least one virtio-gpu capability set ("capset") must be specified when
+starting the device.  The currently capsets supported are ``gfxstream-vulkan``
+and ``cross-domain`` for Linux guests. For Android guests, the experimental
+``x-gfxstream-gles`` and ``x-gfxstream-composer`` capsets are also supported.
+
+The device will try to auto-detect the wayland socket path if the
+``cross-domain`` capset name is set.  The user may optionally specify
+``wayland-socket-path`` for non-standard paths.
+
+The ``wsi`` option can be set to ``surfaceless`` or ``headless``.
+Surfaceless doesn't create a native window surface, but does copy from the
+render target to the Pixman buffer if a virtio-gpu 2D hypercall is

[PATCH v17 0/9] gfxstream + rutabaga_gfx

2023-10-05 Thread Gurchetan Singh

From: Gurchetan Singh 

Branch containing changes:

https://gitlab.com/gurchetansingh/qemu/-/commits/qemu-gfxstream-v17

Changes since v16:

- Fixed typo mentioned here:

https://lists.gnu.org/archive/html/qemu-devel/2023-10/msg01407.html

Antonio Caggiano (2):
  virtio-gpu: CONTEXT_INIT feature
  virtio-gpu: blob prep

Dr. David Alan Gilbert (1):
  virtio: Add shared memory capability

Gerd Hoffmann (1):
  virtio-gpu: hostmem

Gurchetan Singh (5):
  gfxstream + rutabaga prep: added need defintions, fields, and options
  gfxstream + rutabaga: add initial support for gfxstream
  gfxstream + rutabaga: meson support
  gfxstream + rutabaga: enable rutabaga
  docs/system: add basic virtio-gpu documentation

 docs/system/device-emulation.rst |1 +
 docs/system/devices/virtio-gpu.rst   |  112 +++
 hw/display/meson.build   |   22 +
 hw/display/virtio-gpu-base.c |6 +-
 hw/display/virtio-gpu-pci-rutabaga.c |   47 ++
 hw/display/virtio-gpu-pci.c  |   14 +
 hw/display/virtio-gpu-rutabaga.c | 1113 ++
 hw/display/virtio-gpu.c  |   16 +-
 hw/display/virtio-vga-rutabaga.c |   50 ++
 hw/display/virtio-vga.c  |   33 +-
 hw/virtio/virtio-pci.c   |   18 +
 include/hw/virtio/virtio-gpu-bswap.h |   15 +
 include/hw/virtio/virtio-gpu.h   |   40 +
 include/hw/virtio/virtio-pci.h   |4 +
 meson.build  |7 +
 meson_options.txt|2 +
 scripts/meson-buildoptions.sh|3 +
 softmmu/qdev-monitor.c   |3 +
 softmmu/vl.c |1 +
 19 files changed, 1488 insertions(+), 19 deletions(-)
 create mode 100644 docs/system/devices/virtio-gpu.rst
 create mode 100644 hw/display/virtio-gpu-pci-rutabaga.c
 create mode 100644 hw/display/virtio-gpu-rutabaga.c
 create mode 100644 hw/display/virtio-vga-rutabaga.c

-- 
2.42.0.609.gbb76f46606-goog

[PATCH v17 6/9] gfxstream + rutabaga: add initial support for gfxstream

2023-10-05 Thread Gurchetan Singh

This adds initial support for gfxstream and cross-domain.  Both
features rely on virtio-gpu blob resources and context types, which
are also implemented in this patch.

gfxstream has a long and illustrious history in Android graphics
paravirtualization.  It has been powering graphics in the Android
Studio Emulator for more than a decade, which is the main developer
platform.

Originally conceived by Jesse Hall, it was first known as "EmuGL" [a].
The key design characteristic was a 1:1 threading model and
auto-generation, which fit nicely with the OpenGLES spec.  It also
allowed easy layering with ANGLE on the host, which provides the GLES
implementations on Windows or MacOS enviroments.

gfxstream has traditionally been maintained by a single engineer, and
between 2015 to 2021, the goldfish throne passed to Frank Yang.
Historians often remark this glorious reign ("pax gfxstreama" is the
academic term) was comparable to that of Augustus and both Queen
Elizabeths.  Just to name a few accomplishments in a resplendent
panoply: higher versions of GLES, address space graphics, snapshot
support and CTS compliant Vulkan [b].

One major drawback was the use of out-of-tree goldfish drivers.
Android engineers didn't know much about DRM/KMS and especially TTM so
a simple guest to host pipe was conceived.

Luckily, virtio-gpu 3D started to emerge in 2016 due to the work of
the Mesa/virglrenderer communities.  In 2018, the initial virtio-gpu
port of gfxstream was done by Cuttlefish enthusiast Alistair Delva.
It was a symbol compatible replacement of virglrenderer [c] and named
"AVDVirglrenderer".  This implementation forms the basis of the
current gfxstream host implementation still in use today.

cross-domain support follows a similar arc.  Originally conceived by
Wayland aficionado David Reveman and crosvm enjoyer Zach Reizner in
2018, it initially relied on the downstream "virtio-wl" device.

In 2020 and 2021, virtio-gpu was extended to include blob resources
and multiple timelines by yours truly, features gfxstream/cross-domain
both require to function correctly.

Right now, we stand at the precipice of a truly fantastic possibility:
the Android Emulator powered by upstream QEMU and upstream Linux
kernel.  gfxstream will then be packaged properfully, and app
developers can even fix gfxstream bugs on their own if they encounter
them.

It's been quite the ride, my friends.  Where will gfxstream head next,
nobody really knows.  I wouldn't be surprised if it's around for
another decade, maintained by a new generation of Android graphics
enthusiasts.

Technical details:
  - Very simple initial display integration: just used Pixman
  - Largely, 1:1 mapping of virtio-gpu hypercalls to rutabaga function
calls

Next steps for Android VMs:
  - The next step would be improving display integration and UI interfaces
with the goal of the QEMU upstream graphics being in an emulator
release [d].

Next steps for Linux VMs for display virtualization:
  - For widespread distribution, someone needs to package Sommelier or the
wayland-proxy-virtwl [e] ideally into Debian main. In addition, newer
versions of the Linux kernel come with DRM_VIRTIO_GPU_KMS option,
which allows disabling KMS hypercalls.  If anyone cares enough, it'll
probably be possible to build a custom VM variant that uses this display
virtualization strategy.

[a] https://android-review.googlesource.com/c/platform/development/+/34470
[b] 
https://android-review.googlesource.com/q/topic:%22vulkan-hostconnection-start%22
[c] 
https://android-review.googlesource.com/c/device/generic/goldfish-opengl/+/761927
[d] https://developer.android.com/studio/releases/emulator
[e] https://github.com/talex5/wayland-proxy-virtwl

Signed-off-by: Gurchetan Singh 
Tested-by: Alyssa Ross 
Tested-by: Emmanouil Pitsidianakis 
Tested-by: Akihiko Odaki 
Reviewed-by: Emmanouil Pitsidianakis 
Reviewed-by: Antonio Caggiano 
Reviewed-by: Akihiko Odaki 
---
 hw/display/virtio-gpu-pci-rutabaga.c |   47 ++
 hw/display/virtio-gpu-rutabaga.c | 1113 ++
 hw/display/virtio-vga-rutabaga.c |   50 ++
 3 files changed, 1210 insertions(+)
 create mode 100644 hw/display/virtio-gpu-pci-rutabaga.c
 create mode 100644 hw/display/virtio-gpu-rutabaga.c
 create mode 100644 hw/display/virtio-vga-rutabaga.c

diff --git a/hw/display/virtio-gpu-pci-rutabaga.c 
b/hw/display/virtio-gpu-pci-rutabaga.c
new file mode 100644
index 00..c96729e198
--- /dev/null
+++ b/hw/display/virtio-gpu-pci-rutabaga.c
@@ -0,0 +1,47 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "qemu/module.h"
+#include "hw/pci/pci.h"
+#include "hw/qdev-properties.h"
+#include "hw/virtio/virtio.h"
+#include "hw/virtio/virtio-bus.h"
+#include "hw/virtio/virtio-gpu-pci.h"
+#include "qom/object.h"
+
+#define TYPE_VIRTIO_GPU_RUTABAGA_PCI "virtio-gpu-rutabaga-pci"
+OBJECT_DECLARE_SIMPLE_TYPE(VirtIOGPURutabagaPCI,

Re: [RFC PATCH v2 1/9] Add Rust SEV library as subproject

2023-10-05 Thread Tyler Fanelli


On 10/5/23 2:03 AM, Philippe Mathieu-Daudé wrote:

Hi Tyler,

On 4/10/23 22:34, Tyler Fanelli wrote:

The Rust sev library provides a C API for the AMD SEV launch ioctls, as
well as the ability to build with meson. Add the Rust sev library as a
QEMU subproject with the goal of outsourcing all SEV launch ioctls to C
APIs provided by it.

Signed-off-by: Tyler Fanelli 
---
  meson.build   | 8 
  meson_options.txt | 2 ++
  scripts/meson-buildoptions.sh | 3 +++
  subprojects/sev.wrap  | 6 ++
  target/i386/meson.build   | 2 +-
  5 files changed, 20 insertions(+), 1 deletion(-)
  create mode 100644 subprojects/sev.wrap




diff --git a/subprojects/sev.wrap b/subprojects/sev.wrap
new file mode 100644
index 00..5be1faccf6
--- /dev/null
+++ b/subprojects/sev.wrap
@@ -0,0 +1,6 @@
+[wrap-git]
+url = https://github.com/tylerfanelli/sev
+revision = b81b1da5df50055600a5b0349b0c4afda677cccb


Why use your tree instead of the mainstream one?

Before this gets merged we need to mirror the subproject
on our GitLab namespace, then use the mirror URL here.

The required meson changes for the sev library are still in review, so 
I'm still working on a personal branch. Those patches are a blocker for 
this series right now.


This is moreso another RFC to get feedback on building Rust libraries as 
QEMU subprojects (and if this is the proper way to do so).



Tyler

Re: [PATCH v2 5/5] hw/intc/apic: Pass CPU using QOM link property

2023-10-05 Thread Paolo Bonzini


On 10/3/23 10:27, Philippe Mathieu-Daudé wrote:

-/* TODO: convert to link<> */
-apic = APIC_COMMON(cpu->apic_state);
-apic->cpu = cpu;
-apic->apicbase = APIC_DEFAULT_ADDRESS | MSR_IA32_APICBASE_ENABLE;
+qdev_prop_set_uint32(cpu->apic_state, "base-addr",
+ APIC_DEFAULT_ADDRESS | MSR_IA32_APIC


For this to use a link, it's missing the corresponding 
object_unref(apic->cpu) + apic->cpu = NULL assignment somewhere.  For 
example you can add it in apic_common_unrealize (called by 
device_unparent - which is called in turn by x86_cpu_unrealizefn).


Paolo

Re: [PATCH v2 4/5] hw/intc/apic: Rename x86_cpu_apic_create() -> x86_cpu_apic_new()

2023-10-05 Thread Paolo Bonzini


On 10/3/23 10:27, Philippe Mathieu-Daudé wrote:

-x86_cpu_apic_create(cpu, _err);
-if (local_err != NULL) {
-goto out;
-}
+x86_cpu_apic_new(cpu);


I don't like this, "*_new" is generally for functions that return what 
they create.


Patch 2 is scary with the newly-introduced possible failure, but I 
suppose it's safer if you reason that any problem will occur at startup, 
not at hotplug time for example.


Paolo

Re: [PATCH v2] coverity: physmem: use simple assertions instead of modelling

2023-10-05 Thread Paolo Bonzini

On Thu, Oct 5, 2023 at 4:04 PM Vladimir Sementsov-Ogievskiy
 wrote:
> +/*
> + * Assure Coverity (and ourselves) that we are not going to 
> OVERRUN
> + * the buffer by following ldn_he_p().
> + */
> +assert((l == 1 && len >= 1) ||
> +   (l == 2 && len >= 2) ||
> +   (l == 4 && len >= 4) ||
> +   (l == 8 && len >= 8));

I'll queue it shortly, but perhaps you can try if assert(l <= len) is enough?

Alternatively I can try applying the patch on top of the tree that we
test with, and see how things go.

Paolo

>  val = ldn_he_p(buf, l);
>  result |= memory_region_dispatch_write(mr, addr1, val,
> size_memop(l), attrs);
> @@ -2784,6 +2793,15 @@ MemTxResult flatview_read_continue(FlatView *fv, 
> hwaddr addr,
>  l = memory_access_size(mr, l, addr1);
>  result |= memory_region_dispatch_read(mr, addr1, ,
>size_memop(l), attrs);
> +
> +/*
> + * Assure Coverity (and ourselves) that we are not going to 
> OVERRUN
> + * the buffer by following stn_he_p().
> + */
> +assert((l == 1 && len >= 1) ||
> +   (l == 2 && len >= 2) ||
> +   (l == 4 && len >= 4) ||
> +   (l == 8 && len >= 8));
>  stn_he_p(buf, l, val);
>  } else {
>  /* RAM case */
> --
> 2.34.1
>

[PATCH v2 2/3] target/hexagon: fix some occurrences of -Wshadow=local

2023-10-05 Thread Brian Cain

Of the changes in this commit, the changes in `HELPER(commit_hvx_stores)()`
are less obvious.  They are required because of some macro invocations like
SCATTER_OP_WRITE_TO_MEM().

e.g.:

In file included from ../target/hexagon/op_helper.c:31:
../target/hexagon/mmvec/macros.h:205:18: error: declaration of ‘i’ shadows 
a previous local [-Werror=shadow=compatible-local]
  205 | for (int i = 0; i < sizeof(MMVector); i += sizeof(TYPE)) { \
  |  ^
../target/hexagon/op_helper.c:157:17: note: in expansion of macro 
‘SCATTER_OP_WRITE_TO_MEM’
  157 | SCATTER_OP_WRITE_TO_MEM(uint16_t);
  | ^~~
../target/hexagon/op_helper.c:135:9: note: shadowed declaration is here
  135 | int i;
  | ^
In file included from ../target/hexagon/op_helper.c:31:
../target/hexagon/mmvec/macros.h:204:19: error: declaration of ‘ra’ shadows 
a previous local [-Werror=shadow=compatible-local]
  204 | uintptr_t ra = GETPC(); \
  |   ^~
../target/hexagon/op_helper.c:160:17: note: in expansion of macro 
‘SCATTER_OP_WRITE_TO_MEM’
  160 | SCATTER_OP_WRITE_TO_MEM(uint32_t);
  | ^~~
../target/hexagon/op_helper.c:134:15: note: shadowed declaration is here
  134 | uintptr_t ra = GETPC();
  |   ^~

Reviewed-by: Matheus Tavares Bernardino 
Signed-off-by: Brian Cain 
---
 target/hexagon/imported/alu.idef | 6 +++---
 target/hexagon/mmvec/macros.h| 6 +++---
 target/hexagon/op_helper.c   | 9 +++--
 target/hexagon/translate.c   | 9 -
 4 files changed, 13 insertions(+), 17 deletions(-)

diff --git a/target/hexagon/imported/alu.idef b/target/hexagon/imported/alu.idef
index 12d2aac5d4..b855676989 100644
--- a/target/hexagon/imported/alu.idef
+++ b/target/hexagon/imported/alu.idef
@@ -1142,9 +1142,9 @@ 
Q6INSN(A4_cround_rr,"Rd32=cround(Rs32,Rt32)",ATTRIBS(),"Convergent Round", {RdV
 tmp128 = fSHIFTR128(tmp128, SHIFT);\
 DST =  fCAST16S_8S(tmp128);\
 } else {\
-size16s_t rndbit_128 =  fCAST8S_16S((1LL << (SHIFT - 1))); \
-size16s_t src_128 =  fCAST8S_16S(SRC); \
-size16s_t tmp128 = fADD128(src_128, rndbit_128);\
+rndbit_128 =  fCAST8S_16S((1LL << (SHIFT - 1))); \
+src_128 =  fCAST8S_16S(SRC); \
+tmp128 = fADD128(src_128, rndbit_128);\
 tmp128 = fSHIFTR128(tmp128, SHIFT);\
 DST =  fCAST16S_8S(tmp128);\
 }
diff --git a/target/hexagon/mmvec/macros.h b/target/hexagon/mmvec/macros.h
index a655634fd1..728a63d35f 100644
--- a/target/hexagon/mmvec/macros.h
+++ b/target/hexagon/mmvec/macros.h
@@ -201,14 +201,14 @@
 } while (0)
 #define SCATTER_OP_WRITE_TO_MEM(TYPE) \
 do { \
-uintptr_t ra = GETPC(); \
+uintptr_t ra_ = GETPC(); \
 for (int i = 0; i < sizeof(MMVector); i += sizeof(TYPE)) { \
 if (test_bit(i, env->vtcm_log.mask)) { \
 TYPE dst = 0; \
 TYPE inc = 0; \
 for (int j = 0; j < sizeof(TYPE); j++) { \
 uint8_t val; \
-val = cpu_ldub_data_ra(env, env->vtcm_log.va[i + j], ra); \
+val = cpu_ldub_data_ra(env, env->vtcm_log.va[i + j], ra_); 
\
 dst |= val << (8 * j); \
 inc |= env->vtcm_log.data.ub[j + i] << (8 * j); \
 clear_bit(j + i, env->vtcm_log.mask); \
@@ -217,7 +217,7 @@
 dst += inc; \
 for (int j = 0; j < sizeof(TYPE); j++) { \
 cpu_stb_data_ra(env, env->vtcm_log.va[i + j], \
-(dst >> (8 * j)) & 0xFF, ra); \
+(dst >> (8 * j)) & 0xFF, ra_); \
 } \
 } \
 } \
diff --git a/target/hexagon/op_helper.c b/target/hexagon/op_helper.c
index 8ca3976a65..da10ac5847 100644
--- a/target/hexagon/op_helper.c
+++ b/target/hexagon/op_helper.c
@@ -132,10 +132,9 @@ void HELPER(gather_store)(CPUHexagonState *env, uint32_t 
addr, int slot)
 void HELPER(commit_hvx_stores)(CPUHexagonState *env)
 {
 uintptr_t ra = GETPC();
-int i;
 
 /* Normal (possibly masked) vector store */
-for (i = 0; i < VSTORES_MAX; i++) {
+for (int i = 0; i < VSTORES_MAX; i++) {
 if (env->vstore_pending[i]) {
 env->vstore_pending[i] = 0;
 target_ulong va = env->vstore[i].va;
@@ -162,7 +161,7 @@ void HELPER(commit_hvx_stores)(CPUHexagonState *env)
 g_assert_not_reached();
 }
 } else {
-for (i = 0; i < sizeof(MMVector); i++) {
+for (int i = 0; i < sizeof(MMVector); i++) {
 if (test_bit(i, env->vtcm_log.mask)) {
 cpu_stb_data_ra(env, env->vtcm_log.va[i],

[PATCH v2 1/3] target/hexagon: move GETPC() calls to top level helpers

2023-10-05 Thread Brian Cain

From: Matheus Tavares Bernardino 

As docs/devel/loads-stores.rst states:

  ``GETPC()`` should be used with great care: calling
  it in other functions that are *not* the top level
  ``HELPER(foo)`` will cause unexpected behavior. Instead, the
  value of ``GETPC()`` should be read from the helper and passed
  if needed to the functions that the helper calls.

Let's fix the GETPC() usage in Hexagon, making sure it's always called
from top level helpers and passed down to the places where it's
needed. There are a few snippets where that is not currently the case:

- probe_store(), which is only called from two helpers, so it's easy to
  move GETPC() up.

- mem_load*() functions, which are also called directly from helpers,
  but through the MEM_LOAD*() set of macros. Note that this are only
  used when compiling with --disable-hexagon-idef-parser.

  In this case, we also take this opportunity to simplify the code,
  unifying the mem_load*() functions.

- HELPER(probe_hvx_stores), when called from another helper, ends up
  using its own GETPC() expansion instead of the top level caller.

Signed-off-by: Matheus Tavares Bernardino 
Reviewed-by: Taylor Simpson 
Message-Id: 
<2c74c3696946edba7cc5b2942cf296a5af532052.1689070412.git.quic_mathb...@quicinc.com>-ne
Reviewed-by: Brian Cain 
Signed-off-by: Brian Cain 
---
 target/hexagon/macros.h| 19 +-
 target/hexagon/op_helper.c | 75 +++---
 target/hexagon/op_helper.h |  9 -
 3 files changed, 38 insertions(+), 65 deletions(-)

diff --git a/target/hexagon/macros.h b/target/hexagon/macros.h
index 5451b061ee..dafa0df6ed 100644
--- a/target/hexagon/macros.h
+++ b/target/hexagon/macros.h
@@ -173,15 +173,6 @@
 #define MEM_STORE8(VA, DATA, SLOT) \
 MEM_STORE8_FUNC(DATA)(cpu_env, VA, DATA, SLOT)
 #else
-#define MEM_LOAD1s(VA) ((int8_t)mem_load1(env, pkt_has_store_s1, slot, VA))
-#define MEM_LOAD1u(VA) ((uint8_t)mem_load1(env, pkt_has_store_s1, slot, VA))
-#define MEM_LOAD2s(VA) ((int16_t)mem_load2(env, pkt_has_store_s1, slot, VA))
-#define MEM_LOAD2u(VA) ((uint16_t)mem_load2(env, pkt_has_store_s1, slot, VA))
-#define MEM_LOAD4s(VA) ((int32_t)mem_load4(env, pkt_has_store_s1, slot, VA))
-#define MEM_LOAD4u(VA) ((uint32_t)mem_load4(env, pkt_has_store_s1, slot, VA))
-#define MEM_LOAD8s(VA) ((int64_t)mem_load8(env, pkt_has_store_s1, slot, VA))
-#define MEM_LOAD8u(VA) ((uint64_t)mem_load8(env, pkt_has_store_s1, slot, VA))
-
 #define MEM_STORE1(VA, DATA, SLOT) log_store32(env, VA, DATA, 1, SLOT)
 #define MEM_STORE2(VA, DATA, SLOT) log_store32(env, VA, DATA, 2, SLOT)
 #define MEM_STORE4(VA, DATA, SLOT) log_store32(env, VA, DATA, 4, SLOT)
@@ -530,8 +521,16 @@ static inline TCGv gen_read_ireg(TCGv result, TCGv val, 
int shift)
 #ifdef QEMU_GENERATE
 #define fLOAD(NUM, SIZE, SIGN, EA, DST) MEM_LOAD##SIZE##SIGN(DST, EA)
 #else
+#define MEM_LOAD1 cpu_ldub_data_ra
+#define MEM_LOAD2 cpu_lduw_data_ra
+#define MEM_LOAD4 cpu_ldl_data_ra
+#define MEM_LOAD8 cpu_ldq_data_ra
+
 #define fLOAD(NUM, SIZE, SIGN, EA, DST) \
-DST = (size##SIZE##SIGN##_t)MEM_LOAD##SIZE##SIGN(EA)
+do { \
+check_noshuf(env, pkt_has_store_s1, slot, EA, SIZE, GETPC()); \
+DST = (size##SIZE##SIGN##_t)MEM_LOAD##SIZE(env, EA, GETPC()); \
+} while (0)
 #endif
 
 #define fMEMOP(NUM, SIZE, SIGN, EA, FNTYPE, VALUE)
diff --git a/target/hexagon/op_helper.c b/target/hexagon/op_helper.c
index 12967ac21e..8ca3976a65 100644
--- a/target/hexagon/op_helper.c
+++ b/target/hexagon/op_helper.c
@@ -95,9 +95,8 @@ void HELPER(debug_check_store_width)(CPUHexagonState *env, 
int slot, int check)
 }
 }
 
-void HELPER(commit_store)(CPUHexagonState *env, int slot_num)
+static void commit_store(CPUHexagonState *env, int slot_num, uintptr_t ra)
 {
-uintptr_t ra = GETPC();
 uint8_t width = env->mem_log_stores[slot_num].width;
 target_ulong va = env->mem_log_stores[slot_num].va;
 
@@ -119,6 +118,12 @@ void HELPER(commit_store)(CPUHexagonState *env, int 
slot_num)
 }
 }
 
+void HELPER(commit_store)(CPUHexagonState *env, int slot_num)
+{
+uintptr_t ra = GETPC();
+commit_store(env, slot_num, ra);
+}
+
 void HELPER(gather_store)(CPUHexagonState *env, uint32_t addr, int slot)
 {
 mem_gather_store(env, addr, slot);
@@ -467,13 +472,12 @@ int32_t HELPER(cabacdecbin_pred)(int64_t RssV, int64_t 
RttV)
 }
 
 static void probe_store(CPUHexagonState *env, int slot, int mmu_idx,
-bool is_predicated)
+bool is_predicated, uintptr_t retaddr)
 {
 if (!is_predicated || !(env->slot_cancelled & (1 << slot))) {
 size1u_t width = env->mem_log_stores[slot].width;
 target_ulong va = env->mem_log_stores[slot].va;
-uintptr_t ra = GETPC();
-probe_write(env, va, width, mmu_idx, ra);
+probe_write(env, va, width, mmu_idx, retaddr);
 }
 }
 
@@ -494,12 +498,13 @@ void HELPER(probe_pkt_scalar_store_s0)(CPUHexagonState 
*env, int args)
 int mmu_idx = FIELD_EX32(args,

[PATCH v2 3/3] target/hexagon: avoid shadowing globals

2023-10-05 Thread Brian Cain

The typedef `vaddr` is shadowed by `vaddr` identifiers, so we rename the
identifiers to avoid shadowing the type name.

The global `cpu_env` is shadowed by local `cpu_env` arguments, so we
rename the function arguments to avoid shadowing the global.

Signed-off-by: Brian Cain 
---
 target/hexagon/genptr.c | 56 -
 target/hexagon/genptr.h | 18 
 target/hexagon/mmvec/system_ext_mmvec.c |  4 +-
 target/hexagon/mmvec/system_ext_mmvec.h |  2 +-
 target/hexagon/op_helper.c  |  4 +-
 5 files changed, 42 insertions(+), 42 deletions(-)

diff --git a/target/hexagon/genptr.c b/target/hexagon/genptr.c
index 217bc7bb5a..11377ac92b 100644
--- a/target/hexagon/genptr.c
+++ b/target/hexagon/genptr.c
@@ -334,28 +334,28 @@ void gen_set_byte_i64(int N, TCGv_i64 result, TCGv src)
 tcg_gen_deposit_i64(result, result, src64, N * 8, 8);
 }
 
-static inline void gen_load_locked4u(TCGv dest, TCGv vaddr, int mem_index)
+static inline void gen_load_locked4u(TCGv dest, TCGv v_addr, int mem_index)
 {
-tcg_gen_qemu_ld_tl(dest, vaddr, mem_index, MO_TEUL);
-tcg_gen_mov_tl(hex_llsc_addr, vaddr);
+tcg_gen_qemu_ld_tl(dest, v_addr, mem_index, MO_TEUL);
+tcg_gen_mov_tl(hex_llsc_addr, v_addr);
 tcg_gen_mov_tl(hex_llsc_val, dest);
 }
 
-static inline void gen_load_locked8u(TCGv_i64 dest, TCGv vaddr, int mem_index)
+static inline void gen_load_locked8u(TCGv_i64 dest, TCGv v_addr, int mem_index)
 {
-tcg_gen_qemu_ld_i64(dest, vaddr, mem_index, MO_TEUQ);
-tcg_gen_mov_tl(hex_llsc_addr, vaddr);
+tcg_gen_qemu_ld_i64(dest, v_addr, mem_index, MO_TEUQ);
+tcg_gen_mov_tl(hex_llsc_addr, v_addr);
 tcg_gen_mov_i64(hex_llsc_val_i64, dest);
 }
 
 static inline void gen_store_conditional4(DisasContext *ctx,
-  TCGv pred, TCGv vaddr, TCGv src)
+  TCGv pred, TCGv v_addr, TCGv src)
 {
 TCGLabel *fail = gen_new_label();
 TCGLabel *done = gen_new_label();
 TCGv one, zero, tmp;
 
-tcg_gen_brcond_tl(TCG_COND_NE, vaddr, hex_llsc_addr, fail);
+tcg_gen_brcond_tl(TCG_COND_NE, v_addr, hex_llsc_addr, fail);
 
 one = tcg_constant_tl(0xff);
 zero = tcg_constant_tl(0);
@@ -374,13 +374,13 @@ static inline void gen_store_conditional4(DisasContext 
*ctx,
 }
 
 static inline void gen_store_conditional8(DisasContext *ctx,
-  TCGv pred, TCGv vaddr, TCGv_i64 src)
+  TCGv pred, TCGv v_addr, TCGv_i64 src)
 {
 TCGLabel *fail = gen_new_label();
 TCGLabel *done = gen_new_label();
 TCGv_i64 one, zero, tmp;
 
-tcg_gen_brcond_tl(TCG_COND_NE, vaddr, hex_llsc_addr, fail);
+tcg_gen_brcond_tl(TCG_COND_NE, v_addr, hex_llsc_addr, fail);
 
 one = tcg_constant_i64(0xff);
 zero = tcg_constant_i64(0);
@@ -407,57 +407,57 @@ static TCGv gen_slotval(DisasContext *ctx)
 }
 #endif
 
-void gen_store32(TCGv vaddr, TCGv src, int width, uint32_t slot)
+void gen_store32(TCGv v_addr, TCGv src, int width, uint32_t slot)
 {
-tcg_gen_mov_tl(hex_store_addr[slot], vaddr);
+tcg_gen_mov_tl(hex_store_addr[slot], v_addr);
 tcg_gen_movi_tl(hex_store_width[slot], width);
 tcg_gen_mov_tl(hex_store_val32[slot], src);
 }
 
-void gen_store1(TCGv_env cpu_env, TCGv vaddr, TCGv src, uint32_t slot)
+void gen_store1(TCGv_env cpu_env_, TCGv v_addr, TCGv src, uint32_t slot)
 {
-gen_store32(vaddr, src, 1, slot);
+gen_store32(v_addr, src, 1, slot);
 }
 
-void gen_store1i(TCGv_env cpu_env, TCGv vaddr, int32_t src, uint32_t slot)
+void gen_store1i(TCGv_env cpu_env_, TCGv v_addr, int32_t src, uint32_t slot)
 {
 TCGv tmp = tcg_constant_tl(src);
-gen_store1(cpu_env, vaddr, tmp, slot);
+gen_store1(cpu_env_, v_addr, tmp, slot);
 }
 
-void gen_store2(TCGv_env cpu_env, TCGv vaddr, TCGv src, uint32_t slot)
+void gen_store2(TCGv_env cpu_env_, TCGv v_addr, TCGv src, uint32_t slot)
 {
-gen_store32(vaddr, src, 2, slot);
+gen_store32(v_addr, src, 2, slot);
 }
 
-void gen_store2i(TCGv_env cpu_env, TCGv vaddr, int32_t src, uint32_t slot)
+void gen_store2i(TCGv_env cpu_env_, TCGv v_addr, int32_t src, uint32_t slot)
 {
 TCGv tmp = tcg_constant_tl(src);
-gen_store2(cpu_env, vaddr, tmp, slot);
+gen_store2(cpu_env_, v_addr, tmp, slot);
 }
 
-void gen_store4(TCGv_env cpu_env, TCGv vaddr, TCGv src, uint32_t slot)
+void gen_store4(TCGv_env cpu_env_, TCGv v_addr, TCGv src, uint32_t slot)
 {
-gen_store32(vaddr, src, 4, slot);
+gen_store32(v_addr, src, 4, slot);
 }
 
-void gen_store4i(TCGv_env cpu_env, TCGv vaddr, int32_t src, uint32_t slot)
+void gen_store4i(TCGv_env cpu_env_, TCGv v_addr, int32_t src, uint32_t slot)
 {
 TCGv tmp = tcg_constant_tl(src);
-gen_store4(cpu_env, vaddr, tmp, slot);
+gen_store4(cpu_env_, v_addr, tmp, slot);
 }
 
-void gen_store8(TCGv_env cpu_env, TCGv vaddr, TCGv_i64 src, uint32_t slot)
+void gen_store8(TCGv_env cpu_env_, TCGv

[PATCH v2 0/3] hexagon: GETPC() and shadowing fixes

2023-10-05 Thread Brian Cain

In v2: reworked with suggestions from Philippe and added a new patch
to cover -Wshadow=global.

Brian Cain (2):
  target/hexagon: fix some occurrences of -Wshadow=local
  target/hexagon: avoid shadowing globals

Matheus Tavares Bernardino (1):
  target/hexagon: move GETPC() calls to top level helpers

 target/hexagon/genptr.c | 56 -
 target/hexagon/genptr.h | 18 +++---
 target/hexagon/imported/alu.idef|  6 +-
 target/hexagon/macros.h | 19 +++---
 target/hexagon/mmvec/macros.h   |  6 +-
 target/hexagon/mmvec/system_ext_mmvec.c |  4 +-
 target/hexagon/mmvec/system_ext_mmvec.h |  2 +-
 target/hexagon/op_helper.c  | 84 ++---
 target/hexagon/op_helper.h  |  9 ---
 target/hexagon/translate.c  |  9 ++-
 10 files changed, 91 insertions(+), 122 deletions(-)

-- 
2.25.1

[PATCH 3/3] hw/ppc: Add emulation of AmigaOne XE board

2023-10-05 Thread BALATON Zoltan

The AmigaOne is a rebranded MAI Teron board that uses U-Boot firmware
with patches to support AmigaOS and is very similar to pegasos2 so can
be easily emulated sharing most code with pegasos2. The reason to
emulate it is that AmigaOS comes in different versions for AmigaOne
and PegasosII which only have drivers for one machine and firmware so
these only run on the specific machine. Adding this board allows
another AmigaOS version to be used reusing already existing peagasos2
emulation. (The AmigaOne was the first of these boards so likely most
widespread which then inspired Pegasos that was later replaced with
PegasosII due to problems with Articia S, so these have a lot of
similarity. Pegasos mainly ran MorphOS while the PegasosII version of
AmigaOS was added later and therefore less common than the AmigaOne
version.)

Signed-off-by: BALATON Zoltan 
---
 MAINTAINERS |   8 ++
 configs/devices/ppc-softmmu/default.mak |   1 +
 hw/ppc/Kconfig  |   7 +
 hw/ppc/amigaone.c   | 164 
 hw/ppc/meson.build  |   2 +
 5 files changed, 182 insertions(+)
 create mode 100644 hw/ppc/amigaone.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 7f0e20fde6..03f908c153 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1490,6 +1490,14 @@ F: hw/pci-host/mv64361.c
 F: hw/pci-host/mv643xx.h
 F: include/hw/pci-host/mv64361.h
 
+amigaone
+M: BALATON Zoltan 
+L: qemu-...@nongnu.org
+S: Maintained
+F: hw/ppc/amigaone.c
+F: hw/pci-host/articia.c
+F: include/hw/pci-host/articia.h
+
 Virtual Open Firmware (VOF)
 M: Alexey Kardashevskiy 
 R: David Gibson 
diff --git a/configs/devices/ppc-softmmu/default.mak 
b/configs/devices/ppc-softmmu/default.mak
index a887f5438b..b85fd2bcd7 100644
--- a/configs/devices/ppc-softmmu/default.mak
+++ b/configs/devices/ppc-softmmu/default.mak
@@ -14,6 +14,7 @@ CONFIG_SAM460EX=y
 CONFIG_MAC_OLDWORLD=y
 CONFIG_MAC_NEWWORLD=y
 
+CONFIG_AMIGAONE=y
 CONFIG_PEGASOS2=y
 
 # For PReP
diff --git a/hw/ppc/Kconfig b/hw/ppc/Kconfig
index 5dfbf47ef5..56f0475a8e 100644
--- a/hw/ppc/Kconfig
+++ b/hw/ppc/Kconfig
@@ -69,6 +69,13 @@ config SAM460EX
 select USB_OHCI
 select FDT_PPC
 
+config AMIGAONE
+bool
+imply ATI_VGA
+select ARTICIA
+select VT82C686
+select SMBUS_EEPROM
+
 config PEGASOS2
 bool
 imply ATI_VGA
diff --git a/hw/ppc/amigaone.c b/hw/ppc/amigaone.c
new file mode 100644
index 00..3589924c8a
--- /dev/null
+++ b/hw/ppc/amigaone.c
@@ -0,0 +1,164 @@
+/*
+ * QEMU Eyetech AmigaOne/Mai Logic Teron emulation
+ *
+ * Copyright (c) 2023 BALATON Zoltan
+ *
+ * This work is licensed under the GNU GPL license version 2 or later.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/units.h"
+#include "qemu/datadir.h"
+#include "qemu/log.h"
+#include "qemu/error-report.h"
+#include "qapi/error.h"
+#include "hw/ppc/ppc.h"
+#include "hw/boards.h"
+#include "hw/loader.h"
+#include "hw/pci-host/articia.h"
+#include "hw/isa/vt82c686.h"
+#include "hw/ide/pci.h"
+#include "hw/i2c/smbus_eeprom.h"
+#include "hw/ppc/ppc.h"
+#include "sysemu/reset.h"
+#include "kvm_ppc.h"
+
+#define BUS_FREQ_HZ 1
+
+/*
+ * Firmware binary available at
+ * 
https://www.hyperion-entertainment.com/index.php/downloads?view=files=28
+ * then "tail -c 524288 updater.image >u-boot-amigaone.bin"
+ *
+ * BIOS emulator in firmware cannot run QEMU vgabios and hangs on it, use
+ * -device VGA,romfile=VGABIOS-lgpl-latest.bin
+ * from http://www.nongnu.org/vgabios/ instead.
+ */
+#define PROM_FILENAME "u-boot-amigaone.bin"
+#define PROM_ADDR 0xfff0
+#define PROM_SIZE (512 * KiB)
+
+static void amigaone_cpu_reset(void *opaque)
+{
+PowerPCCPU *cpu = opaque;
+
+cpu_reset(CPU(cpu));
+cpu_ppc_tb_reset(>env);
+}
+
+static void fix_spd_data(uint8_t *spd)
+{
+uint32_t bank_size = 4 * MiB * spd[31];
+uint32_t rows = bank_size / spd[13] / spd[17];
+spd[3] = ctz32(rows) - spd[4];
+}
+
+static void amigaone_init(MachineState *machine)
+{
+PowerPCCPU *cpu;
+CPUPPCState *env;
+MemoryRegion *rom, *pci_mem, *mr;
+const char *fwname = machine->firmware ?: PROM_FILENAME;
+char *filename;
+ssize_t sz;
+PCIBus *pci_bus;
+Object *via;
+DeviceState *dev;
+I2CBus *i2c_bus;
+uint8_t *spd_data;
+int i;
+
+/* init CPU */
+cpu = POWERPC_CPU(cpu_create(machine->cpu_type));
+env = >env;
+if (PPC_INPUT(env) != PPC_FLAGS_INPUT_6xx) {
+error_report("Incompatible CPU, only 6xx bus supported");
+exit(1);
+}
+cpu_ppc_tb_init(env, BUS_FREQ_HZ / 4);
+qemu_register_reset(amigaone_cpu_reset, cpu);
+
+/* RAM */
+if (machine->ram_size > 2 * GiB) {
+error_report("RAM size more than 2 GiB is not supported");
+exit(1);
+}
+memory_region_add_subregion(get_system_memory(), 0, machine->ram);
+if (machine->ram_size < 1 * GiB + 32 * KiB) {
+/* Firmware uses this area for startup */
+mr =

[PATCH 0/3] Add emulation of AmigaOne XE board

2023-10-05 Thread BALATON Zoltan

This small series adds amigaone PPC machine which can be emulated
mostly reusing existing models used by pegasos2 as these machines are
very similar. The reason to add another board is that AmigaOS has
different versions for different machines that only run on that
machine and the AmigaOne version is more common than the PegasosII one.
Also it's useful for debugging to be able to test with both machines
as problems in emulation can be identified if they occur on both
machines as opposed to problems specific only to one version. Since
this only needs a very minimal north bridge emulation and a simple
board code with everything else shared with pegasos2 it is not a big
maintenance burden.

The board uses a modofied U-Boot that is needed to boot AmigaOS which
is freely available and distributable under GPL (see comment in
hw/ppc/amigaone.c added in patch 3 for details on how to get firmware
binary) but the sources for it could not be recovered so I could not
reproduce it from source. Ideally this firmware should be included
here for convenience which should be OK due to its GPL licence but
since there's only a binary I did not include it here. Let me know if
that would be OK to include but even without that firmware adding the
machine is useful as users can get the rom binary separately which
they are used to as pegasos2 also needed a firmware image to boot
AmigaOS until recently.

I hope this could be merged for 8.2 with this series being the minimum
set of patches needed to add this machine. This was tested by a Few
people already and can run AmigaOS and Linux distro for AmigaOne well.

Regards,
BALATON Zoltan

BALATON Zoltan (3):
  via-ide: Fix legacy mode emulation
  hw/pci-host: Add emulation of Mai Logic Articia S
  hw/ppc: Add emulation of AmigaOne XE board

 MAINTAINERS |   8 +
 configs/devices/ppc-softmmu/default.mak |   1 +
 hw/ide/via.c|  35 +++-
 hw/pci-host/Kconfig |   5 +
 hw/pci-host/articia.c   | 266 
 hw/pci-host/meson.build |   2 +
 hw/ppc/Kconfig  |   7 +
 hw/ppc/amigaone.c   | 164 +++
 hw/ppc/meson.build  |   2 +
 include/hw/pci-host/articia.h   |  17 ++
 10 files changed, 502 insertions(+), 5 deletions(-)
 create mode 100644 hw/pci-host/articia.c
 create mode 100644 hw/ppc/amigaone.c
 create mode 100644 include/hw/pci-host/articia.h

-- 
2.30.9

[PATCH 2/3] hw/pci-host: Add emulation of Mai Logic Articia S

2023-10-05 Thread BALATON Zoltan

The Articia S is a generic chipset supporting several different CPUs
that were used on some PPC boards. This is a minimal emulation of the
parts needed for emulating the AmigaOne board.

Signed-off-by: BALATON Zoltan 
---
 hw/pci-host/Kconfig   |   5 +
 hw/pci-host/articia.c | 266 ++
 hw/pci-host/meson.build   |   2 +
 include/hw/pci-host/articia.h |  17 +++
 4 files changed, 290 insertions(+)
 create mode 100644 hw/pci-host/articia.c
 create mode 100644 include/hw/pci-host/articia.h

diff --git a/hw/pci-host/Kconfig b/hw/pci-host/Kconfig
index a07070eddf..33014c80a4 100644
--- a/hw/pci-host/Kconfig
+++ b/hw/pci-host/Kconfig
@@ -73,6 +73,11 @@ config SH_PCI
 bool
 select PCI
 
+config ARTICIA
+bool
+select PCI
+select I8259
+
 config MV64361
 bool
 select PCI
diff --git a/hw/pci-host/articia.c b/hw/pci-host/articia.c
new file mode 100644
index 00..80558e1c47
--- /dev/null
+++ b/hw/pci-host/articia.c
@@ -0,0 +1,266 @@
+/*
+ * Mai Logic Articia S emulation
+ *
+ * Copyright (c) 2023 BALATON Zoltan
+ *
+ * This work is licensed under the GNU GPL license version 2 or later.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/log.h"
+#include "qapi/error.h"
+#include "hw/pci/pci_device.h"
+#include "hw/pci/pci_host.h"
+#include "hw/irq.h"
+#include "hw/i2c/bitbang_i2c.h"
+#include "hw/intc/i8259.h"
+#include "hw/pci-host/articia.h"
+
+OBJECT_DECLARE_SIMPLE_TYPE(ArticiaState, ARTICIA)
+
+OBJECT_DECLARE_SIMPLE_TYPE(ArticiaHostState, ARTICIA_PCI_HOST)
+struct ArticiaHostState {
+PCIDevice parent_obj;
+
+ArticiaState *as;
+};
+
+/* TYPE_ARTICIA */
+
+struct ArticiaState {
+PCIHostState parent_obj;
+
+qemu_irq irq[PCI_NUM_PINS];
+MemoryRegion io;
+MemoryRegion mem;
+MemoryRegion reg;
+
+bitbang_i2c_interface smbus;
+uint32_t gpio; /* bits 0-7 in, 8-15 out, 16-23 direction (0 in, 1 out) */
+hwaddr gpio_base;
+MemoryRegion gpio_reg;
+};
+
+static uint64_t articia_gpio_read(void *opaque, hwaddr addr, unsigned int size)
+{
+ArticiaState *s = opaque;
+
+return (s->gpio >> (addr * 8)) & 0xff;
+}
+
+static void articia_gpio_write(void *opaque, hwaddr addr, uint64_t val,
+   unsigned int size)
+{
+ArticiaState *s = opaque;
+uint32_t sh = addr * 8;
+
+if (addr == 0) {
+/* in bits read only? */
+return;
+}
+
+if ((s->gpio & (0xff << sh)) != (val & 0xff) << sh) {
+s->gpio &= ~(0xff << sh | 0xff);
+s->gpio |= (val & 0xff) << sh;
+s->gpio |= bitbang_i2c_set(>smbus, BITBANG_I2C_SDA,
+   s->gpio & BIT(16) ?
+   !!(s->gpio & BIT(8)) : 1);
+if ((s->gpio & BIT(17))) {
+s->gpio &= ~BIT(0);
+s->gpio |= bitbang_i2c_set(>smbus, BITBANG_I2C_SCL,
+   !!(s->gpio & BIT(9)));
+}
+}
+}
+
+static const MemoryRegionOps articia_gpio_ops = {
+.read = articia_gpio_read,
+.write = articia_gpio_write,
+.valid.min_access_size = 1,
+.valid.max_access_size = 1,
+.endianness = DEVICE_LITTLE_ENDIAN,
+};
+
+static uint64_t articia_reg_read(void *opaque, hwaddr addr, unsigned int size)
+{
+ArticiaState *s = opaque;
+uint64_t ret = UINT_MAX;
+
+switch (addr) {
+case 0xc00cf8:
+ret = pci_host_conf_le_ops.read(PCI_HOST_BRIDGE(s), 0, size);
+break;
+case 0xe00cfc ... 0xe00cff:
+ret = pci_host_data_le_ops.read(PCI_HOST_BRIDGE(s), addr - 0xe00cfc, 
size);
+break;
+case 0xf0:
+ret = pic_read_irq(isa_pic);
+break;
+default:
+qemu_log_mask(LOG_UNIMP, "%s: Unimplemented register read 0x%"
+  HWADDR_PRIx " %d\n", __func__, addr, size);
+break;
+}
+return ret;
+}
+
+static void articia_reg_write(void *opaque, hwaddr addr, uint64_t val,
+  unsigned int size)
+{
+ArticiaState *s = opaque;
+
+switch (addr) {
+case 0xc00cf8:
+pci_host_conf_le_ops.write(PCI_HOST_BRIDGE(s), 0, val, size);
+break;
+case 0xe00cfc ... 0xe00cff:
+pci_host_data_le_ops.write(PCI_HOST_BRIDGE(s), addr, val, size);
+break;
+default:
+qemu_log_mask(LOG_UNIMP, "%s: Unimplemented register write 0x%"
+  HWADDR_PRIx " %d <- %"PRIx64"\n", __func__, addr, size, 
val);
+break;
+}
+}
+
+static const MemoryRegionOps articia_reg_ops = {
+.read = articia_reg_read,
+.write = articia_reg_write,
+.valid.min_access_size = 1,
+.valid.max_access_size = 4,
+.endianness = DEVICE_LITTLE_ENDIAN,
+};
+
+static void articia_pcihost_set_irq(void *opaque, int n, int level)
+{
+ArticiaState *s = opaque;
+qemu_set_irq(s->irq[n], level);
+}
+
+static void articia_realize(DeviceState *dev, Error **errp)
+{
+ArticiaState *s = ARTICIA(dev);
+PCIHostState *h =

[PATCH 1/3] via-ide: Fix legacy mode emulation

2023-10-05 Thread BALATON Zoltan

The initial value for BARs were set in reset method for emulating
legacy mode at start but this does not work because PCI code resets
BARs after calling device reset method. Additionally the values
written to BARs were also wrong.

Move setting the BARs to a callback on writing the PCI config regsiter
that sets the compatibility mode (which firmwares needing this mode
seem to do) and fix their values to program it to use legacy port
numbers. As noted in a comment, we only do this when the BARs were
unset before, because logs from real machine show this is how real
chip works, even if it contradicts the data sheet which is not very
clear about this.

Signed-off-by: BALATON Zoltan 
---
 hw/ide/via.c | 35 ++-
 1 file changed, 30 insertions(+), 5 deletions(-)

diff --git a/hw/ide/via.c b/hw/ide/via.c
index fff23803a6..8186190207 100644
--- a/hw/ide/via.c
+++ b/hw/ide/via.c
@@ -132,11 +132,6 @@ static void via_ide_reset(DeviceState *dev)
 pci_set_word(pci_conf + PCI_STATUS, PCI_STATUS_FAST_BACK |
  PCI_STATUS_DEVSEL_MEDIUM);
 
-pci_set_long(pci_conf + PCI_BASE_ADDRESS_0, 0x01f0);
-pci_set_long(pci_conf + PCI_BASE_ADDRESS_1, 0x03f4);
-pci_set_long(pci_conf + PCI_BASE_ADDRESS_2, 0x0170);
-pci_set_long(pci_conf + PCI_BASE_ADDRESS_3, 0x0374);
-pci_set_long(pci_conf + PCI_BASE_ADDRESS_4, 0xcc01); /* BMIBA: 20-23h 
*/
 pci_set_long(pci_conf + PCI_INTERRUPT_LINE, 0x010e);
 
 /* IDE chip enable, IDE configuration 1/2, IDE FIFO Configuration*/
@@ -159,6 +154,35 @@ static void via_ide_reset(DeviceState *dev)
 pci_set_long(pci_conf + 0xc0, 0x00020001);
 }
 
+static void via_ide_cfg_write(PCIDevice *pd, uint32_t addr,
+  uint32_t val, int len)
+{
+pci_default_write_config(pd, addr, val, len);
+/*
+ * Only set BARs if they are unset. Logs from real hardware show that
+ * writing class_prog to enable compatibility mode after BARs were set
+ * (possibly by firmware) it will use ports set by BARs not ISA ports
+ * (e.g. pegasos2 Linux does this and calls it non-100% native mode).
+ * But if 0x8a is written after reset without setting BARs then we want
+ * legacy ports (this is done by AmigaOne firmware for example).
+ */
+if (addr == PCI_CLASS_PROG && val == 0x8a &&
+pci_get_long(pd->config + PCI_BASE_ADDRESS_0) ==
+PCI_BASE_ADDRESS_SPACE_IO) {
+pci_set_long(pd->config + PCI_BASE_ADDRESS_0, 0x1f0
+ | PCI_BASE_ADDRESS_SPACE_IO);
+pci_set_long(pd->config + PCI_BASE_ADDRESS_1, 0x3f6
+ | PCI_BASE_ADDRESS_SPACE_IO);
+pci_set_long(pd->config + PCI_BASE_ADDRESS_2, 0x170
+ | PCI_BASE_ADDRESS_SPACE_IO);
+pci_set_long(pd->config + PCI_BASE_ADDRESS_3, 0x376
+ | PCI_BASE_ADDRESS_SPACE_IO);
+/* BMIBA: 20-23h */
+pci_set_long(pd->config + PCI_BASE_ADDRESS_4, 0xcc00
+ | PCI_BASE_ADDRESS_SPACE_IO);
+}
+}
+
 static void via_ide_realize(PCIDevice *dev, Error **errp)
 {
 PCIIDEState *d = PCI_IDE(dev);
@@ -221,6 +245,7 @@ static void via_ide_class_init(ObjectClass *klass, void 
*data)
 /* Reason: only works as function of VIA southbridge */
 dc->user_creatable = false;
 
+k->config_write = via_ide_cfg_write;
 k->realize = via_ide_realize;
 k->exit = via_ide_exitfn;
 k->vendor_id = PCI_VENDOR_ID_VIA;
-- 
2.30.9

Re: [PATCH v3 10/10] tests/migration-test: Add a test for postcopy hangs during RECOVER

2023-10-05 Thread Fabiano Rosas

Peter Xu  writes:

> On Thu, Oct 05, 2023 at 06:10:20PM -0300, Fabiano Rosas wrote:
>> Peter Xu  writes:
>> 
>> > On Thu, Oct 05, 2023 at 10:37:56AM -0300, Fabiano Rosas wrote:
>> >> >> +/*
>> >> >> + * Make sure both QEMU instances will go into RECOVER stage, then 
>> >> >> test
>> >> >> + * kicking them out using migrate-pause.
>> >> >> + */
>> >> >> +wait_for_postcopy_status(from, "postcopy-recover");
>> >> >> +wait_for_postcopy_status(to, "postcopy-recover");
>> >> >
>> >> > Is this wait out of place? I think we're trying to resume too fast after
>> >> > migrate_recover():
>> >> >
>> >> > # {
>> >> > # "error": {
>> >> > # "class": "GenericError",
>> >> > # "desc": "Cannot resume if there is no paused migration"
>> >> > # }
>> >> > # }
>> >> >
>> >> 
>> >> Ugh, sorry about the long lines:
>> >> 
>> >> {
>> >> "error": {
>> >> "class": "GenericError",
>> >> "desc": "Cannot resume if there is no paused migration"
>> >> }
>> >> }
>> >
>> > Sorry I didn't get you here.  Could you elaborate your question?
>> >
>> 
>> The test is sometimes failing with the above message.
>> 
>> But indeed my question doesn't make sense. I forgot migrate_recover
>> happens on the destination. Nevermind.
>> 
>> The bug is still present nonetheless. We're going into migrate_prepare
>> in some state other than POSTCOPY_PAUSED.
>
> Oh I see.  Interestingly I cannot reproduce on my host, just like last
> time..
>
> What is your setup for running the test?  Anything special?  Here's my
> cmdline:

The crudest oneliner:

for i in $(seq 1 ); do echo "$i ="; \
QTEST_QEMU_BINARY=./qemu-system-x86_64 \
./tests/qtest/migration-test -r /x86_64/migration/postcopy/recovery || break ; 
done

I suspect my system has something specific to it that affects the timing
of the tests. But I have no idea what it could be.

$ lscpu   
Architecture:x86_64  
  CPU op-mode(s):32-bit, 64-bit
  Address sizes: 39 bits physical, 48 bits virtual
  Byte Order:Little Endian
CPU(s):  16
  On-line CPU(s) list:   0-15
Vendor ID:   GenuineIntel
  Model name:11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz
CPU family:  6
Model:   141
Thread(s) per core:  2
Core(s) per socket:  8
Socket(s):   1
Stepping:1
CPU max MHz: 4800.
CPU min MHz: 800.
BogoMIPS:4992.00

>
> $ cat reproduce.sh 
> index=$1
> loop=0
>
> while :; do
> echo "Starting loop=$loop..."
> QTEST_QEMU_BINARY=./qemu-system-x86_64 ./tests/qtest/migration-test 
> -p /x86_64/migration/postcopy/recovery/double-failures
> if [[ $? != 0 ]]; then
> echo "index $index REPRODUCED (loop=$loop) !"
> break
> fi
> loop=$(( loop + 1 ))
> done
>
> Survives 200+ loops and kept going.
>
> However I think I saw what's wrong here, could you help try below fixup?
>

Sure. I won't get to it until tomorrow though.

Re: [PATCH v3 10/10] tests/migration-test: Add a test for postcopy hangs during RECOVER

2023-10-05 Thread Peter Xu

On Thu, Oct 05, 2023 at 06:10:20PM -0300, Fabiano Rosas wrote:
> Peter Xu  writes:
> 
> > On Thu, Oct 05, 2023 at 10:37:56AM -0300, Fabiano Rosas wrote:
> >> >> +/*
> >> >> + * Make sure both QEMU instances will go into RECOVER stage, then 
> >> >> test
> >> >> + * kicking them out using migrate-pause.
> >> >> + */
> >> >> +wait_for_postcopy_status(from, "postcopy-recover");
> >> >> +wait_for_postcopy_status(to, "postcopy-recover");
> >> >
> >> > Is this wait out of place? I think we're trying to resume too fast after
> >> > migrate_recover():
> >> >
> >> > # {
> >> > # "error": {
> >> > # "class": "GenericError",
> >> > # "desc": "Cannot resume if there is no paused migration"
> >> > # }
> >> > # }
> >> >
> >> 
> >> Ugh, sorry about the long lines:
> >> 
> >> {
> >> "error": {
> >> "class": "GenericError",
> >> "desc": "Cannot resume if there is no paused migration"
> >> }
> >> }
> >
> > Sorry I didn't get you here.  Could you elaborate your question?
> >
> 
> The test is sometimes failing with the above message.
> 
> But indeed my question doesn't make sense. I forgot migrate_recover
> happens on the destination. Nevermind.
> 
> The bug is still present nonetheless. We're going into migrate_prepare
> in some state other than POSTCOPY_PAUSED.

Oh I see.  Interestingly I cannot reproduce on my host, just like last
time..

What is your setup for running the test?  Anything special?  Here's my
cmdline:

$ cat reproduce.sh 
index=$1
loop=0

while :; do
echo "Starting loop=$loop..."
QTEST_QEMU_BINARY=./qemu-system-x86_64 ./tests/qtest/migration-test -p 
/x86_64/migration/postcopy/recovery/double-failures
if [[ $? != 0 ]]; then
echo "index $index REPRODUCED (loop=$loop) !"
break
fi
loop=$(( loop + 1 ))
done

Survives 200+ loops and kept going.

However I think I saw what's wrong here, could you help try below fixup?

Thanks,

===8<===
>From 52bd2cd5ddf472e0bb99789dba3660a626382630 Mon Sep 17 00:00:00 2001
From: Peter Xu 
Date: Thu, 5 Oct 2023 17:38:42 -0400
Subject: [PATCH] fixup! tests/migration-test: Add a test for postcopy hangs
 during RECOVER

Signed-off-by: Peter Xu 
---
 tests/qtest/migration-test.c | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index fb7a3765e4..1bdae0a579 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -1489,9 +1489,8 @@ static void test_postcopy_recovery_common(MigrateCommon 
*args)
  * migrate-recover command can only succeed if destination machine
  * is in the paused state
  */
-wait_for_migration_status(to, "postcopy-paused",
-  (const char * []) { "failed", "active",
-  "completed", NULL });
+wait_for_postcopy_status(to, "postcopy-paused");
+wait_for_postcopy_status(from, "postcopy-paused");
 
 if (args->postcopy_recovery_test_fail) {
 /*
@@ -1514,9 +1513,6 @@ static void test_postcopy_recovery_common(MigrateCommon 
*args)
  * Try to rebuild the migration channel using the resume flag and
  * the newly created channel
  */
-wait_for_migration_status(from, "postcopy-paused",
-  (const char * []) { "failed", "active",
-  "completed", NULL });
 migrate_qmp(from, uri, "{'resume': true}");
 
 /* Restore the postcopy bandwidth to unlimited */
-- 
2.41.0

-- 
Peter Xu

Re: [PATCH] qtest/migration: Add a test for the analyze-migration script

2023-10-05 Thread Fabiano Rosas

Peter Xu  writes:

> On Wed, Sep 27, 2023 at 06:47:56PM -0300, Fabiano Rosas wrote:
>> I know this adds a python dependency to qtests and I'm not sure how
>> much we care about this script, but on the other hand it would be nice
>> to catch these errors early on.
>> 
>> This would also help with future work that touches the migration
>> stream (moving multifd out of ram.c and fixed-ram).
>> 
>> Let me know what you think.
>
> The test is good, but I think it'll be definitely less burden and cleaner
> if it can be written just in shell scripts.. that can even be put in a
> single line..
>
> $ (echo "migrate exec:cat>$IMAGE"; echo "quit") | $QEMU_BIN -monitor stdio; 
> $DIR/scripts/analyze-migration.py -f $IMAGE
>
> Any chance to hook that in?

I tried but it's way worse.

We need to add the script to meson, pass the Python and QEMU binaries
in, make it be invoked for each architecture, but skip s390x and ppc
when using TCG only.

Then we need to add code to map target vs. machine type because arm has
no default machine.

We also need to produce TAP output for meson. And the migration script
outputs a ton of lines so Python barfs due to EAGAIN so we need to
handle that as well.

All of that is already there in migration-test.

I'll send a v2 of this patch with the improvements Thomas suggested and
try to dig out a fix for the script on ppc64 I made about 3 years ago
that seemed to have not been merged.

[PATCH v2] misc/pca9552: Fix for pca9552 not getting reset

2023-10-05 Thread Glenn Miles

Testing of the pca9552 device on the powernv platform
showed that the reset method was not being called when
an instance of the device was realized.  This was causing
the INPUT0/INPUT1 POR values to be incorrect.

Fixed by overriding the parent pca955x_realize method with a
new pca9552_realize method which first calls
the parent pca955x_realize method followed by the
pca9552_reset function.

Signed-off-by: Glenn Miles 
---
 hw/misc/pca9552.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/hw/misc/pca9552.c b/hw/misc/pca9552.c
index fff19e369a..bc12dced7f 100644
--- a/hw/misc/pca9552.c
+++ b/hw/misc/pca9552.c
@@ -384,6 +384,12 @@ static void pca955x_realize(DeviceState *dev, Error **errp)
 qdev_init_gpio_out(dev, s->gpio, k->pin_count);
 }
 
+static void pca9552_realize(DeviceState *dev, Error **errp)
+{
+pca955x_realize(dev, errp);
+pca9552_reset(dev);
+}
+
 static Property pca955x_properties[] = {
 DEFINE_PROP_STRING("description", PCA955xState, description),
 DEFINE_PROP_END_OF_LIST(),
@@ -416,7 +422,7 @@ static void pca9552_class_init(ObjectClass *oc, void *data)
 DeviceClass *dc = DEVICE_CLASS(oc);
 PCA955xClass *pc = PCA955X_CLASS(oc);
 
-dc->reset = pca9552_reset;
+dc->realize = pca9552_realize;
 dc->vmsd = _vmstate;
 pc->max_reg = PCA9552_LS3;
 pc->pin_count = 16;
-- 
2.31.1

Re: [PATCH v3 10/10] tests/migration-test: Add a test for postcopy hangs during RECOVER

2023-10-05 Thread Fabiano Rosas

Peter Xu  writes:

> On Thu, Oct 05, 2023 at 10:37:56AM -0300, Fabiano Rosas wrote:
>> >> +/*
>> >> + * Make sure both QEMU instances will go into RECOVER stage, then 
>> >> test
>> >> + * kicking them out using migrate-pause.
>> >> + */
>> >> +wait_for_postcopy_status(from, "postcopy-recover");
>> >> +wait_for_postcopy_status(to, "postcopy-recover");
>> >
>> > Is this wait out of place? I think we're trying to resume too fast after
>> > migrate_recover():
>> >
>> > # {
>> > # "error": {
>> > # "class": "GenericError",
>> > # "desc": "Cannot resume if there is no paused migration"
>> > # }
>> > # }
>> >
>> 
>> Ugh, sorry about the long lines:
>> 
>> {
>> "error": {
>> "class": "GenericError",
>> "desc": "Cannot resume if there is no paused migration"
>> }
>> }
>
> Sorry I didn't get you here.  Could you elaborate your question?
>

The test is sometimes failing with the above message.

But indeed my question doesn't make sense. I forgot migrate_recover
happens on the destination. Nevermind.

The bug is still present nonetheless. We're going into migrate_prepare
in some state other than POSTCOPY_PAUSED.

Re: [PATCH v3 10/10] tests/migration-test: Add a test for postcopy hangs during RECOVER

2023-10-05 Thread Peter Xu

On Thu, Oct 05, 2023 at 10:37:56AM -0300, Fabiano Rosas wrote:
> >> +/*
> >> + * Make sure both QEMU instances will go into RECOVER stage, then test
> >> + * kicking them out using migrate-pause.
> >> + */
> >> +wait_for_postcopy_status(from, "postcopy-recover");
> >> +wait_for_postcopy_status(to, "postcopy-recover");
> >
> > Is this wait out of place? I think we're trying to resume too fast after
> > migrate_recover():
> >
> > # {
> > # "error": {
> > # "class": "GenericError",
> > # "desc": "Cannot resume if there is no paused migration"
> > # }
> > # }
> >
> 
> Ugh, sorry about the long lines:
> 
> {
> "error": {
> "class": "GenericError",
> "desc": "Cannot resume if there is no paused migration"
> }
> }

Sorry I didn't get you here.  Could you elaborate your question?

Here we wait on both sides and make sure they'll all be in RECOVER stage,
and we should know that they won't proceed further because the pipes
contain mostly nothing so they'll just block at the pipes.  What did I
miss?

Thanks,

-- 
Peter Xu

Re: [PATCH v3 07/10] migration: Add migration_rp_wait|kick()

2023-10-05 Thread Peter Xu

On Thu, Oct 05, 2023 at 09:49:25AM +0200, Juan Quintela wrote:
> Peter Xu  wrote:
> > It's just a simple wrapper for rp_sem on either wait() or kick(), make it
> > even clearer on how it is used.  Prepared to be used even for other things.
> >
> > Reviewed-by: Fabiano Rosas 
> > Signed-off-by: Peter Xu 
> 
> Reviewed-by: Juan Quintela 
> 
> I agree with the idea, but I think that the problem is the name of the
> semaphore.
> 
> > +void migration_rp_wait(MigrationState *s)
> > +{
> > +qemu_sem_wait(>rp_state.rp_sem);
> 
> I am not sure if it would be better to have the wrappers or just rename
> 
> If we rename the remaphore to migration_thread, this becomes:
> 
> qemu_sem_wait(>rp_state.return_path_ready);
> 
> qemu_sem_post(>rp_state.return_path_ready);
> 
> Or something similar?

I'd prefer keeping a pair of helpers, but I'm open to other suggestions,
e.g. I can rename the sem at the same time, or have a better name just for
the helpers.

Thanks,

-- 
Peter Xu

[PATCH] misc/pca9552: Fix for pca9552 not getting reset

2023-10-05 Thread Glenn Miles

Testing of the pca9552 device on the powernv platform
showed that the reset method was not being called when
an instance of the device was realized.  This was causing
the INPUT0/INPUT1 POR values to be incorrect.

Fixed by calling pca9552_reset from within the
pca9552_realize method.

Signed-off-by: Glenn Miles 
---
 hw/misc/pca9552.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/hw/misc/pca9552.c b/hw/misc/pca9552.c
index f28b5ecd7e..4e198af137 100644
--- a/hw/misc/pca9552.c
+++ b/hw/misc/pca9552.c
@@ -418,6 +418,12 @@ static void pca955x_realize(DeviceState *dev, Error **errp)
 qdev_init_gpio_in(dev, pca955x_gpio_in_handler, k->pin_count);
 }
 
+static void pca9552_realize(DeviceState *dev, Error **errp)
+{
+pca955x_realize(dev, errp);
+pca9552_reset(dev);
+}
+
 static Property pca955x_properties[] = {
 DEFINE_PROP_STRING("description", PCA955xState, description),
 DEFINE_PROP_END_OF_LIST(),
@@ -450,7 +456,7 @@ static void pca9552_class_init(ObjectClass *oc, void *data)
 DeviceClass *dc = DEVICE_CLASS(oc);
 PCA955xClass *pc = PCA955X_CLASS(oc);
 
-dc->reset = pca9552_reset;
+dc->realize = pca9552_realize;
 dc->vmsd = _vmstate;
 pc->max_reg = PCA9552_LS3;
 pc->pin_count = 16;
-- 
2.31.1

[PATCH v2] misc/pca9552: Let external devices set pca9552 inputs

2023-10-05 Thread Glenn Miles

Allow external devices to drive pca9552 input pins by adding
input GPIO's to the model.  This allows a device to connect
its output GPIO's to the pca9552 input GPIO's.

In order for an external device to set the state of a pca9552
pin, the pin must first be configured for high impedance (LED
is off).  If the pca9552 pin is configured to drive the pin low
(LED is on), then external input will be ignored.

Signed-off-by: Glenn Miles 
---
Based-on: <20230927203221.3286895-1-mil...@linux.vnet.ibm.com>
([PATCH] misc/pca9552: Fix inverted input status)
 hw/misc/pca9552.c | 39 ++-
 include/hw/misc/pca9552.h |  3 ++-
 2 files changed, 36 insertions(+), 6 deletions(-)

diff --git a/hw/misc/pca9552.c b/hw/misc/pca9552.c
index ad811fb249..f28b5ecd7e 100644
--- a/hw/misc/pca9552.c
+++ b/hw/misc/pca9552.c
@@ -113,16 +113,22 @@ static void pca955x_update_pin_input(PCA955xState *s)
 switch (config) {
 case PCA9552_LED_ON:
 /* Pin is set to 0V to turn on LED */
-qemu_set_irq(s->gpio[i], 0);
+qemu_set_irq(s->gpio_out[i], 0);
 s->regs[input_reg] &= ~(1 << input_shift);
 break;
 case PCA9552_LED_OFF:
 /*
  * Pin is set to Hi-Z to turn off LED and
- * pullup sets it to a logical 1.
+ * pullup sets it to a logical 1 unless
+ * external device drives it low.
  */
-qemu_set_irq(s->gpio[i], 1);
-s->regs[input_reg] |= 1 << input_shift;
+if (s->ext_state[i] == 0) {
+qemu_set_irq(s->gpio_out[i], 0);
+s->regs[input_reg] &= ~(1 << input_shift);
+} else {
+qemu_set_irq(s->gpio_out[i], 1);
+s->regs[input_reg] |= 1 << input_shift;
+}
 break;
 case PCA9552_LED_PWM0:
 case PCA9552_LED_PWM1:
@@ -337,6 +343,7 @@ static const VMStateDescription pca9552_vmstate = {
 VMSTATE_UINT8(len, PCA955xState),
 VMSTATE_UINT8(pointer, PCA955xState),
 VMSTATE_UINT8_ARRAY(regs, PCA955xState, PCA955X_NR_REGS),
+VMSTATE_UINT8_ARRAY(ext_state, PCA955xState, PCA955X_PIN_COUNT_MAX),
 VMSTATE_I2C_SLAVE(i2c, PCA955xState),
 VMSTATE_END_OF_LIST()
 }
@@ -355,6 +362,7 @@ static void pca9552_reset(DeviceState *dev)
 s->regs[PCA9552_LS2] = 0x55;
 s->regs[PCA9552_LS3] = 0x55;
 
+memset(s->ext_state, 1, PCA955X_PIN_COUNT_MAX);
 pca955x_update_pin_input(s);
 
 s->pointer = 0xFF;
@@ -377,6 +385,26 @@ static void pca955x_initfn(Object *obj)
 }
 }
 
+static void pca955x_set_ext_state(PCA955xState *s, int pin, int level)
+{
+if (s->ext_state[pin] != level) {
+uint16_t pins_status = pca955x_pins_get_status(s);
+s->ext_state[pin] = level;
+pca955x_update_pin_input(s);
+pca955x_display_pins_status(s, pins_status);
+}
+}
+
+static void pca955x_gpio_in_handler(void *opaque, int pin, int level)
+{
+
+PCA955xState *s = PCA955X(opaque);
+PCA955xClass *k = PCA955X_GET_CLASS(s);
+
+assert((pin >= 0) && (pin < k->pin_count));
+pca955x_set_ext_state(s, pin, level);
+}
+
 static void pca955x_realize(DeviceState *dev, Error **errp)
 {
 PCA955xClass *k = PCA955X_GET_CLASS(dev);
@@ -386,7 +414,8 @@ static void pca955x_realize(DeviceState *dev, Error **errp)
 s->description = g_strdup("pca-unspecified");
 }
 
-qdev_init_gpio_out(dev, s->gpio, k->pin_count);
+qdev_init_gpio_out(dev, s->gpio_out, k->pin_count);
+qdev_init_gpio_in(dev, pca955x_gpio_in_handler, k->pin_count);
 }
 
 static Property pca955x_properties[] = {
diff --git a/include/hw/misc/pca9552.h b/include/hw/misc/pca9552.h
index b6f4e264fe..c36525f0c3 100644
--- a/include/hw/misc/pca9552.h
+++ b/include/hw/misc/pca9552.h
@@ -30,7 +30,8 @@ struct PCA955xState {
 uint8_t pointer;
 
 uint8_t regs[PCA955X_NR_REGS];
-qemu_irq gpio[PCA955X_PIN_COUNT_MAX];
+qemu_irq gpio_out[PCA955X_PIN_COUNT_MAX];
+uint8_t ext_state[PCA955X_PIN_COUNT_MAX];
 char *description; /* For debugging purpose only */
 };
 
-- 
2.31.1

Re: [PATCH 0/3] hvf x86 correctness and efficiency improvements

2023-10-05 Thread Phil Dennis-Jordan

Ping - let me know if there's anything particularly controversial,
unclear, etc. about these patches or if I can do anything to make
reviewing easier.

Thanks!


On Fri, 22 Sept 2023 at 16:09, Phil Dennis-Jordan  wrote:
>
> This is a series of semi-related patches for the x86 macOS 
> Hypervisor.framework
> (hvf) accelerator backend. The intention is to make VMs run slightly more
> efficiently on macOS host machines. They have been subject to some months of
> CI workloads with macOS guest VMs without issues and they seem to give a few
> percent performance improvement. (Though this varies greatly with the type of
> workload.)
>
> Patch 1 enables the INVTSC CPUID bit when running with hvf. This can enable
> some optimisations in the guest OS, and I've not found any reason it shouldn't
> be allowed for hvf based hosts.
>
> Patch 2 fixes hvf_kick_vcpu_thread so it actually forces a VM exit instead of
> doing nothing. I guess this previously didn't cause any huge issues because
> hvf's hv_vcpu_run() would exit so extremely frequently on its own accord. The
> temp variable is needed because the pointer expected by the 
> hv_vcpu_interrupt()
> call doesn't match the fd field's type in the hvf accel's struct 
> AccelCPUState.
> I'm unsure if it would be better to change that struct field to the relevant
> architecture's handle types, hv_vcpuid_t (x86, unsigned int) and hv_vcpu_t
> (aarch64, uint64_t), perhaps via an intermediate typedef?
>
> Patch 3, which replaces the call to hv_vcpu_run() with the more modern
> hv_vcpu_run_until() for running the guest vCPU. The newer API is available
> from macOS 10.15 host systems onwards. This call causes significantly fewer
> VM exits, which also means we really need that exit-forcing interrupt from
> patch 2. The reduction in VM exits means less overhead from exits and less
> contention on the BQL. Using hv_vcpu_run_until() is also a prerequisite for
> using certain newer hvf features, though this patchset doesn't use any.
>
> Patches 2 & 3 must therefore be applied in that order, patch 1 is independent.
>
> This work has been sponsored by Sauce Labs Inc.
>
> Phil Dennis-Jordan (3):
>   i386: hvf: Adds support for INVTSC cpuid bit
>   i386: hvf: In kick_vcpu use hv_vcpu_interrupt to force exit
>   i386: hvf: Updates API usage to use modern vCPU run function
>
>  target/i386/hvf/hvf.c   | 26 +-
>  target/i386/hvf/x86_cpuid.c |  4 
>  2 files changed, 29 insertions(+), 1 deletion(-)
>
> --
> 2.36.1
>

Re: [PATCH v6 14/14] python: use vm.cmd() instead of vm.qmp() where appropriate

2023-10-05 Thread Eric Blake

On Thu, Oct 05, 2023 at 04:55:50PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> In many cases we just want an effect of qmp command and want to raise on
> failure. Use vm.cmd() method which does exactly this.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  tests/avocado/vnc.py  |  16 +-
>  tests/qemu-iotests/030| 168 +++---
>  tests/qemu-iotests/040| 172 +++
>  tests/qemu-iotests/041| 483 --
>  tests/qemu-iotests/045|  15 +-
>  tests/qemu-iotests/055|  62 +--
>  tests/qemu-iotests/056|  77 ++-
>  tests/qemu-iotests/093|  42 +-
>  tests/qemu-iotests/118| 225 
>  tests/qemu-iotests/124| 102 ++--
>  tests/qemu-iotests/129|  14 +-
>  tests/qemu-iotests/132|   5 +-
>  tests/qemu-iotests/139|  45 +-
>  tests/qemu-iotests/147|  31 +-
>  tests/qemu-iotests/151|  57 +--
>  tests/qemu-iotests/152|  10 +-
>  tests/qemu-iotests/155|  55 +-
>  tests/qemu-iotests/165|   8 +-
>  tests/qemu-iotests/196|   3 +-
>  tests/qemu-iotests/205|   6 +-
>  tests/qemu-iotests/218| 105 ++--
>  tests/qemu-iotests/245| 245 -
>  tests/qemu-iotests/264|  31 +-
>  tests/qemu-iotests/281|  21 +-
>  tests/qemu-iotests/295|  15 +-
>  tests/qemu-iotests/296|  15 +-
>  tests/qemu-iotests/298|  13 +-
>  tests/qemu-iotests/300|  54 +-
>  tests/qemu-iotests/iotests.py |   9 +-
>  .../tests/export-incoming-iothread|   6 +-
>  .../qemu-iotests/tests/graph-changes-while-io |   6 +-
>  tests/qemu-iotests/tests/image-fleecing   |   3 +-
>  .../tests/migrate-bitmaps-postcopy-test   |  31 +-
>  tests/qemu-iotests/tests/migrate-bitmaps-test |  47 +-
>  .../qemu-iotests/tests/migrate-during-backup  |  41 +-
>  .../qemu-iotests/tests/migration-permissions  |   9 +-
>  .../tests/mirror-ready-cancel-error   |  74 ++-
>  tests/qemu-iotests/tests/mirror-top-perms |  16 +-
>  tests/qemu-iotests/tests/nbd-multiconn|  12 +-
>  tests/qemu-iotests/tests/reopen-file  |   3 +-
>  .../qemu-iotests/tests/stream-error-on-reset  |   6 +-
>  41 files changed, 951 insertions(+), 1407 deletions(-)

Big but mechanical.  It would be worth amending the commit message to
describe how you found all these spots (in case someone backporting
this patch has to redo the work over a different subset of files based
on what has changed since the two trees diverged).

> 
> diff --git a/tests/avocado/vnc.py b/tests/avocado/vnc.py
> index aeeefc70be..862c8996a8 100644
> --- a/tests/avocado/vnc.py
> +++ b/tests/avocado/vnc.py
> @@ -88,9 +88,8 @@ def test_change_password(self):
>  self.vm.add_args('-nodefaults', '-S', '-vnc', ':0,password=on')
>  self.vm.launch()
>  self.assertTrue(self.vm.qmp('query-vnc')['return']['enabled'])
> -set_password_response = self.vm.qmp('change-vnc-password',
> -password='new_password')
> -self.assertEqual(set_password_response['return'], {})
> +self.vm.cmd('change-vnc-password',
> +password='new_password')

Indeed a nicer idiom, where you are able to use it (whether by
self.assertEqual or by self.assert_qmp).

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org

Re: [PATCH v6 13/14] tests/vm/basevm.py: use cmd() instead of qmp()

2023-10-05 Thread Eric Blake

On Thu, Oct 05, 2023 at 04:55:49PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> We don't expect failure here and need 'result' object. cmd() is better
> in this case.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  tests/vm/basevm.py | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/tests/vm/basevm.py b/tests/vm/basevm.py
> index a97e23b0ce..8aef4cff96 100644
> --- a/tests/vm/basevm.py
> +++ b/tests/vm/basevm.py
> @@ -312,8 +312,8 @@ def boot(self, img, extra_args=[]):
>  self._guest = guest
>  # Init console so we can start consuming the chars.
>  self.console_init()
> -usernet_info = guest.qmp("human-monitor-command",
> - command_line="info usernet").get("return")
> +usernet_info = guest.cmd("human-monitor-command",
> + command_line="info usernet")
>  self.ssh_port = get_info_usernet_hostfwd_port(usernet_info)

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org

Re: [PATCH v6 12/14] iotests.py: pause_job(): drop return value

2023-10-05 Thread Eric Blake

On Thu, Oct 05, 2023 at 04:55:48PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> The returned value is unused. It's simple to check by command
> 
>  git grep -B 3 '\.pause_job('
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  tests/qemu-iotests/iotests.py | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org

Re: [PATCH v6 11/14] iotests: drop some extra ** in qmp() call

2023-10-05 Thread Eric Blake

On Thu, Oct 05, 2023 at 04:55:47PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> qmp() method supports passing dict (if it doesn't need substituting
> '_' to '-' in keys). So, drop some extra '**' operators.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  tests/qemu-iotests/040|  4 +-
>  tests/qemu-iotests/041| 14 +++---
>  tests/qemu-iotests/129|  2 +-
>  tests/qemu-iotests/147|  2 +-
>  tests/qemu-iotests/155|  2 +-
>  tests/qemu-iotests/264| 12 ++---
>  tests/qemu-iotests/295|  5 +-
>  tests/qemu-iotests/296| 15 +++---
>  tests/qemu-iotests/tests/migrate-bitmaps-test |  4 +-
>  .../tests/mirror-ready-cancel-error   | 50 +--
>  10 files changed, 54 insertions(+), 56 deletions(-)
>

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org

Re: [PATCH v6 10/14] iotests: drop some occasional semicolons

2023-10-05 Thread Eric Blake

On Thu, Oct 05, 2023 at 04:55:46PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  tests/qemu-iotests/041 | 2 +-
>  tests/qemu-iotests/196 | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)

Maybe in the subject s/occasional/extra/

Reviewed-by: Eric Blake 

> 
> diff --git a/tests/qemu-iotests/041 b/tests/qemu-iotests/041
> index 4d7a829b65..550e4dc391 100755
> --- a/tests/qemu-iotests/041
> +++ b/tests/qemu-iotests/041
> @@ -1086,7 +1086,7 @@ class TestRepairQuorum(iotests.QMPTestCase):
>  def test_after_a_quorum_snapshot(self):
>  result = self.vm.qmp('blockdev-snapshot-sync', node_name='img1',
>   snapshot_file=quorum_snapshot_file,
> - snapshot_node_name="snap1");
> + snapshot_node_name="snap1")
>  self.assert_qmp(result, 'return', {})

I'm a bit surprised the linters don't pick up on this.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org

Re: [PATCH v6 09/14] iotests: refactor some common qmp result checks into generic pattern

2023-10-05 Thread Eric Blake

On Thu, Oct 05, 2023 at 04:55:45PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> To simplify further conversion.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  tests/qemu-iotests/040 | 3 ++-
>  tests/qemu-iotests/147 | 3 ++-
>  tests/qemu-iotests/155 | 4 ++--
>  tests/qemu-iotests/218 | 4 ++--
>  tests/qemu-iotests/296 | 3 ++-
>  5 files changed, 10 insertions(+), 7 deletions(-)
>

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org

Re: [PATCH v6 08/14] iotests: add some missed checks of qmp result

2023-10-05 Thread Eric Blake

On Thu, Oct 05, 2023 at 04:55:44PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  tests/qemu-iotests/041| 1 +
>  tests/qemu-iotests/151| 1 +
>  tests/qemu-iotests/152| 2 ++
>  tests/qemu-iotests/tests/migrate-bitmaps-test | 2 ++
>  4 files changed, 6 insertions(+)
> 

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org

Re: [PATCH v2 03/21] preallocate: Don't poll during permission updates

2023-10-05 Thread Vladimir Sementsov-Ogievskiy


On 11.09.23 12:46, Kevin Wolf wrote:

When the permission related BlockDriver callbacks are called, we are in
the middle of an operation traversing the block graph. Polling in such a
place is a very bad idea because the graph could change in unexpected
ways. In the future, callers will also hold the graph lock, which is
likely to turn polling into a deadlock.

So we need to get rid of calls to functions like bdrv_getlength() or
bdrv_truncate() there as these functions poll internally. They are
currently used so that when no parent has write/resize permissions on
the image any more, the preallocate filter drops the extra preallocated
area in the image file and gives up write/resize permissions itself.

In order to achieve this without polling in .bdrv_check_perm, don't
immediately truncate the image, but only schedule a BH to do so. The
filter keeps the write/resize permissions a bit longer now until the BH
has executed.

There is one case in which delaying doesn't work: Reopening the image
read-only. In this case, bs->file will likely be reopened read-only,
too, so keeping write permissions a bit longer on it doesn't work. But
we can already cover this case in preallocate_reopen_prepare() and not
rely on the permission updates for it.


Hmm, now I found one more "future" case.

I now try to rebase my "[PATCH v7 0/7] blockdev-replace" 
https://patchew.org/QEMU/20230421114102.884457-1-vsement...@yandex-team.ru/

And it breaks after this commit.

By accident, blockdev-replace series uses exactly "preallocate" filter to test 
insertion/removing of filters. And removing is broken now.

Removing is done as follows:

1. We have filter inserted: disk0 -file-> filter -file-> file0

2. blockdev-replace, replaces file child of disk0, so we should get the picture*: 
disk0 -file-> file0 <-file- filter

3. blockdev-del filter


But step [2] fails, as now preallocate filter doesn't drop permissions during 
the operation (postponing this for a while) and the picture* is impossible. 
Permission check fails.

Hmmm... Any idea how blockdev-replace and preallocate filter should work :) ? 
Maybe, doing truncation in .drain_begin() will help? Will try



Signed-off-by: Kevin Wolf 
Reviewed-by: Eric Blake 
Reviewed-by: Stefan Hajnoczi 
---



--
Best regards,
Vladimir

Re: [PATCH v6 07/14] iotests: QemuStorageDaemon: add cmd() method like in QEMUMachine.

2023-10-05 Thread Eric Blake

On Thu, Oct 05, 2023 at 04:55:43PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> Add similar method for consistency.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  tests/qemu-iotests/iotests.py | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org

Re: [PATCH v6 06/14] python/machine.py: upgrade vm.cmd() method

2023-10-05 Thread Eric Blake

On Thu, Oct 05, 2023 at 04:55:42PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> The method is not popular in iotests, we prefer use vm.qmp() and then
> check success by hand.. But that's not optimal. To simplify movement to

extra '.'

> vm.cmd() let's support same interface improvements like in vm.qmp().
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  python/qemu/machine/machine.py | 12 +++-
>  1 file changed, 11 insertions(+), 1 deletion(-)

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org

Re: [PATCH v3 03/10] migration: Refactor error handling in source return path

2023-10-05 Thread Peter Xu

On Thu, Oct 05, 2023 at 10:22:52AM +0200, Juan Quintela wrote:
> Peter Xu  wrote:
> > rp_state.error was a boolean used to show error happened in return path
> > thread.  That's not only duplicating error reporting (migrate_set_error),
> > but also not good enough in that we only do error_report() and set it to
> > true, we never can keep a history of the exact error and show it in
> > query-migrate.
> >
> > To make this better, a few things done:
> >
> >   - Use error_setg() rather than error_report() across the whole lifecycle
> > of return path thread, keeping the error in an Error*.
> 
> Good.
> 
> >   - Use migrate_set_error() to apply that captured error to the global
> > migration object when error occured in this thread.
> 
> Good.
> 
> >   - With above, no need to have mark_source_rp_bad(), remove it, alongside
> > with rp_state.error itself.
> 
> Good.
> 
> >  uint64_t ram_pagesize_summary(void);
> > -int ram_save_queue_pages(const char *rbname, ram_addr_t start, ram_addr_t 
> > len);
> > +int ram_save_queue_pages(const char *rbname, ram_addr_t start, ram_addr_t 
> > len,
> > + Error **errp);
> 
> 
> good.
> 
> > @@ -1793,37 +1782,36 @@ static void 
> > migrate_handle_rp_req_pages(MigrationState *ms, const char* rbname,
> >   */
> >  if (!QEMU_IS_ALIGNED(start, our_host_ps) ||
> >  !QEMU_IS_ALIGNED(len, our_host_ps)) {
> > -error_report("%s: Misaligned page request, start: " RAM_ADDR_FMT
> > - " len: %zd", __func__, start, len);
> > -mark_source_rp_bad(ms);
> > +error_setg(errp, "MIG_RP_MSG_REQ_PAGES: Misaligned page request, 
> > start:"
> > +   RAM_ADDR_FMT " len: %zd", start, len);
> >  return;
> >  }
> >  
> > -if (ram_save_queue_pages(rbname, start, len)) {
> > -mark_source_rp_bad(ms);
> > -}
> > +ram_save_queue_pages(rbname, start, len, errp);
> 
> ram_save_queue_pages() returns an int.
> I think this function should return an int.

Phil suggested something similar for the other patch, instead of also let
this function return int, I'll add one more patch to let it return boolean
to show whether there's an error, keeping the real error in errp.

> 
> Next is independent of this patch:
> 
> > -static int migrate_handle_rp_recv_bitmap(MigrationState *s, char 
> > *block_name)
> > +static int migrate_handle_rp_recv_bitmap(MigrationState *s, char 
> > *block_name,
> > + Error **errp)
> >  {
> >  RAMBlock *block = qemu_ram_block_by_name(block_name);
> >  
> >  if (!block) {
> > -error_report("%s: invalid block name '%s'", __func__, block_name);
> > +error_setg(errp, "MIG_RP_MSG_RECV_BITMAP has invalid block name 
> > '%s'",
> > +   block_name);
> >  return -EINVAL;
> 
> We sent -EINVAL.
> 
> >  }
> >  
> >  /* Fetch the received bitmap and refresh the dirty bitmap */
> > -return ram_dirty_bitmap_reload(s, block);
> > +return ram_dirty_bitmap_reload(s, block, errp);
> >  }
> >  
> > -static int migrate_handle_rp_resume_ack(MigrationState *s, uint32_t value)
> > +static int migrate_handle_rp_resume_ack(MigrationState *s,
> > +uint32_t value, Error **errp)
> >  {
> >  trace_source_return_path_thread_resume_ack(value);
> >  
> >  if (value != MIGRATION_RESUME_ACK_VALUE) {
> > -error_report("%s: illegal resume_ack value %"PRIu32,
> > - __func__, value);
> > +error_setg(errp, "illegal resume_ack value %"PRIu32, value);
> >  return -1;
> 
> And here -1.
> 
> On both callers we just check if it is different from zero.  We never
> use the return value as errno, so I think we should move to -1, if there
> is an error, that is what errp is for.

Right.  I'll switch all rp-return thread paths to use boolean as return, as
long as there's errp.

> 
> 
> > -/* Returns 0 if the RP was ok, otherwise there was an error on the RP */
> > -static int await_return_path_close_on_source(MigrationState *ms)
> > +static void await_return_path_close_on_source(MigrationState *ms)
> >  {
> > -int ret;
> > -
> >  if (!ms->rp_state.rp_thread_created) {
> > -return 0;
> > +return;
> >  }
> >  
> >  trace_migration_return_path_end_before();
> > @@ -2060,18 +2050,10 @@ static int 
> > await_return_path_close_on_source(MigrationState *ms)
> >  }
> >  }
> >  
> > -trace_await_return_path_close_on_source_joining();
> >  qemu_thread_join(>rp_state.rp_thread);
> >  ms->rp_state.rp_thread_created = false;
> > -trace_await_return_path_close_on_source_close();
> > -
> > -ret = ms->rp_state.error;
> > -ms->rp_state.error = false;
> > -
> >  migration_release_dst_files(ms);
> > -
> > -trace_migration_return_path_end_after(ret);
> > -return ret;
> > +trace_migration_return_path_end_after();
> >  }
> >  
> >  static inline

Re: [PATCH v3 03/10] migration: Refactor error handling in source return path

2023-10-05 Thread Peter Xu

On Thu, Oct 05, 2023 at 09:57:58AM -0300, Fabiano Rosas wrote:
> > @@ -2008,9 +1996,14 @@ static void *source_return_path_thread(void *opaque)
> >  }
> >  
> >  out:
> > -if (qemu_file_get_error(rp)) {
> > +if (err) {
> 
> Need to keep both checks here.

The next patch did that.  Let me squash that into this..

Thanks,

-- 
Peter Xu

Re: [PATCH v8 2/2] tpm: add backend for mssim

2023-10-05 Thread James Bottomley

On Thu, 2023-10-05 at 18:11 +0200, Philippe Mathieu-Daudé wrote:
> On 5/10/23 15:57, James Bottomley wrote:
> > On Thu, 2023-10-05 at 08:49 +0200, Philippe Mathieu-Daudé wrote:
> > > On 4/10/23 20:42, James Bottomley wrote:
> > > > From: James Bottomley 
[...]
> > > > +.. code-block:: console
> > > > +
> > > > +  qemu-system-x86_64  \
> > > > +    -tpmdev
> > > > "{'type':'mssim','id':'tpm0','command':{'type':'inet','host':'r
> > > > emot
> > > > e','port':'2321'},'control':{'type':'inet','host':'remote','por
> > > > t':'
> > > > 2322'}}" \
> > > > +    -device tpm-crb,tpmdev=tpm0
> > > 
> > > Did you test running this command line on a big-endian host?
> > 
> > Well no, big endian machines are rather rare nowadays.  However,
> > since the QIOChannelSocket abstraction is based on SocketAddress,
> > which is a qapi wrapper around strings, what makes you think the
> > endianness would matter?
> 
> You use ntoh/hton in tpm_mssim_handle_request(), so I wonder about
> the 'uint32_t cmd' in tpm_send_ctrl().

tpm_send_ctrl has a htonl for the send control command as well (The TPM
server is always network ordered, i.e. big endian).  The reason it
doesn't have one for the receive is that it only checks against zero
which is endian invariant, if that's what you're asking?

James

Re: [PATCH] MAINTANERS: Split vt82c686 out of fuloong2e

2023-10-05 Thread BALATON Zoltan


On Thu, 5 Oct 2023, Philippe Mathieu-Daudé wrote:

On 5/10/23 20:18, BALATON Zoltan wrote:

It is used by other machines not just fuloong2e so put it in a
separate section and add myself as reviewer.

Signed-off-by: BALATON Zoltan 
---
By the way, PIIX4 already has a section just above where I've added
this but some files are still listed in Malta. You may want to have a
look at that.

  MAINTAINERS | 11 ---
  1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index ea91f9e804..7f0e20fde6 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1307,10 +1307,7 @@ M: Philippe Mathieu-Daudé 
  R: Jiaxun Yang 
  S: Odd Fixes
  F: hw/mips/fuloong2e.c
-F: hw/isa/vt82c686.c
  F: hw/pci-host/bonito.c
-F: hw/usb/vt82c686-uhci-pci.c
-F: include/hw/isa/vt82c686.h
  F: include/hw/pci-host/bonito.h
  F: tests/avocado/machine_mips_fuloong2e.py
  @@ -2462,6 +2459,14 @@ S: Maintained
  F: hw/isa/piix4.c
  F: include/hw/southbridge/piix.h
  +VIA south bridges (VT82C686B, VT8231)
+M: Philippe Mathieu-Daudé 
+R: Jiaxun Yang 
+R: BALATON Zoltan 


Feel free to add yourself as maintainer if you rather.


I'm happy with you staying maintainer if you don't mind :-) (after all 
we'll still need you to send pull requests anyway). Just wanted to get 
cc'd on this to make sure my machines won't break.


Regards,
BALATON Zoltan


Reviewed-by: Philippe Mathieu-Daudé 


+F: hw/isa/vt82c686.c
+F: hw/usb/vt82c686-uhci-pci.c
+F: include/hw/isa/vt82c686.h
+
  Firmware configuration (fw_cfg)
  M: Philippe Mathieu-Daudé 
  R: Gerd Hoffmann

Re: [PATCH v6 05/14] python/qemu: rename command() to cmd()

2023-10-05 Thread Eric Blake

On Thu, Oct 05, 2023 at 04:55:41PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> Use a shorter name. We are going to move in iotests from qmp() to
> command() where possible. But command() is longer than qmp() and don't
> look better. Let's rename.
> 
> You can simply grep for '\.command(' and for 'def command(' to check
> that everything is updated (command() in tests/docker/docker.py is
> unrelated).
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> Reviewed-by: Daniel P. Berrangé 
> ---
>  docs/devel/testing.rst|  10 +-

There's a risk that this will need rebasing on top of any other test
changes being made in parallel; hopefully we can get this series in
sooner rather than later.

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org

Re: [PATCH v6 04/14] python: rename QEMUMonitorProtocol.cmd() to cmd_raw()

2023-10-05 Thread Eric Blake

On Thu, Oct 05, 2023 at 04:55:40PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> Having cmd() and command() methods in one class doesn't look good.
> Rename cmd() to cmd_raw(), to show its meaning better.
> 
> We also want to rename command() to cmd() in future, so this commit is
> a necessary step.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  python/qemu/machine/machine.py | 2 +-
>  python/qemu/qmp/legacy.py  | 4 ++--
>  tests/qemu-iotests/iotests.py  | 2 +-
>  3 files changed, 4 insertions(+), 4 deletions(-)

Reviewed-by: Eric Blake 

I was surprised how few users there are, but agree that this opens up
the door to let our common usage be the least typing.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org

Re: [PATCH v6 03/14] scripts/cpu-x86-uarch-abi.py: use .command() instead of .cmd()

2023-10-05 Thread Eric Blake

On Thu, Oct 05, 2023 at 04:55:39PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> Here we don't expect a failure. In case on failure we'll crash on

s/case on/case of/

> trying to access ['return']. Let's better use .command() that clearly
> raise on failure.

Maybe: s/Let's better /Better is to/; s/raise/raises/

> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  scripts/cpu-x86-uarch-abi.py | 12 ++--
>  1 file changed, 6 insertions(+), 6 deletions(-)
>

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org

[PULL 13/15] nbd/server: Refactor list of negotiated meta contexts

2023-10-05 Thread Eric Blake

Peform several minor refactorings of how the list of negotiated meta
contexts is managed, to make upcoming patches easier: Promote the
internal type NBDExportMetaContexts to the public opaque type
NBDMetaContexts, and mark exp const.  Use a shorter member name in
NBDClient.  Hoist calls to nbd_check_meta_context() earlier in their
callers, as the number of negotiated contexts may impact the flags
exposed in regards to an export, which in turn requires a new
parameter.  Drop a redundant parameter to nbd_negotiate_meta_queries.
No semantic change intended on the success path; on the failure path,
dropping context in nbd_check_meta_export even when reporting an error
is safer.

Signed-off-by: Eric Blake 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Message-ID: <20230925192229.3186470-24-ebl...@redhat.com>
---
 include/block/nbd.h |  1 +
 nbd/server.c| 55 -
 2 files changed, 31 insertions(+), 25 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index ba8724f5336..2006497f987 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -29,6 +29,7 @@
 typedef struct NBDExport NBDExport;
 typedef struct NBDClient NBDClient;
 typedef struct NBDClientConnection NBDClientConnection;
+typedef struct NBDMetaContexts NBDMetaContexts;

 extern const BlockExportDriver blk_exp_nbd;

diff --git a/nbd/server.c b/nbd/server.c
index d3eed6535bf..2719992db7b 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -105,11 +105,13 @@ struct NBDExport {

 static QTAILQ_HEAD(, NBDExport) exports = QTAILQ_HEAD_INITIALIZER(exports);

-/* NBDExportMetaContexts represents a list of contexts to be exported,
+/*
+ * NBDMetaContexts represents a list of meta contexts in use,
  * as selected by NBD_OPT_SET_META_CONTEXT. Also used for
- * NBD_OPT_LIST_META_CONTEXT. */
-typedef struct NBDExportMetaContexts {
-NBDExport *exp;
+ * NBD_OPT_LIST_META_CONTEXT.
+ */
+struct NBDMetaContexts {
+const NBDExport *exp; /* associated export */
 size_t count; /* number of negotiated contexts */
 bool base_allocation; /* export base:allocation context (block status) */
 bool allocation_depth; /* export qemu:allocation-depth */
@@ -117,7 +119,7 @@ typedef struct NBDExportMetaContexts {
 * export qemu:dirty-bitmap:,
 * sized by exp->nr_export_bitmaps
 */
-} NBDExportMetaContexts;
+};

 struct NBDClient {
 int refcount;
@@ -144,7 +146,7 @@ struct NBDClient {
 uint32_t check_align; /* If non-zero, check for aligned client requests */

 NBDMode mode;
-NBDExportMetaContexts export_meta;
+NBDMetaContexts contexts; /* Negotiated meta contexts */

 uint32_t opt; /* Current option being negotiated */
 uint32_t optlen; /* remaining length of data in ioc for the option being
@@ -455,10 +457,10 @@ static int nbd_negotiate_handle_list(NBDClient *client, 
Error **errp)
 return nbd_negotiate_send_rep(client, NBD_REP_ACK, errp);
 }

-static void nbd_check_meta_export(NBDClient *client)
+static void nbd_check_meta_export(NBDClient *client, NBDExport *exp)
 {
-if (client->exp != client->export_meta.exp) {
-client->export_meta.count = 0;
+if (exp != client->contexts.exp) {
+client->contexts.count = 0;
 }
 }

@@ -504,6 +506,7 @@ static int nbd_negotiate_handle_export_name(NBDClient 
*client, bool no_zeroes,
 error_setg(errp, "export not found");
 return -EINVAL;
 }
+nbd_check_meta_export(client, client->exp);

 myflags = client->exp->nbdflags;
 if (client->mode >= NBD_MODE_STRUCTURED) {
@@ -521,7 +524,6 @@ static int nbd_negotiate_handle_export_name(NBDClient 
*client, bool no_zeroes,

 QTAILQ_INSERT_TAIL(>exp->clients, client, next);
 blk_exp_ref(>exp->common);
-nbd_check_meta_export(client);

 return 0;
 }
@@ -641,6 +643,9 @@ static int nbd_negotiate_handle_info(NBDClient *client, 
Error **errp)
   errp, "export '%s' not present",
   sane_name);
 }
+if (client->opt == NBD_OPT_GO) {
+nbd_check_meta_export(client, exp);
+}

 /* Don't bother sending NBD_INFO_NAME unless client requested it */
 if (sendname) {
@@ -729,7 +734,6 @@ static int nbd_negotiate_handle_info(NBDClient *client, 
Error **errp)
 client->check_align = check_align;
 QTAILQ_INSERT_TAIL(>exp->clients, client, next);
 blk_exp_ref(>exp->common);
-nbd_check_meta_export(client);
 rc = 1;
 }
 return rc;
@@ -852,7 +856,7 @@ static bool nbd_strshift(const char **str, const char 
*prefix)
  * Handle queries to 'base' namespace. For now, only the base:allocation
  * context is available.  Return true if @query has been handled.
  */
-static bool nbd_meta_base_query(NBDClient *client, NBDExportMetaContexts *meta,
+static bool nbd_meta_base_query(NBDClient *client, NBDMetaContexts *meta,

[PULL 11/15] nbd/client: Accept 64-bit block status chunks

2023-10-05 Thread Eric Blake

Once extended mode is enabled, we need to accept 64-bit status replies
(even for replies that don't exceed a 32-bit length).  It is easier to
normalize narrow replies into wide format so that the rest of our code
only has to handle one width.  Although a server is non-compliant if
it sends a 64-bit reply in compact mode, or a 32-bit reply in extended
mode, it is still easy enough to tolerate these mismatches.

In normal execution, we are only requesting "base:allocation" which
never exceeds 32 bits for flag values. But during testing with
x-dirty-bitmap, we can force qemu to connect to some other context
that might have 64-bit status bit; however, we ignore those upper bits
(other than mapping qemu:allocation-depth into something that
'qemu-img map --output=json' can expose), and since that only affects
testing, we really don't bother with checking whether more than the
two least-significant bits are set.

Signed-off-by: Eric Blake 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Message-ID: <20230925192229.3186470-22-ebl...@redhat.com>
---
 block/nbd.c| 49 --
 block/trace-events |  1 +
 2 files changed, 35 insertions(+), 15 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index 76461430411..52ebc8b2f53 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -615,13 +615,17 @@ static int nbd_parse_offset_hole_payload(BDRVNBDState *s,
  */
 static int nbd_parse_blockstatus_payload(BDRVNBDState *s,
  NBDStructuredReplyChunk *chunk,
- uint8_t *payload, uint64_t 
orig_length,
- NBDExtent32 *extent, Error **errp)
+ uint8_t *payload, bool wide,
+ uint64_t orig_length,
+ NBDExtent64 *extent, Error **errp)
 {
 uint32_t context_id;
+uint32_t count;
+size_t ext_len = wide ? sizeof(*extent) : sizeof(NBDExtent32);
+size_t pay_len = sizeof(context_id) + wide * sizeof(count) + ext_len;

 /* The server succeeded, so it must have sent [at least] one extent */
-if (chunk->length < sizeof(context_id) + sizeof(*extent)) {
+if (chunk->length < pay_len) {
 error_setg(errp, "Protocol error: invalid payload for "
  "NBD_REPLY_TYPE_BLOCK_STATUS");
 return -EINVAL;
@@ -636,8 +640,15 @@ static int nbd_parse_blockstatus_payload(BDRVNBDState *s,
 return -EINVAL;
 }

-extent->length = payload_advance32();
-extent->flags = payload_advance32();
+if (wide) {
+count = payload_advance32();
+extent->length = payload_advance64();
+extent->flags = payload_advance64();
+} else {
+count = 0;
+extent->length = payload_advance32();
+extent->flags = payload_advance32();
+}

 if (extent->length == 0) {
 error_setg(errp, "Protocol error: server sent status chunk with "
@@ -658,7 +669,7 @@ static int nbd_parse_blockstatus_payload(BDRVNBDState *s,
  * (always a safe status, even if it loses information).
  */
 if (s->info.min_block && !QEMU_IS_ALIGNED(extent->length,
-   s->info.min_block)) {
+  s->info.min_block)) {
 trace_nbd_parse_blockstatus_compliance("extent length is unaligned");
 if (extent->length > s->info.min_block) {
 extent->length = QEMU_ALIGN_DOWN(extent->length,
@@ -672,13 +683,15 @@ static int nbd_parse_blockstatus_payload(BDRVNBDState *s,
 /*
  * We used NBD_CMD_FLAG_REQ_ONE, so the server should not have
  * sent us any more than one extent, nor should it have included
- * status beyond our request in that extent. However, it's easy
- * enough to ignore the server's noncompliance without killing the
+ * status beyond our request in that extent. Furthermore, a wide
+ * server should have replied with an accurate count (we left
+ * count at 0 for a narrow server).  However, it's easy enough to
+ * ignore the server's noncompliance without killing the
  * connection; just ignore trailing extents, and clamp things to
  * the length of our request.
  */
-if (chunk->length > sizeof(context_id) + sizeof(*extent)) {
-trace_nbd_parse_blockstatus_compliance("more than one extent");
+if (count != wide || chunk->length > pay_len) {
+trace_nbd_parse_blockstatus_compliance("unexpected extent count");
 }
 if (extent->length > orig_length) {
 extent->length = orig_length;
@@ -1124,7 +1137,7 @@ nbd_co_receive_cmdread_reply(BDRVNBDState *s, uint64_t 
cookie,

 static int coroutine_fn
 nbd_co_receive_blockstatus_reply(BDRVNBDState *s, uint64_t cookie,
- uint64_t length, NBDExtent32 *extent,
+ uint64_t length, NBDExtent64 *extent,

[PULL 12/15] nbd/client: Request extended headers during negotiation

2023-10-05 Thread Eric Blake

All the pieces are in place for a client to finally request extended
headers.  Note that we must not request extended headers when qemu-nbd
is used to connect to the kernel module (as nbd.ko does not expect
them, but expects us to do the negotiation in userspace before handing
the socket over to the kernel), but there is no harm in all other
clients requesting them.

Extended headers are not essential to the information collected during
'qemu-nbd --list', but probing for it gives us one more piece of
information in that output.  Update the iotests affected by the new
line of output.

Signed-off-by: Eric Blake 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Message-ID: <20230925192229.3186470-23-ebl...@redhat.com>
---
 nbd/client-connection.c   |  2 +-
 nbd/client.c  | 20 ++-
 qemu-nbd.c|  3 +++
 tests/qemu-iotests/223.out|  6 ++
 tests/qemu-iotests/233.out|  4 
 tests/qemu-iotests/241.out|  3 +++
 tests/qemu-iotests/307.out|  5 +
 .../tests/nbd-qemu-allocation.out |  1 +
 8 files changed, 38 insertions(+), 6 deletions(-)

diff --git a/nbd/client-connection.c b/nbd/client-connection.c
index aa0201b7107..f9da67c87e3 100644
--- a/nbd/client-connection.c
+++ b/nbd/client-connection.c
@@ -93,7 +93,7 @@ NBDClientConnection *nbd_client_connection_new(const 
SocketAddress *saddr,
 .do_negotiation = do_negotiation,

 .initial_info.request_sizes = true,
-.initial_info.mode = NBD_MODE_STRUCTURED,
+.initial_info.mode = NBD_MODE_EXTENDED,
 .initial_info.base_allocation = true,
 .initial_info.x_dirty_bitmap = g_strdup(x_dirty_bitmap),
 .initial_info.name = g_strdup(export_name ?: "")
diff --git a/nbd/client.c b/nbd/client.c
index a2f253062aa..29ffc609a4b 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -953,15 +953,23 @@ static int nbd_start_negotiate(QIOChannel *ioc, 
QCryptoTLSCreds *tlscreds,
 if (fixedNewStyle) {
 int result = 0;

+if (max_mode >= NBD_MODE_EXTENDED) {
+result = nbd_request_simple_option(ioc,
+   NBD_OPT_EXTENDED_HEADERS,
+   false, errp);
+if (result) {
+return result < 0 ? -EINVAL : NBD_MODE_EXTENDED;
+}
+}
 if (max_mode >= NBD_MODE_STRUCTURED) {
 result = nbd_request_simple_option(ioc,
NBD_OPT_STRUCTURED_REPLY,
false, errp);
-if (result < 0) {
-return -EINVAL;
+if (result) {
+return result < 0 ? -EINVAL : NBD_MODE_STRUCTURED;
 }
 }
-return result ? NBD_MODE_STRUCTURED : NBD_MODE_SIMPLE;
+return NBD_MODE_SIMPLE;
 } else {
 return NBD_MODE_EXPORT_NAME;
 }
@@ -1034,6 +1042,7 @@ int nbd_receive_negotiate(QIOChannel *ioc, 
QCryptoTLSCreds *tlscreds,
 }

 switch (info->mode) {
+case NBD_MODE_EXTENDED:
 case NBD_MODE_STRUCTURED:
 if (base_allocation) {
 result = nbd_negotiate_simple_meta_context(ioc, info, errp);
@@ -1144,7 +1153,7 @@ int nbd_receive_export_list(QIOChannel *ioc, 
QCryptoTLSCreds *tlscreds,

 *info = NULL;
 result = nbd_start_negotiate(ioc, tlscreds, hostname, ,
- NBD_MODE_STRUCTURED, NULL, errp);
+ NBD_MODE_EXTENDED, NULL, errp);
 if (tlscreds && sioc) {
 ioc = sioc;
 }
@@ -1155,6 +1164,7 @@ int nbd_receive_export_list(QIOChannel *ioc, 
QCryptoTLSCreds *tlscreds,
 switch ((NBDMode)result) {
 case NBD_MODE_SIMPLE:
 case NBD_MODE_STRUCTURED:
+case NBD_MODE_EXTENDED:
 /* newstyle - use NBD_OPT_LIST to populate array, then try
  * NBD_OPT_INFO on each array member. If structured replies
  * are enabled, also try NBD_OPT_LIST_META_CONTEXT. */
@@ -1191,7 +1201,7 @@ int nbd_receive_export_list(QIOChannel *ioc, 
QCryptoTLSCreds *tlscreds,
 break;
 }

-if (result == NBD_MODE_STRUCTURED &&
+if (result >= NBD_MODE_STRUCTURED &&
 nbd_list_meta_contexts(ioc, [i], errp) < 0) {
 goto out;
 }
diff --git a/qemu-nbd.c b/qemu-nbd.c
index 54faa87a0c0..1a39bb8fac0 100644
--- a/qemu-nbd.c
+++ b/qemu-nbd.c
@@ -235,6 +235,9 @@ static int qemu_nbd_client_list(SocketAddress *saddr, 
QCryptoTLSCreds *tls,
 printf("  opt block: %u\n", list[i].opt_block);
 printf("  max block: %u\n", list[i].max_block);
 }
+printf("  transaction size: %s\n",
+   list[i].mode >=

[PULL 06/15] nbd/server: Prepare to send extended header replies

2023-10-05 Thread Eric Blake

Although extended mode is not yet enabled, once we do turn it on, we
need to reply with extended headers to all messages.  Update the low
level entry points necessary so that all other callers automatically
get the right header based on the current mode.

Signed-off-by: Eric Blake 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Message-ID: <20230925192229.3186470-17-ebl...@redhat.com>
---
 nbd/server.c | 30 ++
 1 file changed, 22 insertions(+), 8 deletions(-)

diff --git a/nbd/server.c b/nbd/server.c
index 501749d62b5..910c48c6467 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -1938,8 +1938,6 @@ static inline void set_be_chunk(NBDClient *client, struct 
iovec *iov,
 size_t niov, uint16_t flags, uint16_t type,
 NBDRequest *request)
 {
-/* TODO - handle structured vs. extended replies */
-NBDStructuredReplyChunk *chunk = iov->iov_base;
 size_t i, length = 0;

 for (i = 1; i < niov; i++) {
@@ -1947,12 +1945,26 @@ static inline void set_be_chunk(NBDClient *client, 
struct iovec *iov,
 }
 assert(length <= NBD_MAX_BUFFER_SIZE + sizeof(NBDStructuredReadData));

-iov[0].iov_len = sizeof(*chunk);
-stl_be_p(>magic, NBD_STRUCTURED_REPLY_MAGIC);
-stw_be_p(>flags, flags);
-stw_be_p(>type, type);
-stq_be_p(>cookie, request->cookie);
-stl_be_p(>length, length);
+if (client->mode >= NBD_MODE_EXTENDED) {
+NBDExtendedReplyChunk *chunk = iov->iov_base;
+
+iov[0].iov_len = sizeof(*chunk);
+stl_be_p(>magic, NBD_EXTENDED_REPLY_MAGIC);
+stw_be_p(>flags, flags);
+stw_be_p(>type, type);
+stq_be_p(>cookie, request->cookie);
+stq_be_p(>offset, request->from);
+stq_be_p(>length, length);
+} else {
+NBDStructuredReplyChunk *chunk = iov->iov_base;
+
+iov[0].iov_len = sizeof(*chunk);
+stl_be_p(>magic, NBD_STRUCTURED_REPLY_MAGIC);
+stw_be_p(>flags, flags);
+stw_be_p(>type, type);
+stq_be_p(>cookie, request->cookie);
+stl_be_p(>length, length);
+}
 }

 static int coroutine_fn nbd_co_send_chunk_done(NBDClient *client,
@@ -2512,6 +2524,8 @@ static coroutine_fn int nbd_send_generic_reply(NBDClient 
*client,
 {
 if (client->mode >= NBD_MODE_STRUCTURED && ret < 0) {
 return nbd_co_send_chunk_error(client, request, -ret, error_msg, errp);
+} else if (client->mode >= NBD_MODE_EXTENDED) {
+return nbd_co_send_chunk_done(client, request, errp);
 } else {
 return nbd_co_send_simple_reply(client, request, ret < 0 ? -ret : 0,
 NULL, 0, errp);
-- 
2.41.0

[PULL 09/15] nbd/client: Plumb errp through nbd_receive_replies

2023-10-05 Thread Eric Blake

Instead of ignoring the low-level error just to refabricate our own
message to pass to the caller, we can just plumb the caller's errp
down to the low level.

Signed-off-by: Eric Blake 
Message-ID: <20230925192229.3186470-20-ebl...@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 block/nbd.c | 18 --
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index 4a7f37da1c6..22d3cb11ac8 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -416,7 +416,8 @@ static void coroutine_fn GRAPH_RDLOCK 
nbd_reconnect_attempt(BDRVNBDState *s)
 reconnect_delay_timer_del(s);
 }

-static coroutine_fn int nbd_receive_replies(BDRVNBDState *s, uint64_t cookie)
+static coroutine_fn int nbd_receive_replies(BDRVNBDState *s, uint64_t cookie,
+Error **errp)
 {
 int ret;
 uint64_t ind = COOKIE_TO_INDEX(cookie), ind2;
@@ -457,20 +458,25 @@ static coroutine_fn int nbd_receive_replies(BDRVNBDState 
*s, uint64_t cookie)

 /* We are under mutex and cookie is 0. We have to do the dirty work. */
 assert(s->reply.cookie == 0);
-ret = nbd_receive_reply(s->bs, s->ioc, >reply, NULL);
-if (ret <= 0) {
-ret = ret ? ret : -EIO;
+ret = nbd_receive_reply(s->bs, s->ioc, >reply, errp);
+if (ret == 0) {
+ret = -EIO;
+error_setg(errp, "server dropped connection");
+}
+if (ret < 0) {
 nbd_channel_error(s, ret);
 return ret;
 }
 if (nbd_reply_is_structured(>reply) &&
 s->info.mode < NBD_MODE_STRUCTURED) {
 nbd_channel_error(s, -EINVAL);
+error_setg(errp, "unexpected structured reply");
 return -EINVAL;
 }
 ind2 = COOKIE_TO_INDEX(s->reply.cookie);
 if (ind2 >= MAX_NBD_REQUESTS || !s->requests[ind2].coroutine) {
 nbd_channel_error(s, -EINVAL);
+error_setg(errp, "unexpected cookie value");
 return -EINVAL;
 }
 if (s->reply.cookie == cookie) {
@@ -842,9 +848,9 @@ static coroutine_fn int nbd_co_do_receive_one_chunk(
 }
 *request_ret = 0;

-ret = nbd_receive_replies(s, cookie);
+ret = nbd_receive_replies(s, cookie, errp);
 if (ret < 0) {
-error_setg(errp, "Connection closed");
+error_prepend(errp, "Connection closed: ");
 return -EIO;
 }
 assert(s->ioc);
-- 
2.41.0

[PULL 07/15] nbd/server: Support 64-bit block status

2023-10-05 Thread Eric Blake

The NBD spec states that if the client negotiates extended headers,
the server must avoid NBD_REPLY_TYPE_BLOCK_STATUS and instead use
NBD_REPLY_TYPE_BLOCK_STATUS_EXT which supports 64-bit lengths, even if
the reply does not need more than 32 bits.  As of this patch,
client->mode is still never NBD_MODE_EXTENDED, so the code added here
does not take effect until the next patch enables negotiation.

For now, all metacontexts that we know how to export never populate
more than 32 bits of information, so we don't have to worry about
NBD_REP_ERR_EXT_HEADER_REQD or filtering during handshake, and we
always send all zeroes for the upper 32 bits of status during
NBD_CMD_BLOCK_STATUS.

Note that we previously had some interesting size-juggling on call
chains, such as:

nbd_co_send_block_status(uint32_t length)
-> blockstatus_to_extents(uint32_t bytes)
  -> bdrv_block_status_above(bytes, _t num)
  -> nbd_extent_array_add(uint64_t num)
-> store num in 32-bit length

But we were lucky that it never overflowed: bdrv_block_status_above
never sets num larger than bytes, and we had previously been capping
'bytes' at 32 bits (since the protocol does not allow sending a larger
request without extended headers).  This patch adds some assertions
that ensure we continue to avoid overflowing 32 bits for a narrow
client, while fully utilizing 64-bits all the way through when the
client understands that.  Even in 64-bit math, overflow is not an
issue, because all lengths are coming from the block layer, and we
know that the block layer does not support images larger than off_t
(if lengths were coming from the network, the story would be
different).

Signed-off-by: Eric Blake 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Message-ID: <20230925192229.3186470-18-ebl...@redhat.com>
---
 nbd/server.c | 108 ++-
 1 file changed, 82 insertions(+), 26 deletions(-)

diff --git a/nbd/server.c b/nbd/server.c
index 910c48c6467..183efe27bf8 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -2103,20 +2103,24 @@ static int coroutine_fn 
nbd_co_send_sparse_read(NBDClient *client,
 }

 typedef struct NBDExtentArray {
-NBDExtent32 *extents;
+NBDExtent64 *extents;
 unsigned int nb_alloc;
 unsigned int count;
 uint64_t total_length;
+bool extended;
 bool can_add;
 bool converted_to_be;
 } NBDExtentArray;

-static NBDExtentArray *nbd_extent_array_new(unsigned int nb_alloc)
+static NBDExtentArray *nbd_extent_array_new(unsigned int nb_alloc,
+NBDMode mode)
 {
 NBDExtentArray *ea = g_new0(NBDExtentArray, 1);

+assert(mode >= NBD_MODE_STRUCTURED);
 ea->nb_alloc = nb_alloc;
-ea->extents = g_new(NBDExtent32, nb_alloc);
+ea->extents = g_new(NBDExtent64, nb_alloc);
+ea->extended = mode >= NBD_MODE_EXTENDED;
 ea->can_add = true;

 return ea;
@@ -2135,15 +2139,36 @@ static void 
nbd_extent_array_convert_to_be(NBDExtentArray *ea)
 int i;

 assert(!ea->converted_to_be);
+assert(ea->extended);
 ea->can_add = false;
 ea->converted_to_be = true;

 for (i = 0; i < ea->count; i++) {
-ea->extents[i].flags = cpu_to_be32(ea->extents[i].flags);
-ea->extents[i].length = cpu_to_be32(ea->extents[i].length);
+ea->extents[i].length = cpu_to_be64(ea->extents[i].length);
+ea->extents[i].flags = cpu_to_be64(ea->extents[i].flags);
 }
 }

+/* Further modifications of the array after conversion are abandoned */
+static NBDExtent32 *nbd_extent_array_convert_to_narrow(NBDExtentArray *ea)
+{
+int i;
+NBDExtent32 *extents = g_new(NBDExtent32, ea->count);
+
+assert(!ea->converted_to_be);
+assert(!ea->extended);
+ea->can_add = false;
+ea->converted_to_be = true;
+
+for (i = 0; i < ea->count; i++) {
+assert((ea->extents[i].length | ea->extents[i].flags) <= UINT32_MAX);
+extents[i].length = cpu_to_be32(ea->extents[i].length);
+extents[i].flags = cpu_to_be32(ea->extents[i].flags);
+}
+
+return extents;
+}
+
 /*
  * Add extent to NBDExtentArray. If extent can't be added (no available space),
  * return -1.
@@ -2154,19 +2179,27 @@ static void 
nbd_extent_array_convert_to_be(NBDExtentArray *ea)
  * would result in an incorrect range reported to the client)
  */
 static int nbd_extent_array_add(NBDExtentArray *ea,
-uint32_t length, uint32_t flags)
+uint64_t length, uint32_t flags)
 {
 assert(ea->can_add);

 if (!length) {
 return 0;
 }
+if (!ea->extended) {
+assert(length <= UINT32_MAX);
+}

 /* Extend previous extent if flags are the same */
 if (ea->count > 0 && flags == ea->extents[ea->count - 1].flags) {
-uint64_t sum = (uint64_t)length + ea->extents[ea->count - 1].length;
+uint64_t sum = length + ea->extents[ea->count - 1].length;

-if (sum <= UINT32_MAX) {
+/*
+ * sum

[PULL 03/15] mailmap: Fix BALATON Zoltan author email

2023-10-05 Thread Eric Blake

This fixes authorship of commits 5cbd51a5 and friends, where the
qemu-ppc mailing list rewrote the "From:" field in the corresponding
patches.  See commit 3bd2608db7 ("maint: Add .mailmap entries for
patches claiming list authorship") for explanation.

Signed-off-by: Eric Blake 
Message-ID: <20230927143815.3397386-8-ebl...@redhat.com>
---
 .mailmap | 1 +
 1 file changed, 1 insertion(+)

diff --git a/.mailmap b/.mailmap
index fadf6e74506..d2149592887 100644
--- a/.mailmap
+++ b/.mailmap
@@ -59,6 +59,7 @@ Julia Suvorova  Julia Suvorova via Qemu-devel 
 Justin Terry (VM) via Qemu-devel 

 Stefan Weil  Stefan Weil via 
 Andrey Drobyshev  Andrey Drobyshev via 

+BALATON Zoltan  BALATON Zoltan via 

 # Next, replace old addresses by a more recent one.
 Aleksandar Markovic  

-- 
2.41.0

[PULL 00/15] NBD patches through 2023-10-05

2023-10-05 Thread Eric Blake

The following changes since commit 2f3913f4b2ad74baeb5a6f1d36efbd9ecdf1057d:

  Merge tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu 
into staging (2023-10-05 09:01:01 -0400)

are available in the Git repository at:

  https://repo.or.cz/qemu/ericb.git tags/pull-nbd-2023-10-05

for you to fetch changes up to 2dcbb11b399ada51f734229b612e4f561a2aae0a:

  nbd/server: Add FLAG_PAYLOAD support to CMD_BLOCK_STATUS (2023-10-05 11:02:08 
-0500)


NBD patches for 2023-10-05

- various: mailmap cleanups
- Eric Blake: enable use of NBD 64-bit extended headers


Andrey Drobyshev (1):
  mailmap: Fix Andrey Drobyshev author email

Eric Blake (14):
  maint: Tweak comments in mailmap regarding SPF
  mailmap: Fix BALATON Zoltan author email
  nbd/server: Support a request payload
  nbd/server: Prepare to receive extended header requests
  nbd/server: Prepare to send extended header replies
  nbd/server: Support 64-bit block status
  nbd/server: Enable initial support for extended headers
  nbd/client: Plumb errp through nbd_receive_replies
  nbd/client: Initial support for extended headers
  nbd/client: Accept 64-bit block status chunks
  nbd/client: Request extended headers during negotiation
  nbd/server: Refactor list of negotiated meta contexts
  nbd/server: Prepare for per-request filtering of BLOCK_STATUS
  nbd/server: Add FLAG_PAYLOAD support to CMD_BLOCK_STATUS

 docs/interop/nbd.txt |   1 +
 include/block/nbd.h  |   5 +-
 nbd/nbd-internal.h   |   5 +-
 block/nbd.c  |  67 ++--
 nbd/client-connection.c  |   2 +-
 nbd/client.c | 124 ---
 nbd/server.c | 426 ++-
 qemu-nbd.c   |   4 +
 .mailmap |  16 +-
 block/trace-events   |   1 +
 nbd/trace-events |   5 +-
 tests/qemu-iotests/223.out   |  18 +-
 tests/qemu-iotests/233.out   |   4 +
 tests/qemu-iotests/241.out   |   3 +
 tests/qemu-iotests/307.out   |  15 +-
 tests/qemu-iotests/tests/nbd-qemu-allocation.out |   3 +-
 16 files changed, 538 insertions(+), 161 deletions(-)

-- 
2.41.0

[PULL 02/15] maint: Tweak comments in mailmap regarding SPF

2023-10-05 Thread Eric Blake

Documenting that we should not add new lines to work around SPF
rewrites sounds foreboding; the intent is instead that new lines here
are okay, but indicate a second problem elsewhere in our build process
that we should also consider fixing at the same time, to keep the
section from growing without bounds.  While we have been doing that
for qemu-devel for a while, we jut recently fixed that for qemu-block:
https://git.linaro.org/people/pmaydell/misc-scripts.git/commit/?id=f9a317392

Mentioning DMARC alongside SPF may also help people grep for this
scenario, as well as documenting the 'git config' workaround that can
be used by submitters to avoid the munging issue in the first place.

Note the subtlety: 'git commit' sets authorship information based on
user.name and user.email (where name is usually unquoted); while 'git
send-email' includes a body 'From:' line only when sendemail.from is
present but differs from authorship information.  Hence the use of
quotes in sendemail.from (not a semantic change to email, but enough
of a difference to add the body 'From:').

Fixes: 3bd2608d ("maint: Add .mailmap entries for patches claiming list 
authorship")
CC: Andrey Drobyshev 
Cc: Peter Maydell 
Signed-off-by: Eric Blake 
Message-ID: <20230927143815.3397386-7-ebl...@redhat.com>
Reviewed-by: Peter Maydell 
---
 .mailmap | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/.mailmap b/.mailmap
index 04a7feb005b..fadf6e74506 100644
--- a/.mailmap
+++ b/.mailmap
@@ -40,7 +40,19 @@ Nick Hudson  hn...@vmware.com 

 # for the cvs2svn initialization commit e63c3dc74bf.

 # Next, translate a few commits where mailman rewrote the From: line due
-# to strict SPF, although we prefer to avoid adding more entries like that.
+# to strict SPF and DMARC.  Usually, our build process should be flagging
+# commits like these before maintainer merges; if you find the need to add
+# a line here, please also report a bug against the part of the build
+# process that let the mis-attribution slip through in the first place.
+#
+# If the mailing list munges your emails, use:
+#   git config sendemail.from '"Your Name" '
+# the use of "" in that line will differ from the typically unquoted
+# 'git config user.name', which in turn is sufficient for 'git send-email'
+# to add an extra From: line in the body of your email that takes
+# precedence over any munged From: in the mail's headers.
+# See https://lists.openembedded.org/g/openembedded-core/message/166515
+# and https://lists.gnu.org/archive/html/qemu-devel/2023-09/msg06784.html
 Ed Swierk  Ed Swierk via Qemu-devel 

 Ian McKellar  Ian McKellar via Qemu-devel 

 Julia Suvorova  Julia Suvorova via Qemu-devel 

-- 
2.41.0

[PULL 04/15] nbd/server: Support a request payload

2023-10-05 Thread Eric Blake

Upcoming additions to support NBD 64-bit effect lengths allow for the
possibility to distinguish between payload length (capped at 32M) and
effect length (64 bits, although we generally assume 63 bits because
of off_t limitations).  Without that extension, only the NBD_CMD_WRITE
request has a payload; but with the extension, it makes sense to allow
at least NBD_CMD_BLOCK_STATUS to have both a payload and effect length
in a future patch (where the payload is a limited-size struct that in
turn gives the real effect length as well as a subset of known ids for
which status is requested).  Other future NBD commands may also have a
request payload, so the 64-bit extension introduces a new
NBD_CMD_FLAG_PAYLOAD_LEN that distinguishes between whether the header
length is a payload length or an effect length, rather than
hard-coding the decision based on the command.

According to the spec, a client should never send a command with a
payload without the negotiation phase proving such extension is
available.  So in the unlikely event the bit is set or cleared
incorrectly, the client is already at fault; if the client then
provides the payload, we can gracefully consume it off the wire and
fail the command with NBD_EINVAL (subsequent checks for magic numbers
ensure we are still in sync), while if the client fails to send
payload we block waiting for it (basically deadlocking our connection
to the bad client, but not negatively impacting our ability to service
other clients, so not a security risk).  Note that we do not support
the payload version of BLOCK_STATUS yet.

This patch also fixes a latent bug introduced in b2578459: once
request->len can be 64 bits, assigning it to a 32-bit payload_len can
cause wraparound to 0 which then sets req->complete prematurely;
thankfully, the bug was not possible back then (it takes this and
later patches to even allow request->len larger than 32 bits; and
since previously the only 'payload_len = request->len' assignment was
in NBD_CMD_WRITE which also sets check_length, which in turn rejects
lengths larger than 32M before relying on any possibly-truncated value
stored in payload_len).

Signed-off-by: Eric Blake 
Message-ID: <20230925192229.3186470-15-ebl...@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy 
[eblake: enhance comment on handling client error, fix type bug]
Signed-off-by: Eric Blake 
---
 nbd/server.c | 42 +-
 nbd/trace-events |  1 +
 2 files changed, 38 insertions(+), 5 deletions(-)

diff --git a/nbd/server.c b/nbd/server.c
index 7a6f95071f8..56e9b41828e 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -2322,10 +2322,12 @@ static int coroutine_fn 
nbd_co_receive_request(NBDRequestData *req,
Error **errp)
 {
 NBDClient *client = req->client;
+bool extended_with_payload;
 bool check_length = false;
 bool check_rofs = false;
 bool allocate_buffer = false;
-unsigned payload_len = 0;
+bool payload_okay = false;
+uint64_t payload_len = 0;
 int valid_flags = NBD_CMD_FLAG_FUA;
 int ret;

@@ -2338,6 +2340,13 @@ static int coroutine_fn 
nbd_co_receive_request(NBDRequestData *req,

 trace_nbd_co_receive_request_decode_type(request->cookie, request->type,
  nbd_cmd_lookup(request->type));
+extended_with_payload = client->mode >= NBD_MODE_EXTENDED &&
+request->flags & NBD_CMD_FLAG_PAYLOAD_LEN;
+if (extended_with_payload) {
+payload_len = request->len;
+check_length = true;
+}
+
 switch (request->type) {
 case NBD_CMD_DISC:
 /* Special case: we're going to disconnect without a reply,
@@ -2354,6 +2363,15 @@ static int coroutine_fn 
nbd_co_receive_request(NBDRequestData *req,
 break;

 case NBD_CMD_WRITE:
+if (client->mode >= NBD_MODE_EXTENDED) {
+if (!extended_with_payload) {
+/* The client is noncompliant. Trace it, but proceed. */
+trace_nbd_co_receive_ext_payload_compliance(request->from,
+request->len);
+}
+valid_flags |= NBD_CMD_FLAG_PAYLOAD_LEN;
+}
+payload_okay = true;
 payload_len = request->len;
 check_length = true;
 allocate_buffer = true;
@@ -2395,6 +2413,16 @@ static int coroutine_fn 
nbd_co_receive_request(NBDRequestData *req,
request->len, NBD_MAX_BUFFER_SIZE);
 return -EINVAL;
 }
+if (payload_len && !payload_okay) {
+/*
+ * For now, we don't support payloads on other commands; but
+ * we can keep the connection alive by ignoring the payload.
+ * We will fail the command later with NBD_EINVAL for the use
+ * of an unsupported flag (and not for access beyond bounds).
+ */
+assert(request->type != NBD_CMD_WRITE);
+request->len = 0;
+}
 if

[PULL 05/15] nbd/server: Prepare to receive extended header requests

2023-10-05 Thread Eric Blake

Although extended mode is not yet enabled, once we do turn it on, we
need to accept extended requests for all messages.  Previous patches
have already taken care of supporting 64-bit lengths, now we just need
to read it off the wire.

Note that this implementation will block indefinitely on a buggy
client that sends a non-extended payload (that is, we try to read a
full packet before we ever check the magic number, but a client that
mistakenly sends a simple request after negotiating extended headers
doesn't send us enough bytes), but it's no different from any other
client that stops talking to us partway through a packet and thus not
worth coding around.

Signed-off-by: Eric Blake 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Message-ID: <20230925192229.3186470-16-ebl...@redhat.com>
---
 nbd/nbd-internal.h |  5 -
 nbd/server.c   | 43 ++-
 2 files changed, 34 insertions(+), 14 deletions(-)

diff --git a/nbd/nbd-internal.h b/nbd/nbd-internal.h
index 133b1d94b50..dfa02f77ee4 100644
--- a/nbd/nbd-internal.h
+++ b/nbd/nbd-internal.h
@@ -34,8 +34,11 @@
  * https://github.com/yoe/nbd/blob/master/doc/proto.md
  */

-/* Size of all NBD_OPT_*, without payload */
+/* Size of all compact NBD_CMD_*, without payload */
 #define NBD_REQUEST_SIZE(4 + 2 + 2 + 8 + 8 + 4)
+/* Size of all extended NBD_CMD_*, without payload */
+#define NBD_EXTENDED_REQUEST_SIZE   (4 + 2 + 2 + 8 + 8 + 8)
+
 /* Size of all NBD_REP_* sent in answer to most NBD_OPT_*, without payload */
 #define NBD_REPLY_SIZE  (4 + 4 + 8)
 /* Size of reply to NBD_OPT_EXPORT_NAME */
diff --git a/nbd/server.c b/nbd/server.c
index 56e9b41828e..501749d62b5 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -1411,11 +1411,13 @@ nbd_read_eof(NBDClient *client, void *buffer, size_t 
size, Error **errp)
 static int coroutine_fn nbd_receive_request(NBDClient *client, NBDRequest 
*request,
 Error **errp)
 {
-uint8_t buf[NBD_REQUEST_SIZE];
-uint32_t magic;
+uint8_t buf[NBD_EXTENDED_REQUEST_SIZE];
+uint32_t magic, expect;
 int ret;
+size_t size = client->mode >= NBD_MODE_EXTENDED ?
+NBD_EXTENDED_REQUEST_SIZE : NBD_REQUEST_SIZE;

-ret = nbd_read_eof(client, buf, sizeof(buf), errp);
+ret = nbd_read_eof(client, buf, size, errp);
 if (ret < 0) {
 return ret;
 }
@@ -1423,13 +1425,21 @@ static int coroutine_fn nbd_receive_request(NBDClient 
*client, NBDRequest *reque
 return -EIO;
 }

-/* Request
-   [ 0 ..  3]   magic   (NBD_REQUEST_MAGIC)
-   [ 4 ..  5]   flags   (NBD_CMD_FLAG_FUA, ...)
-   [ 6 ..  7]   type(NBD_CMD_READ, ...)
-   [ 8 .. 15]   cookie
-   [16 .. 23]   from
-   [24 .. 27]   len
+/*
+ * Compact request
+ *  [ 0 ..  3]   magic   (NBD_REQUEST_MAGIC)
+ *  [ 4 ..  5]   flags   (NBD_CMD_FLAG_FUA, ...)
+ *  [ 6 ..  7]   type(NBD_CMD_READ, ...)
+ *  [ 8 .. 15]   cookie
+ *  [16 .. 23]   from
+ *  [24 .. 27]   len
+ * Extended request
+ *  [ 0 ..  3]   magic   (NBD_EXTENDED_REQUEST_MAGIC)
+ *  [ 4 ..  5]   flags   (NBD_CMD_FLAG_FUA, NBD_CMD_FLAG_PAYLOAD_LEN, ...)
+ *  [ 6 ..  7]   type(NBD_CMD_READ, ...)
+ *  [ 8 .. 15]   cookie
+ *  [16 .. 23]   from
+ *  [24 .. 31]   len
  */

 magic = ldl_be_p(buf);
@@ -1437,13 +1447,20 @@ static int coroutine_fn nbd_receive_request(NBDClient 
*client, NBDRequest *reque
 request->type   = lduw_be_p(buf + 6);
 request->cookie = ldq_be_p(buf + 8);
 request->from   = ldq_be_p(buf + 16);
-request->len= (uint32_t)ldl_be_p(buf + 24); /* widen 32 to 64 bits */
+if (client->mode >= NBD_MODE_EXTENDED) {
+request->len = ldq_be_p(buf + 24);
+expect = NBD_EXTENDED_REQUEST_MAGIC;
+} else {
+request->len = (uint32_t)ldl_be_p(buf + 24); /* widen 32 to 64 bits */
+expect = NBD_REQUEST_MAGIC;
+}

 trace_nbd_receive_request(magic, request->flags, request->type,
   request->from, request->len);

-if (magic != NBD_REQUEST_MAGIC) {
-error_setg(errp, "invalid magic (got 0x%" PRIx32 ")", magic);
+if (magic != expect) {
+error_setg(errp, "invalid magic (got 0x%" PRIx32 ", expected 0x%"
+   PRIx32 ")", magic, expect);
 return -EINVAL;
 }
 return 0;
-- 
2.41.0

[PULL 14/15] nbd/server: Prepare for per-request filtering of BLOCK_STATUS

2023-10-05 Thread Eric Blake

The next commit will add support for the optional extension
NBD_CMD_FLAG_PAYLOAD during NBD_CMD_BLOCK_STATUS, where the client can
request that the server only return a subset of negotiated contexts,
rather than all contexts.  To make that task easier, this patch
populates the list of contexts to return on a per-command basis (for
now, identical to the full set of negotiated contexts).

Signed-off-by: Eric Blake 
Message-ID: <20230925192229.3186470-25-ebl...@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 include/block/nbd.h |  1 +
 nbd/server.c| 22 +-
 2 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index 2006497f987..4e7bd6342f9 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -77,6 +77,7 @@ typedef struct NBDRequest {
 uint16_t flags; /* NBD_CMD_FLAG_* */
 uint16_t type;  /* NBD_CMD_* */
 NBDMode mode;   /* Determines which network representation to use */
+NBDMetaContexts *contexts; /* Used by NBD_CMD_BLOCK_STATUS */
 } NBDRequest;

 typedef struct NBDSimpleReply {
diff --git a/nbd/server.c b/nbd/server.c
index 2719992db7b..2dce9c3ad6a 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -2505,6 +2505,7 @@ static int coroutine_fn 
nbd_co_receive_request(NBDRequestData *req,
 break;

 case NBD_CMD_BLOCK_STATUS:
+request->contexts = >contexts;
 valid_flags |= NBD_CMD_FLAG_REQ_ONE;
 break;

@@ -2748,17 +2749,18 @@ static coroutine_fn int nbd_handle_request(NBDClient 
*client,
   "discard failed", errp);

 case NBD_CMD_BLOCK_STATUS:
+assert(request->contexts);
 if (!request->len) {
 return nbd_send_generic_reply(client, request, -EINVAL,
   "need non-zero length", errp);
 }
 assert(client->mode >= NBD_MODE_EXTENDED ||
request->len <= UINT32_MAX);
-if (client->contexts.count) {
+if (request->contexts->count) {
 bool dont_fragment = request->flags & NBD_CMD_FLAG_REQ_ONE;
-int contexts_remaining = client->contexts.count;
+int contexts_remaining = request->contexts->count;

-if (client->contexts.base_allocation) {
+if (request->contexts->base_allocation) {
 ret = nbd_co_send_block_status(client, request,
exp->common.blk,
request->from,
@@ -2771,7 +2773,7 @@ static coroutine_fn int nbd_handle_request(NBDClient 
*client,
 }
 }

-if (client->contexts.allocation_depth) {
+if (request->contexts->allocation_depth) {
 ret = nbd_co_send_block_status(client, request,
exp->common.blk,
request->from, request->len,
@@ -2784,8 +2786,9 @@ static coroutine_fn int nbd_handle_request(NBDClient 
*client,
 }
 }

+assert(request->contexts->exp == client->exp);
 for (i = 0; i < client->exp->nr_export_bitmaps; i++) {
-if (!client->contexts.bitmaps[i]) {
+if (!request->contexts->bitmaps[i]) {
 continue;
 }
 ret = nbd_co_send_bitmap(client, request,
@@ -2801,6 +2804,10 @@ static coroutine_fn int nbd_handle_request(NBDClient 
*client,
 assert(!contexts_remaining);

 return 0;
+} else if (client->contexts.count) {
+return nbd_send_generic_reply(client, request, -EINVAL,
+  "CMD_BLOCK_STATUS payload not valid",
+  errp);
 } else {
 return nbd_send_generic_reply(client, request, -EINVAL,
   "CMD_BLOCK_STATUS not negotiated",
@@ -2879,6 +2886,11 @@ static coroutine_fn void nbd_trip(void *opaque)
 } else {
 ret = nbd_handle_request(client, , req->data, _err);
 }
+if (request.contexts && request.contexts != >contexts) {
+assert(request.type == NBD_CMD_BLOCK_STATUS);
+g_free(request.contexts->bitmaps);
+g_free(request.contexts);
+}
 if (ret < 0) {
 error_prepend(_err, "Failed to send reply: ");
 goto disconnect;
-- 
2.41.0

[PULL 10/15] nbd/client: Initial support for extended headers

2023-10-05 Thread Eric Blake

Update the client code to be able to send an extended request, and
parse an extended header from the server.  Note that since we reject
any structured reply with a too-large payload, we can always normalize
a valid header back into the compact form, so that the caller need not
deal with two branches of a union.  Still, until a later patch lets
the client negotiate extended headers, the code added here should not
be reached.  Note that because of the different magic numbers, it is
just as easy to trace and then tolerate a non-compliant server sending
the wrong header reply as it would be to insist that the server is
compliant.

Signed-off-by: Eric Blake 
Message-ID: <20230925192229.3186470-21-ebl...@redhat.com>
[eblake: fix trace format]
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 include/block/nbd.h |   3 +-
 block/nbd.c |   2 +-
 nbd/client.c| 104 +---
 nbd/trace-events|   3 +-
 4 files changed, 74 insertions(+), 38 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index 8a765e78dfb..ba8724f5336 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -389,7 +389,8 @@ int nbd_init(int fd, QIOChannelSocket *sioc, NBDExportInfo 
*info,
  Error **errp);
 int nbd_send_request(QIOChannel *ioc, NBDRequest *request);
 int coroutine_fn nbd_receive_reply(BlockDriverState *bs, QIOChannel *ioc,
-   NBDReply *reply, Error **errp);
+   NBDReply *reply, NBDMode mode,
+   Error **errp);
 int nbd_client(int fd);
 int nbd_disconnect(int fd);
 int nbd_errno_to_system_errno(int err);
diff --git a/block/nbd.c b/block/nbd.c
index 22d3cb11ac8..76461430411 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -458,7 +458,7 @@ static coroutine_fn int nbd_receive_replies(BDRVNBDState 
*s, uint64_t cookie,

 /* We are under mutex and cookie is 0. We have to do the dirty work. */
 assert(s->reply.cookie == 0);
-ret = nbd_receive_reply(s->bs, s->ioc, >reply, errp);
+ret = nbd_receive_reply(s->bs, s->ioc, >reply, s->info.mode, errp);
 if (ret == 0) {
 ret = -EIO;
 error_setg(errp, "server dropped connection");
diff --git a/nbd/client.c b/nbd/client.c
index cecb0f04377..a2f253062aa 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -1346,22 +1346,29 @@ int nbd_disconnect(int fd)

 int nbd_send_request(QIOChannel *ioc, NBDRequest *request)
 {
-uint8_t buf[NBD_REQUEST_SIZE];
+uint8_t buf[NBD_EXTENDED_REQUEST_SIZE];
+size_t len;

-assert(request->mode <= NBD_MODE_STRUCTURED); /* TODO handle extended */
-assert(request->len <= UINT32_MAX);
 trace_nbd_send_request(request->from, request->len, request->cookie,
request->flags, request->type,
nbd_cmd_lookup(request->type));

-stl_be_p(buf, NBD_REQUEST_MAGIC);
 stw_be_p(buf + 4, request->flags);
 stw_be_p(buf + 6, request->type);
 stq_be_p(buf + 8, request->cookie);
 stq_be_p(buf + 16, request->from);
-stl_be_p(buf + 24, request->len);
+if (request->mode >= NBD_MODE_EXTENDED) {
+stl_be_p(buf, NBD_EXTENDED_REQUEST_MAGIC);
+stq_be_p(buf + 24, request->len);
+len = NBD_EXTENDED_REQUEST_SIZE;
+} else {
+assert(request->len <= UINT32_MAX);
+stl_be_p(buf, NBD_REQUEST_MAGIC);
+stl_be_p(buf + 24, request->len);
+len = NBD_REQUEST_SIZE;
+}

-return nbd_write(ioc, buf, sizeof(buf), NULL);
+return nbd_write(ioc, buf, len, NULL);
 }

 /* nbd_receive_simple_reply
@@ -1388,30 +1395,36 @@ static int nbd_receive_simple_reply(QIOChannel *ioc, 
NBDSimpleReply *reply,
 return 0;
 }

-/* nbd_receive_structured_reply_chunk
+/* nbd_receive_reply_chunk_header
  * Read structured reply chunk except magic field (which should be already
- * read).
+ * read).  Normalize into the compact form.
  * Payload is not read.
  */
-static int nbd_receive_structured_reply_chunk(QIOChannel *ioc,
-  NBDStructuredReplyChunk *chunk,
-  Error **errp)
+static int nbd_receive_reply_chunk_header(QIOChannel *ioc, NBDReply *chunk,
+  Error **errp)
 {
 int ret;
+size_t len;
+uint64_t payload_len;

-assert(chunk->magic == NBD_STRUCTURED_REPLY_MAGIC);
+if (chunk->magic == NBD_STRUCTURED_REPLY_MAGIC) {
+len = sizeof(chunk->structured);
+} else {
+assert(chunk->magic == NBD_EXTENDED_REPLY_MAGIC);
+len = sizeof(chunk->extended);
+}

 ret = nbd_read(ioc, (uint8_t *)chunk + sizeof(chunk->magic),
-   sizeof(*chunk) - sizeof(chunk->magic), "structured chunk",
+   len - sizeof(chunk->magic), "structured chunk",
errp);
 if (ret < 0) {
 return ret;
 }

-chunk->flags =

[PULL 15/15] nbd/server: Add FLAG_PAYLOAD support to CMD_BLOCK_STATUS

2023-10-05 Thread Eric Blake

Allow a client to request a subset of negotiated meta contexts.  For
example, a client may ask to use a single connection to learn about
both block status and dirty bitmaps, but where the dirty bitmap
queries only need to be performed on a subset of the disk; forcing the
server to compute that information on block status queries in the rest
of the disk is wasted effort (both at the server, and on the amount of
traffic sent over the wire to be parsed and ignored by the client).

Qemu as an NBD client never requests to use more than one meta
context, so it has no need to use block status payloads.  Testing this
instead requires support from libnbd, which CAN access multiple meta
contexts in parallel from a single NBD connection; an interop test
submitted to the libnbd project at the same time as this patch
demonstrates the feature working, as well as testing some corner cases
(for example, when the payload length is longer than the export
length), although other corner cases (like passing the same id
duplicated) requires a protocol fuzzer because libnbd is not wired up
to break the protocol that badly.

This also includes tweaks to 'qemu-nbd --list' to show when a server
is advertising the capability, and to the testsuite to reflect the
addition to that output.

Of note: qemu will always advertise the new feature bit during
NBD_OPT_INFO if extended headers have alreay been negotiated
(regardless of whether any NBD_OPT_SET_META_CONTEXT negotiation has
occurred); but for NBD_OPT_GO, qemu only advertises the feature if
block status is also enabled (that is, if the client does not
negotiate any contexts, then NBD_CMD_BLOCK_STATUS cannot be used, so
the feature is not advertised).

Signed-off-by: Eric Blake 
Message-ID: <20230925192229.3186470-26-ebl...@redhat.com>
[eblake: fix logic to reject unnegotiated contexts]
Signed-off-by: Eric Blake 
---
 docs/interop/nbd.txt  |   2 +-
 nbd/server.c  | 117 --
 qemu-nbd.c|   1 +
 nbd/trace-events  |   1 +
 tests/qemu-iotests/223.out|  12 +-
 tests/qemu-iotests/307.out|  10 +-
 .../tests/nbd-qemu-allocation.out |   2 +-
 7 files changed, 125 insertions(+), 20 deletions(-)

diff --git a/docs/interop/nbd.txt b/docs/interop/nbd.txt
index 9aae5e1f294..18efb251de9 100644
--- a/docs/interop/nbd.txt
+++ b/docs/interop/nbd.txt
@@ -69,4 +69,4 @@ NBD_CMD_BLOCK_STATUS for "qemu:dirty-bitmap:", NBD_CMD_CACHE
 NBD_CMD_FLAG_FAST_ZERO
 * 5.2: NBD_CMD_BLOCK_STATUS for "qemu:allocation-depth"
 * 7.1: NBD_FLAG_CAN_MULTI_CONN for shareable writable exports
-* 8.2: NBD_OPT_EXTENDED_HEADERS
+* 8.2: NBD_OPT_EXTENDED_HEADERS, NBD_FLAG_BLOCK_STATUS_PAYLOAD
diff --git a/nbd/server.c b/nbd/server.c
index 2dce9c3ad6a..859c163d19f 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -512,6 +512,9 @@ static int nbd_negotiate_handle_export_name(NBDClient 
*client, bool no_zeroes,
 if (client->mode >= NBD_MODE_STRUCTURED) {
 myflags |= NBD_FLAG_SEND_DF;
 }
+if (client->mode >= NBD_MODE_EXTENDED && client->contexts.count) {
+myflags |= NBD_FLAG_BLOCK_STAT_PAYLOAD;
+}
 trace_nbd_negotiate_new_style_size_flags(client->exp->size, myflags);
 stq_be_p(buf, client->exp->size);
 stw_be_p(buf + 8, myflags);
@@ -699,6 +702,10 @@ static int nbd_negotiate_handle_info(NBDClient *client, 
Error **errp)
 if (client->mode >= NBD_MODE_STRUCTURED) {
 myflags |= NBD_FLAG_SEND_DF;
 }
+if (client->mode >= NBD_MODE_EXTENDED &&
+(client->contexts.count || client->opt == NBD_OPT_INFO)) {
+myflags |= NBD_FLAG_BLOCK_STAT_PAYLOAD;
+}
 trace_nbd_negotiate_new_style_size_flags(exp->size, myflags);
 stq_be_p(buf, exp->size);
 stw_be_p(buf + 8, myflags);
@@ -2420,6 +2427,90 @@ static int coroutine_fn nbd_co_send_bitmap(NBDClient 
*client,
 return nbd_co_send_extents(client, request, ea, last, context_id, errp);
 }

+/*
+ * nbd_co_block_status_payload_read
+ * Called when a client wants a subset of negotiated contexts via a
+ * BLOCK_STATUS payload.  Check the payload for valid length and
+ * contents.  On success, return 0 with request updated to effective
+ * length.  If request was invalid but all payload consumed, return 0
+ * with request->len and request->contexts->count set to 0 (which will
+ * trigger an appropriate NBD_EINVAL response later on).  Return
+ * negative errno if the payload was not fully consumed.
+ */
+static int
+nbd_co_block_status_payload_read(NBDClient *client, NBDRequest *request,
+ Error **errp)
+{
+uint64_t payload_len = request->len;
+g_autofree char *buf = NULL;
+size_t count, i, nr_bitmaps;
+uint32_t id;
+
+if (payload_len > NBD_MAX_BUFFER_SIZE) {
+error_setg(errp, "len (%" PRIu64 ") is larger than max len (%u)",
+   request->len,

[PULL 08/15] nbd/server: Enable initial support for extended headers

2023-10-05 Thread Eric Blake

Time to start supporting clients that request extended headers.  Now
we can finally reach the code added across several previous patches.

Even though the NBD spec has been altered to allow us to accept
NBD_CMD_READ larger than the max payload size (provided our response
is a hole or broken up over more than one data chunk), we are not
planning to take advantage of that, and continue to cap NBD_CMD_READ
to 32M regardless of header size.

For NBD_CMD_WRITE_ZEROES and NBD_CMD_TRIM, the block layer already
supports 64-bit operations without any effort on our part.  For
NBD_CMD_BLOCK_STATUS, the client's length is a hint, and the previous
patch took care of implementing the required
NBD_REPLY_TYPE_BLOCK_STATUS_EXT.

We do not yet support clients that want to do request payload
filtering of NBD_CMD_BLOCK_STATUS; that will be added in later
patches, but is not essential for qemu as a client since qemu only
requests the single context base:allocation.

Signed-off-by: Eric Blake 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Message-ID: <20230925192229.3186470-19-ebl...@redhat.com>
---
 docs/interop/nbd.txt |  1 +
 nbd/server.c | 21 +
 2 files changed, 22 insertions(+)

diff --git a/docs/interop/nbd.txt b/docs/interop/nbd.txt
index f5ca25174a6..9aae5e1f294 100644
--- a/docs/interop/nbd.txt
+++ b/docs/interop/nbd.txt
@@ -69,3 +69,4 @@ NBD_CMD_BLOCK_STATUS for "qemu:dirty-bitmap:", NBD_CMD_CACHE
 NBD_CMD_FLAG_FAST_ZERO
 * 5.2: NBD_CMD_BLOCK_STATUS for "qemu:allocation-depth"
 * 7.1: NBD_FLAG_CAN_MULTI_CONN for shareable writable exports
+* 8.2: NBD_OPT_EXTENDED_HEADERS
diff --git a/nbd/server.c b/nbd/server.c
index 183efe27bf8..d3eed6535bf 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -482,6 +482,10 @@ static int nbd_negotiate_handle_export_name(NBDClient 
*client, bool no_zeroes,
 [10 .. 133]   reserved (0) [unless no_zeroes]
  */
 trace_nbd_negotiate_handle_export_name();
+if (client->mode >= NBD_MODE_EXTENDED) {
+error_setg(errp, "Extended headers already negotiated");
+return -EINVAL;
+}
 if (client->optlen > NBD_MAX_STRING_SIZE) {
 error_setg(errp, "Bad length received");
 return -EINVAL;
@@ -1264,6 +1268,10 @@ static int nbd_negotiate_options(NBDClient *client, 
Error **errp)
 case NBD_OPT_STRUCTURED_REPLY:
 if (length) {
 ret = nbd_reject_length(client, false, errp);
+} else if (client->mode >= NBD_MODE_EXTENDED) {
+ret = nbd_negotiate_send_rep_err(
+client, NBD_REP_ERR_EXT_HEADER_REQD, errp,
+"extended headers already negotiated");
 } else if (client->mode >= NBD_MODE_STRUCTURED) {
 ret = nbd_negotiate_send_rep_err(
 client, NBD_REP_ERR_INVALID, errp,
@@ -1280,6 +1288,19 @@ static int nbd_negotiate_options(NBDClient *client, 
Error **errp)
  errp);
 break;

+case NBD_OPT_EXTENDED_HEADERS:
+if (length) {
+ret = nbd_reject_length(client, false, errp);
+} else if (client->mode >= NBD_MODE_EXTENDED) {
+ret = nbd_negotiate_send_rep_err(
+client, NBD_REP_ERR_INVALID, errp,
+"extended headers already negotiated");
+} else {
+ret = nbd_negotiate_send_rep(client, NBD_REP_ACK, errp);
+client->mode = NBD_MODE_EXTENDED;
+}
+break;
+
 default:
 ret = nbd_opt_drop(client, NBD_REP_ERR_UNSUP, errp,
"Unsupported option %" PRIu32 " (%s)",
-- 
2.41.0

[PULL 01/15] mailmap: Fix Andrey Drobyshev author email

2023-10-05 Thread Eric Blake

From: Andrey Drobyshev 

This fixes authorship of commits 2848289168, 52b10c9c0c as the mailing
list rewrote the "From:" field in the corresponding patches.  See commit
3bd2608db7 ("maint: Add .mailmap entries for patches claiming list
authorship") for explanation.

Signed-off-by: Andrey Drobyshev 
Message-ID: <20230926102801.512107-1-andrey.drobys...@virtuozzo.com>
Reviewed-by: Eric Blake 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Eric Blake 
---
 .mailmap | 1 +
 1 file changed, 1 insertion(+)

diff --git a/.mailmap b/.mailmap
index 64ef9f4de6f..04a7feb005b 100644
--- a/.mailmap
+++ b/.mailmap
@@ -46,6 +46,7 @@ Ian McKellar  Ian McKellar via Qemu-devel 
 Julia Suvorova via Qemu-devel 

 Justin Terry (VM)  Justin Terry (VM) via Qemu-devel 

 Stefan Weil  Stefan Weil via 
+Andrey Drobyshev  Andrey Drobyshev via 


 # Next, replace old addresses by a more recent one.
 Aleksandar Markovic  

-- 
2.41.0

Re: [PATCH] MAINTANERS: Split vt82c686 out of fuloong2e

2023-10-05 Thread Philippe Mathieu-Daudé


On 5/10/23 20:18, BALATON Zoltan wrote:

It is used by other machines not just fuloong2e so put it in a
separate section and add myself as reviewer.

Signed-off-by: BALATON Zoltan 
---
By the way, PIIX4 already has a section just above where I've added
this but some files are still listed in Malta. You may want to have a
look at that.

  MAINTAINERS | 11 ---
  1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index ea91f9e804..7f0e20fde6 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1307,10 +1307,7 @@ M: Philippe Mathieu-Daudé 
  R: Jiaxun Yang 
  S: Odd Fixes
  F: hw/mips/fuloong2e.c
-F: hw/isa/vt82c686.c
  F: hw/pci-host/bonito.c
-F: hw/usb/vt82c686-uhci-pci.c
-F: include/hw/isa/vt82c686.h
  F: include/hw/pci-host/bonito.h
  F: tests/avocado/machine_mips_fuloong2e.py
  
@@ -2462,6 +2459,14 @@ S: Maintained

  F: hw/isa/piix4.c
  F: include/hw/southbridge/piix.h
  
+VIA south bridges (VT82C686B, VT8231)

+M: Philippe Mathieu-Daudé 
+R: Jiaxun Yang 
+R: BALATON Zoltan 


Feel free to add yourself as maintainer if you rather.

Reviewed-by: Philippe Mathieu-Daudé 


+F: hw/isa/vt82c686.c
+F: hw/usb/vt82c686-uhci-pci.c
+F: include/hw/isa/vt82c686.h
+
  Firmware configuration (fw_cfg)
  M: Philippe Mathieu-Daudé 
  R: Gerd Hoffmann

[QEMU][PATCH v1 0/7] Xen: support grant mappings.

2023-10-05 Thread Vikram Garhwal

Hi,
This patch series add support for grant mappings as a pseudo RAM region for Xen.

Enabling grant mappings patches(first 6) are written by Juergen in 2021.

QEMU Virtio device provides an emulated backends for Virtio frontned devices
in Xen.
Please set "iommu_platform=on" option when invoking QEMU. As this will set
VIRTIO_F_ACCESS_PLATFORM feature which will be used by virtio frontend in Xen
to know whether backend supports grants or not.

Regards,
Vikram

Juergen Gross (6):
  xen: when unplugging emulated devices skip virtio devices
  xen: add pseudo RAM region for grant mappings
  softmmu: let qemu_map_ram_ptr() use qemu_ram_ptr_length()
  xen: let xen_ram_addr_from_mapcache() return -1 in case of not found
entry
  memory: add MemoryRegion map and unmap callbacks
  xen: add map and unmap callbacks for grant region

Vikram Garhwal (1):
  hw: arm: Add grant mapping.

 hw/arm/xen_arm.c|   3 +
 hw/i386/xen/xen-hvm.c   |   3 +
 hw/i386/xen/xen_platform.c  |   8 +-
 hw/xen/xen-hvm-common.c |   4 +-
 hw/xen/xen-mapcache.c   | 206 ++--
 include/exec/memory.h   |  21 
 include/exec/ram_addr.h |   1 +
 include/hw/xen/xen-hvm-common.h |   2 +
 include/hw/xen/xen_pvdev.h  |   3 +
 include/sysemu/xen-mapcache.h   |   3 +
 softmmu/physmem.c   | 181 +---
 11 files changed, 349 insertions(+), 86 deletions(-)

-- 
2.17.1

[QEMU][PATCH v1 1/7] xen: when unplugging emulated devices skip virtio devices

2023-10-05 Thread Vikram Garhwal

From: Juergen Gross 

Virtio devices should never be unplugged at boot time, as they are
similar to pci passthrough devices.

Signed-off-by: Juergen Gross 
Signed-off-by: Vikram Garhwal 
---
 hw/i386/xen/xen_platform.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/hw/i386/xen/xen_platform.c b/hw/i386/xen/xen_platform.c
index 17457ff3de..3560eaf8c8 100644
--- a/hw/i386/xen/xen_platform.c
+++ b/hw/i386/xen/xen_platform.c
@@ -28,6 +28,7 @@
 #include "hw/ide/pci.h"
 #include "hw/pci/pci.h"
 #include "migration/vmstate.h"
+#include "hw/virtio/virtio-bus.h"
 #include "net/net.h"
 #include "trace.h"
 #include "sysemu/xen.h"
@@ -132,7 +133,8 @@ static void unplug_nic(PCIBus *b, PCIDevice *d, void *o)
 /* We have to ignore passthrough devices */
 if (pci_get_word(d->config + PCI_CLASS_DEVICE) ==
 PCI_CLASS_NETWORK_ETHERNET
-&& !pci_device_is_passthrough(d)) {
+&& !pci_device_is_passthrough(d)
+&& !qdev_get_child_bus(>qdev, TYPE_VIRTIO_BUS)) {
 object_unparent(OBJECT(d));
 }
 }
@@ -208,6 +210,10 @@ static void unplug_disks(PCIBus *b, PCIDevice *d, void 
*opaque)
 /* We have to ignore passthrough devices */
 if (pci_device_is_passthrough(d))
 return;
+/* Ignore virtio devices */
+if (qdev_get_child_bus(>qdev, TYPE_VIRTIO_BUS)) {
+return;
+}
 
 switch (pci_get_word(d->config + PCI_CLASS_DEVICE)) {
 case PCI_CLASS_STORAGE_IDE:
-- 
2.17.1

Re: [PATCH 5/7] audio: do not use first -audiodev as default audio device

2023-10-05 Thread BALATON Zoltan


On Thu, 5 Oct 2023, Paolo Bonzini wrote:

Il gio 5 ott 2023, 17:40 BALATON Zoltan  ha scritto:


Much better, thanks. Maybe some more small clarifications as below:


===
Using ``-audiodev`` to define the default audio backend (removed in 8.2)

If no audiodev property is specified, previous versions would use the
first ``-audiodev`` command line option as a fallback.  Starting with
version 8.2, audio backends created with ``-audiodev`` will only be
used by clients (sound cards, machines with embedded sound hardware, VNC)


machines with embedded sound hardware that can be set with the audiodev
machine property



-M audiodev needs to be documented in the release notes, not in removed
features.


The more places it's documented the better, peopla don't read docs anyway.


I'm still not sure users will get it without additional explanation

somewhere explicitly saying that if you now get an error with -audiodev
driver then you may now need to use -audio driver instead (hopefully the
error will say that)



Currently the error says to add audiodev=, I can change that to propose
both.


That would help as likely the error will be the only thing the users will 
come across so if it tells them what to do that's the least annoyance.


Regards,
BALATON Zoltan

[PATCH] MAINTANERS: Split vt82c686 out of fuloong2e

2023-10-05 Thread BALATON Zoltan

It is used by other machines not just fuloong2e so put it in a
separate section and add myself as reviewer.

Signed-off-by: BALATON Zoltan 
---
By the way, PIIX4 already has a section just above where I've added
this but some files are still listed in Malta. You may want to have a
look at that.

 MAINTAINERS | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index ea91f9e804..7f0e20fde6 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1307,10 +1307,7 @@ M: Philippe Mathieu-DaudÃ© 
 R: Jiaxun Yang 
 S: Odd Fixes
 F: hw/mips/fuloong2e.c
-F: hw/isa/vt82c686.c
 F: hw/pci-host/bonito.c
-F: hw/usb/vt82c686-uhci-pci.c
-F: include/hw/isa/vt82c686.h
 F: include/hw/pci-host/bonito.h
 F: tests/avocado/machine_mips_fuloong2e.py
 
@@ -2462,6 +2459,14 @@ S: Maintained
 F: hw/isa/piix4.c
 F: include/hw/southbridge/piix.h
 
+VIA south bridges (VT82C686B, VT8231)
+M: Philippe Mathieu-DaudÃ© 
+R: Jiaxun Yang 
+R: BALATON Zoltan 
+F: hw/isa/vt82c686.c
+F: hw/usb/vt82c686-uhci-pci.c
+F: include/hw/isa/vt82c686.h
+
 Firmware configuration (fw_cfg)
 M: Philippe Mathieu-DaudÃ© 
 R: Gerd Hoffmann 
-- 
2.30.9

[PATCH v4] hw/isa/vt82c686: Respect SCI interrupt assignment

2023-10-05 Thread BALATON Zoltan

According to the datasheet, SCI interrupts of the power management
function aren't routed through the PCI pins but rather directly to the
integrated PIC. The routing is configurable through the ACPI interrupt
select register at offset 0x42 in the PCI configuration space of the
power management function.

Signed-off-by: BALATON Zoltan 
---
Alternative proposal to Bernhard's patch to remove FIXME about SCI.

Apply this on top of reverting commit 4e5a20b6da9 which could also be
squashed in this patch but I let you decide if you want that as
separete commit or part of this. I did not test this beyond compiling
but I think this is all there is to it. (Overusing QOM within a chip
model does not make sense as QOM is meant to allow users to introspect
and configure device models and combine different models into new
machines eventually without modifying QEMU but this is not applicable
here within a single device model that can be considered as friend
classes so don't go oveboard with it when not needed.)

 hw/isa/vt82c686.c | 30 +-
 1 file changed, 13 insertions(+), 17 deletions(-)

diff --git a/hw/isa/vt82c686.c b/hw/isa/vt82c686.c
index 8016c71315..2eb769c7fa 100644
--- a/hw/isa/vt82c686.c
+++ b/hw/isa/vt82c686.c
@@ -140,25 +140,21 @@ static const MemoryRegionOps pm_io_ops = {
 
 static void pm_update_sci(ViaPMState *s)
 {
-int sci_level, pmsts;
+int sci_irq, pmsts;
 
 pmsts = acpi_pm1_evt_get_sts(>ar);
-sci_level = (((pmsts & s->ar.pm1.evt.en) &
-  (ACPI_BITMASK_RT_CLOCK_ENABLE |
-   ACPI_BITMASK_POWER_BUTTON_ENABLE |
-   ACPI_BITMASK_GLOBAL_LOCK_ENABLE |
-   ACPI_BITMASK_TIMER_ENABLE)) != 0);
-if (pci_get_byte(s->dev.config + PCI_INTERRUPT_PIN)) {
-/*
- * FIXME:
- * Fix device model that realizes this PM device and remove
- * this work around.
- * The device model should wire SCI and setup
- * PCI_INTERRUPT_PIN properly.
- * If PIN# = 0(interrupt pin isn't used), don't raise SCI as
- * work around.
- */
-pci_set_irq(>dev, sci_level);
+sci_irq = pci_get_byte(s->dev.config + 0x42) & 0xf;
+if (sci_irq) {
+int sci_level = (((pmsts & s->ar.pm1.evt.en) &
+  (ACPI_BITMASK_RT_CLOCK_ENABLE |
+  ACPI_BITMASK_POWER_BUTTON_ENABLE |
+  ACPI_BITMASK_GLOBAL_LOCK_ENABLE |
+  ACPI_BITMASK_TIMER_ENABLE)) != 0);
+
+if (sci_irq == 2) {
+qemu_log_mask(LOG_GUEST_ERROR, "IRQ 2 for VIA PM SCI is reserved");
+}
+via_isa_set_irq(pci_get_function_0(>dev), sci_irq, sci_level);
 }
 /* schedule a timer interruption if needed */
 acpi_pm_tmr_update(>ar, (s->ar.pm1.evt.en & ACPI_BITMASK_TIMER_ENABLE) 
&&
-- 
2.30.9

[QEMU][PATCH v1 3/7] softmmu: let qemu_map_ram_ptr() use qemu_ram_ptr_length()

2023-10-05 Thread Vikram Garhwal

From: Juergen Gross 

qemu_map_ram_ptr() and qemu_ram_ptr_length() share quite some code, so
modify qemu_ram_ptr_length() a little bit and use it for
qemu_map_ram_ptr(), too.

Signed-off-by: Juergen Gross 
Signed-off-by: Vikram Garhwal 
---
 softmmu/physmem.c | 58 +++
 1 file changed, 23 insertions(+), 35 deletions(-)

diff --git a/softmmu/physmem.c b/softmmu/physmem.c
index e182a2fa07..6e5e379dd0 100644
--- a/softmmu/physmem.c
+++ b/softmmu/physmem.c
@@ -2163,38 +2163,8 @@ void qemu_ram_remap(ram_addr_t addr, ram_addr_t length)
 }
 #endif /* !_WIN32 */
 
-/* Return a host pointer to ram allocated with qemu_ram_alloc.
- * This should not be used for general purpose DMA.  Use address_space_map
- * or address_space_rw instead. For local memory (e.g. video ram) that the
- * device owns, use memory_region_get_ram_ptr.
- *
- * Called within RCU critical section.
- */
-void *qemu_map_ram_ptr(RAMBlock *ram_block, ram_addr_t addr)
-{
-RAMBlock *block = ram_block;
-
-if (block == NULL) {
-block = qemu_get_ram_block(addr);
-addr -= block->offset;
-}
-
-if (xen_enabled() && block->host == NULL) {
-/* We need to check if the requested address is in the RAM
- * because we don't want to map the entire memory in QEMU.
- * In that case just map until the end of the page.
- */
-if (block->offset == 0) {
-return xen_map_cache(addr, 0, 0, false);
-}
-
-block->host = xen_map_cache(block->offset, block->max_length, 1, 
false);
-}
-return ramblock_ptr(block, addr);
-}
-
-/* Return a host pointer to guest's ram. Similar to qemu_map_ram_ptr
- * but takes a size argument.
+/*
+ * Return a host pointer to guest's ram.
  *
  * Called within RCU critical section.
  */
@@ -2202,7 +2172,9 @@ static void *qemu_ram_ptr_length(RAMBlock *ram_block, 
ram_addr_t addr,
  hwaddr *size, bool lock)
 {
 RAMBlock *block = ram_block;
-if (*size == 0) {
+hwaddr len = 0;
+
+if (size && *size == 0) {
 return NULL;
 }
 
@@ -2210,7 +2182,10 @@ static void *qemu_ram_ptr_length(RAMBlock *ram_block, 
ram_addr_t addr,
 block = qemu_get_ram_block(addr);
 addr -= block->offset;
 }
-*size = MIN(*size, block->max_length - addr);
+if (size) {
+*size = MIN(*size, block->max_length - addr);
+len = *size;
+}
 
 if (xen_enabled() && block->host == NULL) {
 /* We need to check if the requested address is in the RAM
@@ -2218,7 +2193,7 @@ static void *qemu_ram_ptr_length(RAMBlock *ram_block, 
ram_addr_t addr,
  * In that case just map the requested area.
  */
 if (block->offset == 0) {
-return xen_map_cache(addr, *size, lock, lock);
+return xen_map_cache(addr, len, lock, lock);
 }
 
 block->host = xen_map_cache(block->offset, block->max_length, 1, lock);
@@ -2227,6 +2202,19 @@ static void *qemu_ram_ptr_length(RAMBlock *ram_block, 
ram_addr_t addr,
 return ramblock_ptr(block, addr);
 }
 
+/*
+ * Return a host pointer to ram allocated with qemu_ram_alloc.
+ * This should not be used for general purpose DMA.  Use address_space_map
+ * or address_space_rw instead. For local memory (e.g. video ram) that the
+ * device owns, use memory_region_get_ram_ptr.
+ *
+ * Called within RCU critical section.
+ */
+void *qemu_map_ram_ptr(RAMBlock *ram_block, ram_addr_t addr)
+{
+return qemu_ram_ptr_length(ram_block, addr, NULL, false);
+}
+
 /* Return the offset of a hostpointer within a ramblock */
 ram_addr_t qemu_ram_block_host_offset(RAMBlock *rb, void *host)
 {
-- 
2.17.1

[QEMU][PATCH v1 7/7] hw: arm: Add grant mapping.

2023-10-05 Thread Vikram Garhwal

Enable grant ram mapping support for Xenpvh machine on ARM.

Signed-off-by: Vikram Garhwal 
---
 hw/arm/xen_arm.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/hw/arm/xen_arm.c b/hw/arm/xen_arm.c
index f83b983ec5..553c289720 100644
--- a/hw/arm/xen_arm.c
+++ b/hw/arm/xen_arm.c
@@ -125,6 +125,9 @@ static void xen_init_ram(MachineState *machine)
 DPRINTF("Initialized region xen.ram.hi: base 0x%llx size 0x%lx\n",
 GUEST_RAM1_BASE, ram_size[1]);
 }
+
+DPRINTF("init grant ram mapping for XEN\n");
+ram_grants = *xen_init_grant_ram();
 }
 
 void arch_handle_ioreq(XenIOState *state, ioreq_t *req)
-- 
2.17.1

[QEMU][PATCH v1 2/7] xen: add pseudo RAM region for grant mappings

2023-10-05 Thread Vikram Garhwal

From: Juergen Gross 

Add a memory region which can be used to automatically map granted
memory. It is starting at 0x8000ULL in order to be able to
distinguish it from normal RAM.

For this reason the xen.ram memory region is expanded, which has no
further impact as it is used just as a container of the real RAM
regions and now the grant region.

Signed-off-by: Juergen Gross 
Signed-off-by: Vikram Garhwal 
---
 hw/i386/xen/xen-hvm.c   |  3 ++
 hw/xen/xen-hvm-common.c |  4 +--
 hw/xen/xen-mapcache.c   | 27 ++
 include/exec/ram_addr.h |  1 +
 include/hw/xen/xen-hvm-common.h |  2 ++
 include/hw/xen/xen_pvdev.h  |  3 ++
 include/sysemu/xen-mapcache.h   |  3 ++
 softmmu/physmem.c   | 62 +
 8 files changed, 80 insertions(+), 25 deletions(-)

diff --git a/hw/i386/xen/xen-hvm.c b/hw/i386/xen/xen-hvm.c
index f42621e674..67a8a6 100644
--- a/hw/i386/xen/xen-hvm.c
+++ b/hw/i386/xen/xen-hvm.c
@@ -172,6 +172,9 @@ static void xen_ram_init(PCMachineState *pcms,
  x86ms->above_4g_mem_size);
 memory_region_add_subregion(sysmem, 0x1ULL, _hi);
 }
+
+/* Add grant mappings as a pseudo RAM region. */
+ram_grants = *xen_init_grant_ram();
 }
 
 static XenPhysmap *get_physmapping(hwaddr start_addr, ram_addr_t size)
diff --git a/hw/xen/xen-hvm-common.c b/hw/xen/xen-hvm-common.c
index 565dc39c8f..b7255977a5 100644
--- a/hw/xen/xen-hvm-common.c
+++ b/hw/xen/xen-hvm-common.c
@@ -9,7 +9,7 @@
 #include "hw/boards.h"
 #include "hw/xen/arch_hvm.h"
 
-MemoryRegion ram_memory;
+MemoryRegion ram_memory, ram_grants;
 
 void xen_ram_alloc(ram_addr_t ram_addr, ram_addr_t size, MemoryRegion *mr,
Error **errp)
@@ -26,7 +26,7 @@ void xen_ram_alloc(ram_addr_t ram_addr, ram_addr_t size, 
MemoryRegion *mr,
 return;
 }
 
-if (mr == _memory) {
+if (mr == _memory || mr == _grants) {
 return;
 }
 
diff --git a/hw/xen/xen-mapcache.c b/hw/xen/xen-mapcache.c
index f7d974677d..8115c44c00 100644
--- a/hw/xen/xen-mapcache.c
+++ b/hw/xen/xen-mapcache.c
@@ -14,7 +14,9 @@
 
 #include 
 
+#include "hw/xen/xen-hvm-common.h"
 #include "hw/xen/xen_native.h"
+#include "hw/xen/xen_pvdev.h"
 #include "qemu/bitmap.h"
 
 #include "sysemu/runstate.h"
@@ -597,3 +599,28 @@ uint8_t *xen_replace_cache_entry(hwaddr old_phys_addr,
 mapcache_unlock();
 return p;
 }
+
+MemoryRegion *xen_init_grant_ram(void)
+{
+RAMBlock *block;
+
+memory_region_init(_grants, NULL, "xen.grants",
+   XEN_MAX_VIRTIO_GRANTS * XC_PAGE_SIZE);
+block = g_malloc0(sizeof(*block));
+block->mr = _grants;
+block->used_length = XEN_MAX_VIRTIO_GRANTS * XC_PAGE_SIZE;
+block->max_length = XEN_MAX_VIRTIO_GRANTS * XC_PAGE_SIZE;
+block->fd = -1;
+block->page_size = XC_PAGE_SIZE;
+block->host = (void *)XEN_GRANT_ADDR_OFF;
+block->offset = XEN_GRANT_ADDR_OFF;
+block->flags = RAM_PREALLOC;
+ram_grants.ram_block = block;
+ram_grants.ram = true;
+ram_grants.terminates = true;
+ram_block_add_list(block);
+memory_region_add_subregion(get_system_memory(), XEN_GRANT_ADDR_OFF,
+_grants);
+
+return _grants;
+}
diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
index 90676093f5..c0b5f9a7d0 100644
--- a/include/exec/ram_addr.h
+++ b/include/exec/ram_addr.h
@@ -139,6 +139,7 @@ void qemu_ram_free(RAMBlock *block);
 int qemu_ram_resize(RAMBlock *block, ram_addr_t newsize, Error **errp);
 
 void qemu_ram_msync(RAMBlock *block, ram_addr_t start, ram_addr_t length);
+void ram_block_add_list(RAMBlock *new_block);
 
 /* Clear whole block of mem */
 static inline void qemu_ram_block_writeback(RAMBlock *block)
diff --git a/include/hw/xen/xen-hvm-common.h b/include/hw/xen/xen-hvm-common.h
index 4e9904f1a6..0d300ba898 100644
--- a/include/hw/xen/xen-hvm-common.h
+++ b/include/hw/xen/xen-hvm-common.h
@@ -17,6 +17,8 @@
 #include 
 
 extern MemoryRegion ram_memory;
+
+extern MemoryRegion ram_grants;
 extern MemoryListener xen_io_listener;
 extern DeviceListener xen_device_listener;
 
diff --git a/include/hw/xen/xen_pvdev.h b/include/hw/xen/xen_pvdev.h
index ddad4b9f36..0f1b5edfa9 100644
--- a/include/hw/xen/xen_pvdev.h
+++ b/include/hw/xen/xen_pvdev.h
@@ -80,4 +80,7 @@ int xen_pv_send_notify(struct XenLegacyDevice *xendev);
 void xen_pv_printf(struct XenLegacyDevice *xendev, int msg_level,
const char *fmt, ...)  G_GNUC_PRINTF(3, 4);
 
+#define XEN_GRANT_ADDR_OFF0x8000ULL
+#define XEN_MAX_VIRTIO_GRANTS 65536
+
 #endif /* QEMU_HW_XEN_PVDEV_H */
diff --git a/include/sysemu/xen-mapcache.h b/include/sysemu/xen-mapcache.h
index c8e7c2f6cf..f4bedb1c11 100644
--- a/include/sysemu/xen-mapcache.h
+++ b/include/sysemu/xen-mapcache.h
@@ -10,6 +10,7 @@
 #define XEN_MAPCACHE_H
 
 #include "exec/cpu-common.h"
+#include "exec/ram_addr.h"
 
 typedef hwaddr

[QEMU][PATCH v1 6/7] xen: add map and unmap callbacks for grant region

2023-10-05 Thread Vikram Garhwal

From: Juergen Gross 

Add the callbacks for mapping/unmapping guest memory via grants to the
special grant memory region.

Signed-off-by: Juergen Gross 
Signed-off-by: Vikram Garhwal 
---
 hw/xen/xen-mapcache.c | 167 +-
 softmmu/physmem.c |  11 ++-
 2 files changed, 173 insertions(+), 5 deletions(-)

diff --git a/hw/xen/xen-mapcache.c b/hw/xen/xen-mapcache.c
index 8a61c7dde6..52844a6a9d 100644
--- a/hw/xen/xen-mapcache.c
+++ b/hw/xen/xen-mapcache.c
@@ -9,6 +9,8 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/queue.h"
+#include "qemu/thread.h"
 #include "qemu/units.h"
 #include "qemu/error-report.h"
 
@@ -23,6 +25,8 @@
 #include "sysemu/xen-mapcache.h"
 #include "trace.h"
 
+#include 
+#include 
 
 //#define MAPCACHE_DEBUG
 
@@ -385,7 +389,7 @@ uint8_t *xen_map_cache(hwaddr phys_addr, hwaddr size,
 return p;
 }
 
-ram_addr_t xen_ram_addr_from_mapcache(void *ptr)
+static ram_addr_t xen_ram_addr_from_mapcache_try(void *ptr)
 {
 MapCacheEntry *entry = NULL;
 MapCacheRev *reventry;
@@ -594,10 +598,170 @@ uint8_t *xen_replace_cache_entry(hwaddr old_phys_addr,
 return p;
 }
 
+struct XENMappedGrantRegion {
+void *addr;
+unsigned int pages;
+unsigned int refs;
+unsigned int prot;
+uint32_t idx;
+QLIST_ENTRY(XENMappedGrantRegion) list;
+};
+
+static xengnttab_handle *xen_region_gnttabdev;
+static QLIST_HEAD(GrantRegionList, XENMappedGrantRegion) xen_grant_mappings =
+QLIST_HEAD_INITIALIZER(xen_grant_mappings);
+static QemuMutex xen_map_mutex;
+
+static void *xen_map_grant_dyn(MemoryRegion **mr, hwaddr addr, hwaddr *plen,
+   bool is_write, MemTxAttrs attrs)
+{
+unsigned int page_off = addr & (XC_PAGE_SIZE - 1);
+unsigned int i;
+unsigned int nrefs = (page_off + *plen + XC_PAGE_SIZE - 1) >> 
XC_PAGE_SHIFT;
+uint32_t ref = (addr - XEN_GRANT_ADDR_OFF) >> XC_PAGE_SHIFT;
+uint32_t *refs = NULL;
+unsigned int prot = PROT_READ;
+struct XENMappedGrantRegion *mgr = NULL;
+
+if (is_write) {
+prot |= PROT_WRITE;
+}
+
+qemu_mutex_lock(_map_mutex);
+
+QLIST_FOREACH(mgr, _grant_mappings, list) {
+if (mgr->idx == ref &&
+mgr->pages == nrefs &&
+(mgr->prot & prot) == prot) {
+break;
+}
+}
+if (!mgr) {
+mgr = g_new(struct XENMappedGrantRegion, 1);
+
+if (nrefs == 1) {
+refs = 
+} else {
+refs = g_new(uint32_t, nrefs);
+for (i = 0; i < nrefs; i++) {
+refs[i] = ref + i;
+}
+}
+mgr->addr = xengnttab_map_domain_grant_refs(xen_region_gnttabdev, 
nrefs,
+xen_domid, refs, prot);
+if (mgr->addr) {
+mgr->pages = nrefs;
+mgr->refs = 1;
+mgr->prot = prot;
+mgr->idx = ref;
+
+QLIST_INSERT_HEAD(_grant_mappings, mgr, list);
+} else {
+g_free(mgr);
+mgr = NULL;
+}
+} else {
+mgr->refs++;
+}
+
+qemu_mutex_unlock(_map_mutex);
+
+if (nrefs > 1) {
+g_free(refs);
+}
+
+return mgr ? mgr->addr + page_off : NULL;
+}
+
+static void xen_unmap_grant_dyn(MemoryRegion *mr, void *buffer, ram_addr_t 
addr,
+hwaddr len, bool is_write, hwaddr access_len)
+{
+unsigned int page_off = (unsigned long)buffer & (XC_PAGE_SIZE - 1);
+unsigned int nrefs = (page_off + len + XC_PAGE_SIZE - 1) >> XC_PAGE_SHIFT;
+unsigned int prot = PROT_READ;
+struct XENMappedGrantRegion *mgr = NULL;
+
+if (is_write) {
+prot |= PROT_WRITE;
+}
+
+qemu_mutex_lock(_map_mutex);
+
+QLIST_FOREACH(mgr, _grant_mappings, list) {
+if (mgr->addr == buffer - page_off &&
+mgr->pages == nrefs &&
+(mgr->prot & prot) == prot) {
+break;
+}
+}
+if (mgr) {
+mgr->refs--;
+if (!mgr->refs) {
+xengnttab_unmap(xen_region_gnttabdev, mgr->addr, nrefs);
+
+QLIST_REMOVE(mgr, list);
+g_free(mgr);
+}
+} else {
+error_report("xen_unmap_grant_dyn() trying to unmap unknown buffer");
+}
+
+qemu_mutex_unlock(_map_mutex);
+}
+
+static ram_addr_t xen_ram_addr_from_grant_cache(void *ptr)
+{
+unsigned int page_off = (unsigned long)ptr & (XC_PAGE_SIZE - 1);
+struct XENMappedGrantRegion *mgr = NULL;
+ram_addr_t raddr = RAM_ADDR_INVALID;
+
+qemu_mutex_lock(_map_mutex);
+
+QLIST_FOREACH(mgr, _grant_mappings, list) {
+if (mgr->addr == ptr - page_off) {
+break;
+}
+}
+
+if (mgr) {
+raddr = (mgr->idx << XC_PAGE_SHIFT) + page_off + XEN_GRANT_ADDR_OFF;
+}
+
+qemu_mutex_unlock(_map_mutex);
+
+return raddr;
+}
+
+ram_addr_t xen_ram_addr_from_mapcache(void *ptr)
+{
+ram_addr_t raddr;
+
+raddr =

[QEMU][PATCH v1 4/7] xen: let xen_ram_addr_from_mapcache() return -1 in case of not found entry

2023-10-05 Thread Vikram Garhwal

From: Juergen Gross 

Today xen_ram_addr_from_mapcache() will either abort() or return 0 in
case it can't find a matching entry for a pointer value. Both cases
are bad, so change that to return an invalid address instead.

Signed-off-by: Juergen Gross 
---
 hw/xen/xen-mapcache.c | 12 +++-
 1 file changed, 3 insertions(+), 9 deletions(-)

diff --git a/hw/xen/xen-mapcache.c b/hw/xen/xen-mapcache.c
index 8115c44c00..8a61c7dde6 100644
--- a/hw/xen/xen-mapcache.c
+++ b/hw/xen/xen-mapcache.c
@@ -404,13 +404,8 @@ ram_addr_t xen_ram_addr_from_mapcache(void *ptr)
 }
 }
 if (!found) {
-fprintf(stderr, "%s, could not find %p\n", __func__, ptr);
-QTAILQ_FOREACH(reventry, >locked_entries, next) {
-DPRINTF("   "HWADDR_FMT_plx" -> %p is present\n", 
reventry->paddr_index,
-reventry->vaddr_req);
-}
-abort();
-return 0;
+mapcache_unlock();
+return RAM_ADDR_INVALID;
 }
 
 entry = >entry[paddr_index % mapcache->nr_buckets];
@@ -418,8 +413,7 @@ ram_addr_t xen_ram_addr_from_mapcache(void *ptr)
 entry = entry->next;
 }
 if (!entry) {
-DPRINTF("Trying to find address %p that is not in the mapcache!\n", 
ptr);
-raddr = 0;
+raddr = RAM_ADDR_INVALID;
 } else {
 raddr = (reventry->paddr_index << MCACHE_BUCKET_SHIFT) +
  ((unsigned long) ptr - (unsigned long) entry->vaddr_base);
-- 
2.17.1

[QEMU][PATCH v1 5/7] memory: add MemoryRegion map and unmap callbacks

2023-10-05 Thread Vikram Garhwal

From: Juergen Gross 

In order to support mapping and unmapping guest memory dynamically to
and from qemu during address_space_[un]map() operations add the map()
and unmap() callbacks to MemoryRegionOps.

Those will be used e.g. for Xen grant mappings when performing guest
I/Os.

Signed-off-by: Juergen Gross 
Signed-off-by: Vikram Garhwal 
---
 include/exec/memory.h | 21 ++
 softmmu/physmem.c | 50 +--
 2 files changed, 60 insertions(+), 11 deletions(-)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index c99842d2fc..f3c62d2883 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -280,6 +280,27 @@ struct MemoryRegionOps {
 unsigned size,
 MemTxAttrs attrs);
 
+/*
+ * Dynamically create mapping. @addr is the guest address to map; @plen
+ * is the pointer to the usable length of the buffer.
+ * @mr contents can be changed in case a new memory region is created for
+ * the mapping.
+ * Returns the buffer address for accessing the data.
+ */
+void *(*map)(MemoryRegion **mr,
+ hwaddr addr,
+ hwaddr *plen,
+ bool is_write,
+ MemTxAttrs attrs);
+
+/* Unmap an area obtained via map() before. */
+void (*unmap)(MemoryRegion *mr,
+  void *buffer,
+  ram_addr_t addr,
+  hwaddr len,
+  bool is_write,
+  hwaddr access_len);
+
 enum device_endian endianness;
 /* Guest-visible constraints: */
 struct {
diff --git a/softmmu/physmem.c b/softmmu/physmem.c
index 6e5e379dd0..5f425bea1c 100644
--- a/softmmu/physmem.c
+++ b/softmmu/physmem.c
@@ -3135,6 +3135,7 @@ void *address_space_map(AddressSpace *as,
 hwaddr len = *plen;
 hwaddr l, xlat;
 MemoryRegion *mr;
+void *ptr = NULL;
 FlatView *fv;
 
 if (len == 0) {
@@ -3168,12 +3169,20 @@ void *address_space_map(AddressSpace *as,
 return bounce.buffer;
 }
 
-
 memory_region_ref(mr);
+
+if (mr->ops && mr->ops->map) {
+ptr = mr->ops->map(, addr, plen, is_write, attrs);
+}
+
 *plen = flatview_extend_translation(fv, addr, len, mr, xlat,
 l, is_write, attrs);
 fuzz_dma_read_cb(addr, *plen, mr);
-return qemu_ram_ptr_length(mr->ram_block, xlat, plen, true);
+if (ptr == NULL) {
+ptr = qemu_ram_ptr_length(mr->ram_block, xlat, plen, true);
+}
+
+return ptr;
 }
 
 /* Unmaps a memory region previously mapped by address_space_map().
@@ -3189,11 +3198,16 @@ void address_space_unmap(AddressSpace *as, void 
*buffer, hwaddr len,
 
 mr = memory_region_from_host(buffer, );
 assert(mr != NULL);
-if (is_write) {
-invalidate_and_set_dirty(mr, addr1, access_len);
-}
-if (xen_enabled()) {
-xen_invalidate_map_cache_entry(buffer);
+
+if (mr->ops && mr->ops->unmap) {
+mr->ops->unmap(mr, buffer, addr1, len, is_write, access_len);
+} else {
+if (is_write) {
+invalidate_and_set_dirty(mr, addr1, access_len);
+}
+if (xen_enabled()) {
+xen_invalidate_map_cache_entry(buffer);
+}
 }
 memory_region_unref(mr);
 return;
@@ -3266,10 +3280,18 @@ int64_t address_space_cache_init(MemoryRegionCache 
*cache,
  * doing this if we found actual RAM, which behaves the same
  * regardless of attributes; so UNSPECIFIED is fine.
  */
+if (mr->ops && mr->ops->map) {
+cache->ptr = mr->ops->map(, addr, , is_write,
+  MEMTXATTRS_UNSPECIFIED);
+}
+
 l = flatview_extend_translation(cache->fv, addr, len, mr,
 cache->xlat, l, is_write,
 MEMTXATTRS_UNSPECIFIED);
-cache->ptr = qemu_ram_ptr_length(mr->ram_block, cache->xlat, , true);
+if (!cache->ptr) {
+cache->ptr = qemu_ram_ptr_length(mr->ram_block, cache->xlat, ,
+ true);
+}
 } else {
 cache->ptr = NULL;
 }
@@ -3291,14 +3313,20 @@ void address_space_cache_invalidate(MemoryRegionCache 
*cache,
 
 void address_space_cache_destroy(MemoryRegionCache *cache)
 {
-if (!cache->mrs.mr) {
+MemoryRegion *mr = cache->mrs.mr;
+
+if (!mr) {
 return;
 }
 
-if (xen_enabled()) {
+if (mr->ops && mr->ops->unmap) {
+mr->ops->unmap(mr, cache->ptr, cache->xlat, cache->len,
+   cache->is_write, cache->len);
+} else if (xen_enabled()) {
 xen_invalidate_map_cache_entry(cache->ptr);
 }
-memory_region_unref(cache->mrs.mr);
+
+memory_region_unref(mr);
 flatview_unref(cache->fv);

Re: [PATCH v4 0/8] vhost-user: Back-end state migration

2023-10-05 Thread Stefan Hajnoczi

On Wed, Oct 04, 2023 at 02:58:56PM +0200, Hanna Czenczek wrote:
> RFC:
> https://lists.nongnu.org/archive/html/qemu-devel/2023-03/msg04263.html
> 
> v1:
> https://lists.nongnu.org/archive/html/qemu-devel/2023-04/msg01575.html
> 
> v2:
> https://lists.nongnu.org/archive/html/qemu-devel/2023-07/msg02604.html
> 
> v3:
> https://lists.nongnu.org/archive/html/qemu-devel/2023-09/msg03750.html
> 
> 
> Based-on: <20231004014532.1228637-1-stefa...@redhat.com>
>   ([PATCH v2 0/3] vhost: clean up device reset)
> 
> 
> Hi,
> 
> This v4 includes largely unchanged patches from v3.  The main
> addition/change is what came out of the discussion between Stefan and me
> around how to proceed without SUSPEND/RESUME, which is that this series
> is now based on his reset fix, and it includes more documentation
> changes.

This looks good. I posted some minor comments on the new patches.

Stefan

> 
> Changes in detail:
> 
> - Patch 1: Fall-out from the reset fix: Currently, the status byte is
>   effectively unused (qemu only uses it for resetting, which all
>   back-ends ignore; DPDK uses it to announce potential feature
>   negotiation failure, which qemu ignores).  It is also not defined what
>   exactly front-end or back-end should do with this byte, except
>   pointing at the virtio spec, which however naturally does not say how
>   this integrates with vhost-user’s RESET_DEVICE or [GS]ET_FEATURES.
>   Furthermore, there does not seem to be a use for this; we have
>   RESET_DEVICE for resetting, and we have [GS]ET_FEATURES (and
>   REPLY_ACK, which can be used on SET_FEATURES) for feature
>   negotation.
>   Therefore, deprecate the status byte, pointing to those other commands
>   instead.
> 
> - Patch 2: Patch 4 defines a suspended state for the whole back-end if
>   all vrings are stopped.  I think this should be mentioned in
>   GET_VRING_BASE, but upon trying to add it, I found that it does not
>   even mention that it stops the vring (mentioned only in the Ring
>   States section), and remembered that the whole description of both
>   GET_VRING_BASE and SET_VRING_BASE really was not helpful when trying
>   to implement a vhost-user back-end.  Took the opportunity to overhaul
>   both.
> 
> - Patch 3: This one’s from v3, but quite heavily modified.  Stefan
>   suggested consistently defining the started/stopped and
>   enabled/disabled states to be independent, and indeed doing so
>   simplifies a whole lot of stuff.  Specifically, it makes the magic
>   “enabled/disabled when started” go away.  Basically, I found this
>   change alone is enough to remove the confusion I had with the existing
>   documentation.
> 
> - Patch 4: As suggested by Stefan, just define a suspended state without
>   introducing SUSPEND.  vDPA needs SUSPEND because its GET_VRING_BASE
>   does not stop the vring, but vhost-user’s does, so we can define the
>   suspended state to be when all vrings are stopped.
> 
> - Patch 5: Reference the suspended state.
> 
> - Patches 6 through 8: Unmodified, except for them being rebase on
>   Stefan’s series.
> 
> 
> Hanna Czenczek (8):
>   vhost-user.rst: Deprecate [GS]ET_STATUS
>   vhost-user.rst: Improve [GS]ET_VRING_BASE doc
>   vhost-user.rst: Clarify enabling/disabling vrings
>   vhost-user.rst: Introduce suspended state
>   vhost-user.rst: Migrating back-end-internal state
>   vhost-user: Interface for migration state transfer
>   vhost: Add high-level state save/load functions
>   vhost-user-fs: Implement internal migration
> 
>  docs/interop/vhost-user.rst   | 318 +++---
>  include/hw/virtio/vhost-backend.h |  24 +++
>  include/hw/virtio/vhost.h | 113 +++
>  hw/virtio/vhost-user-fs.c | 101 +-
>  hw/virtio/vhost-user.c| 148 ++
>  hw/virtio/vhost.c | 241 ++
>  6 files changed, 917 insertions(+), 28 deletions(-)
> 
> -- 
> 2.41.0
> 


signature.asc
Description: PGP signature

Re: [PATCH v4 6/8] vhost-user: Interface for migration state transfer

2023-10-05 Thread Stefan Hajnoczi

On Wed, Oct 04, 2023 at 02:59:02PM +0200, Hanna Czenczek wrote:
> Add the interface for transferring the back-end's state during migration
> as defined previously in vhost-user.rst.
> 
> Reviewed-by: Stefan Hajnoczi 
> Signed-off-by: Hanna Czenczek 
> ---
>  include/hw/virtio/vhost-backend.h |  24 +
>  include/hw/virtio/vhost.h |  78 
>  hw/virtio/vhost-user.c| 148 ++
>  hw/virtio/vhost.c |  37 
>  4 files changed, 287 insertions(+)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

Re: [PATCH v4 8/8] vhost-user-fs: Implement internal migration

2023-10-05 Thread Stefan Hajnoczi

On Wed, Oct 04, 2023 at 02:59:04PM +0200, Hanna Czenczek wrote:
> A virtio-fs device's VM state consists of:
> - the virtio device (vring) state (VMSTATE_VIRTIO_DEVICE)
> - the back-end's (virtiofsd's) internal state
> 
> We get/set the latter via the new vhost operations to transfer migratory
> state.  It is its own dedicated subsection, so that for external
> migration, it can be disabled.
> 
> Reviewed-by: Stefan Hajnoczi 
> Signed-off-by: Hanna Czenczek 
> ---
>  hw/virtio/vhost-user-fs.c | 101 +-
>  1 file changed, 100 insertions(+), 1 deletion(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

Re: [PATCH v4 7/8] vhost: Add high-level state save/load functions

2023-10-05 Thread Stefan Hajnoczi

On Wed, Oct 04, 2023 at 02:59:03PM +0200, Hanna Czenczek wrote:
> vhost_save_backend_state() and vhost_load_backend_state() can be used by
> vhost front-ends to easily save and load the back-end's state to/from
> the migration stream.
> 
> Because we do not know the full state size ahead of time,
> vhost_save_backend_state() simply reads the data in 1 MB chunks, and
> writes each chunk consecutively into the migration stream, prefixed by
> its length.  EOF is indicated by a 0-length chunk.
> 
> Reviewed-by: Stefan Hajnoczi 
> Signed-off-by: Hanna Czenczek 
> ---
>  include/hw/virtio/vhost.h |  35 +++
>  hw/virtio/vhost.c | 204 ++
>  2 files changed, 239 insertions(+)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

Re: [PATCH v4 5/8] vhost-user.rst: Migrating back-end-internal state

2023-10-05 Thread Stefan Hajnoczi

On Wed, Oct 04, 2023 at 02:59:01PM +0200, Hanna Czenczek wrote:
> For vhost-user devices, qemu can migrate the virtio state, but not the
> back-end's internal state.  To do so, we need to be able to transfer
> this internal state between front-end (qemu) and back-end.
> 
> At this point, this new feature is added for the purpose of virtio-fs
> migration.  Because virtiofsd's internal state will not be too large, we
> believe it is best to transfer it as a single binary blob after the
> streaming phase.
> 
> These are the additions to the protocol:
> - New vhost-user protocol feature VHOST_USER_PROTOCOL_F_DEVICE_STATE
> - SET_DEVICE_STATE_FD function: Front-end and back-end negotiate a file
>   descriptor over which to transfer the state.
> - CHECK_DEVICE_STATE: After the state has been transferred through the
>   file descriptor, the front-end invokes this function to verify
>   success.  There is no in-band way (through the file descriptor) to
>   indicate failure, so we need to check explicitly.
> 
> Once the transfer FD has been established via SET_DEVICE_STATE_FD
> (which includes establishing the direction of transfer and migration
> phase), the sending side writes its data into it, and the reading side
> reads it until it sees an EOF.  Then, the front-end will check for
> success via CHECK_DEVICE_STATE, which on the destination side includes
> checking for integrity (i.e. errors during deserialization).
> 
> Signed-off-by: Hanna Czenczek 
> ---
>  docs/interop/vhost-user.rst | 172 
>  1 file changed, 172 insertions(+)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

Re: [PATCH v4 4/8] vhost-user.rst: Introduce suspended state

2023-10-05 Thread Stefan Hajnoczi

On Wed, Oct 04, 2023 at 02:59:00PM +0200, Hanna Czenczek wrote:
> In vDPA, GET_VRING_BASE does not stop the queried vring, which is why
> SUSPEND was introduced so that the returned index would be stable.  In
> vhost-user, it does stop the vring, so under the same reasoning, it can
> get away without SUSPEND.
> 
> Still, we do want to clarify that if the device is completely stopped,
> i.e. all vrings are stopped, the back-end should cease to modify any
> state relating to the guest.  Do this by calling it "suspended".
> 
> Suggested-by: Stefan Hajnoczi 
> Signed-off-by: Hanna Czenczek 
> ---
>  docs/interop/vhost-user.rst | 20 +++-
>  1 file changed, 19 insertions(+), 1 deletion(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

Re: [PATCH v4 3/8] vhost-user.rst: Clarify enabling/disabling vrings

2023-10-05 Thread Stefan Hajnoczi

On Wed, Oct 04, 2023 at 02:58:59PM +0200, Hanna Czenczek wrote:
> Currently, the vhost-user documentation says that rings are to be
> initialized in a disabled state when VHOST_USER_F_PROTOCOL_FEATURES is
> negotiated.  However, by the time of feature negotiation, all rings have
> already been initialized, so it is not entirely clear what this means.
> 
> At least the vhost-user-backend Rust crate's implementation interpreted
> it to mean that whenever this feature is negotiated, all rings are to
> put into a disabled state, which means that every SET_FEATURES call
> would disable all rings, effectively halting the device.  This is
> problematic because the VHOST_F_LOG_ALL feature is also set or cleared
> this way, which happens during migration.  Doing so should not halt the
> device.
> 
> Other implementations have interpreted this to mean that the device is
> to be initialized with all rings disabled, and a subsequent SET_FEATURES
> call that does not set VHOST_USER_F_PROTOCOL_FEATURES will enable all of
> them.  Here, SET_FEATURES will never disable any ring.
> 
> This interpretation does not suffer the problem of unintentionally
> halting the device whenever features are set or cleared, so it seems
> better and more reasonable.
> 
> We can clarify this in the documentation by making it explicit that the
> enabled/disabled state is tracked even while the vring is stopped.
> Every vring is initialized in a disabled state, and SET_FEATURES without
> VHOST_USER_F_PROTOCOL_FEATURES simply becomes one way to enable all
> vrings.
> 
> Signed-off-by: Hanna Czenczek 
> ---
>  docs/interop/vhost-user.rst | 32 +---
>  1 file changed, 17 insertions(+), 15 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

[PATCH 0/2] topic: meson: add more compiler hardening flags

2023-10-05 Thread Daniel P . Berrangé

This brings more compiler hardening flags to the default QEMU
build process. The proposed flags have already been adopted by
default in the kernel build process. At some point it is hoped
that distros might enable them globally, as they've done in
the past with things like _FORTIFY_SOURCE. Meanwhile they are
easy things to enable in QEMU which have negligible cost and
clear benefits to hardening. Considering QEMU shows no signs
of stoppping the flow of guest triggerable CVEs, investing in
hardening is worthwhile. See the respective commit messages
for details

I also tested enabling -ftrapv, to change signed integer
overflow from wrapping, to trapping instead. This exposed a
bug in the string-input-visitor which overflows when parsing
ranges, and exposed the test-int128 code as (harmlessly)
overflowing during its testing. Both can be fixed, but I'm
not entirely sure whether -ftrapv is viable or not. I was
wondering about TCG and whether it has a need to intentionally
allow integer overflow for any of its instruction emulation
requirements ?  "make check" passes qtest but that's not
sufficiently broad to make me comfortable. Thus I've not
included an -ftrapv patch here.

Daniel P. Berrangé (2):
  meson: mitigate against ROP exploits with -fzero-call-used-regs
  meson: mitigate against use of uninitialize stack for exploits

 meson.build | 16 
 1 file changed, 16 insertions(+)

-- 
2.41.0

Re: [PATCH v4 2/8] vhost-user.rst: Improve [GS]ET_VRING_BASE doc

2023-10-05 Thread Stefan Hajnoczi

On Wed, Oct 04, 2023 at 02:58:58PM +0200, Hanna Czenczek wrote:
> GET_VRING_BASE does not mention that it stops the respective ring.  Fix
> that.
> 
> Furthermore, it is not fully clear what the "base offset" these
> commands' documentation refers to is; an offset could be many things.
> Be more precise and verbose about it, especially given that these
> commands use different payload structures depending on whether the vring
> is split or packed.
> 
> Signed-off-by: Hanna Czenczek 
> ---
>  docs/interop/vhost-user.rst | 66 ++---
>  1 file changed, 62 insertions(+), 4 deletions(-)
> 
> diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
> index 2f68e67a1a..50f5acebe5 100644
> --- a/docs/interop/vhost-user.rst
> +++ b/docs/interop/vhost-user.rst
> @@ -108,6 +108,37 @@ A vring state description
>  
>  :num: a 32-bit number
>  
> +A vring descriptor index for split virtqueues
> +^
> +
> ++-+-+
> +| vring index | index in avail ring |
> ++-+-+
> +
> +:vring index: 32-bit index of the respective virtqueue
> +
> +:index in avail ring: 32-bit value, of which currently only the lower 16
> +  bits are used:
> +
> +  - Bits 0–15: Next descriptor index in the *Available Ring*

I think we need to say more to make this implementable just by reading
the spec:

  Index of the next *Available Ring* descriptor that the back-end will
  process. This is a free-running index that is not wrapped by the ring
  size.

Feel free to rephrase.

> +  - Bits 16–31: Reserved (set to zero)
> +
> +Vring descriptor indices for packed virtqueues
> +^^
> +
> ++-++
> +| vring index | descriptor indices |
> ++-++
> +
> +:vring index: 32-bit index of the respective virtqueue
> +
> +:descriptor indices: 32-bit value:
> +
> +  - Bits 0–14: Index in the *Available Ring*

Same here.

> +  - Bit 15: Driver (Available) Ring Wrap Counter
> +  - Bits 16–30: Index in the *Used Ring*

Same here.

> +  - Bit 31: Device (Used) Ring Wrap Counter
> +
>  A vring address description
>  ^^^
>  
> @@ -1031,18 +1062,45 @@ Front-end message types
>  ``VHOST_USER_SET_VRING_BASE``
>:id: 10
>:equivalent ioctl: ``VHOST_SET_VRING_BASE``
> -  :request payload: vring state description
> +  :request payload: vring descriptor index/indices
>:reply payload: N/A
>  
> -  Sets the base offset in the available vring.
> +  Sets the next index to use for descriptors in this vring:
> +
> +  * For a split virtqueue, sets only the next descriptor index in the
> +*Available Ring*.  The device is supposed to read the next index in
> +the *Used Ring* from the respective vring structure in guest memory.
> +
> +  * For a packed virtqueue, both indices are supplied, as they are not
> +explicitly available in memory.
> +
> +  Consequently, the payload type is specific to the type of virt queue
> +  (*a vring descriptor index for split virtqueues* vs. *vring descriptor
> +  indices for packed virtqueues*).
>  
>  ``VHOST_USER_GET_VRING_BASE``
>:id: 11
>:equivalent ioctl: ``VHOST_USER_GET_VRING_BASE``
>:request payload: vring state description
> -  :reply payload: vring state description
> +  :reply payload: vring descriptor index/indices
> +
> +  Stops the vring and returns the current descriptor index or indices:
> +
> +* For a split virtqueue, returns only the 16-bit next descriptor
> +  index in the *Available Ring*.  The index in the *Used Ring* is
> +  controlled by the guest driver and can be read from the vring

I find "is controlled by the guest driver" confusing. The device writes
the Used Ring index. The driver only reads it. The device is the active
party here.

The sentence can be shortened to omit the "controlled by the guest
driver" part.

> +  structure in memory, so is not covered.
> +
> +* For a packed virtqueue, neither index is explicitly available to
> +  read from memory, so both indices (as maintained by the device) are
> +  returned.
> +
> +  Consequently, the payload type is specific to the type of virt queue
> +  (*a vring descriptor index for split virtqueues* vs. *vring descriptor
> +  indices for packed virtqueues*).
>  
> -  Get the available vring base offset.
> +  The request payload’s *num* field is currently reserved and must be
> +  set to 0.
>  
>  ``VHOST_USER_SET_VRING_KICK``
>:id: 12
> -- 
> 2.41.0
> 


signature.asc
Description: PGP signature

[PATCH 1/2] meson: mitigate against ROP exploits with -fzero-call-used-regs

2023-10-05 Thread Daniel P . Berrangé

To quote wikipedia:

  "Return-oriented programming (ROP) is a computer security exploit
   technique that allows an attacker to execute code in the presence
   of security defenses such as executable space protection and code
   signing.

   In this technique, an attacker gains control of the call stack to
   hijack program control flow and then executes carefully chosen
   machine instruction sequences that are already present in the
   machine's memory, called "gadgets". Each gadget typically ends in
   a return instruction and is located in a subroutine within the
   existing program and/or shared library code. Chained together,
   these gadgets allow an attacker to perform arbitrary operations
   on a machine employing defenses that thwart simpler attacks."

QEMU is by no means perfect with an ever growing set of CVEs from
flawed hardware device emulation, which could potentially be
exploited using ROP techniques.

Since GCC 11 there has been a compiler option that can mitigate
against this exploit technique:

-fzero-call-user-regs

To understand it refer to these two resources:

   https://www.jerkeby.se/newsletter/posts/rop-reduction-zero-call-user-regs/
   https://gcc.gnu.org/pipermail/gcc-patches/2020-August/552262.html

I used two programs to scan qemu-system-x86_64 for ROP gadgets:

  https://github.com/0vercl0k/rp
  https://github.com/JonathanSalwan/ROPgadget

When asked to find 8 byte gadgets, the 'rp' tool reports:

  A total of 440278 gadgets found.
  You decided to keep only the unique ones, 156143 unique gadgets found.

While the ROPgadget tool reports:

  Unique gadgets found: 353122

With the --ropchain argument, the latter attempts to use the found
gadgets to product a chain that can execute arbitrary syscalls. With
current QEMU it succeeds in this task, which is an undesirable
situation.

With QEMU modified to use -fzero-call-user-regs=used-gpr the 'rp' tool
reports

  A total of 528991 gadgets found.
  You decided to keep only the unique ones, 121128 unique gadgets found.

This is 22% fewer unique gadgets

While the ROPgadget tool reports:

  Unique gadgets found: 328605

This is 7% fewer unique gadgets. Crucially though, despite this more
modest reduction, the ROPgadget tool is no longer able to identify a
chain of gadgets for executing arbitrary syscalls. It fails at the
very first step, unable to find gadgets for populating registers for
a future syscall. Having said that, more advanced tools do still
manage to put together a viable ROP chain.

Also this only takes into account QEMU code. QEMU links to many 3rd
party shared libraries and ideally all of them would be compiled with
this same hardening. That becomes a distro policy question though.

In terms of performance impact, TCG was used as an evaluation test
case. We're not interested in protecting TCG since it isn't designed
to provide a security barrier, but it is performance sensitive code,
so useful as a guide to how other areas of QEMU might be impacted.
With the -fzero-call-user-regs=used-gpr argument present, using the
real world test of booting a linux kernel and having init immediately
poweroff, there is a ~1% slow down in performance under TCG. The QEMU
binary size also grows by approximately 1%.

By comparison, using the more aggressive -fzero-call-user-regs=all,
results in a slowdown of over 25% in TCG, which is clearly not an
acceptable impact, and a binary size increase of 5%.

Considering that 'used-gpr' succesfully stopped ROPgadget assembling
a chain, this more targetted protection is a justifiable hardening
/ performance tradeoff.

Signed-off-by: Daniel P. Berrangé 
---
 meson.build | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/meson.build b/meson.build
index 20ceeb8158..2003ca1ba4 100644
--- a/meson.build
+++ b/meson.build
@@ -435,6 +435,17 @@ if get_option('fuzzing')
   endif
 endif
 
+# Check further flags that make QEMU more robust against malicious parties
+
+hardening_flags = [
+# Zero out registers used during a function call
+# upon its return. This makes it harder to assemble
+# ROP gadgets into something usable
+'-fzero-call-used-regs=used-gpr',
+]
+
+qemu_common_flags += cc.get_supported_arguments(hardening_flags)
+
 add_global_arguments(qemu_common_flags, native: false, language: all_languages)
 add_global_link_arguments(qemu_ldflags, native: false, language: all_languages)
 
-- 
2.41.0

[PATCH 2/2] meson: mitigate against use of uninitialize stack for exploits

2023-10-05 Thread Daniel P . Berrangé

When variables are used without being initialized, there is potential
to take advantage of data that was pre-existing on the stack from an
earlier call, to drive an exploit.

It is good practice to always initialize variables, and the compiler
can warn about flaws when -Wuninitialized is present. This warning,
however, is by no means foolproof with its output varying depending
on compiler version and which optimizations are enabled.

The -ftrivial-auto-var-init option can be used to tell the compiler
to always initialize all variables. This increases the security and
predictability of the program, closing off certain attack vectors,
reducing the risk of unsafe memory disclosure.

While the option takes several possible values, using 'zero' is
considered to be the  option that is likely to lead to semantically
correct or safe behaviour[1]. eg sizes/indexes are not likely to
lead to out-of-bounds accesses when initialized to zero. Pointers
are less likely to point something useful if initialized to zero.

Even with -ftrivial-auto-var-init=zero set, GCC will still issue
warnings with -Wuninitialized if it discovers a problem, so we are
not loosing diagnostics for developers, just hardening runtime
behaviour and making QEMU behave more predictably in case of hitting
bad codepaths.

[1] https://lists.llvm.org/pipermail/cfe-dev/2020-April/065221.html
Signed-off-by: Daniel P. Berrangé 
---
 meson.build | 5 +
 1 file changed, 5 insertions(+)

diff --git a/meson.build b/meson.build
index 2003ca1ba4..19faea8d30 100644
--- a/meson.build
+++ b/meson.build
@@ -442,6 +442,11 @@ hardening_flags = [
 # upon its return. This makes it harder to assemble
 # ROP gadgets into something usable
 '-fzero-call-used-regs=used-gpr',
+
+# Initialize all stack variables to zero. This makes
+# it harder to take advantage of uninitialized stack
+# data to drive exploits
+'-ftrivial-var-auto-init=zero',
 ]
 
 qemu_common_flags += cc.get_supported_arguments(hardening_flags)
-- 
2.41.0

[PATCH v25 19/21] tests/avocado: s390x cpu topology test socket full

2023-10-05 Thread Nina Schoetterl-Glausch

From: Pierre Morel 

This test verifies that QMP set-cpu-topology does not accept
to overload a socket.

Signed-off-by: Pierre Morel 
Reviewed-by: Thomas Huth 
---
 tests/avocado/s390_topology.py | 26 ++
 1 file changed, 26 insertions(+)

diff --git a/tests/avocado/s390_topology.py b/tests/avocado/s390_topology.py
index 3661048f4c..a63c2b2923 100644
--- a/tests/avocado/s390_topology.py
+++ b/tests/avocado/s390_topology.py
@@ -338,3 +338,29 @@ def test_dedicated(self):
 self.guest_set_dispatching('0');
 self.check_topology(0, 0, 0, 0, 'high', True)
 self.check_polarization("horizontal")
+
+
+def test_socket_full(self):
+"""
+This test verifies that QEMU does not accept to overload a socket.
+The socket-id 0 on book-id 0 already contains CPUs 0 and 1 and can
+not accept any new CPU while socket-id 0 on book-id 1 is free.
+
+:avocado: tags=arch:s390x
+:avocado: tags=machine:s390-ccw-virtio
+"""
+self.kernel_init()
+self.vm.add_args('-smp',
+ '3,drawers=2,books=2,sockets=3,cores=2,maxcpus=24')
+self.vm.launch()
+self.wait_until_booted()
+
+self.system_init()
+
+res = self.vm.qmp('set-cpu-topology',
+  {'core-id': 2, 'socket-id': 0, 'book-id': 0})
+self.assertEqual(res['error']['class'], 'GenericError')
+
+res = self.vm.qmp('set-cpu-topology',
+  {'core-id': 2, 'socket-id': 0, 'book-id': 1})
+self.assertEqual(res['return'], {})
-- 
2.39.2

[PATCH v25 20/21] tests/avocado: s390x cpu topology dedicated errors

2023-10-05 Thread Nina Schoetterl-Glausch

From: Pierre Morel 

Let's test that QEMU refuses to setup a dedicated CPU with
low or medium entitlement.

Signed-off-by: Pierre Morel 
Reviewed-by: Thomas Huth 
---
 tests/avocado/s390_topology.py | 48 ++
 1 file changed, 48 insertions(+)

diff --git a/tests/avocado/s390_topology.py b/tests/avocado/s390_topology.py
index a63c2b2923..d3e6556c0f 100644
--- a/tests/avocado/s390_topology.py
+++ b/tests/avocado/s390_topology.py
@@ -364,3 +364,51 @@ def test_socket_full(self):
 res = self.vm.qmp('set-cpu-topology',
   {'core-id': 2, 'socket-id': 0, 'book-id': 1})
 self.assertEqual(res['return'], {})
+
+def test_dedicated_error(self):
+"""
+This test verifies that QEMU refuses to lower the entitlement
+of a dedicated CPU
+
+:avocado: tags=arch:s390x
+:avocado: tags=machine:s390-ccw-virtio
+"""
+self.kernel_init()
+self.vm.launch()
+self.wait_until_booted()
+
+self.system_init()
+
+res = self.vm.qmp('set-cpu-topology',
+  {'core-id': 0, 'dedicated': True})
+self.assertEqual(res['return'], {})
+
+self.check_topology(0, 0, 0, 0, 'high', True)
+
+self.guest_set_dispatching('1');
+
+self.check_topology(0, 0, 0, 0, 'high', True)
+
+res = self.vm.qmp('set-cpu-topology',
+  {'core-id': 0, 'entitlement': 'low', 'dedicated': 
True})
+self.assertEqual(res['error']['class'], 'GenericError')
+
+res = self.vm.qmp('set-cpu-topology',
+  {'core-id': 0, 'entitlement': 'low'})
+self.assertEqual(res['error']['class'], 'GenericError')
+
+res = self.vm.qmp('set-cpu-topology',
+  {'core-id': 0, 'entitlement': 'medium', 'dedicated': 
True})
+self.assertEqual(res['error']['class'], 'GenericError')
+
+res = self.vm.qmp('set-cpu-topology',
+  {'core-id': 0, 'entitlement': 'medium'})
+self.assertEqual(res['error']['class'], 'GenericError')
+
+res = self.vm.qmp('set-cpu-topology',
+  {'core-id': 0, 'entitlement': 'low', 'dedicated': 
False})
+self.assertEqual(res['return'], {})
+
+res = self.vm.qmp('set-cpu-topology',
+  {'core-id': 0, 'entitlement': 'medium', 'dedicated': 
False})
+self.assertEqual(res['return'], {})
-- 
2.39.2

1 2 3 >

1 - 100 of 283 matches

Mail list logo