Re: [PATCH v2 2/3] target/hexagon: fix some occurrences of -Wshadow=local
On 6/10/23 00:22, Brian Cain wrote: Of the changes in this commit, the changes in `HELPER(commit_hvx_stores)()` are less obvious. They are required because of some macro invocations like SCATTER_OP_WRITE_TO_MEM(). e.g.: In file included from ../target/hexagon/op_helper.c:31: ../target/hexagon/mmvec/macros.h:205:18: error: declaration of ‘i’ shadows a previous local [-Werror=shadow=compatible-local] 205 | for (int i = 0; i < sizeof(MMVector); i += sizeof(TYPE)) { \ | ^ ../target/hexagon/op_helper.c:157:17: note: in expansion of macro ‘SCATTER_OP_WRITE_TO_MEM’ 157 | SCATTER_OP_WRITE_TO_MEM(uint16_t); | ^~~ ../target/hexagon/op_helper.c:135:9: note: shadowed declaration is here 135 | int i; | ^ In file included from ../target/hexagon/op_helper.c:31: ../target/hexagon/mmvec/macros.h:204:19: error: declaration of ‘ra’ shadows a previous local [-Werror=shadow=compatible-local] 204 | uintptr_t ra = GETPC(); \ | ^~ ../target/hexagon/op_helper.c:160:17: note: in expansion of macro ‘SCATTER_OP_WRITE_TO_MEM’ 160 | SCATTER_OP_WRITE_TO_MEM(uint32_t); | ^~~ ../target/hexagon/op_helper.c:134:15: note: shadowed declaration is here 134 | uintptr_t ra = GETPC(); | ^~ Reviewed-by: Matheus Tavares Bernardino Signed-off-by: Brian Cain --- target/hexagon/imported/alu.idef | 6 +++--- target/hexagon/mmvec/macros.h| 6 +++--- target/hexagon/op_helper.c | 9 +++-- target/hexagon/translate.c | 9 - 4 files changed, 13 insertions(+), 17 deletions(-) Reviewed-by: Philippe Mathieu-Daudé
[PATCH] target/ppc: Clean up local variable shadowing in kvm_arch_*_registers()
Remove extra 'i' variable to fix this warning : ../target/ppc/kvm.c: In function ‘kvm_arch_put_registers’: ../target/ppc/kvm.c:963:13: warning: declaration of ‘i’ shadows a previous local [-Wshadow=compatible-local] 963 | int i; | ^ ../target/ppc/kvm.c:906:9: note: shadowed declaration is here 906 | int i; | ^ ../target/ppc/kvm.c: In function ‘kvm_arch_get_registers’: ../target/ppc/kvm.c:1265:13: warning: declaration of ‘i’ shadows a previous local [-Wshadow=compatible-local] 1265 | int i; | ^ ../target/ppc/kvm.c:1212:9: note: shadowed declaration is here 1212 | int i, ret; | ^ Signed-off-by: Cédric Le Goater --- target/ppc/kvm.c | 4 1 file changed, 4 deletions(-) diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c index 51112bd3670d..d0e2dcdc7734 100644 --- a/target/ppc/kvm.c +++ b/target/ppc/kvm.c @@ -960,8 +960,6 @@ int kvm_arch_put_registers(CPUState *cs, int level) } if (cap_one_reg) { -int i; - /* * We deliberately ignore errors here, for kernels which have * the ONE_REG calls, but don't support the specific @@ -1262,8 +1260,6 @@ int kvm_arch_get_registers(CPUState *cs) } if (cap_one_reg) { -int i; - /* * We deliberately ignore errors here, for kernels which have * the ONE_REG calls, but don't support the specific -- 2.41.0
Re: [PATCH qemu] timer/i8254: Fix one shot PIT mode
On Fri, Oct 06, 2023 at 02:36:52AM +, Damien Zammit wrote: > >From: Michael Tokarev > >26.02.2023 04:58, Damien Zammit wrote: > >> Currently, the one-shot (mode 1) PIT expires far too quickly, > >> due to the output being set under the wrong logic. > >> This change fixes the one-shot PIT mode to behave similarly to mode 0. > >> > >> TESTED: using the one-shot PIT mode to calibrate a local apic timer. > > > >Has this been forgotten, or is it not needed anymore? > > This is still required but nobody uses the > PIT one-shot mode (probably because it *is* currently broken). > > Can it be merged? > > Thanks, > Damien OK, I tagged it. thanks!
[PATCH] tap-win32: Remove unnecessary stubs
Some of them are only necessary for POSIX systems. The others are assigned to function pointers in NetClientInfo that can actually be NULL. Signed-off-by: Akihiko Odaki --- net/tap-win32.c | 54 - 1 file changed, 54 deletions(-) diff --git a/net/tap-win32.c b/net/tap-win32.c index f327d62ab0..7edbd71633 100644 --- a/net/tap-win32.c +++ b/net/tap-win32.c @@ -707,70 +707,16 @@ static void tap_win32_send(void *opaque) } } -static bool tap_has_ufo(NetClientState *nc) -{ -return false; -} - -static bool tap_has_vnet_hdr(NetClientState *nc) -{ -return false; -} - -int tap_probe_vnet_hdr_len(int fd, int len) -{ -return 0; -} - -void tap_fd_set_vnet_hdr_len(int fd, int len) -{ -} - -int tap_fd_set_vnet_le(int fd, int is_le) -{ -return -EINVAL; -} - -int tap_fd_set_vnet_be(int fd, int is_be) -{ -return -EINVAL; -} - -static void tap_using_vnet_hdr(NetClientState *nc, bool using_vnet_hdr) -{ -} - -static void tap_set_offload(NetClientState *nc, int csum, int tso4, - int tso6, int ecn, int ufo) -{ -} - struct vhost_net *tap_get_vhost_net(NetClientState *nc) { return NULL; } -static bool tap_has_vnet_hdr_len(NetClientState *nc, int len) -{ -return false; -} - -static void tap_set_vnet_hdr_len(NetClientState *nc, int len) -{ -abort(); -} - static NetClientInfo net_tap_win32_info = { .type = NET_CLIENT_DRIVER_TAP, .size = sizeof(TAPState), .receive = tap_receive, .cleanup = tap_cleanup, -.has_ufo = tap_has_ufo, -.has_vnet_hdr = tap_has_vnet_hdr, -.has_vnet_hdr_len = tap_has_vnet_hdr_len, -.using_vnet_hdr = tap_using_vnet_hdr, -.set_offload = tap_set_offload, -.set_vnet_hdr_len = tap_set_vnet_hdr_len, }; static int tap_win32_init(NetClientState *peer, const char *model, -- 2.42.0
Re: [PATCH] scripts/xml-preprocess: Make sure this script is invoked via the right Python
On Fri, Oct 6, 2023 at 8:53 AM Thomas Huth wrote: > > If a script is executable and has a shebang line, Meson treats it as > a normal executable, so that this script here is run via the "python3" > binary in the $PATH. However, "python3" might not be in the $PATH at > all, or it might be a wrong version, so we should make sure to run > this script via the Python version that has been chosen for the QEMU > build process. The best way to do this is to remove the executable bit > from the access mode bits. (See also commit 4b424c757188f7a4) > > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1918 > Signed-off-by: Thomas Huth Reviewed-by: Marc-André Lureau thanks > --- > scripts/xml-preprocess.py | 0 > 1 file changed, 0 insertions(+), 0 deletions(-) > mode change 100755 => 100644 scripts/xml-preprocess.py > > diff --git a/scripts/xml-preprocess.py b/scripts/xml-preprocess.py > old mode 100755 > new mode 100644 > -- > 2.41.0 >
[PATCH] scripts/xml-preprocess: Make sure this script is invoked via the right Python
If a script is executable and has a shebang line, Meson treats it as a normal executable, so that this script here is run via the "python3" binary in the $PATH. However, "python3" might not be in the $PATH at all, or it might be a wrong version, so we should make sure to run this script via the Python version that has been chosen for the QEMU build process. The best way to do this is to remove the executable bit from the access mode bits. (See also commit 4b424c757188f7a4) Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1918 Signed-off-by: Thomas Huth --- scripts/xml-preprocess.py | 0 1 file changed, 0 insertions(+), 0 deletions(-) mode change 100755 => 100644 scripts/xml-preprocess.py diff --git a/scripts/xml-preprocess.py b/scripts/xml-preprocess.py old mode 100755 new mode 100644 -- 2.41.0
Re: [PATCH v4 3/3] hw/vfio: add ramfb migration support
On 10/5/23 18:34, Cédric Le Goater wrote: > On 10/5/23 13:30, marcandre.lur...@redhat.com wrote: >> From: Marc-André Lureau >> >> Add a "VFIODisplay" subsection whenever "x-ramfb-migrate" is turned on. >> >> Turn it off by default on machines <= 8.1 for compatibility reasons. >> >> Signed-off-by: Marc-André Lureau >> --- >> hw/vfio/pci.h | 3 +++ >> hw/core/machine.c | 1 + >> hw/vfio/display.c | 20 >> hw/vfio/pci.c | 44 >> stubs/ramfb.c | 2 ++ >> 5 files changed, 70 insertions(+) >> >> diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h >> index 2d836093a8..fd06695542 100644 >> --- a/hw/vfio/pci.h >> +++ b/hw/vfio/pci.h >> @@ -173,6 +173,7 @@ struct VFIOPCIDevice { >> bool no_kvm_ioeventfd; >> bool no_vfio_ioeventfd; >> bool enable_ramfb; >> + OnOffAuto ramfb_migrate; >> bool defer_kvm_irq_routing; >> bool clear_parent_atomics_on_exit; >> VFIODisplay *dpy; >> @@ -226,4 +227,6 @@ void vfio_display_reset(VFIOPCIDevice *vdev); >> int vfio_display_probe(VFIOPCIDevice *vdev, Error **errp); >> void vfio_display_finalize(VFIOPCIDevice *vdev); >> +extern const VMStateDescription vfio_display_vmstate; >> + >> #endif /* HW_VFIO_VFIO_PCI_H */ >> diff --git a/hw/core/machine.c b/hw/core/machine.c >> index e4361e3d48..f2c59a293c 100644 >> --- a/hw/core/machine.c >> +++ b/hw/core/machine.c >> @@ -33,6 +33,7 @@ >> GlobalProperty hw_compat_8_1[] = { >> { "ramfb", "x-migrate", "off" }, >> + { "vfio-pci-nohotplug", "x-ramfb-migrate", "off" } >> }; >> const size_t hw_compat_8_1_len = G_N_ELEMENTS(hw_compat_8_1); >> diff --git a/hw/vfio/display.c b/hw/vfio/display.c >> index bec864f482..0bdb807642 100644 >> --- a/hw/vfio/display.c >> +++ b/hw/vfio/display.c >> @@ -542,3 +542,23 @@ void vfio_display_finalize(VFIOPCIDevice *vdev) >> vfio_display_edid_exit(vdev->dpy); >> g_free(vdev->dpy); >> } >> + >> +static bool migrate_needed(void *opaque) >> +{ >> + VFIODisplay *dpy = opaque; >> + bool ramfb_exists = dpy->ramfb != NULL; >> + >> + /* see vfio_display_migration_needed() */ >> + assert(ramfb_exists); >> + return ramfb_exists; >> +} >> + >> +const VMStateDescription vfio_display_vmstate = { >> + .name = "VFIODisplay", >> + .version_id = 1, >> + .minimum_version_id = 1, >> + .needed = migrate_needed, >> + .fields = (VMStateField[]) { >> + VMSTATE_STRUCT_POINTER(ramfb, VFIODisplay, ramfb_vmstate, >> RAMFBState), >> + } >> +}; >> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c >> index 3b2ca3c24c..d2ede2f1a2 100644 >> --- a/hw/vfio/pci.c >> +++ b/hw/vfio/pci.c >> @@ -2608,6 +2608,32 @@ static bool vfio_msix_present(void *opaque, int >> version_id) >> return msix_present(pdev); >> } >> +static bool vfio_display_migration_needed(void *opaque) >> +{ >> + VFIOPCIDevice *vdev = opaque; >> + >> + /* >> + * We need to migrate the VFIODisplay object if ramfb *migration* >> was >> + * explicitly requested (in which case we enforced both ramfb=on and >> + * display=on), or ramfb migration was left at the default "auto" >> + * setting, and *ramfb* was explicitly requested (in which case we >> + * enforced display=on). >> + */ >> + return vdev->ramfb_migrate == ON_OFF_AUTO_ON || >> + (vdev->ramfb_migrate == ON_OFF_AUTO_AUTO && >> vdev->enable_ramfb);> +} >> + >> +const VMStateDescription vmstate_vfio_display = { >> + .name = "VFIOPCIDevice/VFIODisplay", >> + .version_id = 1, >> + .minimum_version_id = 1, >> + .needed = vfio_display_migration_needed, >> + .fields = (VMStateField[]){ >> + VMSTATE_STRUCT_POINTER(dpy, VFIOPCIDevice, >> vfio_display_vmstate, VFIODisplay), >> + VMSTATE_END_OF_LIST() >> + } >> +}; >> + >> const VMStateDescription vmstate_vfio_pci_config = { >> .name = "VFIOPCIDevice", >> .version_id = 1, >> @@ -2616,6 +2642,10 @@ const VMStateDescription >> vmstate_vfio_pci_config = { >> VMSTATE_PCI_DEVICE(pdev, VFIOPCIDevice), >> VMSTATE_MSIX_TEST(pdev, VFIOPCIDevice, vfio_msix_present), >> VMSTATE_END_OF_LIST() >> + }, >> + .subsections = (const VMStateDescription*[]) { >> + _vfio_display, >> + NULL >> } >> }; >> @@ -3271,6 +3301,19 @@ static void vfio_realize(PCIDevice *pdev, >> Error **errp) >> } >> } >> + if (vdev->ramfb_migrate == ON_OFF_AUTO_ON && >> !vdev->enable_ramfb) { >> + error_setg(errp, "x-ramfb-migrate requires ramfb=on"); > > This is a case where QEMU would be run from the command line. Could we > ouput a warning and force "ramfb_migrate" to OFF in that case ? since > the machine would still run. Sounds like a valid idea to me: - consistency between x-ramfb-migrate and ramfb would be maintained - x-* properties are not meant as user interface, so QEMU doesn't guarantee anything about them, AIUI Laszlo > > Thanks, > > C. > > >> +
[PATCH] hw/ide: Add command to sync cache on ATAPI
Bumping this thread[1] as it seems to have gotten lost: [1] https://lists.gnu.org/archive/html/qemu-block/2022-03/msg00586.html Damien
Re: [PATCH qemu] timer/i8254: Fix one shot PIT mode
>From: Michael Tokarev >26.02.2023 04:58, Damien Zammit wrote: >> Currently, the one-shot (mode 1) PIT expires far too quickly, >> due to the output being set under the wrong logic. >> This change fixes the one-shot PIT mode to behave similarly to mode 0. >> >> TESTED: using the one-shot PIT mode to calibrate a local apic timer. > >Has this been forgotten, or is it not needed anymore? This is still required but nobody uses the PIT one-shot mode (probably because it *is* currently broken). Can it be merged? Thanks, Damien
RE: [PATCH v4] target/riscv: update checks on writing pmpcfg for Smepmp to version 1.0
> On Mon, Sep 25, 2023 at 2:11 AM Alvin Chang > wrote: > > > > Current checks on writing pmpcfg for Smepmp follows Smepmp version > > 0.9.1. However, Smepmp specification has already been ratified, and > > there are some differences between version 0.9.1 and 1.0. In this > > commit we update the checks of writing pmpcfg to follow Smepmp version > > 1.0. > > > > When mseccfg.MML is set, the constraints to modify PMP rules are: > > 1. Locked rules cannot be removed or modified until a PMP reset, unless > >mseccfg.RLB is set. > > 2. From Smepmp specification version 1.0, chapter 2 section 4b: > >Adding a rule with executable privileges that either is M-mode-only > >or a locked Shared-Region is not possible and such pmpcfg writes are > >ignored, leaving pmpcfg unchanged. > > > > The commit transfers the value of pmpcfg into the index of the Smepmp > > truth table, and checks the rules by aforementioned specification > > changes. > > > > Signed-off-by: Alvin Chang > > --- > > Changes from v3: Modify "epmp_operation" to "smepmp_operation". > > This has the same issue as all the previous versions. > > QEMU is currently not shipping with Smepmp support. So updating some of the > code to support Smepmp is confusing. > > As I pointed out for the v3, we currently only support ePMP 0.9.3. So that is > what the code must work with. > > In order for this change to go in, we also need to upgrade QEMU to support > Smepmp 1.0. > > This patch is an attempt to do that: > https://www.mail-archive.com/qemu-devel@nongnu.org/msg967676.html > > You basically need to combine the changes from > https://www.mail-archive.com/qemu-devel@nongnu.org/msg967676.html into > this patch. So there is a single patch that updates to Smepmp. > > Alistair > Hi Alistair, Mayuresh has sent that patch again recently: https://www.mail-archive.com/qemu-devel@nongnu.org/msg991428.html Since the author is from Ventana, I would like to send my patch separately. I think our patches can be merged/applied together. Thanks! Alvin > > > > Changes from v2: Adopt switch case ranges and numerical order. > > > > Changes from v1: Convert ePMP over to Smepmp. > > > > target/riscv/pmp.c | 40 > > 1 file changed, 32 insertions(+), 8 deletions(-) > > > > diff --git a/target/riscv/pmp.c b/target/riscv/pmp.c index > > a08cd95658..2ebf18c941 100644 > > --- a/target/riscv/pmp.c > > +++ b/target/riscv/pmp.c > > @@ -99,16 +99,40 @@ static void pmp_write_cfg(CPURISCVState *env, > uint32_t pmp_index, uint8_t val) > > locked = false; > > } > > > > -/* mseccfg.MML is set */ > > -if (MSECCFG_MML_ISSET(env)) { > > -/* not adding execute bit */ > > -if ((val & PMP_LOCK) != 0 && (val & PMP_EXEC) != > PMP_EXEC) { > > +/* > > + * mseccfg.MML is set. Locked rules cannot be removed or > modified > > + * until a PMP reset. Besides, from Smepmp specification > version 1.0 > > + * , chapter 2 section 4b says: > > + * Adding a rule with executable privileges that either is > > + * M-mode-only or a locked Shared-Region is not possible > and such > > + * pmpcfg writes are ignored, leaving pmpcfg unchanged. > > + */ > > +if (MSECCFG_MML_ISSET(env) && !pmp_is_locked(env, > pmp_index)) { > > +/* > > + * Convert the PMP permissions to match the truth > table in the > > + * Smepmp spec. > > + */ > > +const uint8_t smepmp_operation = > > +((val & PMP_LOCK) >> 4) | ((val & PMP_READ) << > 2) | > > +(val & PMP_WRITE) | ((val & PMP_EXEC) >> 2); > > + > > +switch (smepmp_operation) { > > +case 0 ... 8: > > locked = false; > > -} > > -/* shared region and not adding X bit */ > > -if ((val & PMP_LOCK) != PMP_LOCK && > > -(val & 0x7) != (PMP_WRITE | PMP_EXEC)) { > > +break; > > +case 9 ... 11: > > +break; > > +case 12: > > +locked = false; > > +break; > > +case 13: > > +break; > > +case 14: > > +case 15: > > locked = false; > > +break; > > +default: > > +g_assert_not_reached(); > > } > > } > > } else { > > -- > > 2.34.1 > > > >
[PATCH v17 4/9] virtio-gpu: blob prep
From: Antonio Caggiano This adds preparatory functions needed to: - decode blob cmds - tracking iovecs Signed-off-by: Antonio Caggiano Signed-off-by: Dmitry Osipenko Signed-off-by: Gurchetan Singh Tested-by: Alyssa Ross Tested-by: Emmanouil Pitsidianakis Tested-by: Akihiko Odaki Tested-by: Huang Rui Acked-by: Huang Rui Reviewed-by: Emmanouil Pitsidianakis Reviewed-by: Akihiko Odaki --- hw/display/virtio-gpu.c | 10 +++--- include/hw/virtio/virtio-gpu-bswap.h | 15 +++ include/hw/virtio/virtio-gpu.h | 5 + 3 files changed, 23 insertions(+), 7 deletions(-) diff --git a/hw/display/virtio-gpu.c b/hw/display/virtio-gpu.c index 5585558855..be16efbd38 100644 --- a/hw/display/virtio-gpu.c +++ b/hw/display/virtio-gpu.c @@ -33,15 +33,11 @@ #define VIRTIO_GPU_VM_VERSION 1 -static struct virtio_gpu_simple_resource* -virtio_gpu_find_resource(VirtIOGPU *g, uint32_t resource_id); static struct virtio_gpu_simple_resource * virtio_gpu_find_check_resource(VirtIOGPU *g, uint32_t resource_id, bool require_backing, const char *caller, uint32_t *error); -static void virtio_gpu_cleanup_mapping(VirtIOGPU *g, - struct virtio_gpu_simple_resource *res); static void virtio_gpu_reset_bh(void *opaque); void virtio_gpu_update_cursor_data(VirtIOGPU *g, @@ -116,7 +112,7 @@ static void update_cursor(VirtIOGPU *g, struct virtio_gpu_update_cursor *cursor) cursor->resource_id ? 1 : 0); } -static struct virtio_gpu_simple_resource * +struct virtio_gpu_simple_resource * virtio_gpu_find_resource(VirtIOGPU *g, uint32_t resource_id) { struct virtio_gpu_simple_resource *res; @@ -904,8 +900,8 @@ void virtio_gpu_cleanup_mapping_iov(VirtIOGPU *g, g_free(iov); } -static void virtio_gpu_cleanup_mapping(VirtIOGPU *g, - struct virtio_gpu_simple_resource *res) +void virtio_gpu_cleanup_mapping(VirtIOGPU *g, +struct virtio_gpu_simple_resource *res) { virtio_gpu_cleanup_mapping_iov(g, res->iov, res->iov_cnt); res->iov = NULL; diff --git a/include/hw/virtio/virtio-gpu-bswap.h b/include/hw/virtio/virtio-gpu-bswap.h index 637a0585d0..dd1975e2d4 100644 --- a/include/hw/virtio/virtio-gpu-bswap.h +++ b/include/hw/virtio/virtio-gpu-bswap.h @@ -70,6 +70,21 @@ virtio_gpu_create_blob_bswap(struct virtio_gpu_resource_create_blob *cblob) le64_to_cpus(>size); } +static inline void +virtio_gpu_map_blob_bswap(struct virtio_gpu_resource_map_blob *mblob) +{ +virtio_gpu_ctrl_hdr_bswap(>hdr); +le32_to_cpus(>resource_id); +le64_to_cpus(>offset); +} + +static inline void +virtio_gpu_unmap_blob_bswap(struct virtio_gpu_resource_unmap_blob *ublob) +{ +virtio_gpu_ctrl_hdr_bswap(>hdr); +le32_to_cpus(>resource_id); +} + static inline void virtio_gpu_scanout_blob_bswap(struct virtio_gpu_set_scanout_blob *ssb) { diff --git a/include/hw/virtio/virtio-gpu.h b/include/hw/virtio/virtio-gpu.h index de4f624e94..55973e112f 100644 --- a/include/hw/virtio/virtio-gpu.h +++ b/include/hw/virtio/virtio-gpu.h @@ -257,6 +257,9 @@ void virtio_gpu_base_fill_display_info(VirtIOGPUBase *g, void virtio_gpu_base_generate_edid(VirtIOGPUBase *g, int scanout, struct virtio_gpu_resp_edid *edid); /* virtio-gpu.c */ +struct virtio_gpu_simple_resource * +virtio_gpu_find_resource(VirtIOGPU *g, uint32_t resource_id); + void virtio_gpu_ctrl_response(VirtIOGPU *g, struct virtio_gpu_ctrl_command *cmd, struct virtio_gpu_ctrl_hdr *resp, @@ -275,6 +278,8 @@ int virtio_gpu_create_mapping_iov(VirtIOGPU *g, uint32_t *niov); void virtio_gpu_cleanup_mapping_iov(VirtIOGPU *g, struct iovec *iov, uint32_t count); +void virtio_gpu_cleanup_mapping(VirtIOGPU *g, +struct virtio_gpu_simple_resource *res); void virtio_gpu_process_cmdq(VirtIOGPU *g); void virtio_gpu_device_realize(DeviceState *qdev, Error **errp); void virtio_gpu_reset(VirtIODevice *vdev); -- 2.42.0.609.gbb76f46606-goog
[PATCH v17 3/9] virtio-gpu: hostmem
From: Gerd Hoffmann Use VIRTIO_GPU_SHM_ID_HOST_VISIBLE as id for virtio-gpu. Signed-off-by: Antonio Caggiano Tested-by: Alyssa Ross Tested-by: Akihiko Odaki Tested-by: Huang Rui Acked-by: Huang Rui Acked-by: Michael S. Tsirkin Reviewed-by: Akihiko Odaki --- hw/display/virtio-gpu-pci.c| 14 ++ hw/display/virtio-gpu.c| 1 + hw/display/virtio-vga.c| 33 - include/hw/virtio/virtio-gpu.h | 5 + 4 files changed, 44 insertions(+), 9 deletions(-) diff --git a/hw/display/virtio-gpu-pci.c b/hw/display/virtio-gpu-pci.c index 93f214ff58..da6a99f038 100644 --- a/hw/display/virtio-gpu-pci.c +++ b/hw/display/virtio-gpu-pci.c @@ -33,6 +33,20 @@ static void virtio_gpu_pci_base_realize(VirtIOPCIProxy *vpci_dev, Error **errp) DeviceState *vdev = DEVICE(g); int i; +if (virtio_gpu_hostmem_enabled(g->conf)) { +vpci_dev->msix_bar_idx = 1; +vpci_dev->modern_mem_bar_idx = 2; +memory_region_init(>hostmem, OBJECT(g), "virtio-gpu-hostmem", + g->conf.hostmem); +pci_register_bar(_dev->pci_dev, 4, + PCI_BASE_ADDRESS_SPACE_MEMORY | + PCI_BASE_ADDRESS_MEM_PREFETCH | + PCI_BASE_ADDRESS_MEM_TYPE_64, + >hostmem); +virtio_pci_add_shm_cap(vpci_dev, 4, 0, g->conf.hostmem, + VIRTIO_GPU_SHM_ID_HOST_VISIBLE); +} + virtio_pci_force_virtio_1(vpci_dev); if (!qdev_realize(vdev, BUS(_dev->bus), errp)) { return; diff --git a/hw/display/virtio-gpu.c b/hw/display/virtio-gpu.c index 93857ad523..5585558855 100644 --- a/hw/display/virtio-gpu.c +++ b/hw/display/virtio-gpu.c @@ -1511,6 +1511,7 @@ static Property virtio_gpu_properties[] = { 256 * MiB), DEFINE_PROP_BIT("blob", VirtIOGPU, parent_obj.conf.flags, VIRTIO_GPU_FLAG_BLOB_ENABLED, false), +DEFINE_PROP_SIZE("hostmem", VirtIOGPU, parent_obj.conf.hostmem, 0), DEFINE_PROP_END_OF_LIST(), }; diff --git a/hw/display/virtio-vga.c b/hw/display/virtio-vga.c index e6fb0aa876..c8552ff760 100644 --- a/hw/display/virtio-vga.c +++ b/hw/display/virtio-vga.c @@ -115,17 +115,32 @@ static void virtio_vga_base_realize(VirtIOPCIProxy *vpci_dev, Error **errp) pci_register_bar(_dev->pci_dev, 0, PCI_BASE_ADDRESS_MEM_PREFETCH, >vram); -/* - * Configure virtio bar and regions - * - * We use bar #2 for the mmio regions, to be compatible with stdvga. - * virtio regions are moved to the end of bar #2, to make room for - * the stdvga mmio registers at the start of bar #2. - */ -vpci_dev->modern_mem_bar_idx = 2; -vpci_dev->msix_bar_idx = 4; vpci_dev->modern_io_bar_idx = 5; +if (!virtio_gpu_hostmem_enabled(g->conf)) { +/* + * Configure virtio bar and regions + * + * We use bar #2 for the mmio regions, to be compatible with stdvga. + * virtio regions are moved to the end of bar #2, to make room for + * the stdvga mmio registers at the start of bar #2. + */ +vpci_dev->modern_mem_bar_idx = 2; +vpci_dev->msix_bar_idx = 4; +} else { +vpci_dev->msix_bar_idx = 1; +vpci_dev->modern_mem_bar_idx = 2; +memory_region_init(>hostmem, OBJECT(g), "virtio-gpu-hostmem", + g->conf.hostmem); +pci_register_bar(_dev->pci_dev, 4, + PCI_BASE_ADDRESS_SPACE_MEMORY | + PCI_BASE_ADDRESS_MEM_PREFETCH | + PCI_BASE_ADDRESS_MEM_TYPE_64, + >hostmem); +virtio_pci_add_shm_cap(vpci_dev, 4, 0, g->conf.hostmem, + VIRTIO_GPU_SHM_ID_HOST_VISIBLE); +} + if (!(vpci_dev->flags & VIRTIO_PCI_FLAG_PAGE_PER_VQ)) { /* * with page-per-vq=off there is no padding space we can use diff --git a/include/hw/virtio/virtio-gpu.h b/include/hw/virtio/virtio-gpu.h index 8377c365ef..de4f624e94 100644 --- a/include/hw/virtio/virtio-gpu.h +++ b/include/hw/virtio/virtio-gpu.h @@ -108,12 +108,15 @@ enum virtio_gpu_base_conf_flags { (_cfg.flags & (1 << VIRTIO_GPU_FLAG_BLOB_ENABLED)) #define virtio_gpu_context_init_enabled(_cfg) \ (_cfg.flags & (1 << VIRTIO_GPU_FLAG_CONTEXT_INIT_ENABLED)) +#define virtio_gpu_hostmem_enabled(_cfg) \ +(_cfg.hostmem > 0) struct virtio_gpu_base_conf { uint32_t max_outputs; uint32_t flags; uint32_t xres; uint32_t yres; +uint64_t hostmem; }; struct virtio_gpu_ctrl_command { @@ -137,6 +140,8 @@ struct VirtIOGPUBase { int renderer_blocked; int enable; +MemoryRegion hostmem; + struct virtio_gpu_scanout scanout[VIRTIO_GPU_MAX_SCANOUTS]; int enabled_output_bitmask; -- 2.42.0.609.gbb76f46606-goog
[PATCH v17 8/9] gfxstream + rutabaga: enable rutabaga
This change enables rutabaga to receive virtio-gpu-3d hypercalls when it is active. Signed-off-by: Gurchetan Singh Tested-by: Alyssa Ross Tested-by: Emmanouil Pitsidianakis Tested-by: Akihiko Odaki Reviewed-by: Antonio Caggiano Reviewed-by: Emmanouil Pitsidianakis Reviewed-by: Akihiko Odaki --- hw/display/virtio-gpu-base.c | 3 ++- hw/display/virtio-gpu.c | 5 +++-- softmmu/qdev-monitor.c | 3 +++ softmmu/vl.c | 1 + 4 files changed, 9 insertions(+), 3 deletions(-) diff --git a/hw/display/virtio-gpu-base.c b/hw/display/virtio-gpu-base.c index 4f2b0ba1f3..50c5373b65 100644 --- a/hw/display/virtio-gpu-base.c +++ b/hw/display/virtio-gpu-base.c @@ -223,7 +223,8 @@ virtio_gpu_base_get_features(VirtIODevice *vdev, uint64_t features, { VirtIOGPUBase *g = VIRTIO_GPU_BASE(vdev); -if (virtio_gpu_virgl_enabled(g->conf)) { +if (virtio_gpu_virgl_enabled(g->conf) || +virtio_gpu_rutabaga_enabled(g->conf)) { features |= (1 << VIRTIO_GPU_F_VIRGL); } if (virtio_gpu_edid_enabled(g->conf)) { diff --git a/hw/display/virtio-gpu.c b/hw/display/virtio-gpu.c index be16efbd38..6efd15b6ae 100644 --- a/hw/display/virtio-gpu.c +++ b/hw/display/virtio-gpu.c @@ -1363,8 +1363,9 @@ void virtio_gpu_device_realize(DeviceState *qdev, Error **errp) VirtIOGPU *g = VIRTIO_GPU(qdev); if (virtio_gpu_blob_enabled(g->parent_obj.conf)) { -if (!virtio_gpu_have_udmabuf()) { -error_setg(errp, "cannot enable blob resources without udmabuf"); +if (!virtio_gpu_rutabaga_enabled(g->parent_obj.conf) && +!virtio_gpu_have_udmabuf()) { +error_setg(errp, "need rutabaga or udmabuf for blob resources"); return; } diff --git a/softmmu/qdev-monitor.c b/softmmu/qdev-monitor.c index 74f4e41338..1b8005ae55 100644 --- a/softmmu/qdev-monitor.c +++ b/softmmu/qdev-monitor.c @@ -86,6 +86,9 @@ static const QDevAlias qdev_alias_table[] = { { "virtio-gpu-pci", "virtio-gpu", QEMU_ARCH_VIRTIO_PCI }, { "virtio-gpu-gl-device", "virtio-gpu-gl", QEMU_ARCH_VIRTIO_MMIO }, { "virtio-gpu-gl-pci", "virtio-gpu-gl", QEMU_ARCH_VIRTIO_PCI }, +{ "virtio-gpu-rutabaga-device", "virtio-gpu-rutabaga", + QEMU_ARCH_VIRTIO_MMIO }, +{ "virtio-gpu-rutabaga-pci", "virtio-gpu-rutabaga", QEMU_ARCH_VIRTIO_PCI }, { "virtio-input-host-device", "virtio-input-host", QEMU_ARCH_VIRTIO_MMIO }, { "virtio-input-host-ccw", "virtio-input-host", QEMU_ARCH_VIRTIO_CCW }, { "virtio-input-host-pci", "virtio-input-host", QEMU_ARCH_VIRTIO_PCI }, diff --git a/softmmu/vl.c b/softmmu/vl.c index 98e071e63b..dd82c6eb13 100644 --- a/softmmu/vl.c +++ b/softmmu/vl.c @@ -215,6 +215,7 @@ static struct { { .driver = "ati-vga", .flag = _vga }, { .driver = "vhost-user-vga", .flag = _vga }, { .driver = "virtio-vga-gl",.flag = _vga }, +{ .driver = "virtio-vga-rutabaga", .flag = _vga }, }; static QemuOptsList qemu_rtc_opts = { -- 2.42.0.609.gbb76f46606-goog
[PATCH v17 5/9] gfxstream + rutabaga prep: added need defintions, fields, and options
This modifies the common virtio-gpu.h file have the fields and defintions needed by gfxstream/rutabaga, by VirtioGpuRutabaga. Signed-off-by: Gurchetan Singh Tested-by: Alyssa Ross Tested-by: Emmanouil Pitsidianakis Tested-by: Akihiko Odaki Reviewed-by: Emmanouil Pitsidianakis Reviewed-by: Antonio Caggiano Reviewed-by: Akihiko Odaki --- include/hw/virtio/virtio-gpu.h | 27 +++ 1 file changed, 27 insertions(+) diff --git a/include/hw/virtio/virtio-gpu.h b/include/hw/virtio/virtio-gpu.h index 55973e112f..39018377d2 100644 --- a/include/hw/virtio/virtio-gpu.h +++ b/include/hw/virtio/virtio-gpu.h @@ -38,6 +38,9 @@ OBJECT_DECLARE_SIMPLE_TYPE(VirtIOGPUGL, VIRTIO_GPU_GL) #define TYPE_VHOST_USER_GPU "vhost-user-gpu" OBJECT_DECLARE_SIMPLE_TYPE(VhostUserGPU, VHOST_USER_GPU) +#define TYPE_VIRTIO_GPU_RUTABAGA "virtio-gpu-rutabaga-device" +OBJECT_DECLARE_SIMPLE_TYPE(VirtIOGPURutabaga, VIRTIO_GPU_RUTABAGA) + struct virtio_gpu_simple_resource { uint32_t resource_id; uint32_t width; @@ -94,6 +97,7 @@ enum virtio_gpu_base_conf_flags { VIRTIO_GPU_FLAG_DMABUF_ENABLED, VIRTIO_GPU_FLAG_BLOB_ENABLED, VIRTIO_GPU_FLAG_CONTEXT_INIT_ENABLED, +VIRTIO_GPU_FLAG_RUTABAGA_ENABLED, }; #define virtio_gpu_virgl_enabled(_cfg) \ @@ -108,6 +112,8 @@ enum virtio_gpu_base_conf_flags { (_cfg.flags & (1 << VIRTIO_GPU_FLAG_BLOB_ENABLED)) #define virtio_gpu_context_init_enabled(_cfg) \ (_cfg.flags & (1 << VIRTIO_GPU_FLAG_CONTEXT_INIT_ENABLED)) +#define virtio_gpu_rutabaga_enabled(_cfg) \ +(_cfg.flags & (1 << VIRTIO_GPU_FLAG_RUTABAGA_ENABLED)) #define virtio_gpu_hostmem_enabled(_cfg) \ (_cfg.hostmem > 0) @@ -232,6 +238,27 @@ struct VhostUserGPU { bool backend_blocked; }; +#define MAX_SLOTS 4096 + +struct MemoryRegionInfo { +int used; +MemoryRegion mr; +uint32_t resource_id; +}; + +struct rutabaga; + +struct VirtIOGPURutabaga { +VirtIOGPU parent_obj; +struct MemoryRegionInfo memory_regions[MAX_SLOTS]; +uint64_t capset_mask; +char *wayland_socket_path; +char *wsi; +bool headless; +uint32_t num_capsets; +struct rutabaga *rutabaga; +}; + #define VIRTIO_GPU_FILL_CMD(out) do { \ size_t s; \ s = iov_to_buf(cmd->elem.out_sg, cmd->elem.out_num, 0, \ -- 2.42.0.609.gbb76f46606-goog
[PATCH v17 2/9] virtio-gpu: CONTEXT_INIT feature
From: Antonio Caggiano The feature can be enabled when a backend wants it. Signed-off-by: Antonio Caggiano Signed-off-by: Gurchetan Singh Tested-by: Alyssa Ross Tested-by: Akihiko Odaki Tested-by: Huang Rui Acked-by: Huang Rui Reviewed-by: Marc-André Lureau Reviewed-by: Philippe Mathieu-Daudé Reviewed-by: Akihiko Odaki --- hw/display/virtio-gpu-base.c | 3 +++ include/hw/virtio/virtio-gpu.h | 3 +++ 2 files changed, 6 insertions(+) diff --git a/hw/display/virtio-gpu-base.c b/hw/display/virtio-gpu-base.c index ca1fb7b16f..4f2b0ba1f3 100644 --- a/hw/display/virtio-gpu-base.c +++ b/hw/display/virtio-gpu-base.c @@ -232,6 +232,9 @@ virtio_gpu_base_get_features(VirtIODevice *vdev, uint64_t features, if (virtio_gpu_blob_enabled(g->conf)) { features |= (1 << VIRTIO_GPU_F_RESOURCE_BLOB); } +if (virtio_gpu_context_init_enabled(g->conf)) { +features |= (1 << VIRTIO_GPU_F_CONTEXT_INIT); +} return features; } diff --git a/include/hw/virtio/virtio-gpu.h b/include/hw/virtio/virtio-gpu.h index 390c4642b8..8377c365ef 100644 --- a/include/hw/virtio/virtio-gpu.h +++ b/include/hw/virtio/virtio-gpu.h @@ -93,6 +93,7 @@ enum virtio_gpu_base_conf_flags { VIRTIO_GPU_FLAG_EDID_ENABLED, VIRTIO_GPU_FLAG_DMABUF_ENABLED, VIRTIO_GPU_FLAG_BLOB_ENABLED, +VIRTIO_GPU_FLAG_CONTEXT_INIT_ENABLED, }; #define virtio_gpu_virgl_enabled(_cfg) \ @@ -105,6 +106,8 @@ enum virtio_gpu_base_conf_flags { (_cfg.flags & (1 << VIRTIO_GPU_FLAG_DMABUF_ENABLED)) #define virtio_gpu_blob_enabled(_cfg) \ (_cfg.flags & (1 << VIRTIO_GPU_FLAG_BLOB_ENABLED)) +#define virtio_gpu_context_init_enabled(_cfg) \ +(_cfg.flags & (1 << VIRTIO_GPU_FLAG_CONTEXT_INIT_ENABLED)) struct virtio_gpu_base_conf { uint32_t max_outputs; -- 2.42.0.609.gbb76f46606-goog
[PATCH v17 1/9] virtio: Add shared memory capability
From: "Dr. David Alan Gilbert" Define a new capability type 'VIRTIO_PCI_CAP_SHARED_MEMORY_CFG' to allow defining shared memory regions with sizes and offsets of 2^32 and more. Multiple instances of the capability are allowed and distinguished by a device-specific 'id'. Signed-off-by: Dr. David Alan Gilbert Signed-off-by: Antonio Caggiano Signed-off-by: Gurchetan Singh Tested-by: Alyssa Ross Tested-by: Huang Rui Tested-by: Akihiko Odaki Acked-by: Huang Rui Reviewed-by: Gurchetan Singh Reviewed-by: Akihiko Odaki --- hw/virtio/virtio-pci.c | 18 ++ include/hw/virtio/virtio-pci.h | 4 2 files changed, 22 insertions(+) diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c index edbc0daa18..da8c9ea12d 100644 --- a/hw/virtio/virtio-pci.c +++ b/hw/virtio/virtio-pci.c @@ -1435,6 +1435,24 @@ static int virtio_pci_add_mem_cap(VirtIOPCIProxy *proxy, return offset; } +int virtio_pci_add_shm_cap(VirtIOPCIProxy *proxy, + uint8_t bar, uint64_t offset, uint64_t length, + uint8_t id) +{ +struct virtio_pci_cap64 cap = { +.cap.cap_len = sizeof cap, +.cap.cfg_type = VIRTIO_PCI_CAP_SHARED_MEMORY_CFG, +}; + +cap.cap.bar = bar; +cap.cap.length = cpu_to_le32(length); +cap.length_hi = cpu_to_le32(length >> 32); +cap.cap.offset = cpu_to_le32(offset); +cap.offset_hi = cpu_to_le32(offset >> 32); +cap.cap.id = id; +return virtio_pci_add_mem_cap(proxy, ); +} + static uint64_t virtio_pci_common_read(void *opaque, hwaddr addr, unsigned size) { diff --git a/include/hw/virtio/virtio-pci.h b/include/hw/virtio/virtio-pci.h index ab2051b64b..5a3f182f99 100644 --- a/include/hw/virtio/virtio-pci.h +++ b/include/hw/virtio/virtio-pci.h @@ -264,4 +264,8 @@ unsigned virtio_pci_optimal_num_queues(unsigned fixed_queues); void virtio_pci_set_guest_notifier_fd_handler(VirtIODevice *vdev, VirtQueue *vq, int n, bool assign, bool with_irqfd); + +int virtio_pci_add_shm_cap(VirtIOPCIProxy *proxy, uint8_t bar, uint64_t offset, + uint64_t length, uint8_t id); + #endif -- 2.42.0.609.gbb76f46606-goog
[PATCH v17 7/9] gfxstream + rutabaga: meson support
- Add meson detection of rutabaga_gfx - Build virtio-gpu-rutabaga.c + associated vga/pci files when present Signed-off-by: Gurchetan Singh Tested-by: Alyssa Ross Tested-by: Emmanouil Pitsidianakis Tested-by: Akihiko Odaki Reviewed-by: Emmanouil Pitsidianakis Reviewed-by: Antonio Caggiano Reviewed-by: Akihiko Odaki --- hw/display/meson.build| 22 ++ meson.build | 7 +++ meson_options.txt | 2 ++ scripts/meson-buildoptions.sh | 3 +++ 4 files changed, 34 insertions(+) diff --git a/hw/display/meson.build b/hw/display/meson.build index 05619c6968..2b64fd9f9d 100644 --- a/hw/display/meson.build +++ b/hw/display/meson.build @@ -80,6 +80,13 @@ if config_all_devices.has_key('CONFIG_VIRTIO_GPU') if_true: [files('virtio-gpu-gl.c', 'virtio-gpu-virgl.c'), pixman, virgl]) hw_display_modules += {'virtio-gpu-gl': virtio_gpu_gl_ss} endif + + if rutabaga.found() +virtio_gpu_rutabaga_ss = ss.source_set() +virtio_gpu_rutabaga_ss.add(when: ['CONFIG_VIRTIO_GPU', rutabaga], + if_true: [files('virtio-gpu-rutabaga.c'), pixman]) +hw_display_modules += {'virtio-gpu-rutabaga': virtio_gpu_rutabaga_ss} + endif endif if config_all_devices.has_key('CONFIG_VIRTIO_PCI') @@ -96,6 +103,12 @@ if config_all_devices.has_key('CONFIG_VIRTIO_PCI') if_true: [files('virtio-gpu-pci-gl.c'), pixman]) hw_display_modules += {'virtio-gpu-pci-gl': virtio_gpu_pci_gl_ss} endif + if rutabaga.found() +virtio_gpu_pci_rutabaga_ss = ss.source_set() +virtio_gpu_pci_rutabaga_ss.add(when: ['CONFIG_VIRTIO_GPU', 'CONFIG_VIRTIO_PCI', rutabaga], + if_true: [files('virtio-gpu-pci-rutabaga.c'), pixman]) +hw_display_modules += {'virtio-gpu-pci-rutabaga': virtio_gpu_pci_rutabaga_ss} + endif endif if config_all_devices.has_key('CONFIG_VIRTIO_VGA') @@ -114,6 +127,15 @@ if config_all_devices.has_key('CONFIG_VIRTIO_VGA') virtio_vga_gl_ss.add(when: 'CONFIG_ACPI', if_true: files('acpi-vga.c'), if_false: files('acpi-vga-stub.c')) hw_display_modules += {'virtio-vga-gl': virtio_vga_gl_ss} + + if rutabaga.found() +virtio_vga_rutabaga_ss = ss.source_set() +virtio_vga_rutabaga_ss.add(when: ['CONFIG_VIRTIO_VGA', rutabaga], + if_true: [files('virtio-vga-rutabaga.c'), pixman]) +virtio_vga_rutabaga_ss.add(when: 'CONFIG_ACPI', if_true: files('acpi-vga.c'), +if_false: files('acpi-vga-stub.c')) +hw_display_modules += {'virtio-vga-rutabaga': virtio_vga_rutabaga_ss} + endif endif system_ss.add(when: 'CONFIG_OMAP', if_true: files('omap_lcdc.c')) diff --git a/meson.build b/meson.build index 3bb64b536c..65848a662a 100644 --- a/meson.build +++ b/meson.build @@ -1046,6 +1046,12 @@ if not get_option('virglrenderer').auto() or have_system or have_vhost_user_gpu dependencies: virgl)) endif endif +rutabaga = not_found +if not get_option('rutabaga_gfx').auto() or have_system or have_vhost_user_gpu + rutabaga = dependency('rutabaga_gfx_ffi', + method: 'pkg-config', + required: get_option('rutabaga_gfx')) +endif blkio = not_found if not get_option('blkio').auto() or have_block blkio = dependency('blkio', @@ -4277,6 +4283,7 @@ summary_info += {'libtasn1': tasn1} summary_info += {'PAM': pam} summary_info += {'iconv support': iconv} summary_info += {'virgl support': virgl} +summary_info += {'rutabaga support': rutabaga} summary_info += {'blkio support': blkio} summary_info += {'curl support': curl} summary_info += {'Multipath support': mpathpersist} diff --git a/meson_options.txt b/meson_options.txt index 6a17b90968..e49309dd78 100644 --- a/meson_options.txt +++ b/meson_options.txt @@ -230,6 +230,8 @@ option('vmnet', type : 'feature', value : 'auto', description: 'vmnet.framework network backend support') option('virglrenderer', type : 'feature', value : 'auto', description: 'virgl rendering support') +option('rutabaga_gfx', type : 'feature', value : 'auto', + description: 'rutabaga_gfx support') option('png', type : 'feature', value : 'auto', description: 'PNG support with libpng') option('vnc', type : 'feature', value : 'auto', diff --git a/scripts/meson-buildoptions.sh b/scripts/meson-buildoptions.sh index 2a74b0275b..a28ccbcaf6 100644 --- a/scripts/meson-buildoptions.sh +++ b/scripts/meson-buildoptions.sh @@ -156,6 +156,7 @@ meson_options_help() { printf "%s\n" ' rbd Ceph block device driver' printf "%s\n" ' rdmaEnable RDMA-based migration' printf "%s\n" ' replication replication support' + printf "%s\n" ' rutabaga-gfxrutabaga_gfx support' printf "%s\n" ' sdl SDL
[PATCH v17 9/9] docs/system: add basic virtio-gpu documentation
This adds basic documentation for virtio-gpu. Suggested-by: Akihiko Odaki Signed-off-by: Gurchetan Singh Tested-by: Alyssa Ross Tested-by: Emmanouil Pitsidianakis Tested-by: Akihiko Odaki Reviewed-by: Emmanouil Pitsidianakis Reviewed-by: Antonio Caggiano Reviewed-by: Akihiko Odaki --- docs/system/device-emulation.rst | 1 + docs/system/devices/virtio-gpu.rst | 112 + 2 files changed, 113 insertions(+) create mode 100644 docs/system/devices/virtio-gpu.rst diff --git a/docs/system/device-emulation.rst b/docs/system/device-emulation.rst index 4491c4cbf7..1167f3a9f2 100644 --- a/docs/system/device-emulation.rst +++ b/docs/system/device-emulation.rst @@ -91,6 +91,7 @@ Emulated Devices devices/nvme.rst devices/usb.rst devices/vhost-user.rst + devices/virtio-gpu.rst devices/virtio-pmem.rst devices/vhost-user-rng.rst devices/canokey.rst diff --git a/docs/system/devices/virtio-gpu.rst b/docs/system/devices/virtio-gpu.rst new file mode 100644 index 00..cb73dd7998 --- /dev/null +++ b/docs/system/devices/virtio-gpu.rst @@ -0,0 +1,112 @@ +.. + SPDX-License-Identifier: GPL-2.0-or-later + +virtio-gpu +== + +This document explains the setup and usage of the virtio-gpu device. +The virtio-gpu device paravirtualizes the GPU and display controller. + +Linux kernel support + + +virtio-gpu requires a guest Linux kernel built with the +``CONFIG_DRM_VIRTIO_GPU`` option. + +QEMU virtio-gpu variants + + +QEMU virtio-gpu device variants come in the following form: + + * ``virtio-vga[-BACKEND]`` + * ``virtio-gpu[-BACKEND][-INTERFACE]`` + * ``vhost-user-vga`` + * ``vhost-user-pci`` + +**Backends:** QEMU provides a 2D virtio-gpu backend, and two accelerated +backends: virglrenderer ('gl' device label) and rutabaga_gfx ('rutabaga' +device label). There is a vhost-user backend that runs the graphics stack +in a separate process for improved isolation. + +**Interfaces:** QEMU further categorizes virtio-gpu device variants based +on the interface exposed to the guest. The interfaces can be classified +into VGA and non-VGA variants. The VGA ones are prefixed with virtio-vga +or vhost-user-vga while the non-VGA ones are prefixed with virtio-gpu or +vhost-user-gpu. + +The VGA ones always use the PCI interface, but for the non-VGA ones, the +user can further pick between MMIO or PCI. For MMIO, the user can suffix +the device name with -device, though vhost-user-gpu does not support MMIO. +For PCI, the user can suffix it with -pci. Without these suffixes, the +platform default will be chosen. + +virtio-gpu 2d +- + +The default 2D backend only performs 2D operations. The guest needs to +employ a software renderer for 3D graphics. + +Typically, the software renderer is provided by `Mesa`_ or `SwiftShader`_. +Mesa's implementations (LLVMpipe, Lavapipe and virgl below) work out of box +on typical modern Linux distributions. + +.. parsed-literal:: +-device virtio-gpu + +.. _Mesa: https://www.mesa3d.org/ +.. _SwiftShader: https://github.com/google/swiftshader + +virtio-gpu virglrenderer + + +When using virgl accelerated graphics mode in the guest, OpenGL API calls +are translated into an intermediate representation (see `Gallium3D`_). The +intermediate representation is communicated to the host and the +`virglrenderer`_ library on the host translates the intermediate +representation back to OpenGL API calls. + +.. parsed-literal:: +-device virtio-gpu-gl + +.. _Gallium3D: https://www.freedesktop.org/wiki/Software/gallium/ +.. _virglrenderer: https://gitlab.freedesktop.org/virgl/virglrenderer/ + +virtio-gpu rutabaga +--- + +virtio-gpu can also leverage rutabaga_gfx to provide `gfxstream`_ +rendering and `Wayland display passthrough`_. With the gfxstream rendering +mode, GLES and Vulkan calls are forwarded to the host with minimal +modification. + +The crosvm book provides directions on how to build a `gfxstream-enabled +rutabaga`_ and launch a `guest Wayland proxy`_. + +This device does require host blob support (``hostmem`` field below). The +``hostmem`` field specifies the size of virtio-gpu host memory window. +This is typically between 256M and 8G. + +At least one virtio-gpu capability set ("capset") must be specified when +starting the device. The currently capsets supported are ``gfxstream-vulkan`` +and ``cross-domain`` for Linux guests. For Android guests, the experimental +``x-gfxstream-gles`` and ``x-gfxstream-composer`` capsets are also supported. + +The device will try to auto-detect the wayland socket path if the +``cross-domain`` capset name is set. The user may optionally specify +``wayland-socket-path`` for non-standard paths. + +The ``wsi`` option can be set to ``surfaceless`` or ``headless``. +Surfaceless doesn't create a native window surface, but does copy from the +render target to the Pixman buffer if a virtio-gpu 2D hypercall is
[PATCH v17 0/9] gfxstream + rutabaga_gfx
From: Gurchetan Singh Branch containing changes: https://gitlab.com/gurchetansingh/qemu/-/commits/qemu-gfxstream-v17 Changes since v16: - Fixed typo mentioned here: https://lists.gnu.org/archive/html/qemu-devel/2023-10/msg01407.html Antonio Caggiano (2): virtio-gpu: CONTEXT_INIT feature virtio-gpu: blob prep Dr. David Alan Gilbert (1): virtio: Add shared memory capability Gerd Hoffmann (1): virtio-gpu: hostmem Gurchetan Singh (5): gfxstream + rutabaga prep: added need defintions, fields, and options gfxstream + rutabaga: add initial support for gfxstream gfxstream + rutabaga: meson support gfxstream + rutabaga: enable rutabaga docs/system: add basic virtio-gpu documentation docs/system/device-emulation.rst |1 + docs/system/devices/virtio-gpu.rst | 112 +++ hw/display/meson.build | 22 + hw/display/virtio-gpu-base.c |6 +- hw/display/virtio-gpu-pci-rutabaga.c | 47 ++ hw/display/virtio-gpu-pci.c | 14 + hw/display/virtio-gpu-rutabaga.c | 1113 ++ hw/display/virtio-gpu.c | 16 +- hw/display/virtio-vga-rutabaga.c | 50 ++ hw/display/virtio-vga.c | 33 +- hw/virtio/virtio-pci.c | 18 + include/hw/virtio/virtio-gpu-bswap.h | 15 + include/hw/virtio/virtio-gpu.h | 40 + include/hw/virtio/virtio-pci.h |4 + meson.build |7 + meson_options.txt|2 + scripts/meson-buildoptions.sh|3 + softmmu/qdev-monitor.c |3 + softmmu/vl.c |1 + 19 files changed, 1488 insertions(+), 19 deletions(-) create mode 100644 docs/system/devices/virtio-gpu.rst create mode 100644 hw/display/virtio-gpu-pci-rutabaga.c create mode 100644 hw/display/virtio-gpu-rutabaga.c create mode 100644 hw/display/virtio-vga-rutabaga.c -- 2.42.0.609.gbb76f46606-goog
[PATCH v17 6/9] gfxstream + rutabaga: add initial support for gfxstream
This adds initial support for gfxstream and cross-domain. Both features rely on virtio-gpu blob resources and context types, which are also implemented in this patch. gfxstream has a long and illustrious history in Android graphics paravirtualization. It has been powering graphics in the Android Studio Emulator for more than a decade, which is the main developer platform. Originally conceived by Jesse Hall, it was first known as "EmuGL" [a]. The key design characteristic was a 1:1 threading model and auto-generation, which fit nicely with the OpenGLES spec. It also allowed easy layering with ANGLE on the host, which provides the GLES implementations on Windows or MacOS enviroments. gfxstream has traditionally been maintained by a single engineer, and between 2015 to 2021, the goldfish throne passed to Frank Yang. Historians often remark this glorious reign ("pax gfxstreama" is the academic term) was comparable to that of Augustus and both Queen Elizabeths. Just to name a few accomplishments in a resplendent panoply: higher versions of GLES, address space graphics, snapshot support and CTS compliant Vulkan [b]. One major drawback was the use of out-of-tree goldfish drivers. Android engineers didn't know much about DRM/KMS and especially TTM so a simple guest to host pipe was conceived. Luckily, virtio-gpu 3D started to emerge in 2016 due to the work of the Mesa/virglrenderer communities. In 2018, the initial virtio-gpu port of gfxstream was done by Cuttlefish enthusiast Alistair Delva. It was a symbol compatible replacement of virglrenderer [c] and named "AVDVirglrenderer". This implementation forms the basis of the current gfxstream host implementation still in use today. cross-domain support follows a similar arc. Originally conceived by Wayland aficionado David Reveman and crosvm enjoyer Zach Reizner in 2018, it initially relied on the downstream "virtio-wl" device. In 2020 and 2021, virtio-gpu was extended to include blob resources and multiple timelines by yours truly, features gfxstream/cross-domain both require to function correctly. Right now, we stand at the precipice of a truly fantastic possibility: the Android Emulator powered by upstream QEMU and upstream Linux kernel. gfxstream will then be packaged properfully, and app developers can even fix gfxstream bugs on their own if they encounter them. It's been quite the ride, my friends. Where will gfxstream head next, nobody really knows. I wouldn't be surprised if it's around for another decade, maintained by a new generation of Android graphics enthusiasts. Technical details: - Very simple initial display integration: just used Pixman - Largely, 1:1 mapping of virtio-gpu hypercalls to rutabaga function calls Next steps for Android VMs: - The next step would be improving display integration and UI interfaces with the goal of the QEMU upstream graphics being in an emulator release [d]. Next steps for Linux VMs for display virtualization: - For widespread distribution, someone needs to package Sommelier or the wayland-proxy-virtwl [e] ideally into Debian main. In addition, newer versions of the Linux kernel come with DRM_VIRTIO_GPU_KMS option, which allows disabling KMS hypercalls. If anyone cares enough, it'll probably be possible to build a custom VM variant that uses this display virtualization strategy. [a] https://android-review.googlesource.com/c/platform/development/+/34470 [b] https://android-review.googlesource.com/q/topic:%22vulkan-hostconnection-start%22 [c] https://android-review.googlesource.com/c/device/generic/goldfish-opengl/+/761927 [d] https://developer.android.com/studio/releases/emulator [e] https://github.com/talex5/wayland-proxy-virtwl Signed-off-by: Gurchetan Singh Tested-by: Alyssa Ross Tested-by: Emmanouil Pitsidianakis Tested-by: Akihiko Odaki Reviewed-by: Emmanouil Pitsidianakis Reviewed-by: Antonio Caggiano Reviewed-by: Akihiko Odaki --- hw/display/virtio-gpu-pci-rutabaga.c | 47 ++ hw/display/virtio-gpu-rutabaga.c | 1113 ++ hw/display/virtio-vga-rutabaga.c | 50 ++ 3 files changed, 1210 insertions(+) create mode 100644 hw/display/virtio-gpu-pci-rutabaga.c create mode 100644 hw/display/virtio-gpu-rutabaga.c create mode 100644 hw/display/virtio-vga-rutabaga.c diff --git a/hw/display/virtio-gpu-pci-rutabaga.c b/hw/display/virtio-gpu-pci-rutabaga.c new file mode 100644 index 00..c96729e198 --- /dev/null +++ b/hw/display/virtio-gpu-pci-rutabaga.c @@ -0,0 +1,47 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ + +#include "qemu/osdep.h" +#include "qapi/error.h" +#include "qemu/module.h" +#include "hw/pci/pci.h" +#include "hw/qdev-properties.h" +#include "hw/virtio/virtio.h" +#include "hw/virtio/virtio-bus.h" +#include "hw/virtio/virtio-gpu-pci.h" +#include "qom/object.h" + +#define TYPE_VIRTIO_GPU_RUTABAGA_PCI "virtio-gpu-rutabaga-pci" +OBJECT_DECLARE_SIMPLE_TYPE(VirtIOGPURutabagaPCI,
Re: [RFC PATCH v2 1/9] Add Rust SEV library as subproject
On 10/5/23 2:03 AM, Philippe Mathieu-Daudé wrote: Hi Tyler, On 4/10/23 22:34, Tyler Fanelli wrote: The Rust sev library provides a C API for the AMD SEV launch ioctls, as well as the ability to build with meson. Add the Rust sev library as a QEMU subproject with the goal of outsourcing all SEV launch ioctls to C APIs provided by it. Signed-off-by: Tyler Fanelli --- meson.build | 8 meson_options.txt | 2 ++ scripts/meson-buildoptions.sh | 3 +++ subprojects/sev.wrap | 6 ++ target/i386/meson.build | 2 +- 5 files changed, 20 insertions(+), 1 deletion(-) create mode 100644 subprojects/sev.wrap diff --git a/subprojects/sev.wrap b/subprojects/sev.wrap new file mode 100644 index 00..5be1faccf6 --- /dev/null +++ b/subprojects/sev.wrap @@ -0,0 +1,6 @@ +[wrap-git] +url = https://github.com/tylerfanelli/sev +revision = b81b1da5df50055600a5b0349b0c4afda677cccb Why use your tree instead of the mainstream one? Before this gets merged we need to mirror the subproject on our GitLab namespace, then use the mirror URL here. The required meson changes for the sev library are still in review, so I'm still working on a personal branch. Those patches are a blocker for this series right now. This is moreso another RFC to get feedback on building Rust libraries as QEMU subprojects (and if this is the proper way to do so). Tyler
Re: [PATCH v2 5/5] hw/intc/apic: Pass CPU using QOM link property
On 10/3/23 10:27, Philippe Mathieu-Daudé wrote: -/* TODO: convert to link<> */ -apic = APIC_COMMON(cpu->apic_state); -apic->cpu = cpu; -apic->apicbase = APIC_DEFAULT_ADDRESS | MSR_IA32_APICBASE_ENABLE; +qdev_prop_set_uint32(cpu->apic_state, "base-addr", + APIC_DEFAULT_ADDRESS | MSR_IA32_APIC For this to use a link, it's missing the corresponding object_unref(apic->cpu) + apic->cpu = NULL assignment somewhere. For example you can add it in apic_common_unrealize (called by device_unparent - which is called in turn by x86_cpu_unrealizefn). Paolo
Re: [PATCH v2 4/5] hw/intc/apic: Rename x86_cpu_apic_create() -> x86_cpu_apic_new()
On 10/3/23 10:27, Philippe Mathieu-Daudé wrote: -x86_cpu_apic_create(cpu, _err); -if (local_err != NULL) { -goto out; -} +x86_cpu_apic_new(cpu); I don't like this, "*_new" is generally for functions that return what they create. Patch 2 is scary with the newly-introduced possible failure, but I suppose it's safer if you reason that any problem will occur at startup, not at hotplug time for example. Paolo
Re: [PATCH v2] coverity: physmem: use simple assertions instead of modelling
On Thu, Oct 5, 2023 at 4:04 PM Vladimir Sementsov-Ogievskiy wrote: > +/* > + * Assure Coverity (and ourselves) that we are not going to > OVERRUN > + * the buffer by following ldn_he_p(). > + */ > +assert((l == 1 && len >= 1) || > + (l == 2 && len >= 2) || > + (l == 4 && len >= 4) || > + (l == 8 && len >= 8)); I'll queue it shortly, but perhaps you can try if assert(l <= len) is enough? Alternatively I can try applying the patch on top of the tree that we test with, and see how things go. Paolo > val = ldn_he_p(buf, l); > result |= memory_region_dispatch_write(mr, addr1, val, > size_memop(l), attrs); > @@ -2784,6 +2793,15 @@ MemTxResult flatview_read_continue(FlatView *fv, > hwaddr addr, > l = memory_access_size(mr, l, addr1); > result |= memory_region_dispatch_read(mr, addr1, , >size_memop(l), attrs); > + > +/* > + * Assure Coverity (and ourselves) that we are not going to > OVERRUN > + * the buffer by following stn_he_p(). > + */ > +assert((l == 1 && len >= 1) || > + (l == 2 && len >= 2) || > + (l == 4 && len >= 4) || > + (l == 8 && len >= 8)); > stn_he_p(buf, l, val); > } else { > /* RAM case */ > -- > 2.34.1 >
[PATCH v2 2/3] target/hexagon: fix some occurrences of -Wshadow=local
Of the changes in this commit, the changes in `HELPER(commit_hvx_stores)()` are less obvious. They are required because of some macro invocations like SCATTER_OP_WRITE_TO_MEM(). e.g.: In file included from ../target/hexagon/op_helper.c:31: ../target/hexagon/mmvec/macros.h:205:18: error: declaration of ‘i’ shadows a previous local [-Werror=shadow=compatible-local] 205 | for (int i = 0; i < sizeof(MMVector); i += sizeof(TYPE)) { \ | ^ ../target/hexagon/op_helper.c:157:17: note: in expansion of macro ‘SCATTER_OP_WRITE_TO_MEM’ 157 | SCATTER_OP_WRITE_TO_MEM(uint16_t); | ^~~ ../target/hexagon/op_helper.c:135:9: note: shadowed declaration is here 135 | int i; | ^ In file included from ../target/hexagon/op_helper.c:31: ../target/hexagon/mmvec/macros.h:204:19: error: declaration of ‘ra’ shadows a previous local [-Werror=shadow=compatible-local] 204 | uintptr_t ra = GETPC(); \ | ^~ ../target/hexagon/op_helper.c:160:17: note: in expansion of macro ‘SCATTER_OP_WRITE_TO_MEM’ 160 | SCATTER_OP_WRITE_TO_MEM(uint32_t); | ^~~ ../target/hexagon/op_helper.c:134:15: note: shadowed declaration is here 134 | uintptr_t ra = GETPC(); | ^~ Reviewed-by: Matheus Tavares Bernardino Signed-off-by: Brian Cain --- target/hexagon/imported/alu.idef | 6 +++--- target/hexagon/mmvec/macros.h| 6 +++--- target/hexagon/op_helper.c | 9 +++-- target/hexagon/translate.c | 9 - 4 files changed, 13 insertions(+), 17 deletions(-) diff --git a/target/hexagon/imported/alu.idef b/target/hexagon/imported/alu.idef index 12d2aac5d4..b855676989 100644 --- a/target/hexagon/imported/alu.idef +++ b/target/hexagon/imported/alu.idef @@ -1142,9 +1142,9 @@ Q6INSN(A4_cround_rr,"Rd32=cround(Rs32,Rt32)",ATTRIBS(),"Convergent Round", {RdV tmp128 = fSHIFTR128(tmp128, SHIFT);\ DST = fCAST16S_8S(tmp128);\ } else {\ -size16s_t rndbit_128 = fCAST8S_16S((1LL << (SHIFT - 1))); \ -size16s_t src_128 = fCAST8S_16S(SRC); \ -size16s_t tmp128 = fADD128(src_128, rndbit_128);\ +rndbit_128 = fCAST8S_16S((1LL << (SHIFT - 1))); \ +src_128 = fCAST8S_16S(SRC); \ +tmp128 = fADD128(src_128, rndbit_128);\ tmp128 = fSHIFTR128(tmp128, SHIFT);\ DST = fCAST16S_8S(tmp128);\ } diff --git a/target/hexagon/mmvec/macros.h b/target/hexagon/mmvec/macros.h index a655634fd1..728a63d35f 100644 --- a/target/hexagon/mmvec/macros.h +++ b/target/hexagon/mmvec/macros.h @@ -201,14 +201,14 @@ } while (0) #define SCATTER_OP_WRITE_TO_MEM(TYPE) \ do { \ -uintptr_t ra = GETPC(); \ +uintptr_t ra_ = GETPC(); \ for (int i = 0; i < sizeof(MMVector); i += sizeof(TYPE)) { \ if (test_bit(i, env->vtcm_log.mask)) { \ TYPE dst = 0; \ TYPE inc = 0; \ for (int j = 0; j < sizeof(TYPE); j++) { \ uint8_t val; \ -val = cpu_ldub_data_ra(env, env->vtcm_log.va[i + j], ra); \ +val = cpu_ldub_data_ra(env, env->vtcm_log.va[i + j], ra_); \ dst |= val << (8 * j); \ inc |= env->vtcm_log.data.ub[j + i] << (8 * j); \ clear_bit(j + i, env->vtcm_log.mask); \ @@ -217,7 +217,7 @@ dst += inc; \ for (int j = 0; j < sizeof(TYPE); j++) { \ cpu_stb_data_ra(env, env->vtcm_log.va[i + j], \ -(dst >> (8 * j)) & 0xFF, ra); \ +(dst >> (8 * j)) & 0xFF, ra_); \ } \ } \ } \ diff --git a/target/hexagon/op_helper.c b/target/hexagon/op_helper.c index 8ca3976a65..da10ac5847 100644 --- a/target/hexagon/op_helper.c +++ b/target/hexagon/op_helper.c @@ -132,10 +132,9 @@ void HELPER(gather_store)(CPUHexagonState *env, uint32_t addr, int slot) void HELPER(commit_hvx_stores)(CPUHexagonState *env) { uintptr_t ra = GETPC(); -int i; /* Normal (possibly masked) vector store */ -for (i = 0; i < VSTORES_MAX; i++) { +for (int i = 0; i < VSTORES_MAX; i++) { if (env->vstore_pending[i]) { env->vstore_pending[i] = 0; target_ulong va = env->vstore[i].va; @@ -162,7 +161,7 @@ void HELPER(commit_hvx_stores)(CPUHexagonState *env) g_assert_not_reached(); } } else { -for (i = 0; i < sizeof(MMVector); i++) { +for (int i = 0; i < sizeof(MMVector); i++) { if (test_bit(i, env->vtcm_log.mask)) { cpu_stb_data_ra(env, env->vtcm_log.va[i],
[PATCH v2 1/3] target/hexagon: move GETPC() calls to top level helpers
From: Matheus Tavares Bernardino As docs/devel/loads-stores.rst states: ``GETPC()`` should be used with great care: calling it in other functions that are *not* the top level ``HELPER(foo)`` will cause unexpected behavior. Instead, the value of ``GETPC()`` should be read from the helper and passed if needed to the functions that the helper calls. Let's fix the GETPC() usage in Hexagon, making sure it's always called from top level helpers and passed down to the places where it's needed. There are a few snippets where that is not currently the case: - probe_store(), which is only called from two helpers, so it's easy to move GETPC() up. - mem_load*() functions, which are also called directly from helpers, but through the MEM_LOAD*() set of macros. Note that this are only used when compiling with --disable-hexagon-idef-parser. In this case, we also take this opportunity to simplify the code, unifying the mem_load*() functions. - HELPER(probe_hvx_stores), when called from another helper, ends up using its own GETPC() expansion instead of the top level caller. Signed-off-by: Matheus Tavares Bernardino Reviewed-by: Taylor Simpson Message-Id: <2c74c3696946edba7cc5b2942cf296a5af532052.1689070412.git.quic_mathb...@quicinc.com>-ne Reviewed-by: Brian Cain Signed-off-by: Brian Cain --- target/hexagon/macros.h| 19 +- target/hexagon/op_helper.c | 75 +++--- target/hexagon/op_helper.h | 9 - 3 files changed, 38 insertions(+), 65 deletions(-) diff --git a/target/hexagon/macros.h b/target/hexagon/macros.h index 5451b061ee..dafa0df6ed 100644 --- a/target/hexagon/macros.h +++ b/target/hexagon/macros.h @@ -173,15 +173,6 @@ #define MEM_STORE8(VA, DATA, SLOT) \ MEM_STORE8_FUNC(DATA)(cpu_env, VA, DATA, SLOT) #else -#define MEM_LOAD1s(VA) ((int8_t)mem_load1(env, pkt_has_store_s1, slot, VA)) -#define MEM_LOAD1u(VA) ((uint8_t)mem_load1(env, pkt_has_store_s1, slot, VA)) -#define MEM_LOAD2s(VA) ((int16_t)mem_load2(env, pkt_has_store_s1, slot, VA)) -#define MEM_LOAD2u(VA) ((uint16_t)mem_load2(env, pkt_has_store_s1, slot, VA)) -#define MEM_LOAD4s(VA) ((int32_t)mem_load4(env, pkt_has_store_s1, slot, VA)) -#define MEM_LOAD4u(VA) ((uint32_t)mem_load4(env, pkt_has_store_s1, slot, VA)) -#define MEM_LOAD8s(VA) ((int64_t)mem_load8(env, pkt_has_store_s1, slot, VA)) -#define MEM_LOAD8u(VA) ((uint64_t)mem_load8(env, pkt_has_store_s1, slot, VA)) - #define MEM_STORE1(VA, DATA, SLOT) log_store32(env, VA, DATA, 1, SLOT) #define MEM_STORE2(VA, DATA, SLOT) log_store32(env, VA, DATA, 2, SLOT) #define MEM_STORE4(VA, DATA, SLOT) log_store32(env, VA, DATA, 4, SLOT) @@ -530,8 +521,16 @@ static inline TCGv gen_read_ireg(TCGv result, TCGv val, int shift) #ifdef QEMU_GENERATE #define fLOAD(NUM, SIZE, SIGN, EA, DST) MEM_LOAD##SIZE##SIGN(DST, EA) #else +#define MEM_LOAD1 cpu_ldub_data_ra +#define MEM_LOAD2 cpu_lduw_data_ra +#define MEM_LOAD4 cpu_ldl_data_ra +#define MEM_LOAD8 cpu_ldq_data_ra + #define fLOAD(NUM, SIZE, SIGN, EA, DST) \ -DST = (size##SIZE##SIGN##_t)MEM_LOAD##SIZE##SIGN(EA) +do { \ +check_noshuf(env, pkt_has_store_s1, slot, EA, SIZE, GETPC()); \ +DST = (size##SIZE##SIGN##_t)MEM_LOAD##SIZE(env, EA, GETPC()); \ +} while (0) #endif #define fMEMOP(NUM, SIZE, SIGN, EA, FNTYPE, VALUE) diff --git a/target/hexagon/op_helper.c b/target/hexagon/op_helper.c index 12967ac21e..8ca3976a65 100644 --- a/target/hexagon/op_helper.c +++ b/target/hexagon/op_helper.c @@ -95,9 +95,8 @@ void HELPER(debug_check_store_width)(CPUHexagonState *env, int slot, int check) } } -void HELPER(commit_store)(CPUHexagonState *env, int slot_num) +static void commit_store(CPUHexagonState *env, int slot_num, uintptr_t ra) { -uintptr_t ra = GETPC(); uint8_t width = env->mem_log_stores[slot_num].width; target_ulong va = env->mem_log_stores[slot_num].va; @@ -119,6 +118,12 @@ void HELPER(commit_store)(CPUHexagonState *env, int slot_num) } } +void HELPER(commit_store)(CPUHexagonState *env, int slot_num) +{ +uintptr_t ra = GETPC(); +commit_store(env, slot_num, ra); +} + void HELPER(gather_store)(CPUHexagonState *env, uint32_t addr, int slot) { mem_gather_store(env, addr, slot); @@ -467,13 +472,12 @@ int32_t HELPER(cabacdecbin_pred)(int64_t RssV, int64_t RttV) } static void probe_store(CPUHexagonState *env, int slot, int mmu_idx, -bool is_predicated) +bool is_predicated, uintptr_t retaddr) { if (!is_predicated || !(env->slot_cancelled & (1 << slot))) { size1u_t width = env->mem_log_stores[slot].width; target_ulong va = env->mem_log_stores[slot].va; -uintptr_t ra = GETPC(); -probe_write(env, va, width, mmu_idx, ra); +probe_write(env, va, width, mmu_idx, retaddr); } } @@ -494,12 +498,13 @@ void HELPER(probe_pkt_scalar_store_s0)(CPUHexagonState *env, int args) int mmu_idx = FIELD_EX32(args,
[PATCH v2 3/3] target/hexagon: avoid shadowing globals
The typedef `vaddr` is shadowed by `vaddr` identifiers, so we rename the identifiers to avoid shadowing the type name. The global `cpu_env` is shadowed by local `cpu_env` arguments, so we rename the function arguments to avoid shadowing the global. Signed-off-by: Brian Cain --- target/hexagon/genptr.c | 56 - target/hexagon/genptr.h | 18 target/hexagon/mmvec/system_ext_mmvec.c | 4 +- target/hexagon/mmvec/system_ext_mmvec.h | 2 +- target/hexagon/op_helper.c | 4 +- 5 files changed, 42 insertions(+), 42 deletions(-) diff --git a/target/hexagon/genptr.c b/target/hexagon/genptr.c index 217bc7bb5a..11377ac92b 100644 --- a/target/hexagon/genptr.c +++ b/target/hexagon/genptr.c @@ -334,28 +334,28 @@ void gen_set_byte_i64(int N, TCGv_i64 result, TCGv src) tcg_gen_deposit_i64(result, result, src64, N * 8, 8); } -static inline void gen_load_locked4u(TCGv dest, TCGv vaddr, int mem_index) +static inline void gen_load_locked4u(TCGv dest, TCGv v_addr, int mem_index) { -tcg_gen_qemu_ld_tl(dest, vaddr, mem_index, MO_TEUL); -tcg_gen_mov_tl(hex_llsc_addr, vaddr); +tcg_gen_qemu_ld_tl(dest, v_addr, mem_index, MO_TEUL); +tcg_gen_mov_tl(hex_llsc_addr, v_addr); tcg_gen_mov_tl(hex_llsc_val, dest); } -static inline void gen_load_locked8u(TCGv_i64 dest, TCGv vaddr, int mem_index) +static inline void gen_load_locked8u(TCGv_i64 dest, TCGv v_addr, int mem_index) { -tcg_gen_qemu_ld_i64(dest, vaddr, mem_index, MO_TEUQ); -tcg_gen_mov_tl(hex_llsc_addr, vaddr); +tcg_gen_qemu_ld_i64(dest, v_addr, mem_index, MO_TEUQ); +tcg_gen_mov_tl(hex_llsc_addr, v_addr); tcg_gen_mov_i64(hex_llsc_val_i64, dest); } static inline void gen_store_conditional4(DisasContext *ctx, - TCGv pred, TCGv vaddr, TCGv src) + TCGv pred, TCGv v_addr, TCGv src) { TCGLabel *fail = gen_new_label(); TCGLabel *done = gen_new_label(); TCGv one, zero, tmp; -tcg_gen_brcond_tl(TCG_COND_NE, vaddr, hex_llsc_addr, fail); +tcg_gen_brcond_tl(TCG_COND_NE, v_addr, hex_llsc_addr, fail); one = tcg_constant_tl(0xff); zero = tcg_constant_tl(0); @@ -374,13 +374,13 @@ static inline void gen_store_conditional4(DisasContext *ctx, } static inline void gen_store_conditional8(DisasContext *ctx, - TCGv pred, TCGv vaddr, TCGv_i64 src) + TCGv pred, TCGv v_addr, TCGv_i64 src) { TCGLabel *fail = gen_new_label(); TCGLabel *done = gen_new_label(); TCGv_i64 one, zero, tmp; -tcg_gen_brcond_tl(TCG_COND_NE, vaddr, hex_llsc_addr, fail); +tcg_gen_brcond_tl(TCG_COND_NE, v_addr, hex_llsc_addr, fail); one = tcg_constant_i64(0xff); zero = tcg_constant_i64(0); @@ -407,57 +407,57 @@ static TCGv gen_slotval(DisasContext *ctx) } #endif -void gen_store32(TCGv vaddr, TCGv src, int width, uint32_t slot) +void gen_store32(TCGv v_addr, TCGv src, int width, uint32_t slot) { -tcg_gen_mov_tl(hex_store_addr[slot], vaddr); +tcg_gen_mov_tl(hex_store_addr[slot], v_addr); tcg_gen_movi_tl(hex_store_width[slot], width); tcg_gen_mov_tl(hex_store_val32[slot], src); } -void gen_store1(TCGv_env cpu_env, TCGv vaddr, TCGv src, uint32_t slot) +void gen_store1(TCGv_env cpu_env_, TCGv v_addr, TCGv src, uint32_t slot) { -gen_store32(vaddr, src, 1, slot); +gen_store32(v_addr, src, 1, slot); } -void gen_store1i(TCGv_env cpu_env, TCGv vaddr, int32_t src, uint32_t slot) +void gen_store1i(TCGv_env cpu_env_, TCGv v_addr, int32_t src, uint32_t slot) { TCGv tmp = tcg_constant_tl(src); -gen_store1(cpu_env, vaddr, tmp, slot); +gen_store1(cpu_env_, v_addr, tmp, slot); } -void gen_store2(TCGv_env cpu_env, TCGv vaddr, TCGv src, uint32_t slot) +void gen_store2(TCGv_env cpu_env_, TCGv v_addr, TCGv src, uint32_t slot) { -gen_store32(vaddr, src, 2, slot); +gen_store32(v_addr, src, 2, slot); } -void gen_store2i(TCGv_env cpu_env, TCGv vaddr, int32_t src, uint32_t slot) +void gen_store2i(TCGv_env cpu_env_, TCGv v_addr, int32_t src, uint32_t slot) { TCGv tmp = tcg_constant_tl(src); -gen_store2(cpu_env, vaddr, tmp, slot); +gen_store2(cpu_env_, v_addr, tmp, slot); } -void gen_store4(TCGv_env cpu_env, TCGv vaddr, TCGv src, uint32_t slot) +void gen_store4(TCGv_env cpu_env_, TCGv v_addr, TCGv src, uint32_t slot) { -gen_store32(vaddr, src, 4, slot); +gen_store32(v_addr, src, 4, slot); } -void gen_store4i(TCGv_env cpu_env, TCGv vaddr, int32_t src, uint32_t slot) +void gen_store4i(TCGv_env cpu_env_, TCGv v_addr, int32_t src, uint32_t slot) { TCGv tmp = tcg_constant_tl(src); -gen_store4(cpu_env, vaddr, tmp, slot); +gen_store4(cpu_env_, v_addr, tmp, slot); } -void gen_store8(TCGv_env cpu_env, TCGv vaddr, TCGv_i64 src, uint32_t slot) +void gen_store8(TCGv_env cpu_env_, TCGv
[PATCH v2 0/3] hexagon: GETPC() and shadowing fixes
In v2: reworked with suggestions from Philippe and added a new patch to cover -Wshadow=global. Brian Cain (2): target/hexagon: fix some occurrences of -Wshadow=local target/hexagon: avoid shadowing globals Matheus Tavares Bernardino (1): target/hexagon: move GETPC() calls to top level helpers target/hexagon/genptr.c | 56 - target/hexagon/genptr.h | 18 +++--- target/hexagon/imported/alu.idef| 6 +- target/hexagon/macros.h | 19 +++--- target/hexagon/mmvec/macros.h | 6 +- target/hexagon/mmvec/system_ext_mmvec.c | 4 +- target/hexagon/mmvec/system_ext_mmvec.h | 2 +- target/hexagon/op_helper.c | 84 ++--- target/hexagon/op_helper.h | 9 --- target/hexagon/translate.c | 9 ++- 10 files changed, 91 insertions(+), 122 deletions(-) -- 2.25.1
[PATCH 3/3] hw/ppc: Add emulation of AmigaOne XE board
The AmigaOne is a rebranded MAI Teron board that uses U-Boot firmware with patches to support AmigaOS and is very similar to pegasos2 so can be easily emulated sharing most code with pegasos2. The reason to emulate it is that AmigaOS comes in different versions for AmigaOne and PegasosII which only have drivers for one machine and firmware so these only run on the specific machine. Adding this board allows another AmigaOS version to be used reusing already existing peagasos2 emulation. (The AmigaOne was the first of these boards so likely most widespread which then inspired Pegasos that was later replaced with PegasosII due to problems with Articia S, so these have a lot of similarity. Pegasos mainly ran MorphOS while the PegasosII version of AmigaOS was added later and therefore less common than the AmigaOne version.) Signed-off-by: BALATON Zoltan --- MAINTAINERS | 8 ++ configs/devices/ppc-softmmu/default.mak | 1 + hw/ppc/Kconfig | 7 + hw/ppc/amigaone.c | 164 hw/ppc/meson.build | 2 + 5 files changed, 182 insertions(+) create mode 100644 hw/ppc/amigaone.c diff --git a/MAINTAINERS b/MAINTAINERS index 7f0e20fde6..03f908c153 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -1490,6 +1490,14 @@ F: hw/pci-host/mv64361.c F: hw/pci-host/mv643xx.h F: include/hw/pci-host/mv64361.h +amigaone +M: BALATON Zoltan +L: qemu-...@nongnu.org +S: Maintained +F: hw/ppc/amigaone.c +F: hw/pci-host/articia.c +F: include/hw/pci-host/articia.h + Virtual Open Firmware (VOF) M: Alexey Kardashevskiy R: David Gibson diff --git a/configs/devices/ppc-softmmu/default.mak b/configs/devices/ppc-softmmu/default.mak index a887f5438b..b85fd2bcd7 100644 --- a/configs/devices/ppc-softmmu/default.mak +++ b/configs/devices/ppc-softmmu/default.mak @@ -14,6 +14,7 @@ CONFIG_SAM460EX=y CONFIG_MAC_OLDWORLD=y CONFIG_MAC_NEWWORLD=y +CONFIG_AMIGAONE=y CONFIG_PEGASOS2=y # For PReP diff --git a/hw/ppc/Kconfig b/hw/ppc/Kconfig index 5dfbf47ef5..56f0475a8e 100644 --- a/hw/ppc/Kconfig +++ b/hw/ppc/Kconfig @@ -69,6 +69,13 @@ config SAM460EX select USB_OHCI select FDT_PPC +config AMIGAONE +bool +imply ATI_VGA +select ARTICIA +select VT82C686 +select SMBUS_EEPROM + config PEGASOS2 bool imply ATI_VGA diff --git a/hw/ppc/amigaone.c b/hw/ppc/amigaone.c new file mode 100644 index 00..3589924c8a --- /dev/null +++ b/hw/ppc/amigaone.c @@ -0,0 +1,164 @@ +/* + * QEMU Eyetech AmigaOne/Mai Logic Teron emulation + * + * Copyright (c) 2023 BALATON Zoltan + * + * This work is licensed under the GNU GPL license version 2 or later. + * + */ + +#include "qemu/osdep.h" +#include "qemu/units.h" +#include "qemu/datadir.h" +#include "qemu/log.h" +#include "qemu/error-report.h" +#include "qapi/error.h" +#include "hw/ppc/ppc.h" +#include "hw/boards.h" +#include "hw/loader.h" +#include "hw/pci-host/articia.h" +#include "hw/isa/vt82c686.h" +#include "hw/ide/pci.h" +#include "hw/i2c/smbus_eeprom.h" +#include "hw/ppc/ppc.h" +#include "sysemu/reset.h" +#include "kvm_ppc.h" + +#define BUS_FREQ_HZ 1 + +/* + * Firmware binary available at + * https://www.hyperion-entertainment.com/index.php/downloads?view=files=28 + * then "tail -c 524288 updater.image >u-boot-amigaone.bin" + * + * BIOS emulator in firmware cannot run QEMU vgabios and hangs on it, use + * -device VGA,romfile=VGABIOS-lgpl-latest.bin + * from http://www.nongnu.org/vgabios/ instead. + */ +#define PROM_FILENAME "u-boot-amigaone.bin" +#define PROM_ADDR 0xfff0 +#define PROM_SIZE (512 * KiB) + +static void amigaone_cpu_reset(void *opaque) +{ +PowerPCCPU *cpu = opaque; + +cpu_reset(CPU(cpu)); +cpu_ppc_tb_reset(>env); +} + +static void fix_spd_data(uint8_t *spd) +{ +uint32_t bank_size = 4 * MiB * spd[31]; +uint32_t rows = bank_size / spd[13] / spd[17]; +spd[3] = ctz32(rows) - spd[4]; +} + +static void amigaone_init(MachineState *machine) +{ +PowerPCCPU *cpu; +CPUPPCState *env; +MemoryRegion *rom, *pci_mem, *mr; +const char *fwname = machine->firmware ?: PROM_FILENAME; +char *filename; +ssize_t sz; +PCIBus *pci_bus; +Object *via; +DeviceState *dev; +I2CBus *i2c_bus; +uint8_t *spd_data; +int i; + +/* init CPU */ +cpu = POWERPC_CPU(cpu_create(machine->cpu_type)); +env = >env; +if (PPC_INPUT(env) != PPC_FLAGS_INPUT_6xx) { +error_report("Incompatible CPU, only 6xx bus supported"); +exit(1); +} +cpu_ppc_tb_init(env, BUS_FREQ_HZ / 4); +qemu_register_reset(amigaone_cpu_reset, cpu); + +/* RAM */ +if (machine->ram_size > 2 * GiB) { +error_report("RAM size more than 2 GiB is not supported"); +exit(1); +} +memory_region_add_subregion(get_system_memory(), 0, machine->ram); +if (machine->ram_size < 1 * GiB + 32 * KiB) { +/* Firmware uses this area for startup */ +mr =
[PATCH 0/3] Add emulation of AmigaOne XE board
This small series adds amigaone PPC machine which can be emulated mostly reusing existing models used by pegasos2 as these machines are very similar. The reason to add another board is that AmigaOS has different versions for different machines that only run on that machine and the AmigaOne version is more common than the PegasosII one. Also it's useful for debugging to be able to test with both machines as problems in emulation can be identified if they occur on both machines as opposed to problems specific only to one version. Since this only needs a very minimal north bridge emulation and a simple board code with everything else shared with pegasos2 it is not a big maintenance burden. The board uses a modofied U-Boot that is needed to boot AmigaOS which is freely available and distributable under GPL (see comment in hw/ppc/amigaone.c added in patch 3 for details on how to get firmware binary) but the sources for it could not be recovered so I could not reproduce it from source. Ideally this firmware should be included here for convenience which should be OK due to its GPL licence but since there's only a binary I did not include it here. Let me know if that would be OK to include but even without that firmware adding the machine is useful as users can get the rom binary separately which they are used to as pegasos2 also needed a firmware image to boot AmigaOS until recently. I hope this could be merged for 8.2 with this series being the minimum set of patches needed to add this machine. This was tested by a Few people already and can run AmigaOS and Linux distro for AmigaOne well. Regards, BALATON Zoltan BALATON Zoltan (3): via-ide: Fix legacy mode emulation hw/pci-host: Add emulation of Mai Logic Articia S hw/ppc: Add emulation of AmigaOne XE board MAINTAINERS | 8 + configs/devices/ppc-softmmu/default.mak | 1 + hw/ide/via.c| 35 +++- hw/pci-host/Kconfig | 5 + hw/pci-host/articia.c | 266 hw/pci-host/meson.build | 2 + hw/ppc/Kconfig | 7 + hw/ppc/amigaone.c | 164 +++ hw/ppc/meson.build | 2 + include/hw/pci-host/articia.h | 17 ++ 10 files changed, 502 insertions(+), 5 deletions(-) create mode 100644 hw/pci-host/articia.c create mode 100644 hw/ppc/amigaone.c create mode 100644 include/hw/pci-host/articia.h -- 2.30.9
[PATCH 2/3] hw/pci-host: Add emulation of Mai Logic Articia S
The Articia S is a generic chipset supporting several different CPUs that were used on some PPC boards. This is a minimal emulation of the parts needed for emulating the AmigaOne board. Signed-off-by: BALATON Zoltan --- hw/pci-host/Kconfig | 5 + hw/pci-host/articia.c | 266 ++ hw/pci-host/meson.build | 2 + include/hw/pci-host/articia.h | 17 +++ 4 files changed, 290 insertions(+) create mode 100644 hw/pci-host/articia.c create mode 100644 include/hw/pci-host/articia.h diff --git a/hw/pci-host/Kconfig b/hw/pci-host/Kconfig index a07070eddf..33014c80a4 100644 --- a/hw/pci-host/Kconfig +++ b/hw/pci-host/Kconfig @@ -73,6 +73,11 @@ config SH_PCI bool select PCI +config ARTICIA +bool +select PCI +select I8259 + config MV64361 bool select PCI diff --git a/hw/pci-host/articia.c b/hw/pci-host/articia.c new file mode 100644 index 00..80558e1c47 --- /dev/null +++ b/hw/pci-host/articia.c @@ -0,0 +1,266 @@ +/* + * Mai Logic Articia S emulation + * + * Copyright (c) 2023 BALATON Zoltan + * + * This work is licensed under the GNU GPL license version 2 or later. + * + */ + +#include "qemu/osdep.h" +#include "qemu/log.h" +#include "qapi/error.h" +#include "hw/pci/pci_device.h" +#include "hw/pci/pci_host.h" +#include "hw/irq.h" +#include "hw/i2c/bitbang_i2c.h" +#include "hw/intc/i8259.h" +#include "hw/pci-host/articia.h" + +OBJECT_DECLARE_SIMPLE_TYPE(ArticiaState, ARTICIA) + +OBJECT_DECLARE_SIMPLE_TYPE(ArticiaHostState, ARTICIA_PCI_HOST) +struct ArticiaHostState { +PCIDevice parent_obj; + +ArticiaState *as; +}; + +/* TYPE_ARTICIA */ + +struct ArticiaState { +PCIHostState parent_obj; + +qemu_irq irq[PCI_NUM_PINS]; +MemoryRegion io; +MemoryRegion mem; +MemoryRegion reg; + +bitbang_i2c_interface smbus; +uint32_t gpio; /* bits 0-7 in, 8-15 out, 16-23 direction (0 in, 1 out) */ +hwaddr gpio_base; +MemoryRegion gpio_reg; +}; + +static uint64_t articia_gpio_read(void *opaque, hwaddr addr, unsigned int size) +{ +ArticiaState *s = opaque; + +return (s->gpio >> (addr * 8)) & 0xff; +} + +static void articia_gpio_write(void *opaque, hwaddr addr, uint64_t val, + unsigned int size) +{ +ArticiaState *s = opaque; +uint32_t sh = addr * 8; + +if (addr == 0) { +/* in bits read only? */ +return; +} + +if ((s->gpio & (0xff << sh)) != (val & 0xff) << sh) { +s->gpio &= ~(0xff << sh | 0xff); +s->gpio |= (val & 0xff) << sh; +s->gpio |= bitbang_i2c_set(>smbus, BITBANG_I2C_SDA, + s->gpio & BIT(16) ? + !!(s->gpio & BIT(8)) : 1); +if ((s->gpio & BIT(17))) { +s->gpio &= ~BIT(0); +s->gpio |= bitbang_i2c_set(>smbus, BITBANG_I2C_SCL, + !!(s->gpio & BIT(9))); +} +} +} + +static const MemoryRegionOps articia_gpio_ops = { +.read = articia_gpio_read, +.write = articia_gpio_write, +.valid.min_access_size = 1, +.valid.max_access_size = 1, +.endianness = DEVICE_LITTLE_ENDIAN, +}; + +static uint64_t articia_reg_read(void *opaque, hwaddr addr, unsigned int size) +{ +ArticiaState *s = opaque; +uint64_t ret = UINT_MAX; + +switch (addr) { +case 0xc00cf8: +ret = pci_host_conf_le_ops.read(PCI_HOST_BRIDGE(s), 0, size); +break; +case 0xe00cfc ... 0xe00cff: +ret = pci_host_data_le_ops.read(PCI_HOST_BRIDGE(s), addr - 0xe00cfc, size); +break; +case 0xf0: +ret = pic_read_irq(isa_pic); +break; +default: +qemu_log_mask(LOG_UNIMP, "%s: Unimplemented register read 0x%" + HWADDR_PRIx " %d\n", __func__, addr, size); +break; +} +return ret; +} + +static void articia_reg_write(void *opaque, hwaddr addr, uint64_t val, + unsigned int size) +{ +ArticiaState *s = opaque; + +switch (addr) { +case 0xc00cf8: +pci_host_conf_le_ops.write(PCI_HOST_BRIDGE(s), 0, val, size); +break; +case 0xe00cfc ... 0xe00cff: +pci_host_data_le_ops.write(PCI_HOST_BRIDGE(s), addr, val, size); +break; +default: +qemu_log_mask(LOG_UNIMP, "%s: Unimplemented register write 0x%" + HWADDR_PRIx " %d <- %"PRIx64"\n", __func__, addr, size, val); +break; +} +} + +static const MemoryRegionOps articia_reg_ops = { +.read = articia_reg_read, +.write = articia_reg_write, +.valid.min_access_size = 1, +.valid.max_access_size = 4, +.endianness = DEVICE_LITTLE_ENDIAN, +}; + +static void articia_pcihost_set_irq(void *opaque, int n, int level) +{ +ArticiaState *s = opaque; +qemu_set_irq(s->irq[n], level); +} + +static void articia_realize(DeviceState *dev, Error **errp) +{ +ArticiaState *s = ARTICIA(dev); +PCIHostState *h =
[PATCH 1/3] via-ide: Fix legacy mode emulation
The initial value for BARs were set in reset method for emulating legacy mode at start but this does not work because PCI code resets BARs after calling device reset method. Additionally the values written to BARs were also wrong. Move setting the BARs to a callback on writing the PCI config regsiter that sets the compatibility mode (which firmwares needing this mode seem to do) and fix their values to program it to use legacy port numbers. As noted in a comment, we only do this when the BARs were unset before, because logs from real machine show this is how real chip works, even if it contradicts the data sheet which is not very clear about this. Signed-off-by: BALATON Zoltan --- hw/ide/via.c | 35 ++- 1 file changed, 30 insertions(+), 5 deletions(-) diff --git a/hw/ide/via.c b/hw/ide/via.c index fff23803a6..8186190207 100644 --- a/hw/ide/via.c +++ b/hw/ide/via.c @@ -132,11 +132,6 @@ static void via_ide_reset(DeviceState *dev) pci_set_word(pci_conf + PCI_STATUS, PCI_STATUS_FAST_BACK | PCI_STATUS_DEVSEL_MEDIUM); -pci_set_long(pci_conf + PCI_BASE_ADDRESS_0, 0x01f0); -pci_set_long(pci_conf + PCI_BASE_ADDRESS_1, 0x03f4); -pci_set_long(pci_conf + PCI_BASE_ADDRESS_2, 0x0170); -pci_set_long(pci_conf + PCI_BASE_ADDRESS_3, 0x0374); -pci_set_long(pci_conf + PCI_BASE_ADDRESS_4, 0xcc01); /* BMIBA: 20-23h */ pci_set_long(pci_conf + PCI_INTERRUPT_LINE, 0x010e); /* IDE chip enable, IDE configuration 1/2, IDE FIFO Configuration*/ @@ -159,6 +154,35 @@ static void via_ide_reset(DeviceState *dev) pci_set_long(pci_conf + 0xc0, 0x00020001); } +static void via_ide_cfg_write(PCIDevice *pd, uint32_t addr, + uint32_t val, int len) +{ +pci_default_write_config(pd, addr, val, len); +/* + * Only set BARs if they are unset. Logs from real hardware show that + * writing class_prog to enable compatibility mode after BARs were set + * (possibly by firmware) it will use ports set by BARs not ISA ports + * (e.g. pegasos2 Linux does this and calls it non-100% native mode). + * But if 0x8a is written after reset without setting BARs then we want + * legacy ports (this is done by AmigaOne firmware for example). + */ +if (addr == PCI_CLASS_PROG && val == 0x8a && +pci_get_long(pd->config + PCI_BASE_ADDRESS_0) == +PCI_BASE_ADDRESS_SPACE_IO) { +pci_set_long(pd->config + PCI_BASE_ADDRESS_0, 0x1f0 + | PCI_BASE_ADDRESS_SPACE_IO); +pci_set_long(pd->config + PCI_BASE_ADDRESS_1, 0x3f6 + | PCI_BASE_ADDRESS_SPACE_IO); +pci_set_long(pd->config + PCI_BASE_ADDRESS_2, 0x170 + | PCI_BASE_ADDRESS_SPACE_IO); +pci_set_long(pd->config + PCI_BASE_ADDRESS_3, 0x376 + | PCI_BASE_ADDRESS_SPACE_IO); +/* BMIBA: 20-23h */ +pci_set_long(pd->config + PCI_BASE_ADDRESS_4, 0xcc00 + | PCI_BASE_ADDRESS_SPACE_IO); +} +} + static void via_ide_realize(PCIDevice *dev, Error **errp) { PCIIDEState *d = PCI_IDE(dev); @@ -221,6 +245,7 @@ static void via_ide_class_init(ObjectClass *klass, void *data) /* Reason: only works as function of VIA southbridge */ dc->user_creatable = false; +k->config_write = via_ide_cfg_write; k->realize = via_ide_realize; k->exit = via_ide_exitfn; k->vendor_id = PCI_VENDOR_ID_VIA; -- 2.30.9
Re: [PATCH v3 10/10] tests/migration-test: Add a test for postcopy hangs during RECOVER
Peter Xu writes: > On Thu, Oct 05, 2023 at 06:10:20PM -0300, Fabiano Rosas wrote: >> Peter Xu writes: >> >> > On Thu, Oct 05, 2023 at 10:37:56AM -0300, Fabiano Rosas wrote: >> >> >> +/* >> >> >> + * Make sure both QEMU instances will go into RECOVER stage, then >> >> >> test >> >> >> + * kicking them out using migrate-pause. >> >> >> + */ >> >> >> +wait_for_postcopy_status(from, "postcopy-recover"); >> >> >> +wait_for_postcopy_status(to, "postcopy-recover"); >> >> > >> >> > Is this wait out of place? I think we're trying to resume too fast after >> >> > migrate_recover(): >> >> > >> >> > # { >> >> > # "error": { >> >> > # "class": "GenericError", >> >> > # "desc": "Cannot resume if there is no paused migration" >> >> > # } >> >> > # } >> >> > >> >> >> >> Ugh, sorry about the long lines: >> >> >> >> { >> >> "error": { >> >> "class": "GenericError", >> >> "desc": "Cannot resume if there is no paused migration" >> >> } >> >> } >> > >> > Sorry I didn't get you here. Could you elaborate your question? >> > >> >> The test is sometimes failing with the above message. >> >> But indeed my question doesn't make sense. I forgot migrate_recover >> happens on the destination. Nevermind. >> >> The bug is still present nonetheless. We're going into migrate_prepare >> in some state other than POSTCOPY_PAUSED. > > Oh I see. Interestingly I cannot reproduce on my host, just like last > time.. > > What is your setup for running the test? Anything special? Here's my > cmdline: The crudest oneliner: for i in $(seq 1 ); do echo "$i ="; \ QTEST_QEMU_BINARY=./qemu-system-x86_64 \ ./tests/qtest/migration-test -r /x86_64/migration/postcopy/recovery || break ; done I suspect my system has something specific to it that affects the timing of the tests. But I have no idea what it could be. $ lscpu Architecture:x86_64 CPU op-mode(s):32-bit, 64-bit Address sizes: 39 bits physical, 48 bits virtual Byte Order:Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Vendor ID: GenuineIntel Model name:11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz CPU family: 6 Model: 141 Thread(s) per core: 2 Core(s) per socket: 8 Socket(s): 1 Stepping:1 CPU max MHz: 4800. CPU min MHz: 800. BogoMIPS:4992.00 > > $ cat reproduce.sh > index=$1 > loop=0 > > while :; do > echo "Starting loop=$loop..." > QTEST_QEMU_BINARY=./qemu-system-x86_64 ./tests/qtest/migration-test > -p /x86_64/migration/postcopy/recovery/double-failures > if [[ $? != 0 ]]; then > echo "index $index REPRODUCED (loop=$loop) !" > break > fi > loop=$(( loop + 1 )) > done > > Survives 200+ loops and kept going. > > However I think I saw what's wrong here, could you help try below fixup? > Sure. I won't get to it until tomorrow though.
Re: [PATCH v3 10/10] tests/migration-test: Add a test for postcopy hangs during RECOVER
On Thu, Oct 05, 2023 at 06:10:20PM -0300, Fabiano Rosas wrote: > Peter Xu writes: > > > On Thu, Oct 05, 2023 at 10:37:56AM -0300, Fabiano Rosas wrote: > >> >> +/* > >> >> + * Make sure both QEMU instances will go into RECOVER stage, then > >> >> test > >> >> + * kicking them out using migrate-pause. > >> >> + */ > >> >> +wait_for_postcopy_status(from, "postcopy-recover"); > >> >> +wait_for_postcopy_status(to, "postcopy-recover"); > >> > > >> > Is this wait out of place? I think we're trying to resume too fast after > >> > migrate_recover(): > >> > > >> > # { > >> > # "error": { > >> > # "class": "GenericError", > >> > # "desc": "Cannot resume if there is no paused migration" > >> > # } > >> > # } > >> > > >> > >> Ugh, sorry about the long lines: > >> > >> { > >> "error": { > >> "class": "GenericError", > >> "desc": "Cannot resume if there is no paused migration" > >> } > >> } > > > > Sorry I didn't get you here. Could you elaborate your question? > > > > The test is sometimes failing with the above message. > > But indeed my question doesn't make sense. I forgot migrate_recover > happens on the destination. Nevermind. > > The bug is still present nonetheless. We're going into migrate_prepare > in some state other than POSTCOPY_PAUSED. Oh I see. Interestingly I cannot reproduce on my host, just like last time.. What is your setup for running the test? Anything special? Here's my cmdline: $ cat reproduce.sh index=$1 loop=0 while :; do echo "Starting loop=$loop..." QTEST_QEMU_BINARY=./qemu-system-x86_64 ./tests/qtest/migration-test -p /x86_64/migration/postcopy/recovery/double-failures if [[ $? != 0 ]]; then echo "index $index REPRODUCED (loop=$loop) !" break fi loop=$(( loop + 1 )) done Survives 200+ loops and kept going. However I think I saw what's wrong here, could you help try below fixup? Thanks, ===8<=== >From 52bd2cd5ddf472e0bb99789dba3660a626382630 Mon Sep 17 00:00:00 2001 From: Peter Xu Date: Thu, 5 Oct 2023 17:38:42 -0400 Subject: [PATCH] fixup! tests/migration-test: Add a test for postcopy hangs during RECOVER Signed-off-by: Peter Xu --- tests/qtest/migration-test.c | 8 ++-- 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c index fb7a3765e4..1bdae0a579 100644 --- a/tests/qtest/migration-test.c +++ b/tests/qtest/migration-test.c @@ -1489,9 +1489,8 @@ static void test_postcopy_recovery_common(MigrateCommon *args) * migrate-recover command can only succeed if destination machine * is in the paused state */ -wait_for_migration_status(to, "postcopy-paused", - (const char * []) { "failed", "active", - "completed", NULL }); +wait_for_postcopy_status(to, "postcopy-paused"); +wait_for_postcopy_status(from, "postcopy-paused"); if (args->postcopy_recovery_test_fail) { /* @@ -1514,9 +1513,6 @@ static void test_postcopy_recovery_common(MigrateCommon *args) * Try to rebuild the migration channel using the resume flag and * the newly created channel */ -wait_for_migration_status(from, "postcopy-paused", - (const char * []) { "failed", "active", - "completed", NULL }); migrate_qmp(from, uri, "{'resume': true}"); /* Restore the postcopy bandwidth to unlimited */ -- 2.41.0 -- Peter Xu
Re: [PATCH] qtest/migration: Add a test for the analyze-migration script
Peter Xu writes: > On Wed, Sep 27, 2023 at 06:47:56PM -0300, Fabiano Rosas wrote: >> I know this adds a python dependency to qtests and I'm not sure how >> much we care about this script, but on the other hand it would be nice >> to catch these errors early on. >> >> This would also help with future work that touches the migration >> stream (moving multifd out of ram.c and fixed-ram). >> >> Let me know what you think. > > The test is good, but I think it'll be definitely less burden and cleaner > if it can be written just in shell scripts.. that can even be put in a > single line.. > > $ (echo "migrate exec:cat>$IMAGE"; echo "quit") | $QEMU_BIN -monitor stdio; > $DIR/scripts/analyze-migration.py -f $IMAGE > > Any chance to hook that in? I tried but it's way worse. We need to add the script to meson, pass the Python and QEMU binaries in, make it be invoked for each architecture, but skip s390x and ppc when using TCG only. Then we need to add code to map target vs. machine type because arm has no default machine. We also need to produce TAP output for meson. And the migration script outputs a ton of lines so Python barfs due to EAGAIN so we need to handle that as well. All of that is already there in migration-test. I'll send a v2 of this patch with the improvements Thomas suggested and try to dig out a fix for the script on ppc64 I made about 3 years ago that seemed to have not been merged.
[PATCH v2] misc/pca9552: Fix for pca9552 not getting reset
Testing of the pca9552 device on the powernv platform showed that the reset method was not being called when an instance of the device was realized. This was causing the INPUT0/INPUT1 POR values to be incorrect. Fixed by overriding the parent pca955x_realize method with a new pca9552_realize method which first calls the parent pca955x_realize method followed by the pca9552_reset function. Signed-off-by: Glenn Miles --- hw/misc/pca9552.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/hw/misc/pca9552.c b/hw/misc/pca9552.c index fff19e369a..bc12dced7f 100644 --- a/hw/misc/pca9552.c +++ b/hw/misc/pca9552.c @@ -384,6 +384,12 @@ static void pca955x_realize(DeviceState *dev, Error **errp) qdev_init_gpio_out(dev, s->gpio, k->pin_count); } +static void pca9552_realize(DeviceState *dev, Error **errp) +{ +pca955x_realize(dev, errp); +pca9552_reset(dev); +} + static Property pca955x_properties[] = { DEFINE_PROP_STRING("description", PCA955xState, description), DEFINE_PROP_END_OF_LIST(), @@ -416,7 +422,7 @@ static void pca9552_class_init(ObjectClass *oc, void *data) DeviceClass *dc = DEVICE_CLASS(oc); PCA955xClass *pc = PCA955X_CLASS(oc); -dc->reset = pca9552_reset; +dc->realize = pca9552_realize; dc->vmsd = _vmstate; pc->max_reg = PCA9552_LS3; pc->pin_count = 16; -- 2.31.1
Re: [PATCH v3 10/10] tests/migration-test: Add a test for postcopy hangs during RECOVER
Peter Xu writes: > On Thu, Oct 05, 2023 at 10:37:56AM -0300, Fabiano Rosas wrote: >> >> +/* >> >> + * Make sure both QEMU instances will go into RECOVER stage, then >> >> test >> >> + * kicking them out using migrate-pause. >> >> + */ >> >> +wait_for_postcopy_status(from, "postcopy-recover"); >> >> +wait_for_postcopy_status(to, "postcopy-recover"); >> > >> > Is this wait out of place? I think we're trying to resume too fast after >> > migrate_recover(): >> > >> > # { >> > # "error": { >> > # "class": "GenericError", >> > # "desc": "Cannot resume if there is no paused migration" >> > # } >> > # } >> > >> >> Ugh, sorry about the long lines: >> >> { >> "error": { >> "class": "GenericError", >> "desc": "Cannot resume if there is no paused migration" >> } >> } > > Sorry I didn't get you here. Could you elaborate your question? > The test is sometimes failing with the above message. But indeed my question doesn't make sense. I forgot migrate_recover happens on the destination. Nevermind. The bug is still present nonetheless. We're going into migrate_prepare in some state other than POSTCOPY_PAUSED.
Re: [PATCH v3 10/10] tests/migration-test: Add a test for postcopy hangs during RECOVER
On Thu, Oct 05, 2023 at 10:37:56AM -0300, Fabiano Rosas wrote: > >> +/* > >> + * Make sure both QEMU instances will go into RECOVER stage, then test > >> + * kicking them out using migrate-pause. > >> + */ > >> +wait_for_postcopy_status(from, "postcopy-recover"); > >> +wait_for_postcopy_status(to, "postcopy-recover"); > > > > Is this wait out of place? I think we're trying to resume too fast after > > migrate_recover(): > > > > # { > > # "error": { > > # "class": "GenericError", > > # "desc": "Cannot resume if there is no paused migration" > > # } > > # } > > > > Ugh, sorry about the long lines: > > { > "error": { > "class": "GenericError", > "desc": "Cannot resume if there is no paused migration" > } > } Sorry I didn't get you here. Could you elaborate your question? Here we wait on both sides and make sure they'll all be in RECOVER stage, and we should know that they won't proceed further because the pipes contain mostly nothing so they'll just block at the pipes. What did I miss? Thanks, -- Peter Xu
Re: [PATCH v3 07/10] migration: Add migration_rp_wait|kick()
On Thu, Oct 05, 2023 at 09:49:25AM +0200, Juan Quintela wrote: > Peter Xu wrote: > > It's just a simple wrapper for rp_sem on either wait() or kick(), make it > > even clearer on how it is used. Prepared to be used even for other things. > > > > Reviewed-by: Fabiano Rosas > > Signed-off-by: Peter Xu > > Reviewed-by: Juan Quintela > > I agree with the idea, but I think that the problem is the name of the > semaphore. > > > +void migration_rp_wait(MigrationState *s) > > +{ > > +qemu_sem_wait(>rp_state.rp_sem); > > I am not sure if it would be better to have the wrappers or just rename > > If we rename the remaphore to migration_thread, this becomes: > > qemu_sem_wait(>rp_state.return_path_ready); > > qemu_sem_post(>rp_state.return_path_ready); > > Or something similar? I'd prefer keeping a pair of helpers, but I'm open to other suggestions, e.g. I can rename the sem at the same time, or have a better name just for the helpers. Thanks, -- Peter Xu
[PATCH] misc/pca9552: Fix for pca9552 not getting reset
Testing of the pca9552 device on the powernv platform showed that the reset method was not being called when an instance of the device was realized. This was causing the INPUT0/INPUT1 POR values to be incorrect. Fixed by calling pca9552_reset from within the pca9552_realize method. Signed-off-by: Glenn Miles --- hw/misc/pca9552.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/hw/misc/pca9552.c b/hw/misc/pca9552.c index f28b5ecd7e..4e198af137 100644 --- a/hw/misc/pca9552.c +++ b/hw/misc/pca9552.c @@ -418,6 +418,12 @@ static void pca955x_realize(DeviceState *dev, Error **errp) qdev_init_gpio_in(dev, pca955x_gpio_in_handler, k->pin_count); } +static void pca9552_realize(DeviceState *dev, Error **errp) +{ +pca955x_realize(dev, errp); +pca9552_reset(dev); +} + static Property pca955x_properties[] = { DEFINE_PROP_STRING("description", PCA955xState, description), DEFINE_PROP_END_OF_LIST(), @@ -450,7 +456,7 @@ static void pca9552_class_init(ObjectClass *oc, void *data) DeviceClass *dc = DEVICE_CLASS(oc); PCA955xClass *pc = PCA955X_CLASS(oc); -dc->reset = pca9552_reset; +dc->realize = pca9552_realize; dc->vmsd = _vmstate; pc->max_reg = PCA9552_LS3; pc->pin_count = 16; -- 2.31.1
[PATCH v2] misc/pca9552: Let external devices set pca9552 inputs
Allow external devices to drive pca9552 input pins by adding input GPIO's to the model. This allows a device to connect its output GPIO's to the pca9552 input GPIO's. In order for an external device to set the state of a pca9552 pin, the pin must first be configured for high impedance (LED is off). If the pca9552 pin is configured to drive the pin low (LED is on), then external input will be ignored. Signed-off-by: Glenn Miles --- Based-on: <20230927203221.3286895-1-mil...@linux.vnet.ibm.com> ([PATCH] misc/pca9552: Fix inverted input status) hw/misc/pca9552.c | 39 ++- include/hw/misc/pca9552.h | 3 ++- 2 files changed, 36 insertions(+), 6 deletions(-) diff --git a/hw/misc/pca9552.c b/hw/misc/pca9552.c index ad811fb249..f28b5ecd7e 100644 --- a/hw/misc/pca9552.c +++ b/hw/misc/pca9552.c @@ -113,16 +113,22 @@ static void pca955x_update_pin_input(PCA955xState *s) switch (config) { case PCA9552_LED_ON: /* Pin is set to 0V to turn on LED */ -qemu_set_irq(s->gpio[i], 0); +qemu_set_irq(s->gpio_out[i], 0); s->regs[input_reg] &= ~(1 << input_shift); break; case PCA9552_LED_OFF: /* * Pin is set to Hi-Z to turn off LED and - * pullup sets it to a logical 1. + * pullup sets it to a logical 1 unless + * external device drives it low. */ -qemu_set_irq(s->gpio[i], 1); -s->regs[input_reg] |= 1 << input_shift; +if (s->ext_state[i] == 0) { +qemu_set_irq(s->gpio_out[i], 0); +s->regs[input_reg] &= ~(1 << input_shift); +} else { +qemu_set_irq(s->gpio_out[i], 1); +s->regs[input_reg] |= 1 << input_shift; +} break; case PCA9552_LED_PWM0: case PCA9552_LED_PWM1: @@ -337,6 +343,7 @@ static const VMStateDescription pca9552_vmstate = { VMSTATE_UINT8(len, PCA955xState), VMSTATE_UINT8(pointer, PCA955xState), VMSTATE_UINT8_ARRAY(regs, PCA955xState, PCA955X_NR_REGS), +VMSTATE_UINT8_ARRAY(ext_state, PCA955xState, PCA955X_PIN_COUNT_MAX), VMSTATE_I2C_SLAVE(i2c, PCA955xState), VMSTATE_END_OF_LIST() } @@ -355,6 +362,7 @@ static void pca9552_reset(DeviceState *dev) s->regs[PCA9552_LS2] = 0x55; s->regs[PCA9552_LS3] = 0x55; +memset(s->ext_state, 1, PCA955X_PIN_COUNT_MAX); pca955x_update_pin_input(s); s->pointer = 0xFF; @@ -377,6 +385,26 @@ static void pca955x_initfn(Object *obj) } } +static void pca955x_set_ext_state(PCA955xState *s, int pin, int level) +{ +if (s->ext_state[pin] != level) { +uint16_t pins_status = pca955x_pins_get_status(s); +s->ext_state[pin] = level; +pca955x_update_pin_input(s); +pca955x_display_pins_status(s, pins_status); +} +} + +static void pca955x_gpio_in_handler(void *opaque, int pin, int level) +{ + +PCA955xState *s = PCA955X(opaque); +PCA955xClass *k = PCA955X_GET_CLASS(s); + +assert((pin >= 0) && (pin < k->pin_count)); +pca955x_set_ext_state(s, pin, level); +} + static void pca955x_realize(DeviceState *dev, Error **errp) { PCA955xClass *k = PCA955X_GET_CLASS(dev); @@ -386,7 +414,8 @@ static void pca955x_realize(DeviceState *dev, Error **errp) s->description = g_strdup("pca-unspecified"); } -qdev_init_gpio_out(dev, s->gpio, k->pin_count); +qdev_init_gpio_out(dev, s->gpio_out, k->pin_count); +qdev_init_gpio_in(dev, pca955x_gpio_in_handler, k->pin_count); } static Property pca955x_properties[] = { diff --git a/include/hw/misc/pca9552.h b/include/hw/misc/pca9552.h index b6f4e264fe..c36525f0c3 100644 --- a/include/hw/misc/pca9552.h +++ b/include/hw/misc/pca9552.h @@ -30,7 +30,8 @@ struct PCA955xState { uint8_t pointer; uint8_t regs[PCA955X_NR_REGS]; -qemu_irq gpio[PCA955X_PIN_COUNT_MAX]; +qemu_irq gpio_out[PCA955X_PIN_COUNT_MAX]; +uint8_t ext_state[PCA955X_PIN_COUNT_MAX]; char *description; /* For debugging purpose only */ }; -- 2.31.1
Re: [PATCH 0/3] hvf x86 correctness and efficiency improvements
Ping - let me know if there's anything particularly controversial, unclear, etc. about these patches or if I can do anything to make reviewing easier. Thanks! On Fri, 22 Sept 2023 at 16:09, Phil Dennis-Jordan wrote: > > This is a series of semi-related patches for the x86 macOS > Hypervisor.framework > (hvf) accelerator backend. The intention is to make VMs run slightly more > efficiently on macOS host machines. They have been subject to some months of > CI workloads with macOS guest VMs without issues and they seem to give a few > percent performance improvement. (Though this varies greatly with the type of > workload.) > > Patch 1 enables the INVTSC CPUID bit when running with hvf. This can enable > some optimisations in the guest OS, and I've not found any reason it shouldn't > be allowed for hvf based hosts. > > Patch 2 fixes hvf_kick_vcpu_thread so it actually forces a VM exit instead of > doing nothing. I guess this previously didn't cause any huge issues because > hvf's hv_vcpu_run() would exit so extremely frequently on its own accord. The > temp variable is needed because the pointer expected by the > hv_vcpu_interrupt() > call doesn't match the fd field's type in the hvf accel's struct > AccelCPUState. > I'm unsure if it would be better to change that struct field to the relevant > architecture's handle types, hv_vcpuid_t (x86, unsigned int) and hv_vcpu_t > (aarch64, uint64_t), perhaps via an intermediate typedef? > > Patch 3, which replaces the call to hv_vcpu_run() with the more modern > hv_vcpu_run_until() for running the guest vCPU. The newer API is available > from macOS 10.15 host systems onwards. This call causes significantly fewer > VM exits, which also means we really need that exit-forcing interrupt from > patch 2. The reduction in VM exits means less overhead from exits and less > contention on the BQL. Using hv_vcpu_run_until() is also a prerequisite for > using certain newer hvf features, though this patchset doesn't use any. > > Patches 2 & 3 must therefore be applied in that order, patch 1 is independent. > > This work has been sponsored by Sauce Labs Inc. > > Phil Dennis-Jordan (3): > i386: hvf: Adds support for INVTSC cpuid bit > i386: hvf: In kick_vcpu use hv_vcpu_interrupt to force exit > i386: hvf: Updates API usage to use modern vCPU run function > > target/i386/hvf/hvf.c | 26 +- > target/i386/hvf/x86_cpuid.c | 4 > 2 files changed, 29 insertions(+), 1 deletion(-) > > -- > 2.36.1 >
Re: [PATCH v6 14/14] python: use vm.cmd() instead of vm.qmp() where appropriate
On Thu, Oct 05, 2023 at 04:55:50PM +0300, Vladimir Sementsov-Ogievskiy wrote: > In many cases we just want an effect of qmp command and want to raise on > failure. Use vm.cmd() method which does exactly this. > > Signed-off-by: Vladimir Sementsov-Ogievskiy > --- > tests/avocado/vnc.py | 16 +- > tests/qemu-iotests/030| 168 +++--- > tests/qemu-iotests/040| 172 +++ > tests/qemu-iotests/041| 483 -- > tests/qemu-iotests/045| 15 +- > tests/qemu-iotests/055| 62 +-- > tests/qemu-iotests/056| 77 ++- > tests/qemu-iotests/093| 42 +- > tests/qemu-iotests/118| 225 > tests/qemu-iotests/124| 102 ++-- > tests/qemu-iotests/129| 14 +- > tests/qemu-iotests/132| 5 +- > tests/qemu-iotests/139| 45 +- > tests/qemu-iotests/147| 31 +- > tests/qemu-iotests/151| 57 +-- > tests/qemu-iotests/152| 10 +- > tests/qemu-iotests/155| 55 +- > tests/qemu-iotests/165| 8 +- > tests/qemu-iotests/196| 3 +- > tests/qemu-iotests/205| 6 +- > tests/qemu-iotests/218| 105 ++-- > tests/qemu-iotests/245| 245 - > tests/qemu-iotests/264| 31 +- > tests/qemu-iotests/281| 21 +- > tests/qemu-iotests/295| 15 +- > tests/qemu-iotests/296| 15 +- > tests/qemu-iotests/298| 13 +- > tests/qemu-iotests/300| 54 +- > tests/qemu-iotests/iotests.py | 9 +- > .../tests/export-incoming-iothread| 6 +- > .../qemu-iotests/tests/graph-changes-while-io | 6 +- > tests/qemu-iotests/tests/image-fleecing | 3 +- > .../tests/migrate-bitmaps-postcopy-test | 31 +- > tests/qemu-iotests/tests/migrate-bitmaps-test | 47 +- > .../qemu-iotests/tests/migrate-during-backup | 41 +- > .../qemu-iotests/tests/migration-permissions | 9 +- > .../tests/mirror-ready-cancel-error | 74 ++- > tests/qemu-iotests/tests/mirror-top-perms | 16 +- > tests/qemu-iotests/tests/nbd-multiconn| 12 +- > tests/qemu-iotests/tests/reopen-file | 3 +- > .../qemu-iotests/tests/stream-error-on-reset | 6 +- > 41 files changed, 951 insertions(+), 1407 deletions(-) Big but mechanical. It would be worth amending the commit message to describe how you found all these spots (in case someone backporting this patch has to redo the work over a different subset of files based on what has changed since the two trees diverged). > > diff --git a/tests/avocado/vnc.py b/tests/avocado/vnc.py > index aeeefc70be..862c8996a8 100644 > --- a/tests/avocado/vnc.py > +++ b/tests/avocado/vnc.py > @@ -88,9 +88,8 @@ def test_change_password(self): > self.vm.add_args('-nodefaults', '-S', '-vnc', ':0,password=on') > self.vm.launch() > self.assertTrue(self.vm.qmp('query-vnc')['return']['enabled']) > -set_password_response = self.vm.qmp('change-vnc-password', > -password='new_password') > -self.assertEqual(set_password_response['return'], {}) > +self.vm.cmd('change-vnc-password', > +password='new_password') Indeed a nicer idiom, where you are able to use it (whether by self.assertEqual or by self.assert_qmp). Reviewed-by: Eric Blake -- Eric Blake, Principal Software Engineer Red Hat, Inc. Virtualization: qemu.org | libguestfs.org
Re: [PATCH v6 13/14] tests/vm/basevm.py: use cmd() instead of qmp()
On Thu, Oct 05, 2023 at 04:55:49PM +0300, Vladimir Sementsov-Ogievskiy wrote: > We don't expect failure here and need 'result' object. cmd() is better > in this case. > > Signed-off-by: Vladimir Sementsov-Ogievskiy > --- > tests/vm/basevm.py | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/tests/vm/basevm.py b/tests/vm/basevm.py > index a97e23b0ce..8aef4cff96 100644 > --- a/tests/vm/basevm.py > +++ b/tests/vm/basevm.py > @@ -312,8 +312,8 @@ def boot(self, img, extra_args=[]): > self._guest = guest > # Init console so we can start consuming the chars. > self.console_init() > -usernet_info = guest.qmp("human-monitor-command", > - command_line="info usernet").get("return") > +usernet_info = guest.cmd("human-monitor-command", > + command_line="info usernet") > self.ssh_port = get_info_usernet_hostfwd_port(usernet_info) Reviewed-by: Eric Blake -- Eric Blake, Principal Software Engineer Red Hat, Inc. Virtualization: qemu.org | libguestfs.org
Re: [PATCH v6 12/14] iotests.py: pause_job(): drop return value
On Thu, Oct 05, 2023 at 04:55:48PM +0300, Vladimir Sementsov-Ogievskiy wrote: > The returned value is unused. It's simple to check by command > > git grep -B 3 '\.pause_job(' > > Signed-off-by: Vladimir Sementsov-Ogievskiy > --- > tests/qemu-iotests/iotests.py | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) Reviewed-by: Eric Blake -- Eric Blake, Principal Software Engineer Red Hat, Inc. Virtualization: qemu.org | libguestfs.org
Re: [PATCH v6 11/14] iotests: drop some extra ** in qmp() call
On Thu, Oct 05, 2023 at 04:55:47PM +0300, Vladimir Sementsov-Ogievskiy wrote: > qmp() method supports passing dict (if it doesn't need substituting > '_' to '-' in keys). So, drop some extra '**' operators. > > Signed-off-by: Vladimir Sementsov-Ogievskiy > --- > tests/qemu-iotests/040| 4 +- > tests/qemu-iotests/041| 14 +++--- > tests/qemu-iotests/129| 2 +- > tests/qemu-iotests/147| 2 +- > tests/qemu-iotests/155| 2 +- > tests/qemu-iotests/264| 12 ++--- > tests/qemu-iotests/295| 5 +- > tests/qemu-iotests/296| 15 +++--- > tests/qemu-iotests/tests/migrate-bitmaps-test | 4 +- > .../tests/mirror-ready-cancel-error | 50 +-- > 10 files changed, 54 insertions(+), 56 deletions(-) > Reviewed-by: Eric Blake -- Eric Blake, Principal Software Engineer Red Hat, Inc. Virtualization: qemu.org | libguestfs.org
Re: [PATCH v6 10/14] iotests: drop some occasional semicolons
On Thu, Oct 05, 2023 at 04:55:46PM +0300, Vladimir Sementsov-Ogievskiy wrote: > Signed-off-by: Vladimir Sementsov-Ogievskiy > --- > tests/qemu-iotests/041 | 2 +- > tests/qemu-iotests/196 | 2 +- > 2 files changed, 2 insertions(+), 2 deletions(-) Maybe in the subject s/occasional/extra/ Reviewed-by: Eric Blake > > diff --git a/tests/qemu-iotests/041 b/tests/qemu-iotests/041 > index 4d7a829b65..550e4dc391 100755 > --- a/tests/qemu-iotests/041 > +++ b/tests/qemu-iotests/041 > @@ -1086,7 +1086,7 @@ class TestRepairQuorum(iotests.QMPTestCase): > def test_after_a_quorum_snapshot(self): > result = self.vm.qmp('blockdev-snapshot-sync', node_name='img1', > snapshot_file=quorum_snapshot_file, > - snapshot_node_name="snap1"); > + snapshot_node_name="snap1") > self.assert_qmp(result, 'return', {}) I'm a bit surprised the linters don't pick up on this. -- Eric Blake, Principal Software Engineer Red Hat, Inc. Virtualization: qemu.org | libguestfs.org
Re: [PATCH v6 09/14] iotests: refactor some common qmp result checks into generic pattern
On Thu, Oct 05, 2023 at 04:55:45PM +0300, Vladimir Sementsov-Ogievskiy wrote: > To simplify further conversion. > > Signed-off-by: Vladimir Sementsov-Ogievskiy > --- > tests/qemu-iotests/040 | 3 ++- > tests/qemu-iotests/147 | 3 ++- > tests/qemu-iotests/155 | 4 ++-- > tests/qemu-iotests/218 | 4 ++-- > tests/qemu-iotests/296 | 3 ++- > 5 files changed, 10 insertions(+), 7 deletions(-) > Reviewed-by: Eric Blake -- Eric Blake, Principal Software Engineer Red Hat, Inc. Virtualization: qemu.org | libguestfs.org
Re: [PATCH v6 08/14] iotests: add some missed checks of qmp result
On Thu, Oct 05, 2023 at 04:55:44PM +0300, Vladimir Sementsov-Ogievskiy wrote: > Signed-off-by: Vladimir Sementsov-Ogievskiy > --- > tests/qemu-iotests/041| 1 + > tests/qemu-iotests/151| 1 + > tests/qemu-iotests/152| 2 ++ > tests/qemu-iotests/tests/migrate-bitmaps-test | 2 ++ > 4 files changed, 6 insertions(+) > Reviewed-by: Eric Blake -- Eric Blake, Principal Software Engineer Red Hat, Inc. Virtualization: qemu.org | libguestfs.org
Re: [PATCH v2 03/21] preallocate: Don't poll during permission updates
On 11.09.23 12:46, Kevin Wolf wrote: When the permission related BlockDriver callbacks are called, we are in the middle of an operation traversing the block graph. Polling in such a place is a very bad idea because the graph could change in unexpected ways. In the future, callers will also hold the graph lock, which is likely to turn polling into a deadlock. So we need to get rid of calls to functions like bdrv_getlength() or bdrv_truncate() there as these functions poll internally. They are currently used so that when no parent has write/resize permissions on the image any more, the preallocate filter drops the extra preallocated area in the image file and gives up write/resize permissions itself. In order to achieve this without polling in .bdrv_check_perm, don't immediately truncate the image, but only schedule a BH to do so. The filter keeps the write/resize permissions a bit longer now until the BH has executed. There is one case in which delaying doesn't work: Reopening the image read-only. In this case, bs->file will likely be reopened read-only, too, so keeping write permissions a bit longer on it doesn't work. But we can already cover this case in preallocate_reopen_prepare() and not rely on the permission updates for it. Hmm, now I found one more "future" case. I now try to rebase my "[PATCH v7 0/7] blockdev-replace" https://patchew.org/QEMU/20230421114102.884457-1-vsement...@yandex-team.ru/ And it breaks after this commit. By accident, blockdev-replace series uses exactly "preallocate" filter to test insertion/removing of filters. And removing is broken now. Removing is done as follows: 1. We have filter inserted: disk0 -file-> filter -file-> file0 2. blockdev-replace, replaces file child of disk0, so we should get the picture*: disk0 -file-> file0 <-file- filter 3. blockdev-del filter But step [2] fails, as now preallocate filter doesn't drop permissions during the operation (postponing this for a while) and the picture* is impossible. Permission check fails. Hmmm... Any idea how blockdev-replace and preallocate filter should work :) ? Maybe, doing truncation in .drain_begin() will help? Will try Signed-off-by: Kevin Wolf Reviewed-by: Eric Blake Reviewed-by: Stefan Hajnoczi --- -- Best regards, Vladimir
Re: [PATCH v6 07/14] iotests: QemuStorageDaemon: add cmd() method like in QEMUMachine.
On Thu, Oct 05, 2023 at 04:55:43PM +0300, Vladimir Sementsov-Ogievskiy wrote: > Add similar method for consistency. > > Signed-off-by: Vladimir Sementsov-Ogievskiy > --- > tests/qemu-iotests/iotests.py | 7 ++- > 1 file changed, 6 insertions(+), 1 deletion(-) > Reviewed-by: Eric Blake -- Eric Blake, Principal Software Engineer Red Hat, Inc. Virtualization: qemu.org | libguestfs.org
Re: [PATCH v6 06/14] python/machine.py: upgrade vm.cmd() method
On Thu, Oct 05, 2023 at 04:55:42PM +0300, Vladimir Sementsov-Ogievskiy wrote: > The method is not popular in iotests, we prefer use vm.qmp() and then > check success by hand.. But that's not optimal. To simplify movement to extra '.' > vm.cmd() let's support same interface improvements like in vm.qmp(). > > Signed-off-by: Vladimir Sementsov-Ogievskiy > --- > python/qemu/machine/machine.py | 12 +++- > 1 file changed, 11 insertions(+), 1 deletion(-) Reviewed-by: Eric Blake -- Eric Blake, Principal Software Engineer Red Hat, Inc. Virtualization: qemu.org | libguestfs.org
Re: [PATCH v3 03/10] migration: Refactor error handling in source return path
On Thu, Oct 05, 2023 at 10:22:52AM +0200, Juan Quintela wrote: > Peter Xu wrote: > > rp_state.error was a boolean used to show error happened in return path > > thread. That's not only duplicating error reporting (migrate_set_error), > > but also not good enough in that we only do error_report() and set it to > > true, we never can keep a history of the exact error and show it in > > query-migrate. > > > > To make this better, a few things done: > > > > - Use error_setg() rather than error_report() across the whole lifecycle > > of return path thread, keeping the error in an Error*. > > Good. > > > - Use migrate_set_error() to apply that captured error to the global > > migration object when error occured in this thread. > > Good. > > > - With above, no need to have mark_source_rp_bad(), remove it, alongside > > with rp_state.error itself. > > Good. > > > uint64_t ram_pagesize_summary(void); > > -int ram_save_queue_pages(const char *rbname, ram_addr_t start, ram_addr_t > > len); > > +int ram_save_queue_pages(const char *rbname, ram_addr_t start, ram_addr_t > > len, > > + Error **errp); > > > good. > > > @@ -1793,37 +1782,36 @@ static void > > migrate_handle_rp_req_pages(MigrationState *ms, const char* rbname, > > */ > > if (!QEMU_IS_ALIGNED(start, our_host_ps) || > > !QEMU_IS_ALIGNED(len, our_host_ps)) { > > -error_report("%s: Misaligned page request, start: " RAM_ADDR_FMT > > - " len: %zd", __func__, start, len); > > -mark_source_rp_bad(ms); > > +error_setg(errp, "MIG_RP_MSG_REQ_PAGES: Misaligned page request, > > start:" > > + RAM_ADDR_FMT " len: %zd", start, len); > > return; > > } > > > > -if (ram_save_queue_pages(rbname, start, len)) { > > -mark_source_rp_bad(ms); > > -} > > +ram_save_queue_pages(rbname, start, len, errp); > > ram_save_queue_pages() returns an int. > I think this function should return an int. Phil suggested something similar for the other patch, instead of also let this function return int, I'll add one more patch to let it return boolean to show whether there's an error, keeping the real error in errp. > > Next is independent of this patch: > > > -static int migrate_handle_rp_recv_bitmap(MigrationState *s, char > > *block_name) > > +static int migrate_handle_rp_recv_bitmap(MigrationState *s, char > > *block_name, > > + Error **errp) > > { > > RAMBlock *block = qemu_ram_block_by_name(block_name); > > > > if (!block) { > > -error_report("%s: invalid block name '%s'", __func__, block_name); > > +error_setg(errp, "MIG_RP_MSG_RECV_BITMAP has invalid block name > > '%s'", > > + block_name); > > return -EINVAL; > > We sent -EINVAL. > > > } > > > > /* Fetch the received bitmap and refresh the dirty bitmap */ > > -return ram_dirty_bitmap_reload(s, block); > > +return ram_dirty_bitmap_reload(s, block, errp); > > } > > > > -static int migrate_handle_rp_resume_ack(MigrationState *s, uint32_t value) > > +static int migrate_handle_rp_resume_ack(MigrationState *s, > > +uint32_t value, Error **errp) > > { > > trace_source_return_path_thread_resume_ack(value); > > > > if (value != MIGRATION_RESUME_ACK_VALUE) { > > -error_report("%s: illegal resume_ack value %"PRIu32, > > - __func__, value); > > +error_setg(errp, "illegal resume_ack value %"PRIu32, value); > > return -1; > > And here -1. > > On both callers we just check if it is different from zero. We never > use the return value as errno, so I think we should move to -1, if there > is an error, that is what errp is for. Right. I'll switch all rp-return thread paths to use boolean as return, as long as there's errp. > > > > -/* Returns 0 if the RP was ok, otherwise there was an error on the RP */ > > -static int await_return_path_close_on_source(MigrationState *ms) > > +static void await_return_path_close_on_source(MigrationState *ms) > > { > > -int ret; > > - > > if (!ms->rp_state.rp_thread_created) { > > -return 0; > > +return; > > } > > > > trace_migration_return_path_end_before(); > > @@ -2060,18 +2050,10 @@ static int > > await_return_path_close_on_source(MigrationState *ms) > > } > > } > > > > -trace_await_return_path_close_on_source_joining(); > > qemu_thread_join(>rp_state.rp_thread); > > ms->rp_state.rp_thread_created = false; > > -trace_await_return_path_close_on_source_close(); > > - > > -ret = ms->rp_state.error; > > -ms->rp_state.error = false; > > - > > migration_release_dst_files(ms); > > - > > -trace_migration_return_path_end_after(ret); > > -return ret; > > +trace_migration_return_path_end_after(); > > } > > > > static inline
Re: [PATCH v3 03/10] migration: Refactor error handling in source return path
On Thu, Oct 05, 2023 at 09:57:58AM -0300, Fabiano Rosas wrote: > > @@ -2008,9 +1996,14 @@ static void *source_return_path_thread(void *opaque) > > } > > > > out: > > -if (qemu_file_get_error(rp)) { > > +if (err) { > > Need to keep both checks here. The next patch did that. Let me squash that into this.. Thanks, -- Peter Xu
Re: [PATCH v8 2/2] tpm: add backend for mssim
On Thu, 2023-10-05 at 18:11 +0200, Philippe Mathieu-Daudé wrote: > On 5/10/23 15:57, James Bottomley wrote: > > On Thu, 2023-10-05 at 08:49 +0200, Philippe Mathieu-Daudé wrote: > > > On 4/10/23 20:42, James Bottomley wrote: > > > > From: James Bottomley [...] > > > > +.. code-block:: console > > > > + > > > > + qemu-system-x86_64 \ > > > > + -tpmdev > > > > "{'type':'mssim','id':'tpm0','command':{'type':'inet','host':'r > > > > emot > > > > e','port':'2321'},'control':{'type':'inet','host':'remote','por > > > > t':' > > > > 2322'}}" \ > > > > + -device tpm-crb,tpmdev=tpm0 > > > > > > Did you test running this command line on a big-endian host? > > > > Well no, big endian machines are rather rare nowadays. However, > > since the QIOChannelSocket abstraction is based on SocketAddress, > > which is a qapi wrapper around strings, what makes you think the > > endianness would matter? > > You use ntoh/hton in tpm_mssim_handle_request(), so I wonder about > the 'uint32_t cmd' in tpm_send_ctrl(). tpm_send_ctrl has a htonl for the send control command as well (The TPM server is always network ordered, i.e. big endian). The reason it doesn't have one for the receive is that it only checks against zero which is endian invariant, if that's what you're asking? James
Re: [PATCH] MAINTANERS: Split vt82c686 out of fuloong2e
On Thu, 5 Oct 2023, Philippe Mathieu-Daudé wrote: On 5/10/23 20:18, BALATON Zoltan wrote: It is used by other machines not just fuloong2e so put it in a separate section and add myself as reviewer. Signed-off-by: BALATON Zoltan --- By the way, PIIX4 already has a section just above where I've added this but some files are still listed in Malta. You may want to have a look at that. MAINTAINERS | 11 --- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/MAINTAINERS b/MAINTAINERS index ea91f9e804..7f0e20fde6 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -1307,10 +1307,7 @@ M: Philippe Mathieu-Daudé R: Jiaxun Yang S: Odd Fixes F: hw/mips/fuloong2e.c -F: hw/isa/vt82c686.c F: hw/pci-host/bonito.c -F: hw/usb/vt82c686-uhci-pci.c -F: include/hw/isa/vt82c686.h F: include/hw/pci-host/bonito.h F: tests/avocado/machine_mips_fuloong2e.py @@ -2462,6 +2459,14 @@ S: Maintained F: hw/isa/piix4.c F: include/hw/southbridge/piix.h +VIA south bridges (VT82C686B, VT8231) +M: Philippe Mathieu-Daudé +R: Jiaxun Yang +R: BALATON Zoltan Feel free to add yourself as maintainer if you rather. I'm happy with you staying maintainer if you don't mind :-) (after all we'll still need you to send pull requests anyway). Just wanted to get cc'd on this to make sure my machines won't break. Regards, BALATON Zoltan Reviewed-by: Philippe Mathieu-Daudé +F: hw/isa/vt82c686.c +F: hw/usb/vt82c686-uhci-pci.c +F: include/hw/isa/vt82c686.h + Firmware configuration (fw_cfg) M: Philippe Mathieu-Daudé R: Gerd Hoffmann
Re: [PATCH v6 05/14] python/qemu: rename command() to cmd()
On Thu, Oct 05, 2023 at 04:55:41PM +0300, Vladimir Sementsov-Ogievskiy wrote: > Use a shorter name. We are going to move in iotests from qmp() to > command() where possible. But command() is longer than qmp() and don't > look better. Let's rename. > > You can simply grep for '\.command(' and for 'def command(' to check > that everything is updated (command() in tests/docker/docker.py is > unrelated). > > Signed-off-by: Vladimir Sementsov-Ogievskiy > Reviewed-by: Daniel P. Berrangé > --- > docs/devel/testing.rst| 10 +- There's a risk that this will need rebasing on top of any other test changes being made in parallel; hopefully we can get this series in sooner rather than later. Reviewed-by: Eric Blake -- Eric Blake, Principal Software Engineer Red Hat, Inc. Virtualization: qemu.org | libguestfs.org
Re: [PATCH v6 04/14] python: rename QEMUMonitorProtocol.cmd() to cmd_raw()
On Thu, Oct 05, 2023 at 04:55:40PM +0300, Vladimir Sementsov-Ogievskiy wrote: > Having cmd() and command() methods in one class doesn't look good. > Rename cmd() to cmd_raw(), to show its meaning better. > > We also want to rename command() to cmd() in future, so this commit is > a necessary step. > > Signed-off-by: Vladimir Sementsov-Ogievskiy > --- > python/qemu/machine/machine.py | 2 +- > python/qemu/qmp/legacy.py | 4 ++-- > tests/qemu-iotests/iotests.py | 2 +- > 3 files changed, 4 insertions(+), 4 deletions(-) Reviewed-by: Eric Blake I was surprised how few users there are, but agree that this opens up the door to let our common usage be the least typing. -- Eric Blake, Principal Software Engineer Red Hat, Inc. Virtualization: qemu.org | libguestfs.org
Re: [PATCH v6 03/14] scripts/cpu-x86-uarch-abi.py: use .command() instead of .cmd()
On Thu, Oct 05, 2023 at 04:55:39PM +0300, Vladimir Sementsov-Ogievskiy wrote: > Here we don't expect a failure. In case on failure we'll crash on s/case on/case of/ > trying to access ['return']. Let's better use .command() that clearly > raise on failure. Maybe: s/Let's better /Better is to/; s/raise/raises/ > > Signed-off-by: Vladimir Sementsov-Ogievskiy > --- > scripts/cpu-x86-uarch-abi.py | 12 ++-- > 1 file changed, 6 insertions(+), 6 deletions(-) > Reviewed-by: Eric Blake -- Eric Blake, Principal Software Engineer Red Hat, Inc. Virtualization: qemu.org | libguestfs.org
[PULL 13/15] nbd/server: Refactor list of negotiated meta contexts
Peform several minor refactorings of how the list of negotiated meta contexts is managed, to make upcoming patches easier: Promote the internal type NBDExportMetaContexts to the public opaque type NBDMetaContexts, and mark exp const. Use a shorter member name in NBDClient. Hoist calls to nbd_check_meta_context() earlier in their callers, as the number of negotiated contexts may impact the flags exposed in regards to an export, which in turn requires a new parameter. Drop a redundant parameter to nbd_negotiate_meta_queries. No semantic change intended on the success path; on the failure path, dropping context in nbd_check_meta_export even when reporting an error is safer. Signed-off-by: Eric Blake Reviewed-by: Vladimir Sementsov-Ogievskiy Message-ID: <20230925192229.3186470-24-ebl...@redhat.com> --- include/block/nbd.h | 1 + nbd/server.c| 55 - 2 files changed, 31 insertions(+), 25 deletions(-) diff --git a/include/block/nbd.h b/include/block/nbd.h index ba8724f5336..2006497f987 100644 --- a/include/block/nbd.h +++ b/include/block/nbd.h @@ -29,6 +29,7 @@ typedef struct NBDExport NBDExport; typedef struct NBDClient NBDClient; typedef struct NBDClientConnection NBDClientConnection; +typedef struct NBDMetaContexts NBDMetaContexts; extern const BlockExportDriver blk_exp_nbd; diff --git a/nbd/server.c b/nbd/server.c index d3eed6535bf..2719992db7b 100644 --- a/nbd/server.c +++ b/nbd/server.c @@ -105,11 +105,13 @@ struct NBDExport { static QTAILQ_HEAD(, NBDExport) exports = QTAILQ_HEAD_INITIALIZER(exports); -/* NBDExportMetaContexts represents a list of contexts to be exported, +/* + * NBDMetaContexts represents a list of meta contexts in use, * as selected by NBD_OPT_SET_META_CONTEXT. Also used for - * NBD_OPT_LIST_META_CONTEXT. */ -typedef struct NBDExportMetaContexts { -NBDExport *exp; + * NBD_OPT_LIST_META_CONTEXT. + */ +struct NBDMetaContexts { +const NBDExport *exp; /* associated export */ size_t count; /* number of negotiated contexts */ bool base_allocation; /* export base:allocation context (block status) */ bool allocation_depth; /* export qemu:allocation-depth */ @@ -117,7 +119,7 @@ typedef struct NBDExportMetaContexts { * export qemu:dirty-bitmap:, * sized by exp->nr_export_bitmaps */ -} NBDExportMetaContexts; +}; struct NBDClient { int refcount; @@ -144,7 +146,7 @@ struct NBDClient { uint32_t check_align; /* If non-zero, check for aligned client requests */ NBDMode mode; -NBDExportMetaContexts export_meta; +NBDMetaContexts contexts; /* Negotiated meta contexts */ uint32_t opt; /* Current option being negotiated */ uint32_t optlen; /* remaining length of data in ioc for the option being @@ -455,10 +457,10 @@ static int nbd_negotiate_handle_list(NBDClient *client, Error **errp) return nbd_negotiate_send_rep(client, NBD_REP_ACK, errp); } -static void nbd_check_meta_export(NBDClient *client) +static void nbd_check_meta_export(NBDClient *client, NBDExport *exp) { -if (client->exp != client->export_meta.exp) { -client->export_meta.count = 0; +if (exp != client->contexts.exp) { +client->contexts.count = 0; } } @@ -504,6 +506,7 @@ static int nbd_negotiate_handle_export_name(NBDClient *client, bool no_zeroes, error_setg(errp, "export not found"); return -EINVAL; } +nbd_check_meta_export(client, client->exp); myflags = client->exp->nbdflags; if (client->mode >= NBD_MODE_STRUCTURED) { @@ -521,7 +524,6 @@ static int nbd_negotiate_handle_export_name(NBDClient *client, bool no_zeroes, QTAILQ_INSERT_TAIL(>exp->clients, client, next); blk_exp_ref(>exp->common); -nbd_check_meta_export(client); return 0; } @@ -641,6 +643,9 @@ static int nbd_negotiate_handle_info(NBDClient *client, Error **errp) errp, "export '%s' not present", sane_name); } +if (client->opt == NBD_OPT_GO) { +nbd_check_meta_export(client, exp); +} /* Don't bother sending NBD_INFO_NAME unless client requested it */ if (sendname) { @@ -729,7 +734,6 @@ static int nbd_negotiate_handle_info(NBDClient *client, Error **errp) client->check_align = check_align; QTAILQ_INSERT_TAIL(>exp->clients, client, next); blk_exp_ref(>exp->common); -nbd_check_meta_export(client); rc = 1; } return rc; @@ -852,7 +856,7 @@ static bool nbd_strshift(const char **str, const char *prefix) * Handle queries to 'base' namespace. For now, only the base:allocation * context is available. Return true if @query has been handled. */ -static bool nbd_meta_base_query(NBDClient *client, NBDExportMetaContexts *meta, +static bool nbd_meta_base_query(NBDClient *client, NBDMetaContexts *meta,
[PULL 11/15] nbd/client: Accept 64-bit block status chunks
Once extended mode is enabled, we need to accept 64-bit status replies (even for replies that don't exceed a 32-bit length). It is easier to normalize narrow replies into wide format so that the rest of our code only has to handle one width. Although a server is non-compliant if it sends a 64-bit reply in compact mode, or a 32-bit reply in extended mode, it is still easy enough to tolerate these mismatches. In normal execution, we are only requesting "base:allocation" which never exceeds 32 bits for flag values. But during testing with x-dirty-bitmap, we can force qemu to connect to some other context that might have 64-bit status bit; however, we ignore those upper bits (other than mapping qemu:allocation-depth into something that 'qemu-img map --output=json' can expose), and since that only affects testing, we really don't bother with checking whether more than the two least-significant bits are set. Signed-off-by: Eric Blake Reviewed-by: Vladimir Sementsov-Ogievskiy Message-ID: <20230925192229.3186470-22-ebl...@redhat.com> --- block/nbd.c| 49 -- block/trace-events | 1 + 2 files changed, 35 insertions(+), 15 deletions(-) diff --git a/block/nbd.c b/block/nbd.c index 76461430411..52ebc8b2f53 100644 --- a/block/nbd.c +++ b/block/nbd.c @@ -615,13 +615,17 @@ static int nbd_parse_offset_hole_payload(BDRVNBDState *s, */ static int nbd_parse_blockstatus_payload(BDRVNBDState *s, NBDStructuredReplyChunk *chunk, - uint8_t *payload, uint64_t orig_length, - NBDExtent32 *extent, Error **errp) + uint8_t *payload, bool wide, + uint64_t orig_length, + NBDExtent64 *extent, Error **errp) { uint32_t context_id; +uint32_t count; +size_t ext_len = wide ? sizeof(*extent) : sizeof(NBDExtent32); +size_t pay_len = sizeof(context_id) + wide * sizeof(count) + ext_len; /* The server succeeded, so it must have sent [at least] one extent */ -if (chunk->length < sizeof(context_id) + sizeof(*extent)) { +if (chunk->length < pay_len) { error_setg(errp, "Protocol error: invalid payload for " "NBD_REPLY_TYPE_BLOCK_STATUS"); return -EINVAL; @@ -636,8 +640,15 @@ static int nbd_parse_blockstatus_payload(BDRVNBDState *s, return -EINVAL; } -extent->length = payload_advance32(); -extent->flags = payload_advance32(); +if (wide) { +count = payload_advance32(); +extent->length = payload_advance64(); +extent->flags = payload_advance64(); +} else { +count = 0; +extent->length = payload_advance32(); +extent->flags = payload_advance32(); +} if (extent->length == 0) { error_setg(errp, "Protocol error: server sent status chunk with " @@ -658,7 +669,7 @@ static int nbd_parse_blockstatus_payload(BDRVNBDState *s, * (always a safe status, even if it loses information). */ if (s->info.min_block && !QEMU_IS_ALIGNED(extent->length, - s->info.min_block)) { + s->info.min_block)) { trace_nbd_parse_blockstatus_compliance("extent length is unaligned"); if (extent->length > s->info.min_block) { extent->length = QEMU_ALIGN_DOWN(extent->length, @@ -672,13 +683,15 @@ static int nbd_parse_blockstatus_payload(BDRVNBDState *s, /* * We used NBD_CMD_FLAG_REQ_ONE, so the server should not have * sent us any more than one extent, nor should it have included - * status beyond our request in that extent. However, it's easy - * enough to ignore the server's noncompliance without killing the + * status beyond our request in that extent. Furthermore, a wide + * server should have replied with an accurate count (we left + * count at 0 for a narrow server). However, it's easy enough to + * ignore the server's noncompliance without killing the * connection; just ignore trailing extents, and clamp things to * the length of our request. */ -if (chunk->length > sizeof(context_id) + sizeof(*extent)) { -trace_nbd_parse_blockstatus_compliance("more than one extent"); +if (count != wide || chunk->length > pay_len) { +trace_nbd_parse_blockstatus_compliance("unexpected extent count"); } if (extent->length > orig_length) { extent->length = orig_length; @@ -1124,7 +1137,7 @@ nbd_co_receive_cmdread_reply(BDRVNBDState *s, uint64_t cookie, static int coroutine_fn nbd_co_receive_blockstatus_reply(BDRVNBDState *s, uint64_t cookie, - uint64_t length, NBDExtent32 *extent, + uint64_t length, NBDExtent64 *extent,
[PULL 12/15] nbd/client: Request extended headers during negotiation
All the pieces are in place for a client to finally request extended headers. Note that we must not request extended headers when qemu-nbd is used to connect to the kernel module (as nbd.ko does not expect them, but expects us to do the negotiation in userspace before handing the socket over to the kernel), but there is no harm in all other clients requesting them. Extended headers are not essential to the information collected during 'qemu-nbd --list', but probing for it gives us one more piece of information in that output. Update the iotests affected by the new line of output. Signed-off-by: Eric Blake Reviewed-by: Vladimir Sementsov-Ogievskiy Message-ID: <20230925192229.3186470-23-ebl...@redhat.com> --- nbd/client-connection.c | 2 +- nbd/client.c | 20 ++- qemu-nbd.c| 3 +++ tests/qemu-iotests/223.out| 6 ++ tests/qemu-iotests/233.out| 4 tests/qemu-iotests/241.out| 3 +++ tests/qemu-iotests/307.out| 5 + .../tests/nbd-qemu-allocation.out | 1 + 8 files changed, 38 insertions(+), 6 deletions(-) diff --git a/nbd/client-connection.c b/nbd/client-connection.c index aa0201b7107..f9da67c87e3 100644 --- a/nbd/client-connection.c +++ b/nbd/client-connection.c @@ -93,7 +93,7 @@ NBDClientConnection *nbd_client_connection_new(const SocketAddress *saddr, .do_negotiation = do_negotiation, .initial_info.request_sizes = true, -.initial_info.mode = NBD_MODE_STRUCTURED, +.initial_info.mode = NBD_MODE_EXTENDED, .initial_info.base_allocation = true, .initial_info.x_dirty_bitmap = g_strdup(x_dirty_bitmap), .initial_info.name = g_strdup(export_name ?: "") diff --git a/nbd/client.c b/nbd/client.c index a2f253062aa..29ffc609a4b 100644 --- a/nbd/client.c +++ b/nbd/client.c @@ -953,15 +953,23 @@ static int nbd_start_negotiate(QIOChannel *ioc, QCryptoTLSCreds *tlscreds, if (fixedNewStyle) { int result = 0; +if (max_mode >= NBD_MODE_EXTENDED) { +result = nbd_request_simple_option(ioc, + NBD_OPT_EXTENDED_HEADERS, + false, errp); +if (result) { +return result < 0 ? -EINVAL : NBD_MODE_EXTENDED; +} +} if (max_mode >= NBD_MODE_STRUCTURED) { result = nbd_request_simple_option(ioc, NBD_OPT_STRUCTURED_REPLY, false, errp); -if (result < 0) { -return -EINVAL; +if (result) { +return result < 0 ? -EINVAL : NBD_MODE_STRUCTURED; } } -return result ? NBD_MODE_STRUCTURED : NBD_MODE_SIMPLE; +return NBD_MODE_SIMPLE; } else { return NBD_MODE_EXPORT_NAME; } @@ -1034,6 +1042,7 @@ int nbd_receive_negotiate(QIOChannel *ioc, QCryptoTLSCreds *tlscreds, } switch (info->mode) { +case NBD_MODE_EXTENDED: case NBD_MODE_STRUCTURED: if (base_allocation) { result = nbd_negotiate_simple_meta_context(ioc, info, errp); @@ -1144,7 +1153,7 @@ int nbd_receive_export_list(QIOChannel *ioc, QCryptoTLSCreds *tlscreds, *info = NULL; result = nbd_start_negotiate(ioc, tlscreds, hostname, , - NBD_MODE_STRUCTURED, NULL, errp); + NBD_MODE_EXTENDED, NULL, errp); if (tlscreds && sioc) { ioc = sioc; } @@ -1155,6 +1164,7 @@ int nbd_receive_export_list(QIOChannel *ioc, QCryptoTLSCreds *tlscreds, switch ((NBDMode)result) { case NBD_MODE_SIMPLE: case NBD_MODE_STRUCTURED: +case NBD_MODE_EXTENDED: /* newstyle - use NBD_OPT_LIST to populate array, then try * NBD_OPT_INFO on each array member. If structured replies * are enabled, also try NBD_OPT_LIST_META_CONTEXT. */ @@ -1191,7 +1201,7 @@ int nbd_receive_export_list(QIOChannel *ioc, QCryptoTLSCreds *tlscreds, break; } -if (result == NBD_MODE_STRUCTURED && +if (result >= NBD_MODE_STRUCTURED && nbd_list_meta_contexts(ioc, [i], errp) < 0) { goto out; } diff --git a/qemu-nbd.c b/qemu-nbd.c index 54faa87a0c0..1a39bb8fac0 100644 --- a/qemu-nbd.c +++ b/qemu-nbd.c @@ -235,6 +235,9 @@ static int qemu_nbd_client_list(SocketAddress *saddr, QCryptoTLSCreds *tls, printf(" opt block: %u\n", list[i].opt_block); printf(" max block: %u\n", list[i].max_block); } +printf(" transaction size: %s\n", + list[i].mode >=
[PULL 06/15] nbd/server: Prepare to send extended header replies
Although extended mode is not yet enabled, once we do turn it on, we need to reply with extended headers to all messages. Update the low level entry points necessary so that all other callers automatically get the right header based on the current mode. Signed-off-by: Eric Blake Reviewed-by: Vladimir Sementsov-Ogievskiy Message-ID: <20230925192229.3186470-17-ebl...@redhat.com> --- nbd/server.c | 30 ++ 1 file changed, 22 insertions(+), 8 deletions(-) diff --git a/nbd/server.c b/nbd/server.c index 501749d62b5..910c48c6467 100644 --- a/nbd/server.c +++ b/nbd/server.c @@ -1938,8 +1938,6 @@ static inline void set_be_chunk(NBDClient *client, struct iovec *iov, size_t niov, uint16_t flags, uint16_t type, NBDRequest *request) { -/* TODO - handle structured vs. extended replies */ -NBDStructuredReplyChunk *chunk = iov->iov_base; size_t i, length = 0; for (i = 1; i < niov; i++) { @@ -1947,12 +1945,26 @@ static inline void set_be_chunk(NBDClient *client, struct iovec *iov, } assert(length <= NBD_MAX_BUFFER_SIZE + sizeof(NBDStructuredReadData)); -iov[0].iov_len = sizeof(*chunk); -stl_be_p(>magic, NBD_STRUCTURED_REPLY_MAGIC); -stw_be_p(>flags, flags); -stw_be_p(>type, type); -stq_be_p(>cookie, request->cookie); -stl_be_p(>length, length); +if (client->mode >= NBD_MODE_EXTENDED) { +NBDExtendedReplyChunk *chunk = iov->iov_base; + +iov[0].iov_len = sizeof(*chunk); +stl_be_p(>magic, NBD_EXTENDED_REPLY_MAGIC); +stw_be_p(>flags, flags); +stw_be_p(>type, type); +stq_be_p(>cookie, request->cookie); +stq_be_p(>offset, request->from); +stq_be_p(>length, length); +} else { +NBDStructuredReplyChunk *chunk = iov->iov_base; + +iov[0].iov_len = sizeof(*chunk); +stl_be_p(>magic, NBD_STRUCTURED_REPLY_MAGIC); +stw_be_p(>flags, flags); +stw_be_p(>type, type); +stq_be_p(>cookie, request->cookie); +stl_be_p(>length, length); +} } static int coroutine_fn nbd_co_send_chunk_done(NBDClient *client, @@ -2512,6 +2524,8 @@ static coroutine_fn int nbd_send_generic_reply(NBDClient *client, { if (client->mode >= NBD_MODE_STRUCTURED && ret < 0) { return nbd_co_send_chunk_error(client, request, -ret, error_msg, errp); +} else if (client->mode >= NBD_MODE_EXTENDED) { +return nbd_co_send_chunk_done(client, request, errp); } else { return nbd_co_send_simple_reply(client, request, ret < 0 ? -ret : 0, NULL, 0, errp); -- 2.41.0
[PULL 09/15] nbd/client: Plumb errp through nbd_receive_replies
Instead of ignoring the low-level error just to refabricate our own message to pass to the caller, we can just plumb the caller's errp down to the low level. Signed-off-by: Eric Blake Message-ID: <20230925192229.3186470-20-ebl...@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy --- block/nbd.c | 18 -- 1 file changed, 12 insertions(+), 6 deletions(-) diff --git a/block/nbd.c b/block/nbd.c index 4a7f37da1c6..22d3cb11ac8 100644 --- a/block/nbd.c +++ b/block/nbd.c @@ -416,7 +416,8 @@ static void coroutine_fn GRAPH_RDLOCK nbd_reconnect_attempt(BDRVNBDState *s) reconnect_delay_timer_del(s); } -static coroutine_fn int nbd_receive_replies(BDRVNBDState *s, uint64_t cookie) +static coroutine_fn int nbd_receive_replies(BDRVNBDState *s, uint64_t cookie, +Error **errp) { int ret; uint64_t ind = COOKIE_TO_INDEX(cookie), ind2; @@ -457,20 +458,25 @@ static coroutine_fn int nbd_receive_replies(BDRVNBDState *s, uint64_t cookie) /* We are under mutex and cookie is 0. We have to do the dirty work. */ assert(s->reply.cookie == 0); -ret = nbd_receive_reply(s->bs, s->ioc, >reply, NULL); -if (ret <= 0) { -ret = ret ? ret : -EIO; +ret = nbd_receive_reply(s->bs, s->ioc, >reply, errp); +if (ret == 0) { +ret = -EIO; +error_setg(errp, "server dropped connection"); +} +if (ret < 0) { nbd_channel_error(s, ret); return ret; } if (nbd_reply_is_structured(>reply) && s->info.mode < NBD_MODE_STRUCTURED) { nbd_channel_error(s, -EINVAL); +error_setg(errp, "unexpected structured reply"); return -EINVAL; } ind2 = COOKIE_TO_INDEX(s->reply.cookie); if (ind2 >= MAX_NBD_REQUESTS || !s->requests[ind2].coroutine) { nbd_channel_error(s, -EINVAL); +error_setg(errp, "unexpected cookie value"); return -EINVAL; } if (s->reply.cookie == cookie) { @@ -842,9 +848,9 @@ static coroutine_fn int nbd_co_do_receive_one_chunk( } *request_ret = 0; -ret = nbd_receive_replies(s, cookie); +ret = nbd_receive_replies(s, cookie, errp); if (ret < 0) { -error_setg(errp, "Connection closed"); +error_prepend(errp, "Connection closed: "); return -EIO; } assert(s->ioc); -- 2.41.0
[PULL 07/15] nbd/server: Support 64-bit block status
The NBD spec states that if the client negotiates extended headers, the server must avoid NBD_REPLY_TYPE_BLOCK_STATUS and instead use NBD_REPLY_TYPE_BLOCK_STATUS_EXT which supports 64-bit lengths, even if the reply does not need more than 32 bits. As of this patch, client->mode is still never NBD_MODE_EXTENDED, so the code added here does not take effect until the next patch enables negotiation. For now, all metacontexts that we know how to export never populate more than 32 bits of information, so we don't have to worry about NBD_REP_ERR_EXT_HEADER_REQD or filtering during handshake, and we always send all zeroes for the upper 32 bits of status during NBD_CMD_BLOCK_STATUS. Note that we previously had some interesting size-juggling on call chains, such as: nbd_co_send_block_status(uint32_t length) -> blockstatus_to_extents(uint32_t bytes) -> bdrv_block_status_above(bytes, _t num) -> nbd_extent_array_add(uint64_t num) -> store num in 32-bit length But we were lucky that it never overflowed: bdrv_block_status_above never sets num larger than bytes, and we had previously been capping 'bytes' at 32 bits (since the protocol does not allow sending a larger request without extended headers). This patch adds some assertions that ensure we continue to avoid overflowing 32 bits for a narrow client, while fully utilizing 64-bits all the way through when the client understands that. Even in 64-bit math, overflow is not an issue, because all lengths are coming from the block layer, and we know that the block layer does not support images larger than off_t (if lengths were coming from the network, the story would be different). Signed-off-by: Eric Blake Reviewed-by: Vladimir Sementsov-Ogievskiy Message-ID: <20230925192229.3186470-18-ebl...@redhat.com> --- nbd/server.c | 108 ++- 1 file changed, 82 insertions(+), 26 deletions(-) diff --git a/nbd/server.c b/nbd/server.c index 910c48c6467..183efe27bf8 100644 --- a/nbd/server.c +++ b/nbd/server.c @@ -2103,20 +2103,24 @@ static int coroutine_fn nbd_co_send_sparse_read(NBDClient *client, } typedef struct NBDExtentArray { -NBDExtent32 *extents; +NBDExtent64 *extents; unsigned int nb_alloc; unsigned int count; uint64_t total_length; +bool extended; bool can_add; bool converted_to_be; } NBDExtentArray; -static NBDExtentArray *nbd_extent_array_new(unsigned int nb_alloc) +static NBDExtentArray *nbd_extent_array_new(unsigned int nb_alloc, +NBDMode mode) { NBDExtentArray *ea = g_new0(NBDExtentArray, 1); +assert(mode >= NBD_MODE_STRUCTURED); ea->nb_alloc = nb_alloc; -ea->extents = g_new(NBDExtent32, nb_alloc); +ea->extents = g_new(NBDExtent64, nb_alloc); +ea->extended = mode >= NBD_MODE_EXTENDED; ea->can_add = true; return ea; @@ -2135,15 +2139,36 @@ static void nbd_extent_array_convert_to_be(NBDExtentArray *ea) int i; assert(!ea->converted_to_be); +assert(ea->extended); ea->can_add = false; ea->converted_to_be = true; for (i = 0; i < ea->count; i++) { -ea->extents[i].flags = cpu_to_be32(ea->extents[i].flags); -ea->extents[i].length = cpu_to_be32(ea->extents[i].length); +ea->extents[i].length = cpu_to_be64(ea->extents[i].length); +ea->extents[i].flags = cpu_to_be64(ea->extents[i].flags); } } +/* Further modifications of the array after conversion are abandoned */ +static NBDExtent32 *nbd_extent_array_convert_to_narrow(NBDExtentArray *ea) +{ +int i; +NBDExtent32 *extents = g_new(NBDExtent32, ea->count); + +assert(!ea->converted_to_be); +assert(!ea->extended); +ea->can_add = false; +ea->converted_to_be = true; + +for (i = 0; i < ea->count; i++) { +assert((ea->extents[i].length | ea->extents[i].flags) <= UINT32_MAX); +extents[i].length = cpu_to_be32(ea->extents[i].length); +extents[i].flags = cpu_to_be32(ea->extents[i].flags); +} + +return extents; +} + /* * Add extent to NBDExtentArray. If extent can't be added (no available space), * return -1. @@ -2154,19 +2179,27 @@ static void nbd_extent_array_convert_to_be(NBDExtentArray *ea) * would result in an incorrect range reported to the client) */ static int nbd_extent_array_add(NBDExtentArray *ea, -uint32_t length, uint32_t flags) +uint64_t length, uint32_t flags) { assert(ea->can_add); if (!length) { return 0; } +if (!ea->extended) { +assert(length <= UINT32_MAX); +} /* Extend previous extent if flags are the same */ if (ea->count > 0 && flags == ea->extents[ea->count - 1].flags) { -uint64_t sum = (uint64_t)length + ea->extents[ea->count - 1].length; +uint64_t sum = length + ea->extents[ea->count - 1].length; -if (sum <= UINT32_MAX) { +/* + * sum
[PULL 03/15] mailmap: Fix BALATON Zoltan author email
This fixes authorship of commits 5cbd51a5 and friends, where the qemu-ppc mailing list rewrote the "From:" field in the corresponding patches. See commit 3bd2608db7 ("maint: Add .mailmap entries for patches claiming list authorship") for explanation. Signed-off-by: Eric Blake Message-ID: <20230927143815.3397386-8-ebl...@redhat.com> --- .mailmap | 1 + 1 file changed, 1 insertion(+) diff --git a/.mailmap b/.mailmap index fadf6e74506..d2149592887 100644 --- a/.mailmap +++ b/.mailmap @@ -59,6 +59,7 @@ Julia Suvorova Julia Suvorova via Qemu-devel Justin Terry (VM) via Qemu-devel Stefan Weil Stefan Weil via Andrey Drobyshev Andrey Drobyshev via +BALATON Zoltan BALATON Zoltan via # Next, replace old addresses by a more recent one. Aleksandar Markovic -- 2.41.0
[PULL 00/15] NBD patches through 2023-10-05
The following changes since commit 2f3913f4b2ad74baeb5a6f1d36efbd9ecdf1057d: Merge tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu into staging (2023-10-05 09:01:01 -0400) are available in the Git repository at: https://repo.or.cz/qemu/ericb.git tags/pull-nbd-2023-10-05 for you to fetch changes up to 2dcbb11b399ada51f734229b612e4f561a2aae0a: nbd/server: Add FLAG_PAYLOAD support to CMD_BLOCK_STATUS (2023-10-05 11:02:08 -0500) NBD patches for 2023-10-05 - various: mailmap cleanups - Eric Blake: enable use of NBD 64-bit extended headers Andrey Drobyshev (1): mailmap: Fix Andrey Drobyshev author email Eric Blake (14): maint: Tweak comments in mailmap regarding SPF mailmap: Fix BALATON Zoltan author email nbd/server: Support a request payload nbd/server: Prepare to receive extended header requests nbd/server: Prepare to send extended header replies nbd/server: Support 64-bit block status nbd/server: Enable initial support for extended headers nbd/client: Plumb errp through nbd_receive_replies nbd/client: Initial support for extended headers nbd/client: Accept 64-bit block status chunks nbd/client: Request extended headers during negotiation nbd/server: Refactor list of negotiated meta contexts nbd/server: Prepare for per-request filtering of BLOCK_STATUS nbd/server: Add FLAG_PAYLOAD support to CMD_BLOCK_STATUS docs/interop/nbd.txt | 1 + include/block/nbd.h | 5 +- nbd/nbd-internal.h | 5 +- block/nbd.c | 67 ++-- nbd/client-connection.c | 2 +- nbd/client.c | 124 --- nbd/server.c | 426 ++- qemu-nbd.c | 4 + .mailmap | 16 +- block/trace-events | 1 + nbd/trace-events | 5 +- tests/qemu-iotests/223.out | 18 +- tests/qemu-iotests/233.out | 4 + tests/qemu-iotests/241.out | 3 + tests/qemu-iotests/307.out | 15 +- tests/qemu-iotests/tests/nbd-qemu-allocation.out | 3 +- 16 files changed, 538 insertions(+), 161 deletions(-) -- 2.41.0
[PULL 02/15] maint: Tweak comments in mailmap regarding SPF
Documenting that we should not add new lines to work around SPF rewrites sounds foreboding; the intent is instead that new lines here are okay, but indicate a second problem elsewhere in our build process that we should also consider fixing at the same time, to keep the section from growing without bounds. While we have been doing that for qemu-devel for a while, we jut recently fixed that for qemu-block: https://git.linaro.org/people/pmaydell/misc-scripts.git/commit/?id=f9a317392 Mentioning DMARC alongside SPF may also help people grep for this scenario, as well as documenting the 'git config' workaround that can be used by submitters to avoid the munging issue in the first place. Note the subtlety: 'git commit' sets authorship information based on user.name and user.email (where name is usually unquoted); while 'git send-email' includes a body 'From:' line only when sendemail.from is present but differs from authorship information. Hence the use of quotes in sendemail.from (not a semantic change to email, but enough of a difference to add the body 'From:'). Fixes: 3bd2608d ("maint: Add .mailmap entries for patches claiming list authorship") CC: Andrey Drobyshev Cc: Peter Maydell Signed-off-by: Eric Blake Message-ID: <20230927143815.3397386-7-ebl...@redhat.com> Reviewed-by: Peter Maydell --- .mailmap | 14 +- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/.mailmap b/.mailmap index 04a7feb005b..fadf6e74506 100644 --- a/.mailmap +++ b/.mailmap @@ -40,7 +40,19 @@ Nick Hudson hn...@vmware.com # for the cvs2svn initialization commit e63c3dc74bf. # Next, translate a few commits where mailman rewrote the From: line due -# to strict SPF, although we prefer to avoid adding more entries like that. +# to strict SPF and DMARC. Usually, our build process should be flagging +# commits like these before maintainer merges; if you find the need to add +# a line here, please also report a bug against the part of the build +# process that let the mis-attribution slip through in the first place. +# +# If the mailing list munges your emails, use: +# git config sendemail.from '"Your Name" ' +# the use of "" in that line will differ from the typically unquoted +# 'git config user.name', which in turn is sufficient for 'git send-email' +# to add an extra From: line in the body of your email that takes +# precedence over any munged From: in the mail's headers. +# See https://lists.openembedded.org/g/openembedded-core/message/166515 +# and https://lists.gnu.org/archive/html/qemu-devel/2023-09/msg06784.html Ed Swierk Ed Swierk via Qemu-devel Ian McKellar Ian McKellar via Qemu-devel Julia Suvorova Julia Suvorova via Qemu-devel -- 2.41.0
[PULL 04/15] nbd/server: Support a request payload
Upcoming additions to support NBD 64-bit effect lengths allow for the possibility to distinguish between payload length (capped at 32M) and effect length (64 bits, although we generally assume 63 bits because of off_t limitations). Without that extension, only the NBD_CMD_WRITE request has a payload; but with the extension, it makes sense to allow at least NBD_CMD_BLOCK_STATUS to have both a payload and effect length in a future patch (where the payload is a limited-size struct that in turn gives the real effect length as well as a subset of known ids for which status is requested). Other future NBD commands may also have a request payload, so the 64-bit extension introduces a new NBD_CMD_FLAG_PAYLOAD_LEN that distinguishes between whether the header length is a payload length or an effect length, rather than hard-coding the decision based on the command. According to the spec, a client should never send a command with a payload without the negotiation phase proving such extension is available. So in the unlikely event the bit is set or cleared incorrectly, the client is already at fault; if the client then provides the payload, we can gracefully consume it off the wire and fail the command with NBD_EINVAL (subsequent checks for magic numbers ensure we are still in sync), while if the client fails to send payload we block waiting for it (basically deadlocking our connection to the bad client, but not negatively impacting our ability to service other clients, so not a security risk). Note that we do not support the payload version of BLOCK_STATUS yet. This patch also fixes a latent bug introduced in b2578459: once request->len can be 64 bits, assigning it to a 32-bit payload_len can cause wraparound to 0 which then sets req->complete prematurely; thankfully, the bug was not possible back then (it takes this and later patches to even allow request->len larger than 32 bits; and since previously the only 'payload_len = request->len' assignment was in NBD_CMD_WRITE which also sets check_length, which in turn rejects lengths larger than 32M before relying on any possibly-truncated value stored in payload_len). Signed-off-by: Eric Blake Message-ID: <20230925192229.3186470-15-ebl...@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy [eblake: enhance comment on handling client error, fix type bug] Signed-off-by: Eric Blake --- nbd/server.c | 42 +- nbd/trace-events | 1 + 2 files changed, 38 insertions(+), 5 deletions(-) diff --git a/nbd/server.c b/nbd/server.c index 7a6f95071f8..56e9b41828e 100644 --- a/nbd/server.c +++ b/nbd/server.c @@ -2322,10 +2322,12 @@ static int coroutine_fn nbd_co_receive_request(NBDRequestData *req, Error **errp) { NBDClient *client = req->client; +bool extended_with_payload; bool check_length = false; bool check_rofs = false; bool allocate_buffer = false; -unsigned payload_len = 0; +bool payload_okay = false; +uint64_t payload_len = 0; int valid_flags = NBD_CMD_FLAG_FUA; int ret; @@ -2338,6 +2340,13 @@ static int coroutine_fn nbd_co_receive_request(NBDRequestData *req, trace_nbd_co_receive_request_decode_type(request->cookie, request->type, nbd_cmd_lookup(request->type)); +extended_with_payload = client->mode >= NBD_MODE_EXTENDED && +request->flags & NBD_CMD_FLAG_PAYLOAD_LEN; +if (extended_with_payload) { +payload_len = request->len; +check_length = true; +} + switch (request->type) { case NBD_CMD_DISC: /* Special case: we're going to disconnect without a reply, @@ -2354,6 +2363,15 @@ static int coroutine_fn nbd_co_receive_request(NBDRequestData *req, break; case NBD_CMD_WRITE: +if (client->mode >= NBD_MODE_EXTENDED) { +if (!extended_with_payload) { +/* The client is noncompliant. Trace it, but proceed. */ +trace_nbd_co_receive_ext_payload_compliance(request->from, +request->len); +} +valid_flags |= NBD_CMD_FLAG_PAYLOAD_LEN; +} +payload_okay = true; payload_len = request->len; check_length = true; allocate_buffer = true; @@ -2395,6 +2413,16 @@ static int coroutine_fn nbd_co_receive_request(NBDRequestData *req, request->len, NBD_MAX_BUFFER_SIZE); return -EINVAL; } +if (payload_len && !payload_okay) { +/* + * For now, we don't support payloads on other commands; but + * we can keep the connection alive by ignoring the payload. + * We will fail the command later with NBD_EINVAL for the use + * of an unsupported flag (and not for access beyond bounds). + */ +assert(request->type != NBD_CMD_WRITE); +request->len = 0; +} if
[PULL 05/15] nbd/server: Prepare to receive extended header requests
Although extended mode is not yet enabled, once we do turn it on, we need to accept extended requests for all messages. Previous patches have already taken care of supporting 64-bit lengths, now we just need to read it off the wire. Note that this implementation will block indefinitely on a buggy client that sends a non-extended payload (that is, we try to read a full packet before we ever check the magic number, but a client that mistakenly sends a simple request after negotiating extended headers doesn't send us enough bytes), but it's no different from any other client that stops talking to us partway through a packet and thus not worth coding around. Signed-off-by: Eric Blake Reviewed-by: Vladimir Sementsov-Ogievskiy Message-ID: <20230925192229.3186470-16-ebl...@redhat.com> --- nbd/nbd-internal.h | 5 - nbd/server.c | 43 ++- 2 files changed, 34 insertions(+), 14 deletions(-) diff --git a/nbd/nbd-internal.h b/nbd/nbd-internal.h index 133b1d94b50..dfa02f77ee4 100644 --- a/nbd/nbd-internal.h +++ b/nbd/nbd-internal.h @@ -34,8 +34,11 @@ * https://github.com/yoe/nbd/blob/master/doc/proto.md */ -/* Size of all NBD_OPT_*, without payload */ +/* Size of all compact NBD_CMD_*, without payload */ #define NBD_REQUEST_SIZE(4 + 2 + 2 + 8 + 8 + 4) +/* Size of all extended NBD_CMD_*, without payload */ +#define NBD_EXTENDED_REQUEST_SIZE (4 + 2 + 2 + 8 + 8 + 8) + /* Size of all NBD_REP_* sent in answer to most NBD_OPT_*, without payload */ #define NBD_REPLY_SIZE (4 + 4 + 8) /* Size of reply to NBD_OPT_EXPORT_NAME */ diff --git a/nbd/server.c b/nbd/server.c index 56e9b41828e..501749d62b5 100644 --- a/nbd/server.c +++ b/nbd/server.c @@ -1411,11 +1411,13 @@ nbd_read_eof(NBDClient *client, void *buffer, size_t size, Error **errp) static int coroutine_fn nbd_receive_request(NBDClient *client, NBDRequest *request, Error **errp) { -uint8_t buf[NBD_REQUEST_SIZE]; -uint32_t magic; +uint8_t buf[NBD_EXTENDED_REQUEST_SIZE]; +uint32_t magic, expect; int ret; +size_t size = client->mode >= NBD_MODE_EXTENDED ? +NBD_EXTENDED_REQUEST_SIZE : NBD_REQUEST_SIZE; -ret = nbd_read_eof(client, buf, sizeof(buf), errp); +ret = nbd_read_eof(client, buf, size, errp); if (ret < 0) { return ret; } @@ -1423,13 +1425,21 @@ static int coroutine_fn nbd_receive_request(NBDClient *client, NBDRequest *reque return -EIO; } -/* Request - [ 0 .. 3] magic (NBD_REQUEST_MAGIC) - [ 4 .. 5] flags (NBD_CMD_FLAG_FUA, ...) - [ 6 .. 7] type(NBD_CMD_READ, ...) - [ 8 .. 15] cookie - [16 .. 23] from - [24 .. 27] len +/* + * Compact request + * [ 0 .. 3] magic (NBD_REQUEST_MAGIC) + * [ 4 .. 5] flags (NBD_CMD_FLAG_FUA, ...) + * [ 6 .. 7] type(NBD_CMD_READ, ...) + * [ 8 .. 15] cookie + * [16 .. 23] from + * [24 .. 27] len + * Extended request + * [ 0 .. 3] magic (NBD_EXTENDED_REQUEST_MAGIC) + * [ 4 .. 5] flags (NBD_CMD_FLAG_FUA, NBD_CMD_FLAG_PAYLOAD_LEN, ...) + * [ 6 .. 7] type(NBD_CMD_READ, ...) + * [ 8 .. 15] cookie + * [16 .. 23] from + * [24 .. 31] len */ magic = ldl_be_p(buf); @@ -1437,13 +1447,20 @@ static int coroutine_fn nbd_receive_request(NBDClient *client, NBDRequest *reque request->type = lduw_be_p(buf + 6); request->cookie = ldq_be_p(buf + 8); request->from = ldq_be_p(buf + 16); -request->len= (uint32_t)ldl_be_p(buf + 24); /* widen 32 to 64 bits */ +if (client->mode >= NBD_MODE_EXTENDED) { +request->len = ldq_be_p(buf + 24); +expect = NBD_EXTENDED_REQUEST_MAGIC; +} else { +request->len = (uint32_t)ldl_be_p(buf + 24); /* widen 32 to 64 bits */ +expect = NBD_REQUEST_MAGIC; +} trace_nbd_receive_request(magic, request->flags, request->type, request->from, request->len); -if (magic != NBD_REQUEST_MAGIC) { -error_setg(errp, "invalid magic (got 0x%" PRIx32 ")", magic); +if (magic != expect) { +error_setg(errp, "invalid magic (got 0x%" PRIx32 ", expected 0x%" + PRIx32 ")", magic, expect); return -EINVAL; } return 0; -- 2.41.0
[PULL 14/15] nbd/server: Prepare for per-request filtering of BLOCK_STATUS
The next commit will add support for the optional extension NBD_CMD_FLAG_PAYLOAD during NBD_CMD_BLOCK_STATUS, where the client can request that the server only return a subset of negotiated contexts, rather than all contexts. To make that task easier, this patch populates the list of contexts to return on a per-command basis (for now, identical to the full set of negotiated contexts). Signed-off-by: Eric Blake Message-ID: <20230925192229.3186470-25-ebl...@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy --- include/block/nbd.h | 1 + nbd/server.c| 22 +- 2 files changed, 18 insertions(+), 5 deletions(-) diff --git a/include/block/nbd.h b/include/block/nbd.h index 2006497f987..4e7bd6342f9 100644 --- a/include/block/nbd.h +++ b/include/block/nbd.h @@ -77,6 +77,7 @@ typedef struct NBDRequest { uint16_t flags; /* NBD_CMD_FLAG_* */ uint16_t type; /* NBD_CMD_* */ NBDMode mode; /* Determines which network representation to use */ +NBDMetaContexts *contexts; /* Used by NBD_CMD_BLOCK_STATUS */ } NBDRequest; typedef struct NBDSimpleReply { diff --git a/nbd/server.c b/nbd/server.c index 2719992db7b..2dce9c3ad6a 100644 --- a/nbd/server.c +++ b/nbd/server.c @@ -2505,6 +2505,7 @@ static int coroutine_fn nbd_co_receive_request(NBDRequestData *req, break; case NBD_CMD_BLOCK_STATUS: +request->contexts = >contexts; valid_flags |= NBD_CMD_FLAG_REQ_ONE; break; @@ -2748,17 +2749,18 @@ static coroutine_fn int nbd_handle_request(NBDClient *client, "discard failed", errp); case NBD_CMD_BLOCK_STATUS: +assert(request->contexts); if (!request->len) { return nbd_send_generic_reply(client, request, -EINVAL, "need non-zero length", errp); } assert(client->mode >= NBD_MODE_EXTENDED || request->len <= UINT32_MAX); -if (client->contexts.count) { +if (request->contexts->count) { bool dont_fragment = request->flags & NBD_CMD_FLAG_REQ_ONE; -int contexts_remaining = client->contexts.count; +int contexts_remaining = request->contexts->count; -if (client->contexts.base_allocation) { +if (request->contexts->base_allocation) { ret = nbd_co_send_block_status(client, request, exp->common.blk, request->from, @@ -2771,7 +2773,7 @@ static coroutine_fn int nbd_handle_request(NBDClient *client, } } -if (client->contexts.allocation_depth) { +if (request->contexts->allocation_depth) { ret = nbd_co_send_block_status(client, request, exp->common.blk, request->from, request->len, @@ -2784,8 +2786,9 @@ static coroutine_fn int nbd_handle_request(NBDClient *client, } } +assert(request->contexts->exp == client->exp); for (i = 0; i < client->exp->nr_export_bitmaps; i++) { -if (!client->contexts.bitmaps[i]) { +if (!request->contexts->bitmaps[i]) { continue; } ret = nbd_co_send_bitmap(client, request, @@ -2801,6 +2804,10 @@ static coroutine_fn int nbd_handle_request(NBDClient *client, assert(!contexts_remaining); return 0; +} else if (client->contexts.count) { +return nbd_send_generic_reply(client, request, -EINVAL, + "CMD_BLOCK_STATUS payload not valid", + errp); } else { return nbd_send_generic_reply(client, request, -EINVAL, "CMD_BLOCK_STATUS not negotiated", @@ -2879,6 +2886,11 @@ static coroutine_fn void nbd_trip(void *opaque) } else { ret = nbd_handle_request(client, , req->data, _err); } +if (request.contexts && request.contexts != >contexts) { +assert(request.type == NBD_CMD_BLOCK_STATUS); +g_free(request.contexts->bitmaps); +g_free(request.contexts); +} if (ret < 0) { error_prepend(_err, "Failed to send reply: "); goto disconnect; -- 2.41.0
[PULL 10/15] nbd/client: Initial support for extended headers
Update the client code to be able to send an extended request, and parse an extended header from the server. Note that since we reject any structured reply with a too-large payload, we can always normalize a valid header back into the compact form, so that the caller need not deal with two branches of a union. Still, until a later patch lets the client negotiate extended headers, the code added here should not be reached. Note that because of the different magic numbers, it is just as easy to trace and then tolerate a non-compliant server sending the wrong header reply as it would be to insist that the server is compliant. Signed-off-by: Eric Blake Message-ID: <20230925192229.3186470-21-ebl...@redhat.com> [eblake: fix trace format] Reviewed-by: Vladimir Sementsov-Ogievskiy --- include/block/nbd.h | 3 +- block/nbd.c | 2 +- nbd/client.c| 104 +--- nbd/trace-events| 3 +- 4 files changed, 74 insertions(+), 38 deletions(-) diff --git a/include/block/nbd.h b/include/block/nbd.h index 8a765e78dfb..ba8724f5336 100644 --- a/include/block/nbd.h +++ b/include/block/nbd.h @@ -389,7 +389,8 @@ int nbd_init(int fd, QIOChannelSocket *sioc, NBDExportInfo *info, Error **errp); int nbd_send_request(QIOChannel *ioc, NBDRequest *request); int coroutine_fn nbd_receive_reply(BlockDriverState *bs, QIOChannel *ioc, - NBDReply *reply, Error **errp); + NBDReply *reply, NBDMode mode, + Error **errp); int nbd_client(int fd); int nbd_disconnect(int fd); int nbd_errno_to_system_errno(int err); diff --git a/block/nbd.c b/block/nbd.c index 22d3cb11ac8..76461430411 100644 --- a/block/nbd.c +++ b/block/nbd.c @@ -458,7 +458,7 @@ static coroutine_fn int nbd_receive_replies(BDRVNBDState *s, uint64_t cookie, /* We are under mutex and cookie is 0. We have to do the dirty work. */ assert(s->reply.cookie == 0); -ret = nbd_receive_reply(s->bs, s->ioc, >reply, errp); +ret = nbd_receive_reply(s->bs, s->ioc, >reply, s->info.mode, errp); if (ret == 0) { ret = -EIO; error_setg(errp, "server dropped connection"); diff --git a/nbd/client.c b/nbd/client.c index cecb0f04377..a2f253062aa 100644 --- a/nbd/client.c +++ b/nbd/client.c @@ -1346,22 +1346,29 @@ int nbd_disconnect(int fd) int nbd_send_request(QIOChannel *ioc, NBDRequest *request) { -uint8_t buf[NBD_REQUEST_SIZE]; +uint8_t buf[NBD_EXTENDED_REQUEST_SIZE]; +size_t len; -assert(request->mode <= NBD_MODE_STRUCTURED); /* TODO handle extended */ -assert(request->len <= UINT32_MAX); trace_nbd_send_request(request->from, request->len, request->cookie, request->flags, request->type, nbd_cmd_lookup(request->type)); -stl_be_p(buf, NBD_REQUEST_MAGIC); stw_be_p(buf + 4, request->flags); stw_be_p(buf + 6, request->type); stq_be_p(buf + 8, request->cookie); stq_be_p(buf + 16, request->from); -stl_be_p(buf + 24, request->len); +if (request->mode >= NBD_MODE_EXTENDED) { +stl_be_p(buf, NBD_EXTENDED_REQUEST_MAGIC); +stq_be_p(buf + 24, request->len); +len = NBD_EXTENDED_REQUEST_SIZE; +} else { +assert(request->len <= UINT32_MAX); +stl_be_p(buf, NBD_REQUEST_MAGIC); +stl_be_p(buf + 24, request->len); +len = NBD_REQUEST_SIZE; +} -return nbd_write(ioc, buf, sizeof(buf), NULL); +return nbd_write(ioc, buf, len, NULL); } /* nbd_receive_simple_reply @@ -1388,30 +1395,36 @@ static int nbd_receive_simple_reply(QIOChannel *ioc, NBDSimpleReply *reply, return 0; } -/* nbd_receive_structured_reply_chunk +/* nbd_receive_reply_chunk_header * Read structured reply chunk except magic field (which should be already - * read). + * read). Normalize into the compact form. * Payload is not read. */ -static int nbd_receive_structured_reply_chunk(QIOChannel *ioc, - NBDStructuredReplyChunk *chunk, - Error **errp) +static int nbd_receive_reply_chunk_header(QIOChannel *ioc, NBDReply *chunk, + Error **errp) { int ret; +size_t len; +uint64_t payload_len; -assert(chunk->magic == NBD_STRUCTURED_REPLY_MAGIC); +if (chunk->magic == NBD_STRUCTURED_REPLY_MAGIC) { +len = sizeof(chunk->structured); +} else { +assert(chunk->magic == NBD_EXTENDED_REPLY_MAGIC); +len = sizeof(chunk->extended); +} ret = nbd_read(ioc, (uint8_t *)chunk + sizeof(chunk->magic), - sizeof(*chunk) - sizeof(chunk->magic), "structured chunk", + len - sizeof(chunk->magic), "structured chunk", errp); if (ret < 0) { return ret; } -chunk->flags =
[PULL 15/15] nbd/server: Add FLAG_PAYLOAD support to CMD_BLOCK_STATUS
Allow a client to request a subset of negotiated meta contexts. For example, a client may ask to use a single connection to learn about both block status and dirty bitmaps, but where the dirty bitmap queries only need to be performed on a subset of the disk; forcing the server to compute that information on block status queries in the rest of the disk is wasted effort (both at the server, and on the amount of traffic sent over the wire to be parsed and ignored by the client). Qemu as an NBD client never requests to use more than one meta context, so it has no need to use block status payloads. Testing this instead requires support from libnbd, which CAN access multiple meta contexts in parallel from a single NBD connection; an interop test submitted to the libnbd project at the same time as this patch demonstrates the feature working, as well as testing some corner cases (for example, when the payload length is longer than the export length), although other corner cases (like passing the same id duplicated) requires a protocol fuzzer because libnbd is not wired up to break the protocol that badly. This also includes tweaks to 'qemu-nbd --list' to show when a server is advertising the capability, and to the testsuite to reflect the addition to that output. Of note: qemu will always advertise the new feature bit during NBD_OPT_INFO if extended headers have alreay been negotiated (regardless of whether any NBD_OPT_SET_META_CONTEXT negotiation has occurred); but for NBD_OPT_GO, qemu only advertises the feature if block status is also enabled (that is, if the client does not negotiate any contexts, then NBD_CMD_BLOCK_STATUS cannot be used, so the feature is not advertised). Signed-off-by: Eric Blake Message-ID: <20230925192229.3186470-26-ebl...@redhat.com> [eblake: fix logic to reject unnegotiated contexts] Signed-off-by: Eric Blake --- docs/interop/nbd.txt | 2 +- nbd/server.c | 117 -- qemu-nbd.c| 1 + nbd/trace-events | 1 + tests/qemu-iotests/223.out| 12 +- tests/qemu-iotests/307.out| 10 +- .../tests/nbd-qemu-allocation.out | 2 +- 7 files changed, 125 insertions(+), 20 deletions(-) diff --git a/docs/interop/nbd.txt b/docs/interop/nbd.txt index 9aae5e1f294..18efb251de9 100644 --- a/docs/interop/nbd.txt +++ b/docs/interop/nbd.txt @@ -69,4 +69,4 @@ NBD_CMD_BLOCK_STATUS for "qemu:dirty-bitmap:", NBD_CMD_CACHE NBD_CMD_FLAG_FAST_ZERO * 5.2: NBD_CMD_BLOCK_STATUS for "qemu:allocation-depth" * 7.1: NBD_FLAG_CAN_MULTI_CONN for shareable writable exports -* 8.2: NBD_OPT_EXTENDED_HEADERS +* 8.2: NBD_OPT_EXTENDED_HEADERS, NBD_FLAG_BLOCK_STATUS_PAYLOAD diff --git a/nbd/server.c b/nbd/server.c index 2dce9c3ad6a..859c163d19f 100644 --- a/nbd/server.c +++ b/nbd/server.c @@ -512,6 +512,9 @@ static int nbd_negotiate_handle_export_name(NBDClient *client, bool no_zeroes, if (client->mode >= NBD_MODE_STRUCTURED) { myflags |= NBD_FLAG_SEND_DF; } +if (client->mode >= NBD_MODE_EXTENDED && client->contexts.count) { +myflags |= NBD_FLAG_BLOCK_STAT_PAYLOAD; +} trace_nbd_negotiate_new_style_size_flags(client->exp->size, myflags); stq_be_p(buf, client->exp->size); stw_be_p(buf + 8, myflags); @@ -699,6 +702,10 @@ static int nbd_negotiate_handle_info(NBDClient *client, Error **errp) if (client->mode >= NBD_MODE_STRUCTURED) { myflags |= NBD_FLAG_SEND_DF; } +if (client->mode >= NBD_MODE_EXTENDED && +(client->contexts.count || client->opt == NBD_OPT_INFO)) { +myflags |= NBD_FLAG_BLOCK_STAT_PAYLOAD; +} trace_nbd_negotiate_new_style_size_flags(exp->size, myflags); stq_be_p(buf, exp->size); stw_be_p(buf + 8, myflags); @@ -2420,6 +2427,90 @@ static int coroutine_fn nbd_co_send_bitmap(NBDClient *client, return nbd_co_send_extents(client, request, ea, last, context_id, errp); } +/* + * nbd_co_block_status_payload_read + * Called when a client wants a subset of negotiated contexts via a + * BLOCK_STATUS payload. Check the payload for valid length and + * contents. On success, return 0 with request updated to effective + * length. If request was invalid but all payload consumed, return 0 + * with request->len and request->contexts->count set to 0 (which will + * trigger an appropriate NBD_EINVAL response later on). Return + * negative errno if the payload was not fully consumed. + */ +static int +nbd_co_block_status_payload_read(NBDClient *client, NBDRequest *request, + Error **errp) +{ +uint64_t payload_len = request->len; +g_autofree char *buf = NULL; +size_t count, i, nr_bitmaps; +uint32_t id; + +if (payload_len > NBD_MAX_BUFFER_SIZE) { +error_setg(errp, "len (%" PRIu64 ") is larger than max len (%u)", + request->len,
[PULL 08/15] nbd/server: Enable initial support for extended headers
Time to start supporting clients that request extended headers. Now we can finally reach the code added across several previous patches. Even though the NBD spec has been altered to allow us to accept NBD_CMD_READ larger than the max payload size (provided our response is a hole or broken up over more than one data chunk), we are not planning to take advantage of that, and continue to cap NBD_CMD_READ to 32M regardless of header size. For NBD_CMD_WRITE_ZEROES and NBD_CMD_TRIM, the block layer already supports 64-bit operations without any effort on our part. For NBD_CMD_BLOCK_STATUS, the client's length is a hint, and the previous patch took care of implementing the required NBD_REPLY_TYPE_BLOCK_STATUS_EXT. We do not yet support clients that want to do request payload filtering of NBD_CMD_BLOCK_STATUS; that will be added in later patches, but is not essential for qemu as a client since qemu only requests the single context base:allocation. Signed-off-by: Eric Blake Reviewed-by: Vladimir Sementsov-Ogievskiy Message-ID: <20230925192229.3186470-19-ebl...@redhat.com> --- docs/interop/nbd.txt | 1 + nbd/server.c | 21 + 2 files changed, 22 insertions(+) diff --git a/docs/interop/nbd.txt b/docs/interop/nbd.txt index f5ca25174a6..9aae5e1f294 100644 --- a/docs/interop/nbd.txt +++ b/docs/interop/nbd.txt @@ -69,3 +69,4 @@ NBD_CMD_BLOCK_STATUS for "qemu:dirty-bitmap:", NBD_CMD_CACHE NBD_CMD_FLAG_FAST_ZERO * 5.2: NBD_CMD_BLOCK_STATUS for "qemu:allocation-depth" * 7.1: NBD_FLAG_CAN_MULTI_CONN for shareable writable exports +* 8.2: NBD_OPT_EXTENDED_HEADERS diff --git a/nbd/server.c b/nbd/server.c index 183efe27bf8..d3eed6535bf 100644 --- a/nbd/server.c +++ b/nbd/server.c @@ -482,6 +482,10 @@ static int nbd_negotiate_handle_export_name(NBDClient *client, bool no_zeroes, [10 .. 133] reserved (0) [unless no_zeroes] */ trace_nbd_negotiate_handle_export_name(); +if (client->mode >= NBD_MODE_EXTENDED) { +error_setg(errp, "Extended headers already negotiated"); +return -EINVAL; +} if (client->optlen > NBD_MAX_STRING_SIZE) { error_setg(errp, "Bad length received"); return -EINVAL; @@ -1264,6 +1268,10 @@ static int nbd_negotiate_options(NBDClient *client, Error **errp) case NBD_OPT_STRUCTURED_REPLY: if (length) { ret = nbd_reject_length(client, false, errp); +} else if (client->mode >= NBD_MODE_EXTENDED) { +ret = nbd_negotiate_send_rep_err( +client, NBD_REP_ERR_EXT_HEADER_REQD, errp, +"extended headers already negotiated"); } else if (client->mode >= NBD_MODE_STRUCTURED) { ret = nbd_negotiate_send_rep_err( client, NBD_REP_ERR_INVALID, errp, @@ -1280,6 +1288,19 @@ static int nbd_negotiate_options(NBDClient *client, Error **errp) errp); break; +case NBD_OPT_EXTENDED_HEADERS: +if (length) { +ret = nbd_reject_length(client, false, errp); +} else if (client->mode >= NBD_MODE_EXTENDED) { +ret = nbd_negotiate_send_rep_err( +client, NBD_REP_ERR_INVALID, errp, +"extended headers already negotiated"); +} else { +ret = nbd_negotiate_send_rep(client, NBD_REP_ACK, errp); +client->mode = NBD_MODE_EXTENDED; +} +break; + default: ret = nbd_opt_drop(client, NBD_REP_ERR_UNSUP, errp, "Unsupported option %" PRIu32 " (%s)", -- 2.41.0
[PULL 01/15] mailmap: Fix Andrey Drobyshev author email
From: Andrey Drobyshev This fixes authorship of commits 2848289168, 52b10c9c0c as the mailing list rewrote the "From:" field in the corresponding patches. See commit 3bd2608db7 ("maint: Add .mailmap entries for patches claiming list authorship") for explanation. Signed-off-by: Andrey Drobyshev Message-ID: <20230926102801.512107-1-andrey.drobys...@virtuozzo.com> Reviewed-by: Eric Blake Reviewed-by: Philippe Mathieu-Daudé Signed-off-by: Eric Blake --- .mailmap | 1 + 1 file changed, 1 insertion(+) diff --git a/.mailmap b/.mailmap index 64ef9f4de6f..04a7feb005b 100644 --- a/.mailmap +++ b/.mailmap @@ -46,6 +46,7 @@ Ian McKellar Ian McKellar via Qemu-devel Julia Suvorova via Qemu-devel Justin Terry (VM) Justin Terry (VM) via Qemu-devel Stefan Weil Stefan Weil via +Andrey Drobyshev Andrey Drobyshev via # Next, replace old addresses by a more recent one. Aleksandar Markovic -- 2.41.0
Re: [PATCH] MAINTANERS: Split vt82c686 out of fuloong2e
On 5/10/23 20:18, BALATON Zoltan wrote: It is used by other machines not just fuloong2e so put it in a separate section and add myself as reviewer. Signed-off-by: BALATON Zoltan --- By the way, PIIX4 already has a section just above where I've added this but some files are still listed in Malta. You may want to have a look at that. MAINTAINERS | 11 --- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/MAINTAINERS b/MAINTAINERS index ea91f9e804..7f0e20fde6 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -1307,10 +1307,7 @@ M: Philippe Mathieu-Daudé R: Jiaxun Yang S: Odd Fixes F: hw/mips/fuloong2e.c -F: hw/isa/vt82c686.c F: hw/pci-host/bonito.c -F: hw/usb/vt82c686-uhci-pci.c -F: include/hw/isa/vt82c686.h F: include/hw/pci-host/bonito.h F: tests/avocado/machine_mips_fuloong2e.py @@ -2462,6 +2459,14 @@ S: Maintained F: hw/isa/piix4.c F: include/hw/southbridge/piix.h +VIA south bridges (VT82C686B, VT8231) +M: Philippe Mathieu-Daudé +R: Jiaxun Yang +R: BALATON Zoltan Feel free to add yourself as maintainer if you rather. Reviewed-by: Philippe Mathieu-Daudé +F: hw/isa/vt82c686.c +F: hw/usb/vt82c686-uhci-pci.c +F: include/hw/isa/vt82c686.h + Firmware configuration (fw_cfg) M: Philippe Mathieu-Daudé R: Gerd Hoffmann
[QEMU][PATCH v1 0/7] Xen: support grant mappings.
Hi, This patch series add support for grant mappings as a pseudo RAM region for Xen. Enabling grant mappings patches(first 6) are written by Juergen in 2021. QEMU Virtio device provides an emulated backends for Virtio frontned devices in Xen. Please set "iommu_platform=on" option when invoking QEMU. As this will set VIRTIO_F_ACCESS_PLATFORM feature which will be used by virtio frontend in Xen to know whether backend supports grants or not. Regards, Vikram Juergen Gross (6): xen: when unplugging emulated devices skip virtio devices xen: add pseudo RAM region for grant mappings softmmu: let qemu_map_ram_ptr() use qemu_ram_ptr_length() xen: let xen_ram_addr_from_mapcache() return -1 in case of not found entry memory: add MemoryRegion map and unmap callbacks xen: add map and unmap callbacks for grant region Vikram Garhwal (1): hw: arm: Add grant mapping. hw/arm/xen_arm.c| 3 + hw/i386/xen/xen-hvm.c | 3 + hw/i386/xen/xen_platform.c | 8 +- hw/xen/xen-hvm-common.c | 4 +- hw/xen/xen-mapcache.c | 206 ++-- include/exec/memory.h | 21 include/exec/ram_addr.h | 1 + include/hw/xen/xen-hvm-common.h | 2 + include/hw/xen/xen_pvdev.h | 3 + include/sysemu/xen-mapcache.h | 3 + softmmu/physmem.c | 181 +--- 11 files changed, 349 insertions(+), 86 deletions(-) -- 2.17.1
[QEMU][PATCH v1 1/7] xen: when unplugging emulated devices skip virtio devices
From: Juergen Gross Virtio devices should never be unplugged at boot time, as they are similar to pci passthrough devices. Signed-off-by: Juergen Gross Signed-off-by: Vikram Garhwal --- hw/i386/xen/xen_platform.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/hw/i386/xen/xen_platform.c b/hw/i386/xen/xen_platform.c index 17457ff3de..3560eaf8c8 100644 --- a/hw/i386/xen/xen_platform.c +++ b/hw/i386/xen/xen_platform.c @@ -28,6 +28,7 @@ #include "hw/ide/pci.h" #include "hw/pci/pci.h" #include "migration/vmstate.h" +#include "hw/virtio/virtio-bus.h" #include "net/net.h" #include "trace.h" #include "sysemu/xen.h" @@ -132,7 +133,8 @@ static void unplug_nic(PCIBus *b, PCIDevice *d, void *o) /* We have to ignore passthrough devices */ if (pci_get_word(d->config + PCI_CLASS_DEVICE) == PCI_CLASS_NETWORK_ETHERNET -&& !pci_device_is_passthrough(d)) { +&& !pci_device_is_passthrough(d) +&& !qdev_get_child_bus(>qdev, TYPE_VIRTIO_BUS)) { object_unparent(OBJECT(d)); } } @@ -208,6 +210,10 @@ static void unplug_disks(PCIBus *b, PCIDevice *d, void *opaque) /* We have to ignore passthrough devices */ if (pci_device_is_passthrough(d)) return; +/* Ignore virtio devices */ +if (qdev_get_child_bus(>qdev, TYPE_VIRTIO_BUS)) { +return; +} switch (pci_get_word(d->config + PCI_CLASS_DEVICE)) { case PCI_CLASS_STORAGE_IDE: -- 2.17.1
Re: [PATCH 5/7] audio: do not use first -audiodev as default audio device
On Thu, 5 Oct 2023, Paolo Bonzini wrote: Il gio 5 ott 2023, 17:40 BALATON Zoltan ha scritto: Much better, thanks. Maybe some more small clarifications as below: === Using ``-audiodev`` to define the default audio backend (removed in 8.2) If no audiodev property is specified, previous versions would use the first ``-audiodev`` command line option as a fallback. Starting with version 8.2, audio backends created with ``-audiodev`` will only be used by clients (sound cards, machines with embedded sound hardware, VNC) machines with embedded sound hardware that can be set with the audiodev machine property -M audiodev needs to be documented in the release notes, not in removed features. The more places it's documented the better, peopla don't read docs anyway. I'm still not sure users will get it without additional explanation somewhere explicitly saying that if you now get an error with -audiodev driver then you may now need to use -audio driver instead (hopefully the error will say that) Currently the error says to add audiodev=, I can change that to propose both. That would help as likely the error will be the only thing the users will come across so if it tells them what to do that's the least annoyance. Regards, BALATON Zoltan
[PATCH] MAINTANERS: Split vt82c686 out of fuloong2e
It is used by other machines not just fuloong2e so put it in a separate section and add myself as reviewer. Signed-off-by: BALATON Zoltan --- By the way, PIIX4 already has a section just above where I've added this but some files are still listed in Malta. You may want to have a look at that. MAINTAINERS | 11 --- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/MAINTAINERS b/MAINTAINERS index ea91f9e804..7f0e20fde6 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -1307,10 +1307,7 @@ M: Philippe Mathieu-Daudé R: Jiaxun Yang S: Odd Fixes F: hw/mips/fuloong2e.c -F: hw/isa/vt82c686.c F: hw/pci-host/bonito.c -F: hw/usb/vt82c686-uhci-pci.c -F: include/hw/isa/vt82c686.h F: include/hw/pci-host/bonito.h F: tests/avocado/machine_mips_fuloong2e.py @@ -2462,6 +2459,14 @@ S: Maintained F: hw/isa/piix4.c F: include/hw/southbridge/piix.h +VIA south bridges (VT82C686B, VT8231) +M: Philippe Mathieu-Daudé +R: Jiaxun Yang +R: BALATON Zoltan +F: hw/isa/vt82c686.c +F: hw/usb/vt82c686-uhci-pci.c +F: include/hw/isa/vt82c686.h + Firmware configuration (fw_cfg) M: Philippe Mathieu-Daudé R: Gerd Hoffmann -- 2.30.9
[PATCH v4] hw/isa/vt82c686: Respect SCI interrupt assignment
According to the datasheet, SCI interrupts of the power management function aren't routed through the PCI pins but rather directly to the integrated PIC. The routing is configurable through the ACPI interrupt select register at offset 0x42 in the PCI configuration space of the power management function. Signed-off-by: BALATON Zoltan --- Alternative proposal to Bernhard's patch to remove FIXME about SCI. Apply this on top of reverting commit 4e5a20b6da9 which could also be squashed in this patch but I let you decide if you want that as separete commit or part of this. I did not test this beyond compiling but I think this is all there is to it. (Overusing QOM within a chip model does not make sense as QOM is meant to allow users to introspect and configure device models and combine different models into new machines eventually without modifying QEMU but this is not applicable here within a single device model that can be considered as friend classes so don't go oveboard with it when not needed.) hw/isa/vt82c686.c | 30 +- 1 file changed, 13 insertions(+), 17 deletions(-) diff --git a/hw/isa/vt82c686.c b/hw/isa/vt82c686.c index 8016c71315..2eb769c7fa 100644 --- a/hw/isa/vt82c686.c +++ b/hw/isa/vt82c686.c @@ -140,25 +140,21 @@ static const MemoryRegionOps pm_io_ops = { static void pm_update_sci(ViaPMState *s) { -int sci_level, pmsts; +int sci_irq, pmsts; pmsts = acpi_pm1_evt_get_sts(>ar); -sci_level = (((pmsts & s->ar.pm1.evt.en) & - (ACPI_BITMASK_RT_CLOCK_ENABLE | - ACPI_BITMASK_POWER_BUTTON_ENABLE | - ACPI_BITMASK_GLOBAL_LOCK_ENABLE | - ACPI_BITMASK_TIMER_ENABLE)) != 0); -if (pci_get_byte(s->dev.config + PCI_INTERRUPT_PIN)) { -/* - * FIXME: - * Fix device model that realizes this PM device and remove - * this work around. - * The device model should wire SCI and setup - * PCI_INTERRUPT_PIN properly. - * If PIN# = 0(interrupt pin isn't used), don't raise SCI as - * work around. - */ -pci_set_irq(>dev, sci_level); +sci_irq = pci_get_byte(s->dev.config + 0x42) & 0xf; +if (sci_irq) { +int sci_level = (((pmsts & s->ar.pm1.evt.en) & + (ACPI_BITMASK_RT_CLOCK_ENABLE | + ACPI_BITMASK_POWER_BUTTON_ENABLE | + ACPI_BITMASK_GLOBAL_LOCK_ENABLE | + ACPI_BITMASK_TIMER_ENABLE)) != 0); + +if (sci_irq == 2) { +qemu_log_mask(LOG_GUEST_ERROR, "IRQ 2 for VIA PM SCI is reserved"); +} +via_isa_set_irq(pci_get_function_0(>dev), sci_irq, sci_level); } /* schedule a timer interruption if needed */ acpi_pm_tmr_update(>ar, (s->ar.pm1.evt.en & ACPI_BITMASK_TIMER_ENABLE) && -- 2.30.9
[QEMU][PATCH v1 3/7] softmmu: let qemu_map_ram_ptr() use qemu_ram_ptr_length()
From: Juergen Gross qemu_map_ram_ptr() and qemu_ram_ptr_length() share quite some code, so modify qemu_ram_ptr_length() a little bit and use it for qemu_map_ram_ptr(), too. Signed-off-by: Juergen Gross Signed-off-by: Vikram Garhwal --- softmmu/physmem.c | 58 +++ 1 file changed, 23 insertions(+), 35 deletions(-) diff --git a/softmmu/physmem.c b/softmmu/physmem.c index e182a2fa07..6e5e379dd0 100644 --- a/softmmu/physmem.c +++ b/softmmu/physmem.c @@ -2163,38 +2163,8 @@ void qemu_ram_remap(ram_addr_t addr, ram_addr_t length) } #endif /* !_WIN32 */ -/* Return a host pointer to ram allocated with qemu_ram_alloc. - * This should not be used for general purpose DMA. Use address_space_map - * or address_space_rw instead. For local memory (e.g. video ram) that the - * device owns, use memory_region_get_ram_ptr. - * - * Called within RCU critical section. - */ -void *qemu_map_ram_ptr(RAMBlock *ram_block, ram_addr_t addr) -{ -RAMBlock *block = ram_block; - -if (block == NULL) { -block = qemu_get_ram_block(addr); -addr -= block->offset; -} - -if (xen_enabled() && block->host == NULL) { -/* We need to check if the requested address is in the RAM - * because we don't want to map the entire memory in QEMU. - * In that case just map until the end of the page. - */ -if (block->offset == 0) { -return xen_map_cache(addr, 0, 0, false); -} - -block->host = xen_map_cache(block->offset, block->max_length, 1, false); -} -return ramblock_ptr(block, addr); -} - -/* Return a host pointer to guest's ram. Similar to qemu_map_ram_ptr - * but takes a size argument. +/* + * Return a host pointer to guest's ram. * * Called within RCU critical section. */ @@ -2202,7 +2172,9 @@ static void *qemu_ram_ptr_length(RAMBlock *ram_block, ram_addr_t addr, hwaddr *size, bool lock) { RAMBlock *block = ram_block; -if (*size == 0) { +hwaddr len = 0; + +if (size && *size == 0) { return NULL; } @@ -2210,7 +2182,10 @@ static void *qemu_ram_ptr_length(RAMBlock *ram_block, ram_addr_t addr, block = qemu_get_ram_block(addr); addr -= block->offset; } -*size = MIN(*size, block->max_length - addr); +if (size) { +*size = MIN(*size, block->max_length - addr); +len = *size; +} if (xen_enabled() && block->host == NULL) { /* We need to check if the requested address is in the RAM @@ -2218,7 +2193,7 @@ static void *qemu_ram_ptr_length(RAMBlock *ram_block, ram_addr_t addr, * In that case just map the requested area. */ if (block->offset == 0) { -return xen_map_cache(addr, *size, lock, lock); +return xen_map_cache(addr, len, lock, lock); } block->host = xen_map_cache(block->offset, block->max_length, 1, lock); @@ -2227,6 +2202,19 @@ static void *qemu_ram_ptr_length(RAMBlock *ram_block, ram_addr_t addr, return ramblock_ptr(block, addr); } +/* + * Return a host pointer to ram allocated with qemu_ram_alloc. + * This should not be used for general purpose DMA. Use address_space_map + * or address_space_rw instead. For local memory (e.g. video ram) that the + * device owns, use memory_region_get_ram_ptr. + * + * Called within RCU critical section. + */ +void *qemu_map_ram_ptr(RAMBlock *ram_block, ram_addr_t addr) +{ +return qemu_ram_ptr_length(ram_block, addr, NULL, false); +} + /* Return the offset of a hostpointer within a ramblock */ ram_addr_t qemu_ram_block_host_offset(RAMBlock *rb, void *host) { -- 2.17.1
[QEMU][PATCH v1 7/7] hw: arm: Add grant mapping.
Enable grant ram mapping support for Xenpvh machine on ARM. Signed-off-by: Vikram Garhwal --- hw/arm/xen_arm.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/hw/arm/xen_arm.c b/hw/arm/xen_arm.c index f83b983ec5..553c289720 100644 --- a/hw/arm/xen_arm.c +++ b/hw/arm/xen_arm.c @@ -125,6 +125,9 @@ static void xen_init_ram(MachineState *machine) DPRINTF("Initialized region xen.ram.hi: base 0x%llx size 0x%lx\n", GUEST_RAM1_BASE, ram_size[1]); } + +DPRINTF("init grant ram mapping for XEN\n"); +ram_grants = *xen_init_grant_ram(); } void arch_handle_ioreq(XenIOState *state, ioreq_t *req) -- 2.17.1
[QEMU][PATCH v1 2/7] xen: add pseudo RAM region for grant mappings
From: Juergen Gross Add a memory region which can be used to automatically map granted memory. It is starting at 0x8000ULL in order to be able to distinguish it from normal RAM. For this reason the xen.ram memory region is expanded, which has no further impact as it is used just as a container of the real RAM regions and now the grant region. Signed-off-by: Juergen Gross Signed-off-by: Vikram Garhwal --- hw/i386/xen/xen-hvm.c | 3 ++ hw/xen/xen-hvm-common.c | 4 +-- hw/xen/xen-mapcache.c | 27 ++ include/exec/ram_addr.h | 1 + include/hw/xen/xen-hvm-common.h | 2 ++ include/hw/xen/xen_pvdev.h | 3 ++ include/sysemu/xen-mapcache.h | 3 ++ softmmu/physmem.c | 62 + 8 files changed, 80 insertions(+), 25 deletions(-) diff --git a/hw/i386/xen/xen-hvm.c b/hw/i386/xen/xen-hvm.c index f42621e674..67a8a6 100644 --- a/hw/i386/xen/xen-hvm.c +++ b/hw/i386/xen/xen-hvm.c @@ -172,6 +172,9 @@ static void xen_ram_init(PCMachineState *pcms, x86ms->above_4g_mem_size); memory_region_add_subregion(sysmem, 0x1ULL, _hi); } + +/* Add grant mappings as a pseudo RAM region. */ +ram_grants = *xen_init_grant_ram(); } static XenPhysmap *get_physmapping(hwaddr start_addr, ram_addr_t size) diff --git a/hw/xen/xen-hvm-common.c b/hw/xen/xen-hvm-common.c index 565dc39c8f..b7255977a5 100644 --- a/hw/xen/xen-hvm-common.c +++ b/hw/xen/xen-hvm-common.c @@ -9,7 +9,7 @@ #include "hw/boards.h" #include "hw/xen/arch_hvm.h" -MemoryRegion ram_memory; +MemoryRegion ram_memory, ram_grants; void xen_ram_alloc(ram_addr_t ram_addr, ram_addr_t size, MemoryRegion *mr, Error **errp) @@ -26,7 +26,7 @@ void xen_ram_alloc(ram_addr_t ram_addr, ram_addr_t size, MemoryRegion *mr, return; } -if (mr == _memory) { +if (mr == _memory || mr == _grants) { return; } diff --git a/hw/xen/xen-mapcache.c b/hw/xen/xen-mapcache.c index f7d974677d..8115c44c00 100644 --- a/hw/xen/xen-mapcache.c +++ b/hw/xen/xen-mapcache.c @@ -14,7 +14,9 @@ #include +#include "hw/xen/xen-hvm-common.h" #include "hw/xen/xen_native.h" +#include "hw/xen/xen_pvdev.h" #include "qemu/bitmap.h" #include "sysemu/runstate.h" @@ -597,3 +599,28 @@ uint8_t *xen_replace_cache_entry(hwaddr old_phys_addr, mapcache_unlock(); return p; } + +MemoryRegion *xen_init_grant_ram(void) +{ +RAMBlock *block; + +memory_region_init(_grants, NULL, "xen.grants", + XEN_MAX_VIRTIO_GRANTS * XC_PAGE_SIZE); +block = g_malloc0(sizeof(*block)); +block->mr = _grants; +block->used_length = XEN_MAX_VIRTIO_GRANTS * XC_PAGE_SIZE; +block->max_length = XEN_MAX_VIRTIO_GRANTS * XC_PAGE_SIZE; +block->fd = -1; +block->page_size = XC_PAGE_SIZE; +block->host = (void *)XEN_GRANT_ADDR_OFF; +block->offset = XEN_GRANT_ADDR_OFF; +block->flags = RAM_PREALLOC; +ram_grants.ram_block = block; +ram_grants.ram = true; +ram_grants.terminates = true; +ram_block_add_list(block); +memory_region_add_subregion(get_system_memory(), XEN_GRANT_ADDR_OFF, +_grants); + +return _grants; +} diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h index 90676093f5..c0b5f9a7d0 100644 --- a/include/exec/ram_addr.h +++ b/include/exec/ram_addr.h @@ -139,6 +139,7 @@ void qemu_ram_free(RAMBlock *block); int qemu_ram_resize(RAMBlock *block, ram_addr_t newsize, Error **errp); void qemu_ram_msync(RAMBlock *block, ram_addr_t start, ram_addr_t length); +void ram_block_add_list(RAMBlock *new_block); /* Clear whole block of mem */ static inline void qemu_ram_block_writeback(RAMBlock *block) diff --git a/include/hw/xen/xen-hvm-common.h b/include/hw/xen/xen-hvm-common.h index 4e9904f1a6..0d300ba898 100644 --- a/include/hw/xen/xen-hvm-common.h +++ b/include/hw/xen/xen-hvm-common.h @@ -17,6 +17,8 @@ #include extern MemoryRegion ram_memory; + +extern MemoryRegion ram_grants; extern MemoryListener xen_io_listener; extern DeviceListener xen_device_listener; diff --git a/include/hw/xen/xen_pvdev.h b/include/hw/xen/xen_pvdev.h index ddad4b9f36..0f1b5edfa9 100644 --- a/include/hw/xen/xen_pvdev.h +++ b/include/hw/xen/xen_pvdev.h @@ -80,4 +80,7 @@ int xen_pv_send_notify(struct XenLegacyDevice *xendev); void xen_pv_printf(struct XenLegacyDevice *xendev, int msg_level, const char *fmt, ...) G_GNUC_PRINTF(3, 4); +#define XEN_GRANT_ADDR_OFF0x8000ULL +#define XEN_MAX_VIRTIO_GRANTS 65536 + #endif /* QEMU_HW_XEN_PVDEV_H */ diff --git a/include/sysemu/xen-mapcache.h b/include/sysemu/xen-mapcache.h index c8e7c2f6cf..f4bedb1c11 100644 --- a/include/sysemu/xen-mapcache.h +++ b/include/sysemu/xen-mapcache.h @@ -10,6 +10,7 @@ #define XEN_MAPCACHE_H #include "exec/cpu-common.h" +#include "exec/ram_addr.h" typedef hwaddr
[QEMU][PATCH v1 6/7] xen: add map and unmap callbacks for grant region
From: Juergen Gross Add the callbacks for mapping/unmapping guest memory via grants to the special grant memory region. Signed-off-by: Juergen Gross Signed-off-by: Vikram Garhwal --- hw/xen/xen-mapcache.c | 167 +- softmmu/physmem.c | 11 ++- 2 files changed, 173 insertions(+), 5 deletions(-) diff --git a/hw/xen/xen-mapcache.c b/hw/xen/xen-mapcache.c index 8a61c7dde6..52844a6a9d 100644 --- a/hw/xen/xen-mapcache.c +++ b/hw/xen/xen-mapcache.c @@ -9,6 +9,8 @@ */ #include "qemu/osdep.h" +#include "qemu/queue.h" +#include "qemu/thread.h" #include "qemu/units.h" #include "qemu/error-report.h" @@ -23,6 +25,8 @@ #include "sysemu/xen-mapcache.h" #include "trace.h" +#include +#include //#define MAPCACHE_DEBUG @@ -385,7 +389,7 @@ uint8_t *xen_map_cache(hwaddr phys_addr, hwaddr size, return p; } -ram_addr_t xen_ram_addr_from_mapcache(void *ptr) +static ram_addr_t xen_ram_addr_from_mapcache_try(void *ptr) { MapCacheEntry *entry = NULL; MapCacheRev *reventry; @@ -594,10 +598,170 @@ uint8_t *xen_replace_cache_entry(hwaddr old_phys_addr, return p; } +struct XENMappedGrantRegion { +void *addr; +unsigned int pages; +unsigned int refs; +unsigned int prot; +uint32_t idx; +QLIST_ENTRY(XENMappedGrantRegion) list; +}; + +static xengnttab_handle *xen_region_gnttabdev; +static QLIST_HEAD(GrantRegionList, XENMappedGrantRegion) xen_grant_mappings = +QLIST_HEAD_INITIALIZER(xen_grant_mappings); +static QemuMutex xen_map_mutex; + +static void *xen_map_grant_dyn(MemoryRegion **mr, hwaddr addr, hwaddr *plen, + bool is_write, MemTxAttrs attrs) +{ +unsigned int page_off = addr & (XC_PAGE_SIZE - 1); +unsigned int i; +unsigned int nrefs = (page_off + *plen + XC_PAGE_SIZE - 1) >> XC_PAGE_SHIFT; +uint32_t ref = (addr - XEN_GRANT_ADDR_OFF) >> XC_PAGE_SHIFT; +uint32_t *refs = NULL; +unsigned int prot = PROT_READ; +struct XENMappedGrantRegion *mgr = NULL; + +if (is_write) { +prot |= PROT_WRITE; +} + +qemu_mutex_lock(_map_mutex); + +QLIST_FOREACH(mgr, _grant_mappings, list) { +if (mgr->idx == ref && +mgr->pages == nrefs && +(mgr->prot & prot) == prot) { +break; +} +} +if (!mgr) { +mgr = g_new(struct XENMappedGrantRegion, 1); + +if (nrefs == 1) { +refs = +} else { +refs = g_new(uint32_t, nrefs); +for (i = 0; i < nrefs; i++) { +refs[i] = ref + i; +} +} +mgr->addr = xengnttab_map_domain_grant_refs(xen_region_gnttabdev, nrefs, +xen_domid, refs, prot); +if (mgr->addr) { +mgr->pages = nrefs; +mgr->refs = 1; +mgr->prot = prot; +mgr->idx = ref; + +QLIST_INSERT_HEAD(_grant_mappings, mgr, list); +} else { +g_free(mgr); +mgr = NULL; +} +} else { +mgr->refs++; +} + +qemu_mutex_unlock(_map_mutex); + +if (nrefs > 1) { +g_free(refs); +} + +return mgr ? mgr->addr + page_off : NULL; +} + +static void xen_unmap_grant_dyn(MemoryRegion *mr, void *buffer, ram_addr_t addr, +hwaddr len, bool is_write, hwaddr access_len) +{ +unsigned int page_off = (unsigned long)buffer & (XC_PAGE_SIZE - 1); +unsigned int nrefs = (page_off + len + XC_PAGE_SIZE - 1) >> XC_PAGE_SHIFT; +unsigned int prot = PROT_READ; +struct XENMappedGrantRegion *mgr = NULL; + +if (is_write) { +prot |= PROT_WRITE; +} + +qemu_mutex_lock(_map_mutex); + +QLIST_FOREACH(mgr, _grant_mappings, list) { +if (mgr->addr == buffer - page_off && +mgr->pages == nrefs && +(mgr->prot & prot) == prot) { +break; +} +} +if (mgr) { +mgr->refs--; +if (!mgr->refs) { +xengnttab_unmap(xen_region_gnttabdev, mgr->addr, nrefs); + +QLIST_REMOVE(mgr, list); +g_free(mgr); +} +} else { +error_report("xen_unmap_grant_dyn() trying to unmap unknown buffer"); +} + +qemu_mutex_unlock(_map_mutex); +} + +static ram_addr_t xen_ram_addr_from_grant_cache(void *ptr) +{ +unsigned int page_off = (unsigned long)ptr & (XC_PAGE_SIZE - 1); +struct XENMappedGrantRegion *mgr = NULL; +ram_addr_t raddr = RAM_ADDR_INVALID; + +qemu_mutex_lock(_map_mutex); + +QLIST_FOREACH(mgr, _grant_mappings, list) { +if (mgr->addr == ptr - page_off) { +break; +} +} + +if (mgr) { +raddr = (mgr->idx << XC_PAGE_SHIFT) + page_off + XEN_GRANT_ADDR_OFF; +} + +qemu_mutex_unlock(_map_mutex); + +return raddr; +} + +ram_addr_t xen_ram_addr_from_mapcache(void *ptr) +{ +ram_addr_t raddr; + +raddr =
[QEMU][PATCH v1 4/7] xen: let xen_ram_addr_from_mapcache() return -1 in case of not found entry
From: Juergen Gross Today xen_ram_addr_from_mapcache() will either abort() or return 0 in case it can't find a matching entry for a pointer value. Both cases are bad, so change that to return an invalid address instead. Signed-off-by: Juergen Gross --- hw/xen/xen-mapcache.c | 12 +++- 1 file changed, 3 insertions(+), 9 deletions(-) diff --git a/hw/xen/xen-mapcache.c b/hw/xen/xen-mapcache.c index 8115c44c00..8a61c7dde6 100644 --- a/hw/xen/xen-mapcache.c +++ b/hw/xen/xen-mapcache.c @@ -404,13 +404,8 @@ ram_addr_t xen_ram_addr_from_mapcache(void *ptr) } } if (!found) { -fprintf(stderr, "%s, could not find %p\n", __func__, ptr); -QTAILQ_FOREACH(reventry, >locked_entries, next) { -DPRINTF(" "HWADDR_FMT_plx" -> %p is present\n", reventry->paddr_index, -reventry->vaddr_req); -} -abort(); -return 0; +mapcache_unlock(); +return RAM_ADDR_INVALID; } entry = >entry[paddr_index % mapcache->nr_buckets]; @@ -418,8 +413,7 @@ ram_addr_t xen_ram_addr_from_mapcache(void *ptr) entry = entry->next; } if (!entry) { -DPRINTF("Trying to find address %p that is not in the mapcache!\n", ptr); -raddr = 0; +raddr = RAM_ADDR_INVALID; } else { raddr = (reventry->paddr_index << MCACHE_BUCKET_SHIFT) + ((unsigned long) ptr - (unsigned long) entry->vaddr_base); -- 2.17.1
[QEMU][PATCH v1 5/7] memory: add MemoryRegion map and unmap callbacks
From: Juergen Gross In order to support mapping and unmapping guest memory dynamically to and from qemu during address_space_[un]map() operations add the map() and unmap() callbacks to MemoryRegionOps. Those will be used e.g. for Xen grant mappings when performing guest I/Os. Signed-off-by: Juergen Gross Signed-off-by: Vikram Garhwal --- include/exec/memory.h | 21 ++ softmmu/physmem.c | 50 +-- 2 files changed, 60 insertions(+), 11 deletions(-) diff --git a/include/exec/memory.h b/include/exec/memory.h index c99842d2fc..f3c62d2883 100644 --- a/include/exec/memory.h +++ b/include/exec/memory.h @@ -280,6 +280,27 @@ struct MemoryRegionOps { unsigned size, MemTxAttrs attrs); +/* + * Dynamically create mapping. @addr is the guest address to map; @plen + * is the pointer to the usable length of the buffer. + * @mr contents can be changed in case a new memory region is created for + * the mapping. + * Returns the buffer address for accessing the data. + */ +void *(*map)(MemoryRegion **mr, + hwaddr addr, + hwaddr *plen, + bool is_write, + MemTxAttrs attrs); + +/* Unmap an area obtained via map() before. */ +void (*unmap)(MemoryRegion *mr, + void *buffer, + ram_addr_t addr, + hwaddr len, + bool is_write, + hwaddr access_len); + enum device_endian endianness; /* Guest-visible constraints: */ struct { diff --git a/softmmu/physmem.c b/softmmu/physmem.c index 6e5e379dd0..5f425bea1c 100644 --- a/softmmu/physmem.c +++ b/softmmu/physmem.c @@ -3135,6 +3135,7 @@ void *address_space_map(AddressSpace *as, hwaddr len = *plen; hwaddr l, xlat; MemoryRegion *mr; +void *ptr = NULL; FlatView *fv; if (len == 0) { @@ -3168,12 +3169,20 @@ void *address_space_map(AddressSpace *as, return bounce.buffer; } - memory_region_ref(mr); + +if (mr->ops && mr->ops->map) { +ptr = mr->ops->map(, addr, plen, is_write, attrs); +} + *plen = flatview_extend_translation(fv, addr, len, mr, xlat, l, is_write, attrs); fuzz_dma_read_cb(addr, *plen, mr); -return qemu_ram_ptr_length(mr->ram_block, xlat, plen, true); +if (ptr == NULL) { +ptr = qemu_ram_ptr_length(mr->ram_block, xlat, plen, true); +} + +return ptr; } /* Unmaps a memory region previously mapped by address_space_map(). @@ -3189,11 +3198,16 @@ void address_space_unmap(AddressSpace *as, void *buffer, hwaddr len, mr = memory_region_from_host(buffer, ); assert(mr != NULL); -if (is_write) { -invalidate_and_set_dirty(mr, addr1, access_len); -} -if (xen_enabled()) { -xen_invalidate_map_cache_entry(buffer); + +if (mr->ops && mr->ops->unmap) { +mr->ops->unmap(mr, buffer, addr1, len, is_write, access_len); +} else { +if (is_write) { +invalidate_and_set_dirty(mr, addr1, access_len); +} +if (xen_enabled()) { +xen_invalidate_map_cache_entry(buffer); +} } memory_region_unref(mr); return; @@ -3266,10 +3280,18 @@ int64_t address_space_cache_init(MemoryRegionCache *cache, * doing this if we found actual RAM, which behaves the same * regardless of attributes; so UNSPECIFIED is fine. */ +if (mr->ops && mr->ops->map) { +cache->ptr = mr->ops->map(, addr, , is_write, + MEMTXATTRS_UNSPECIFIED); +} + l = flatview_extend_translation(cache->fv, addr, len, mr, cache->xlat, l, is_write, MEMTXATTRS_UNSPECIFIED); -cache->ptr = qemu_ram_ptr_length(mr->ram_block, cache->xlat, , true); +if (!cache->ptr) { +cache->ptr = qemu_ram_ptr_length(mr->ram_block, cache->xlat, , + true); +} } else { cache->ptr = NULL; } @@ -3291,14 +3313,20 @@ void address_space_cache_invalidate(MemoryRegionCache *cache, void address_space_cache_destroy(MemoryRegionCache *cache) { -if (!cache->mrs.mr) { +MemoryRegion *mr = cache->mrs.mr; + +if (!mr) { return; } -if (xen_enabled()) { +if (mr->ops && mr->ops->unmap) { +mr->ops->unmap(mr, cache->ptr, cache->xlat, cache->len, + cache->is_write, cache->len); +} else if (xen_enabled()) { xen_invalidate_map_cache_entry(cache->ptr); } -memory_region_unref(cache->mrs.mr); + +memory_region_unref(mr); flatview_unref(cache->fv);
Re: [PATCH v4 0/8] vhost-user: Back-end state migration
On Wed, Oct 04, 2023 at 02:58:56PM +0200, Hanna Czenczek wrote: > RFC: > https://lists.nongnu.org/archive/html/qemu-devel/2023-03/msg04263.html > > v1: > https://lists.nongnu.org/archive/html/qemu-devel/2023-04/msg01575.html > > v2: > https://lists.nongnu.org/archive/html/qemu-devel/2023-07/msg02604.html > > v3: > https://lists.nongnu.org/archive/html/qemu-devel/2023-09/msg03750.html > > > Based-on: <20231004014532.1228637-1-stefa...@redhat.com> > ([PATCH v2 0/3] vhost: clean up device reset) > > > Hi, > > This v4 includes largely unchanged patches from v3. The main > addition/change is what came out of the discussion between Stefan and me > around how to proceed without SUSPEND/RESUME, which is that this series > is now based on his reset fix, and it includes more documentation > changes. This looks good. I posted some minor comments on the new patches. Stefan > > Changes in detail: > > - Patch 1: Fall-out from the reset fix: Currently, the status byte is > effectively unused (qemu only uses it for resetting, which all > back-ends ignore; DPDK uses it to announce potential feature > negotiation failure, which qemu ignores). It is also not defined what > exactly front-end or back-end should do with this byte, except > pointing at the virtio spec, which however naturally does not say how > this integrates with vhost-user’s RESET_DEVICE or [GS]ET_FEATURES. > Furthermore, there does not seem to be a use for this; we have > RESET_DEVICE for resetting, and we have [GS]ET_FEATURES (and > REPLY_ACK, which can be used on SET_FEATURES) for feature > negotation. > Therefore, deprecate the status byte, pointing to those other commands > instead. > > - Patch 2: Patch 4 defines a suspended state for the whole back-end if > all vrings are stopped. I think this should be mentioned in > GET_VRING_BASE, but upon trying to add it, I found that it does not > even mention that it stops the vring (mentioned only in the Ring > States section), and remembered that the whole description of both > GET_VRING_BASE and SET_VRING_BASE really was not helpful when trying > to implement a vhost-user back-end. Took the opportunity to overhaul > both. > > - Patch 3: This one’s from v3, but quite heavily modified. Stefan > suggested consistently defining the started/stopped and > enabled/disabled states to be independent, and indeed doing so > simplifies a whole lot of stuff. Specifically, it makes the magic > “enabled/disabled when started” go away. Basically, I found this > change alone is enough to remove the confusion I had with the existing > documentation. > > - Patch 4: As suggested by Stefan, just define a suspended state without > introducing SUSPEND. vDPA needs SUSPEND because its GET_VRING_BASE > does not stop the vring, but vhost-user’s does, so we can define the > suspended state to be when all vrings are stopped. > > - Patch 5: Reference the suspended state. > > - Patches 6 through 8: Unmodified, except for them being rebase on > Stefan’s series. > > > Hanna Czenczek (8): > vhost-user.rst: Deprecate [GS]ET_STATUS > vhost-user.rst: Improve [GS]ET_VRING_BASE doc > vhost-user.rst: Clarify enabling/disabling vrings > vhost-user.rst: Introduce suspended state > vhost-user.rst: Migrating back-end-internal state > vhost-user: Interface for migration state transfer > vhost: Add high-level state save/load functions > vhost-user-fs: Implement internal migration > > docs/interop/vhost-user.rst | 318 +++--- > include/hw/virtio/vhost-backend.h | 24 +++ > include/hw/virtio/vhost.h | 113 +++ > hw/virtio/vhost-user-fs.c | 101 +- > hw/virtio/vhost-user.c| 148 ++ > hw/virtio/vhost.c | 241 ++ > 6 files changed, 917 insertions(+), 28 deletions(-) > > -- > 2.41.0 > signature.asc Description: PGP signature
Re: [PATCH v4 6/8] vhost-user: Interface for migration state transfer
On Wed, Oct 04, 2023 at 02:59:02PM +0200, Hanna Czenczek wrote: > Add the interface for transferring the back-end's state during migration > as defined previously in vhost-user.rst. > > Reviewed-by: Stefan Hajnoczi > Signed-off-by: Hanna Czenczek > --- > include/hw/virtio/vhost-backend.h | 24 + > include/hw/virtio/vhost.h | 78 > hw/virtio/vhost-user.c| 148 ++ > hw/virtio/vhost.c | 37 > 4 files changed, 287 insertions(+) Reviewed-by: Stefan Hajnoczi signature.asc Description: PGP signature
Re: [PATCH v4 8/8] vhost-user-fs: Implement internal migration
On Wed, Oct 04, 2023 at 02:59:04PM +0200, Hanna Czenczek wrote: > A virtio-fs device's VM state consists of: > - the virtio device (vring) state (VMSTATE_VIRTIO_DEVICE) > - the back-end's (virtiofsd's) internal state > > We get/set the latter via the new vhost operations to transfer migratory > state. It is its own dedicated subsection, so that for external > migration, it can be disabled. > > Reviewed-by: Stefan Hajnoczi > Signed-off-by: Hanna Czenczek > --- > hw/virtio/vhost-user-fs.c | 101 +- > 1 file changed, 100 insertions(+), 1 deletion(-) Reviewed-by: Stefan Hajnoczi signature.asc Description: PGP signature
Re: [PATCH v4 7/8] vhost: Add high-level state save/load functions
On Wed, Oct 04, 2023 at 02:59:03PM +0200, Hanna Czenczek wrote: > vhost_save_backend_state() and vhost_load_backend_state() can be used by > vhost front-ends to easily save and load the back-end's state to/from > the migration stream. > > Because we do not know the full state size ahead of time, > vhost_save_backend_state() simply reads the data in 1 MB chunks, and > writes each chunk consecutively into the migration stream, prefixed by > its length. EOF is indicated by a 0-length chunk. > > Reviewed-by: Stefan Hajnoczi > Signed-off-by: Hanna Czenczek > --- > include/hw/virtio/vhost.h | 35 +++ > hw/virtio/vhost.c | 204 ++ > 2 files changed, 239 insertions(+) Reviewed-by: Stefan Hajnoczi signature.asc Description: PGP signature
Re: [PATCH v4 5/8] vhost-user.rst: Migrating back-end-internal state
On Wed, Oct 04, 2023 at 02:59:01PM +0200, Hanna Czenczek wrote: > For vhost-user devices, qemu can migrate the virtio state, but not the > back-end's internal state. To do so, we need to be able to transfer > this internal state between front-end (qemu) and back-end. > > At this point, this new feature is added for the purpose of virtio-fs > migration. Because virtiofsd's internal state will not be too large, we > believe it is best to transfer it as a single binary blob after the > streaming phase. > > These are the additions to the protocol: > - New vhost-user protocol feature VHOST_USER_PROTOCOL_F_DEVICE_STATE > - SET_DEVICE_STATE_FD function: Front-end and back-end negotiate a file > descriptor over which to transfer the state. > - CHECK_DEVICE_STATE: After the state has been transferred through the > file descriptor, the front-end invokes this function to verify > success. There is no in-band way (through the file descriptor) to > indicate failure, so we need to check explicitly. > > Once the transfer FD has been established via SET_DEVICE_STATE_FD > (which includes establishing the direction of transfer and migration > phase), the sending side writes its data into it, and the reading side > reads it until it sees an EOF. Then, the front-end will check for > success via CHECK_DEVICE_STATE, which on the destination side includes > checking for integrity (i.e. errors during deserialization). > > Signed-off-by: Hanna Czenczek > --- > docs/interop/vhost-user.rst | 172 > 1 file changed, 172 insertions(+) Reviewed-by: Stefan Hajnoczi signature.asc Description: PGP signature
Re: [PATCH v4 4/8] vhost-user.rst: Introduce suspended state
On Wed, Oct 04, 2023 at 02:59:00PM +0200, Hanna Czenczek wrote: > In vDPA, GET_VRING_BASE does not stop the queried vring, which is why > SUSPEND was introduced so that the returned index would be stable. In > vhost-user, it does stop the vring, so under the same reasoning, it can > get away without SUSPEND. > > Still, we do want to clarify that if the device is completely stopped, > i.e. all vrings are stopped, the back-end should cease to modify any > state relating to the guest. Do this by calling it "suspended". > > Suggested-by: Stefan Hajnoczi > Signed-off-by: Hanna Czenczek > --- > docs/interop/vhost-user.rst | 20 +++- > 1 file changed, 19 insertions(+), 1 deletion(-) Reviewed-by: Stefan Hajnoczi signature.asc Description: PGP signature
Re: [PATCH v4 3/8] vhost-user.rst: Clarify enabling/disabling vrings
On Wed, Oct 04, 2023 at 02:58:59PM +0200, Hanna Czenczek wrote: > Currently, the vhost-user documentation says that rings are to be > initialized in a disabled state when VHOST_USER_F_PROTOCOL_FEATURES is > negotiated. However, by the time of feature negotiation, all rings have > already been initialized, so it is not entirely clear what this means. > > At least the vhost-user-backend Rust crate's implementation interpreted > it to mean that whenever this feature is negotiated, all rings are to > put into a disabled state, which means that every SET_FEATURES call > would disable all rings, effectively halting the device. This is > problematic because the VHOST_F_LOG_ALL feature is also set or cleared > this way, which happens during migration. Doing so should not halt the > device. > > Other implementations have interpreted this to mean that the device is > to be initialized with all rings disabled, and a subsequent SET_FEATURES > call that does not set VHOST_USER_F_PROTOCOL_FEATURES will enable all of > them. Here, SET_FEATURES will never disable any ring. > > This interpretation does not suffer the problem of unintentionally > halting the device whenever features are set or cleared, so it seems > better and more reasonable. > > We can clarify this in the documentation by making it explicit that the > enabled/disabled state is tracked even while the vring is stopped. > Every vring is initialized in a disabled state, and SET_FEATURES without > VHOST_USER_F_PROTOCOL_FEATURES simply becomes one way to enable all > vrings. > > Signed-off-by: Hanna Czenczek > --- > docs/interop/vhost-user.rst | 32 +--- > 1 file changed, 17 insertions(+), 15 deletions(-) Reviewed-by: Stefan Hajnoczi signature.asc Description: PGP signature
[PATCH 0/2] topic: meson: add more compiler hardening flags
This brings more compiler hardening flags to the default QEMU build process. The proposed flags have already been adopted by default in the kernel build process. At some point it is hoped that distros might enable them globally, as they've done in the past with things like _FORTIFY_SOURCE. Meanwhile they are easy things to enable in QEMU which have negligible cost and clear benefits to hardening. Considering QEMU shows no signs of stoppping the flow of guest triggerable CVEs, investing in hardening is worthwhile. See the respective commit messages for details I also tested enabling -ftrapv, to change signed integer overflow from wrapping, to trapping instead. This exposed a bug in the string-input-visitor which overflows when parsing ranges, and exposed the test-int128 code as (harmlessly) overflowing during its testing. Both can be fixed, but I'm not entirely sure whether -ftrapv is viable or not. I was wondering about TCG and whether it has a need to intentionally allow integer overflow for any of its instruction emulation requirements ? "make check" passes qtest but that's not sufficiently broad to make me comfortable. Thus I've not included an -ftrapv patch here. Daniel P. Berrangé (2): meson: mitigate against ROP exploits with -fzero-call-used-regs meson: mitigate against use of uninitialize stack for exploits meson.build | 16 1 file changed, 16 insertions(+) -- 2.41.0
Re: [PATCH v4 2/8] vhost-user.rst: Improve [GS]ET_VRING_BASE doc
On Wed, Oct 04, 2023 at 02:58:58PM +0200, Hanna Czenczek wrote: > GET_VRING_BASE does not mention that it stops the respective ring. Fix > that. > > Furthermore, it is not fully clear what the "base offset" these > commands' documentation refers to is; an offset could be many things. > Be more precise and verbose about it, especially given that these > commands use different payload structures depending on whether the vring > is split or packed. > > Signed-off-by: Hanna Czenczek > --- > docs/interop/vhost-user.rst | 66 ++--- > 1 file changed, 62 insertions(+), 4 deletions(-) > > diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst > index 2f68e67a1a..50f5acebe5 100644 > --- a/docs/interop/vhost-user.rst > +++ b/docs/interop/vhost-user.rst > @@ -108,6 +108,37 @@ A vring state description > > :num: a 32-bit number > > +A vring descriptor index for split virtqueues > +^ > + > ++-+-+ > +| vring index | index in avail ring | > ++-+-+ > + > +:vring index: 32-bit index of the respective virtqueue > + > +:index in avail ring: 32-bit value, of which currently only the lower 16 > + bits are used: > + > + - Bits 0–15: Next descriptor index in the *Available Ring* I think we need to say more to make this implementable just by reading the spec: Index of the next *Available Ring* descriptor that the back-end will process. This is a free-running index that is not wrapped by the ring size. Feel free to rephrase. > + - Bits 16–31: Reserved (set to zero) > + > +Vring descriptor indices for packed virtqueues > +^^ > + > ++-++ > +| vring index | descriptor indices | > ++-++ > + > +:vring index: 32-bit index of the respective virtqueue > + > +:descriptor indices: 32-bit value: > + > + - Bits 0–14: Index in the *Available Ring* Same here. > + - Bit 15: Driver (Available) Ring Wrap Counter > + - Bits 16–30: Index in the *Used Ring* Same here. > + - Bit 31: Device (Used) Ring Wrap Counter > + > A vring address description > ^^^ > > @@ -1031,18 +1062,45 @@ Front-end message types > ``VHOST_USER_SET_VRING_BASE`` >:id: 10 >:equivalent ioctl: ``VHOST_SET_VRING_BASE`` > - :request payload: vring state description > + :request payload: vring descriptor index/indices >:reply payload: N/A > > - Sets the base offset in the available vring. > + Sets the next index to use for descriptors in this vring: > + > + * For a split virtqueue, sets only the next descriptor index in the > +*Available Ring*. The device is supposed to read the next index in > +the *Used Ring* from the respective vring structure in guest memory. > + > + * For a packed virtqueue, both indices are supplied, as they are not > +explicitly available in memory. > + > + Consequently, the payload type is specific to the type of virt queue > + (*a vring descriptor index for split virtqueues* vs. *vring descriptor > + indices for packed virtqueues*). > > ``VHOST_USER_GET_VRING_BASE`` >:id: 11 >:equivalent ioctl: ``VHOST_USER_GET_VRING_BASE`` >:request payload: vring state description > - :reply payload: vring state description > + :reply payload: vring descriptor index/indices > + > + Stops the vring and returns the current descriptor index or indices: > + > +* For a split virtqueue, returns only the 16-bit next descriptor > + index in the *Available Ring*. The index in the *Used Ring* is > + controlled by the guest driver and can be read from the vring I find "is controlled by the guest driver" confusing. The device writes the Used Ring index. The driver only reads it. The device is the active party here. The sentence can be shortened to omit the "controlled by the guest driver" part. > + structure in memory, so is not covered. > + > +* For a packed virtqueue, neither index is explicitly available to > + read from memory, so both indices (as maintained by the device) are > + returned. > + > + Consequently, the payload type is specific to the type of virt queue > + (*a vring descriptor index for split virtqueues* vs. *vring descriptor > + indices for packed virtqueues*). > > - Get the available vring base offset. > + The request payload’s *num* field is currently reserved and must be > + set to 0. > > ``VHOST_USER_SET_VRING_KICK`` >:id: 12 > -- > 2.41.0 > signature.asc Description: PGP signature
[PATCH 1/2] meson: mitigate against ROP exploits with -fzero-call-used-regs
To quote wikipedia: "Return-oriented programming (ROP) is a computer security exploit technique that allows an attacker to execute code in the presence of security defenses such as executable space protection and code signing. In this technique, an attacker gains control of the call stack to hijack program control flow and then executes carefully chosen machine instruction sequences that are already present in the machine's memory, called "gadgets". Each gadget typically ends in a return instruction and is located in a subroutine within the existing program and/or shared library code. Chained together, these gadgets allow an attacker to perform arbitrary operations on a machine employing defenses that thwart simpler attacks." QEMU is by no means perfect with an ever growing set of CVEs from flawed hardware device emulation, which could potentially be exploited using ROP techniques. Since GCC 11 there has been a compiler option that can mitigate against this exploit technique: -fzero-call-user-regs To understand it refer to these two resources: https://www.jerkeby.se/newsletter/posts/rop-reduction-zero-call-user-regs/ https://gcc.gnu.org/pipermail/gcc-patches/2020-August/552262.html I used two programs to scan qemu-system-x86_64 for ROP gadgets: https://github.com/0vercl0k/rp https://github.com/JonathanSalwan/ROPgadget When asked to find 8 byte gadgets, the 'rp' tool reports: A total of 440278 gadgets found. You decided to keep only the unique ones, 156143 unique gadgets found. While the ROPgadget tool reports: Unique gadgets found: 353122 With the --ropchain argument, the latter attempts to use the found gadgets to product a chain that can execute arbitrary syscalls. With current QEMU it succeeds in this task, which is an undesirable situation. With QEMU modified to use -fzero-call-user-regs=used-gpr the 'rp' tool reports A total of 528991 gadgets found. You decided to keep only the unique ones, 121128 unique gadgets found. This is 22% fewer unique gadgets While the ROPgadget tool reports: Unique gadgets found: 328605 This is 7% fewer unique gadgets. Crucially though, despite this more modest reduction, the ROPgadget tool is no longer able to identify a chain of gadgets for executing arbitrary syscalls. It fails at the very first step, unable to find gadgets for populating registers for a future syscall. Having said that, more advanced tools do still manage to put together a viable ROP chain. Also this only takes into account QEMU code. QEMU links to many 3rd party shared libraries and ideally all of them would be compiled with this same hardening. That becomes a distro policy question though. In terms of performance impact, TCG was used as an evaluation test case. We're not interested in protecting TCG since it isn't designed to provide a security barrier, but it is performance sensitive code, so useful as a guide to how other areas of QEMU might be impacted. With the -fzero-call-user-regs=used-gpr argument present, using the real world test of booting a linux kernel and having init immediately poweroff, there is a ~1% slow down in performance under TCG. The QEMU binary size also grows by approximately 1%. By comparison, using the more aggressive -fzero-call-user-regs=all, results in a slowdown of over 25% in TCG, which is clearly not an acceptable impact, and a binary size increase of 5%. Considering that 'used-gpr' succesfully stopped ROPgadget assembling a chain, this more targetted protection is a justifiable hardening / performance tradeoff. Signed-off-by: Daniel P. Berrangé --- meson.build | 11 +++ 1 file changed, 11 insertions(+) diff --git a/meson.build b/meson.build index 20ceeb8158..2003ca1ba4 100644 --- a/meson.build +++ b/meson.build @@ -435,6 +435,17 @@ if get_option('fuzzing') endif endif +# Check further flags that make QEMU more robust against malicious parties + +hardening_flags = [ +# Zero out registers used during a function call +# upon its return. This makes it harder to assemble +# ROP gadgets into something usable +'-fzero-call-used-regs=used-gpr', +] + +qemu_common_flags += cc.get_supported_arguments(hardening_flags) + add_global_arguments(qemu_common_flags, native: false, language: all_languages) add_global_link_arguments(qemu_ldflags, native: false, language: all_languages) -- 2.41.0
[PATCH 2/2] meson: mitigate against use of uninitialize stack for exploits
When variables are used without being initialized, there is potential to take advantage of data that was pre-existing on the stack from an earlier call, to drive an exploit. It is good practice to always initialize variables, and the compiler can warn about flaws when -Wuninitialized is present. This warning, however, is by no means foolproof with its output varying depending on compiler version and which optimizations are enabled. The -ftrivial-auto-var-init option can be used to tell the compiler to always initialize all variables. This increases the security and predictability of the program, closing off certain attack vectors, reducing the risk of unsafe memory disclosure. While the option takes several possible values, using 'zero' is considered to be the option that is likely to lead to semantically correct or safe behaviour[1]. eg sizes/indexes are not likely to lead to out-of-bounds accesses when initialized to zero. Pointers are less likely to point something useful if initialized to zero. Even with -ftrivial-auto-var-init=zero set, GCC will still issue warnings with -Wuninitialized if it discovers a problem, so we are not loosing diagnostics for developers, just hardening runtime behaviour and making QEMU behave more predictably in case of hitting bad codepaths. [1] https://lists.llvm.org/pipermail/cfe-dev/2020-April/065221.html Signed-off-by: Daniel P. Berrangé --- meson.build | 5 + 1 file changed, 5 insertions(+) diff --git a/meson.build b/meson.build index 2003ca1ba4..19faea8d30 100644 --- a/meson.build +++ b/meson.build @@ -442,6 +442,11 @@ hardening_flags = [ # upon its return. This makes it harder to assemble # ROP gadgets into something usable '-fzero-call-used-regs=used-gpr', + +# Initialize all stack variables to zero. This makes +# it harder to take advantage of uninitialized stack +# data to drive exploits +'-ftrivial-var-auto-init=zero', ] qemu_common_flags += cc.get_supported_arguments(hardening_flags) -- 2.41.0
[PATCH v25 19/21] tests/avocado: s390x cpu topology test socket full
From: Pierre Morel This test verifies that QMP set-cpu-topology does not accept to overload a socket. Signed-off-by: Pierre Morel Reviewed-by: Thomas Huth --- tests/avocado/s390_topology.py | 26 ++ 1 file changed, 26 insertions(+) diff --git a/tests/avocado/s390_topology.py b/tests/avocado/s390_topology.py index 3661048f4c..a63c2b2923 100644 --- a/tests/avocado/s390_topology.py +++ b/tests/avocado/s390_topology.py @@ -338,3 +338,29 @@ def test_dedicated(self): self.guest_set_dispatching('0'); self.check_topology(0, 0, 0, 0, 'high', True) self.check_polarization("horizontal") + + +def test_socket_full(self): +""" +This test verifies that QEMU does not accept to overload a socket. +The socket-id 0 on book-id 0 already contains CPUs 0 and 1 and can +not accept any new CPU while socket-id 0 on book-id 1 is free. + +:avocado: tags=arch:s390x +:avocado: tags=machine:s390-ccw-virtio +""" +self.kernel_init() +self.vm.add_args('-smp', + '3,drawers=2,books=2,sockets=3,cores=2,maxcpus=24') +self.vm.launch() +self.wait_until_booted() + +self.system_init() + +res = self.vm.qmp('set-cpu-topology', + {'core-id': 2, 'socket-id': 0, 'book-id': 0}) +self.assertEqual(res['error']['class'], 'GenericError') + +res = self.vm.qmp('set-cpu-topology', + {'core-id': 2, 'socket-id': 0, 'book-id': 1}) +self.assertEqual(res['return'], {}) -- 2.39.2
[PATCH v25 20/21] tests/avocado: s390x cpu topology dedicated errors
From: Pierre Morel Let's test that QEMU refuses to setup a dedicated CPU with low or medium entitlement. Signed-off-by: Pierre Morel Reviewed-by: Thomas Huth --- tests/avocado/s390_topology.py | 48 ++ 1 file changed, 48 insertions(+) diff --git a/tests/avocado/s390_topology.py b/tests/avocado/s390_topology.py index a63c2b2923..d3e6556c0f 100644 --- a/tests/avocado/s390_topology.py +++ b/tests/avocado/s390_topology.py @@ -364,3 +364,51 @@ def test_socket_full(self): res = self.vm.qmp('set-cpu-topology', {'core-id': 2, 'socket-id': 0, 'book-id': 1}) self.assertEqual(res['return'], {}) + +def test_dedicated_error(self): +""" +This test verifies that QEMU refuses to lower the entitlement +of a dedicated CPU + +:avocado: tags=arch:s390x +:avocado: tags=machine:s390-ccw-virtio +""" +self.kernel_init() +self.vm.launch() +self.wait_until_booted() + +self.system_init() + +res = self.vm.qmp('set-cpu-topology', + {'core-id': 0, 'dedicated': True}) +self.assertEqual(res['return'], {}) + +self.check_topology(0, 0, 0, 0, 'high', True) + +self.guest_set_dispatching('1'); + +self.check_topology(0, 0, 0, 0, 'high', True) + +res = self.vm.qmp('set-cpu-topology', + {'core-id': 0, 'entitlement': 'low', 'dedicated': True}) +self.assertEqual(res['error']['class'], 'GenericError') + +res = self.vm.qmp('set-cpu-topology', + {'core-id': 0, 'entitlement': 'low'}) +self.assertEqual(res['error']['class'], 'GenericError') + +res = self.vm.qmp('set-cpu-topology', + {'core-id': 0, 'entitlement': 'medium', 'dedicated': True}) +self.assertEqual(res['error']['class'], 'GenericError') + +res = self.vm.qmp('set-cpu-topology', + {'core-id': 0, 'entitlement': 'medium'}) +self.assertEqual(res['error']['class'], 'GenericError') + +res = self.vm.qmp('set-cpu-topology', + {'core-id': 0, 'entitlement': 'low', 'dedicated': False}) +self.assertEqual(res['return'], {}) + +res = self.vm.qmp('set-cpu-topology', + {'core-id': 0, 'entitlement': 'medium', 'dedicated': False}) +self.assertEqual(res['return'], {}) -- 2.39.2