date:20240513

Re: [PATCH 1/2] target/riscv: prioritize pmp errors in raise_mmu_exception()

2024-05-13 Thread Alistair Francis

On Sat, Apr 13, 2024 at 9:00 PM Alexei Filippov
 wrote:
>
> From: Daniel Henrique Barboza 
>
> raise_mmu_exception(), as is today, is prioritizing guest page faults by
> checking first if virt_enabled && !first_stage, and then considering the
> regular inst/load/store faults.
>
> There's no mention in the spec about guest page fault being a higher
> priority that PMP faults. In fact, privileged spec section 3.7.1 says:
>
> "Attempting to fetch an instruction from a PMP region that does not have
> execute permissions raises an instruction access-fault exception.
> Attempting to execute a load or load-reserved instruction which accesses
> a physical address within a PMP region without read permissions raises a
> load access-fault exception. Attempting to execute a store,
> store-conditional, or AMO instruction which accesses a physical address
> within a PMP region without write permissions raises a store
> access-fault exception."
>
> So, in fact, we're doing it wrong - PMP faults should always be thrown,
> regardless of also being a first or second stage fault.
>
> The way riscv_cpu_tlb_fill() and get_physical_address() work is
> adequate: a TRANSLATE_PMP_FAIL error is immediately reported and
> reflected in the 'pmp_violation' flag. What we need is to change
> raise_mmu_exception() to prioritize it.
>
> Reported-by: Joseph Chan 
> Fixes: 82d53adfbb ("target/riscv/cpu_helper.c: Invalid exception on MMU 
> translation stage")
> Signed-off-by: Daniel Henrique Barboza 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  target/riscv/cpu_helper.c | 22 --
>  1 file changed, 12 insertions(+), 10 deletions(-)
>
> diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
> index bc70ab5abc..196166f8dd 100644
> --- a/target/riscv/cpu_helper.c
> +++ b/target/riscv/cpu_helper.c
> @@ -1203,28 +1203,30 @@ static void raise_mmu_exception(CPURISCVState *env, 
> target_ulong address,
>
>  switch (access_type) {
>  case MMU_INST_FETCH:
> -if (env->virt_enabled && !first_stage) {
> +if (pmp_violation) {
> +cs->exception_index = RISCV_EXCP_INST_ACCESS_FAULT;
> +} else if (env->virt_enabled && !first_stage) {
>  cs->exception_index = RISCV_EXCP_INST_GUEST_PAGE_FAULT;
>  } else {
> -cs->exception_index = pmp_violation ?
> -RISCV_EXCP_INST_ACCESS_FAULT : RISCV_EXCP_INST_PAGE_FAULT;
> +cs->exception_index = RISCV_EXCP_INST_PAGE_FAULT;
>  }
>  break;
>  case MMU_DATA_LOAD:
> -if (two_stage && !first_stage) {
> +if (pmp_violation) {
> +cs->exception_index = RISCV_EXCP_LOAD_ACCESS_FAULT;
> +} else if (two_stage && !first_stage) {
>  cs->exception_index = RISCV_EXCP_LOAD_GUEST_ACCESS_FAULT;
>  } else {
> -cs->exception_index = pmp_violation ?
> -RISCV_EXCP_LOAD_ACCESS_FAULT : RISCV_EXCP_LOAD_PAGE_FAULT;
> +cs->exception_index = RISCV_EXCP_LOAD_PAGE_FAULT;
>  }
>  break;
>  case MMU_DATA_STORE:
> -if (two_stage && !first_stage) {
> +if (pmp_violation) {
> +cs->exception_index = RISCV_EXCP_STORE_AMO_ACCESS_FAULT;
> +} else if (two_stage && !first_stage) {
>  cs->exception_index = RISCV_EXCP_STORE_GUEST_AMO_ACCESS_FAULT;
>  } else {
> -cs->exception_index = pmp_violation ?
> -RISCV_EXCP_STORE_AMO_ACCESS_FAULT :
> -RISCV_EXCP_STORE_PAGE_FAULT;
> +cs->exception_index = RISCV_EXCP_STORE_PAGE_FAULT;
>  }
>  break;
>  default:
> --
> 2.34.1
>
>

RE: [PATCH intel_iommu 0/7] FLTS for VT-d

2024-05-13 Thread Duan, Zhenzhong

Hi Clement,

I'll learn and try to give comments this week.

Thanks
Zhenzhong

>-Original Message-
>From: CLEMENT MATHIEU--DRIF 
>Subject: Re: [PATCH intel_iommu 0/7] FLTS for VT-d
>
>Hi Zhenzhong
>
>Have you had time to review the ATS series rebased on you FLTS patches?
>
>Thanks
> >cmd
>
>
>On 06/05/2024 03:38, Duan, Zhenzhong wrote:
>> Caution: External email. Do not open attachments or click links, unless this
>email comes from a known sender and you know the content is safe.
>>
>>
>> Hi Clement,
>>
>> Sorry for late response, just back from vacation.
>> I saw your rebased version and thanks for your work.
>> I'll schedule a timeslot to review them.
>>
>> Thanks
>> Zhenzhong
>>
>>> -Original Message-
>>> From: CLEMENT MATHIEU--DRIF 
>>> Subject: Re: [PATCH intel_iommu 0/7] FLTS for VT-d
>>>
>>> Hi Zhenzhong,
>>>
>>> I will rebase,
>>>
>>> thanks
>>>
>>> On 01/05/2024 14:40, Duan, Zhenzhong wrote:
 Caution: External email. Do not open attachments or click links, unless
>this
>>> email comes from a known sender and you know the content is safe.

 Ah, this is a duplicate effort on stage-1 translation.

 Hi Clement,

 We had ever sent a rfcv1 series "intel_iommu: Enable stage-1
>translation"
 for both emulated and passthrough device, link:
 https://lists.gnu.org/archive/html/qemu-devel/2024-
>01/msg02740.html
 which now evolves to rfcv2, link:

>>>
>https://github.com/yiliu1765/qemu/commits/zhenzhong/iommufd_nesting
>>> _rfcv2/
 It had addressed recent community comments, also the comments in
>old
>>> history series:
>>>
>https://patchwork.kernel.org/project/kvm/cover/20210302203827.437645
>>> -1-yi.l@intel.com/
 Would you mind rebasing your remaining part, i.e., ATS, PRI emulation,
>etc
>>> on to our rfcv2?
 Thanks
 Zhenzhong

> -Original Message-
> From: Cédric Le Goater 
> Subject: Re: [PATCH intel_iommu 0/7] FLTS for VT-d
>
> Hello,
>
> Adding a few people in Cc: who are familiar with the Intel IOMMU.
>
> Thanks,
>
> C.
>
>
>
>
> On 4/22/24 17:52, CLEMENT MATHIEU--DRIF wrote:
>> This series is the first of a list that add support for SVM in the Intel
>>> IOMMU.
>> Here, we implement support for first-stage translation in VT-d.
>> The PASID-based IOTLB invalidation is also added in this series as it is
>a
>> requirement of FLTS.
>>
>> The last patch introduces the 'flts' option to enable the feature from
>> the command line.
>> Once enabled, several drivers of the Linux kernel use this feature.
>>
>> This work is based on the VT-d specification version 4.1 (March 2023)
>>
>> Here is a link to a GitHub repository where you can find the following
> elements :
>>- Qemu with all the patches for SVM
>>- ATS
>>- PRI
>>- PASID based IOTLB invalidation
>>- Device IOTLB invalidations
>>- First-stage translations
>>- Requests with already translated addresses
>>- A demo device
>>- A simple driver for the demo device
>>- A userspace program (for testing and demonstration purposes)
>>
>> https://github.com/BullSequana/Qemu-in-guest-SVM-demo
>>
>> Clément Mathieu--Drif (7):
>>  intel_iommu: fix FRCD construction macro.
>>  intel_iommu: rename slpte to pte before adding FLTS
>>  intel_iommu: make types match
>>  intel_iommu: add support for first-stage translation
>>  intel_iommu: extract device IOTLB invalidation logic
>>  intel_iommu: add PASID-based IOTLB invalidation
>>  intel_iommu: add a CLI option to enable FLTS
>>
>> hw/i386/intel_iommu.c  | 655
>++-
>>> -
> -
>> hw/i386/intel_iommu_internal.h | 114 --
>> include/hw/i386/intel_iommu.h  |   3 +-
>> 3 files changed, 609 insertions(+), 163 deletions(-)
>>

Re: [PATCH v2 0/4] Fix fp16 checking in vector fp widen/narrow instructions

2024-05-13 Thread Alistair Francis

On Fri, Mar 22, 2024 at 7:33 PM Max Chou  wrote:
>
> When SEW is 16, we need to check whether the Zvfhmin is enabled for the
> single width operator for vector floating point widen/narrow
> instructions.
>
> The commits in this patchset fix the single width operator checking and
> remove the redudant SEW checking for vector floating point widen/narrow
> instructions.
>
> v2:
>   Group patchset and rebase to the riscv-to-apply.next branch(commit 385e575)
>
>
> Thanks to those who have already reviewed:
>
> Daniel Henrique Barboza dbarb...@ventanamicro.com
> [PATCH] target/riscv: rvv: Fix Zvfhmin checking for vfwcvt.f.f.v and 
> vfncvt.f.f.w instructions
> [PATCH] target/riscv: rvv: Check single width operator for vector fp 
> widen instructions
> [PATCH] target/riscv: rvv: Check single width operator for 
> vfncvt.rod.f.f.w
> [PATCH] target/riscv: rvv: Remove redudant SEW checking for vector fp 
> narrow/widen instructions
>
>
> Max Chou (4):
>   target/riscv: rvv: Fix Zvfhmin checking for vfwcvt.f.f.v and
> vfncvt.f.f.w instructions
>   target/riscv: rvv: Check single width operator for vector fp widen
> instructions
>   target/riscv: rvv: Check single width operator for vfncvt.rod.f.f.w
>   target/riscv: rvv: Remove redudant SEW checking for vector fp
> narrow/widen instructions

Thanks!

Applied to riscv-to-apply.next

Alistair

>
>  target/riscv/insn_trans/trans_rvv.c.inc | 42 -
>  1 file changed, 28 insertions(+), 14 deletions(-)
>
> --
> 2.34.1
>
>

Re: riscv disassembler error with pmpcfg0

2024-05-13 Thread Alistair Francis

On Thu, Apr 4, 2024 at 5:02 AM Eric DeVolder  wrote:
>
> I've been using QEMU8 to collect instruction information on U-Boot + OpenSBI.
>
> I'm running QEMU in this fashion to collect the information:
>
> # qemu-system-riscv64 -plugin file=qemu/build/contrib/plugins/libexeclog.so 
> -singlestep -d plugin,nochain -D execlog.txt ...
>
> When examining the instruction trace in execlog, I've noticed that the 
> disassembly for pmpcfg0 is erroneous, for example:
>
> 0, 0x5456, 0x3a002573, "csrrs   a0,pmpcfg3,zero"
>
> the CSR encoded in the instruction above is 0x3a0, which is pmpcfg0 (which 
> also matches the code I'm examining).
>
> For the Uboot+OpenSBI code I'm examining, pmpcfg0/3 is the only one that 
> appears to have a problem.
>
> I also checked QEMU9 and it behaves as described above as well.
>
> I'm willing to provide a fix if I can get some advice/pointers on how this 
> disassembly statement is generated...I did take a quick look but it didn't 
> appear obvious how...

Thanks for pointing this out. This should fix the issue for you:
https://patchew.org/QEMU/20240514051615.330979-1-alistair.fran...@wdc.com/

Alistair

>
> Thanks,
> eric
>

[PATCH] dias/riscv: Decode all of the pmpcfg and pmpaddr CSRs

2024-05-13 Thread Alistair Francis

Previously we only listed a single pmpcfg CSR and the first 16 pmpaddr
CSRs. This patch fixes this to list all 16 pmpcfg and all 64 pmpaddr
CSRs are part of the dissassembly.

Reported-by: Eric DeVolder 
Signed-off-by: Alistair Francis 
---
 disas/riscv.c | 65 ++-
 1 file changed, 64 insertions(+), 1 deletion(-)

diff --git a/disas/riscv.c b/disas/riscv.c
index e236c8b5b7..297cfa2f63 100644
--- a/disas/riscv.c
+++ b/disas/riscv.c
@@ -2190,7 +2190,22 @@ static const char *csr_name(int csrno)
 case 0x0383: return "mibound";
 case 0x0384: return "mdbase";
 case 0x0385: return "mdbound";
-case 0x03a0: return "pmpcfg3";
+case 0x03a0: return "pmpcfg0";
+case 0x03a1: return "pmpcfg1";
+case 0x03a2: return "pmpcfg2";
+case 0x03a3: return "pmpcfg3";
+case 0x03a4: return "pmpcfg4";
+case 0x03a5: return "pmpcfg5";
+case 0x03a6: return "pmpcfg6";
+case 0x03a7: return "pmpcfg7";
+case 0x03a8: return "pmpcfg8";
+case 0x03a9: return "pmpcfg9";
+case 0x03aa: return "pmpcfg10";
+case 0x03ab: return "pmpcfg11";
+case 0x03ac: return "pmpcfg12";
+case 0x03ad: return "pmpcfg13";
+case 0x03ae: return "pmpcfg14";
+case 0x03af: return "pmpcfg15";
 case 0x03b0: return "pmpaddr0";
 case 0x03b1: return "pmpaddr1";
 case 0x03b2: return "pmpaddr2";
@@ -2207,6 +,54 @@ static const char *csr_name(int csrno)
 case 0x03bd: return "pmpaddr13";
 case 0x03be: return "pmpaddr14";
 case 0x03bf: return "pmpaddr15";
+case 0x03c0: return "pmpaddr16";
+case 0x03c1: return "pmpaddr17";
+case 0x03c2: return "pmpaddr18";
+case 0x03c3: return "pmpaddr19";
+case 0x03c4: return "pmpaddr20";
+case 0x03c5: return "pmpaddr21";
+case 0x03c6: return "pmpaddr22";
+case 0x03c7: return "pmpaddr23";
+case 0x03c8: return "pmpaddr24";
+case 0x03c9: return "pmpaddr25";
+case 0x03ca: return "pmpaddr26";
+case 0x03cb: return "pmpaddr27";
+case 0x03cc: return "pmpaddr28";
+case 0x03cd: return "pmpaddr29";
+case 0x03ce: return "pmpaddr30";
+case 0x03cf: return "pmpaddr31";
+case 0x03d0: return "pmpaddr32";
+case 0x03d1: return "pmpaddr33";
+case 0x03d2: return "pmpaddr34";
+case 0x03d3: return "pmpaddr35";
+case 0x03d4: return "pmpaddr36";
+case 0x03d5: return "pmpaddr37";
+case 0x03d6: return "pmpaddr38";
+case 0x03d7: return "pmpaddr39";
+case 0x03d8: return "pmpaddr40";
+case 0x03d9: return "pmpaddr41";
+case 0x03da: return "pmpaddr42";
+case 0x03db: return "pmpaddr43";
+case 0x03dc: return "pmpaddr44";
+case 0x03dd: return "pmpaddr45";
+case 0x03de: return "pmpaddr46";
+case 0x03df: return "pmpaddr47";
+case 0x03e0: return "pmpaddr48";
+case 0x03e1: return "pmpaddr49";
+case 0x03e2: return "pmpaddr50";
+case 0x03e3: return "pmpaddr51";
+case 0x03e4: return "pmpaddr52";
+case 0x03e5: return "pmpaddr53";
+case 0x03e6: return "pmpaddr54";
+case 0x03e7: return "pmpaddr55";
+case 0x03e8: return "pmpaddr56";
+case 0x03e9: return "pmpaddr57";
+case 0x03ea: return "pmpaddr58";
+case 0x03eb: return "pmpaddr59";
+case 0x03ec: return "pmpaddr60";
+case 0x03ed: return "pmpaddr61";
+case 0x03ee: return "pmpaddr62";
+case 0x03ef: return "pmpaddr63";
 case 0x0780: return "mtohost";
 case 0x0781: return "mfromhost";
 case 0x0782: return "mreset";
-- 
2.45.0

Re: [PATCH intel_iommu 0/7] FLTS for VT-d

2024-05-13 Thread CLEMENT MATHIEU--DRIF

Hi Zhenzhong

Have you had time to review the ATS series rebased on you FLTS patches?

Thanks
 >cmd


On 06/05/2024 03:38, Duan, Zhenzhong wrote:
> Caution: External email. Do not open attachments or click links, unless this 
> email comes from a known sender and you know the content is safe.
>
>
> Hi Clement,
>
> Sorry for late response, just back from vacation.
> I saw your rebased version and thanks for your work.
> I'll schedule a timeslot to review them.
>
> Thanks
> Zhenzhong
>
>> -Original Message-
>> From: CLEMENT MATHIEU--DRIF 
>> Subject: Re: [PATCH intel_iommu 0/7] FLTS for VT-d
>>
>> Hi Zhenzhong,
>>
>> I will rebase,
>>
>> thanks
>>
>> On 01/05/2024 14:40, Duan, Zhenzhong wrote:
>>> Caution: External email. Do not open attachments or click links, unless this
>> email comes from a known sender and you know the content is safe.
>>>
>>> Ah, this is a duplicate effort on stage-1 translation.
>>>
>>> Hi Clement,
>>>
>>> We had ever sent a rfcv1 series "intel_iommu: Enable stage-1 translation"
>>> for both emulated and passthrough device, link:
>>> https://lists.gnu.org/archive/html/qemu-devel/2024-01/msg02740.html
>>> which now evolves to rfcv2, link:
>>>
>> https://github.com/yiliu1765/qemu/commits/zhenzhong/iommufd_nesting
>> _rfcv2/
>>> It had addressed recent community comments, also the comments in old
>> history series:
>> https://patchwork.kernel.org/project/kvm/cover/20210302203827.437645
>> -1-yi.l@intel.com/
>>> Would you mind rebasing your remaining part, i.e., ATS, PRI emulation, etc
>> on to our rfcv2?
>>> Thanks
>>> Zhenzhong
>>>
 -Original Message-
 From: Cédric Le Goater 
 Subject: Re: [PATCH intel_iommu 0/7] FLTS for VT-d

 Hello,

 Adding a few people in Cc: who are familiar with the Intel IOMMU.

 Thanks,

 C.




 On 4/22/24 17:52, CLEMENT MATHIEU--DRIF wrote:
> This series is the first of a list that add support for SVM in the Intel
>> IOMMU.
> Here, we implement support for first-stage translation in VT-d.
> The PASID-based IOTLB invalidation is also added in this series as it is a
> requirement of FLTS.
>
> The last patch introduces the 'flts' option to enable the feature from
> the command line.
> Once enabled, several drivers of the Linux kernel use this feature.
>
> This work is based on the VT-d specification version 4.1 (March 2023)
>
> Here is a link to a GitHub repository where you can find the following
 elements :
>- Qemu with all the patches for SVM
>- ATS
>- PRI
>- PASID based IOTLB invalidation
>- Device IOTLB invalidations
>- First-stage translations
>- Requests with already translated addresses
>- A demo device
>- A simple driver for the demo device
>- A userspace program (for testing and demonstration purposes)
>
> https://github.com/BullSequana/Qemu-in-guest-SVM-demo
>
> Clément Mathieu--Drif (7):
>  intel_iommu: fix FRCD construction macro.
>  intel_iommu: rename slpte to pte before adding FLTS
>  intel_iommu: make types match
>  intel_iommu: add support for first-stage translation
>  intel_iommu: extract device IOTLB invalidation logic
>  intel_iommu: add PASID-based IOTLB invalidation
>  intel_iommu: add a CLI option to enable FLTS
>
> hw/i386/intel_iommu.c  | 655 ++-
>> -
 -
> hw/i386/intel_iommu_internal.h | 114 --
> include/hw/i386/intel_iommu.h  |   3 +-
> 3 files changed, 609 insertions(+), 163 deletions(-)
>

Re: [PATCH v2 3/4] virtio-gpu: add x-vmstate-version

2024-05-13 Thread Peter Xu

Hey, Marc-Andre,

On Mon, May 13, 2024 at 11:19:04AM +0400, marcandre.lur...@redhat.com wrote:
> diff --git a/hw/display/virtio-gpu.c b/hw/display/virtio-gpu.c
> index ae831b6b3e..7f9fb5eacc 100644
> --- a/hw/display/virtio-gpu.c
> +++ b/hw/display/virtio-gpu.c
> @@ -1234,7 +1234,8 @@ static int virtio_gpu_save(QEMUFile *f, void *opaque, 
> size_t size,
>  }
>  qemu_put_be32(f, 0); /* end of list */
>  
> -return vmstate_save_state(f, _virtio_gpu_scanouts, g, NULL);
> +return vmstate_save_state_v(f, _virtio_gpu_scanouts, g,
> +NULL, g->vmstate_version, NULL);
>  }
>  
>  static bool virtio_gpu_load_restore_mapping(VirtIOGPU *g,
> @@ -1339,7 +1340,7 @@ static int virtio_gpu_load(QEMUFile *f, void *opaque, 
> size_t size,
>  }
>  
>  /* load & apply scanout state */
> -vmstate_load_state(f, _virtio_gpu_scanouts, g, 1);
> +vmstate_load_state(f, _virtio_gpu_scanouts, g, 
> g->vmstate_version);

[sorry for a late response; attending a conf, and will reply to the v1
 thread later for the other discussions..]

These two changes shouldn't be needed if we go with the .field_exists()
approach, am I right?  IIUC in that case we can keep the version 1 here and
don't boost anything, because we relied on the machine versions.

IIUC this might be the reason why we found 9.0 mahines are broken on
migration.  E.g, IIUC my original patch should work for 9.0<->9.0 too.

Thanks,

>  
>  return 0;
>  }
> @@ -1659,6 +1660,7 @@ static Property virtio_gpu_properties[] = {
>  DEFINE_PROP_BIT("blob", VirtIOGPU, parent_obj.conf.flags,
>  VIRTIO_GPU_FLAG_BLOB_ENABLED, false),
>  DEFINE_PROP_SIZE("hostmem", VirtIOGPU, parent_obj.conf.hostmem, 0),
> +DEFINE_PROP_UINT8("x-vmstate-version", VirtIOGPU, vmstate_version, 1),
>  DEFINE_PROP_END_OF_LIST(),
>  };
>  
> -- 
> 2.41.0.28.gd7d8841f67
> 

-- 
Peter Xu

Re: [PATCH v2 2/4] migration: fix a typo

2024-05-13 Thread Peter Xu

On Mon, May 13, 2024 at 11:19:03AM +0400, marcandre.lur...@redhat.com wrote:
> From: Marc-André Lureau 
> 
> Signed-off-by: Marc-André Lureau 

Reviewed-by: Peter Xu 

-- 
Peter Xu

Re: [RFC 0/2] Identify aliased maps in vdpa SVQ iova_tree

2024-05-13 Thread Jason Wang

On Mon, May 13, 2024 at 5:58 PM Eugenio Perez Martin
 wrote:
>
> On Mon, May 13, 2024 at 10:28 AM Jason Wang  wrote:
> >
> > On Mon, May 13, 2024 at 2:28 PM Eugenio Perez Martin
> >  wrote:
> > >
> > > On Sat, May 11, 2024 at 6:07 AM Jason Wang  wrote:
> > > >
> > > > On Fri, May 10, 2024 at 3:16 PM Eugenio Perez Martin
> > > >  wrote:
> > > > >
> > > > > On Fri, May 10, 2024 at 6:29 AM Jason Wang  
> > > > > wrote:
> > > > > >
> > > > > > On Thu, May 9, 2024 at 3:10 PM Eugenio Perez Martin 
> > > > > >  wrote:
> > > > > > >
> > > > > > > On Thu, May 9, 2024 at 8:27 AM Jason Wang  
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > On Thu, May 9, 2024 at 1:16 AM Eugenio Perez Martin 
> > > > > > > >  wrote:
> > > > > > > > >
> > > > > > > > > On Wed, May 8, 2024 at 4:29 AM Jason Wang 
> > > > > > > > >  wrote:
> > > > > > > > > >
> > > > > > > > > > On Tue, May 7, 2024 at 6:57 PM Eugenio Perez Martin 
> > > > > > > > > >  wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Tue, May 7, 2024 at 9:29 AM Jason Wang 
> > > > > > > > > > >  wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > On Fri, Apr 12, 2024 at 3:56 PM Eugenio Perez Martin
> > > > > > > > > > > >  wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Fri, Apr 12, 2024 at 8:47 AM Jason Wang 
> > > > > > > > > > > > >  wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Wed, Apr 10, 2024 at 6:03 PM Eugenio Pérez 
> > > > > > > > > > > > > >  wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > The guest may have overlapped memory regions, 
> > > > > > > > > > > > > > > where different GPA leads
> > > > > > > > > > > > > > > to the same HVA.  This causes a problem when 
> > > > > > > > > > > > > > > overlapped regions
> > > > > > > > > > > > > > > (different GPA but same translated HVA) exists in 
> > > > > > > > > > > > > > > the tree, as looking
> > > > > > > > > > > > > > > them by HVA will return them twice.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I think I don't understand if there's any side 
> > > > > > > > > > > > > > effect for shadow virtqueue?
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > My bad, I totally forgot to put a reference to where 
> > > > > > > > > > > > > this comes from.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Si-Wei found that during initialization this 
> > > > > > > > > > > > > sequences of maps /
> > > > > > > > > > > > > unmaps happens [1]:
> > > > > > > > > > > > >
> > > > > > > > > > > > > HVAGPAIOVA
> > > > > > > > > > > > > -
> > > > > > > > > > > > > Map
> > > > > > > > > > > > > [0x7f7903e0, 0x7f7983e0)[0x0, 0x8000) 
> > > > > > > > > > > > > [0x1000, 0x8000)
> > > > > > > > > > > > > [0x7f7983e0, 0x7f9903e0)[0x1, 
> > > > > > > > > > > > > 0x208000)
> > > > > > > > > > > > > [0x80001000, 0x201000)
> > > > > > > > > > > > > [0x7f7903ea, 0x7f7903ec)[0xfeda, 
> > > > > > > > > > > > > 0xfedc)
> > > > > > > > > > > > > [0x201000, 0x221000)
> > > > > > > > > > > > >
> > > > > > > > > > > > > Unmap
> > > > > > > > > > > > > [0x7f7903ea, 0x7f7903ec)[0xfeda, 
> > > > > > > > > > > > > 0xfedc) [0x1000,
> > > > > > > > > > > > > 0x2) ???
> > > > > > > > > > > > >
> > > > > > > > > > > > > The third HVA range is contained in the first one, 
> > > > > > > > > > > > > but exposed under a
> > > > > > > > > > > > > different GVA (aliased). This is not "flattened" by 
> > > > > > > > > > > > > QEMU, as GPA does
> > > > > > > > > > > > > not overlap, only HVA.
> > > > > > > > > > > > >
> > > > > > > > > > > > > At the third chunk unmap, the current algorithm finds 
> > > > > > > > > > > > > the first chunk,
> > > > > > > > > > > > > not the second one. This series is the way to tell 
> > > > > > > > > > > > > the difference at
> > > > > > > > > > > > > unmap time.
> > > > > > > > > > > > >
> > > > > > > > > > > > > [1] 
> > > > > > > > > > > > > https://lists.nongnu.org/archive/html/qemu-devel/2024-04/msg00079.html
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks!
> > > > > > > > > > > >
> > > > > > > > > > > > Ok, I was wondering if we need to store GPA(GIOVA) to 
> > > > > > > > > > > > HVA mappings in
> > > > > > > > > > > > the iova tree to solve this issue completely. Then 
> > > > > > > > > > > > there won't be
> > > > > > > > > > > > aliasing issues.
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > I'm ok to explore that route but this has another 
> > > > > > > > > > > problem. Both SVQ
> > > > > > > > > > > vrings and CVQ buffers also need to be addressable by 
> > > > > > > > > > > VhostIOVATree,
> > > > > > > > > > > and they do not have GPA.
> > > > > > > > > > >
> > > > > > > > > > > At this moment vhost_svq_translate_addr is

RE: [PATCH v2 00/11] VFIO: misc cleanups

2024-05-13 Thread Duan, Zhenzhong

Hi All,

When I looked into more functions passing 'Error **',
I see many are in "int testfunc(..., Error **errp)" format. I was a bit 
confused.

The qapi/error.h suggests:

* - Whenever practical, also return a value that indicates success /
 *   failure.  This can make the error checking more concise, and can
 *   avoid useless error object creation and destruction.  Note that
 *   we still have many functions returning void.  We recommend
 *   • bool-valued functions return true on success / false on failure,
 *   • pointer-valued functions return non-null / null pointer, and
 *   • integer-valued functions return non-negative / negative.

There are some functions like:

int testfunc(..., Error **errp)
{
If (succeed) {
return 0;
} else {
return -EINVAL;
}
}

Does testfunc() follow 'integer-valued functions' as above or it should be 
changed to 'bool-valued functions'?

Is there a clear rule in which case to change 'int testfunc(... Error **errp)' 
to ' bool testfunc(... Error **errp)'?

Thanks
Zhenzhong

>-Original Message-
>From: Duan, Zhenzhong 
>
>Subject: [PATCH v2 00/11] VFIO: misc cleanups
>
>Hi
>
>This is a cleanup series to change functions in hw/vfio/ to return bool
>when the error is passed through errp parameter, also some cleanup
>with g_autofree.
>
>See discussion at https://lists.gnu.org/archive/html/qemu-devel/2024-
>04/msg04782.html
>
>This series processed below files:
>hw/vfio/container.c
>hw/vfio/iommufd.c
>hw/vfio/cpr.c
>backends/iommufd.c
>
>So above files are clean now, there are still other files need processing
>in hw/vfio.
>
>Test done on x86 platform:
>vfio device hotplug/unplug with different backend
>reboot
>
>Thanks
>Zhenzhong
>
>Changelog:
>v2:
>- split out g_autofree code as a patch (Cédric)
>- add processing for more files
>
>Zhenzhong Duan (11):
>  vfio/pci: Use g_autofree in vfio_realize
>  vfio/pci: Use g_autofree in iommufd_cdev_get_info_iova_range()
>  vfio: Make VFIOIOMMUClass::attach_device() and its wrapper return bool
>  vfio: Make VFIOIOMMUClass::setup() return bool
>  vfio: Make VFIOIOMMUClass::add_window() and its wrapper return bool
>  vfio/container: Make vfio_connect_container() return bool
>  vfio/container: Make vfio_set_iommu() return bool
>  vfio/container: Make vfio_get_device() return bool
>  vfio/iommufd: Make iommufd_cdev_*() return bool
>  vfio/cpr: Make vfio_cpr_register_container() return bool
>  backends/iommufd: Make iommufd_backend_*() return bool
>
> include/hw/vfio/vfio-common.h |   6 +-
> include/hw/vfio/vfio-container-base.h |  18 ++---
> include/sysemu/iommufd.h  |   6 +-
> backends/iommufd.c|  29 +++
> hw/vfio/ap.c  |   6 +-
> hw/vfio/ccw.c |   6 +-
> hw/vfio/common.c  |   6 +-
> hw/vfio/container-base.c  |   8 +-
> hw/vfio/container.c   |  81 +--
> hw/vfio/cpr.c |   4 +-
> hw/vfio/iommufd.c | 109 +++---
> hw/vfio/pci.c |  12 ++-
> hw/vfio/platform.c|   7 +-
> hw/vfio/spapr.c   |  28 +++
> backends/trace-events |   4 +-
> 15 files changed, 147 insertions(+), 183 deletions(-)
>
>--
>2.34.1

Re: [PATCH 1/2] hw/core: allow parameter=1 for SMP topology on any machine

2024-05-13 Thread Zhao Liu

> I'm failing to see what real world technical problems QEMU faces
> with a parameter being set to '1' by a mgmt app, when QEMU itself
> treats all omitted values as being '1' anyway.
> 
> If we're trying to faithfully model the real world, then restricting
> the topology against machine types though still looks inherantly wrong.
> The valid topology ought to be constrained based on the named CPU model.
> eg it doesn't make sense to allow 'dies=4' with a Skylake CPU model,
> only an EPYC CPU model, especially if we want to model cache info in
> a way that matches the real world silicon better.

Thanks for figuring out this. This issue is related with Intel CPU
cache model: currently Intel code defaults L3 shared at die level.
This could be resolved by defining the accurate default cache topology
level for CPU model and make Intel CPU models share L3 at package level
except only Cascadelake.

Then user could define any other topology levels (die/module) for
Icelake and this won't change the cache topology, unless the user adds
more sockets or further customizes the cache topology in another way [1].
Do you agree with this solution?

[1]: 
https://lore.kernel.org/qemu-devel/20240220092504.726064-1-zhao1@linux.intel.com/

[snip]

> As above, I think that restrictions based on machine type, while nice and
> simple, are incorrect long term. If we did impose restrictions based on
> CPU model, then we could trivially expose this info to mgmt apps via the
> existing mechanism for querying supported CPU models. Limiting based on
> CPU model, however, has potentially greater back compat issues, though
> it would be strictly more faithful to hardware.

I think as long as the default cache topology model is clearly defined,
users can further customize the CPU topology and adjust the cache
topology based on it. After all, topology is architectural, not CPU
model-specific (linux support for topology does not take into account
specific CPU models).

For example, x86, for simplicity, can we assume that all x86 CPU models
support all x86 topology levels (thread/core/module/die/package) without
making distinctions based on specific CPU models?

That way as long as the user doesn't change the default topology, then
Guest's cache and other topology information won't be "corrupted".

And there's one more question, does this rollback mean that smp's
parameters must have compatible default values for all architectures?

This is related with my SMP cache proposal above [1], should I provide
default entries (e.g. default) to be compatible with all architectures,
even if they don't support custom cache topology? Like the following:

-smp 32,sockets=2,dies=2,modules=2,cores=2,threads=2,maxcpus=32,\
 l1d-cache=default,l1i-cache=default,l2-cache=default,l3-cache=default

Thanks,
Zhao

Re: [PATCH] target/ppc: handle vcpu hotplug failure gracefully

2024-05-13 Thread Nicholas Piggin

On Tue Apr 23, 2024 at 4:30 PM AEST, Harsh Prateek Bora wrote:
> + qemu-devel
>
> On 4/23/24 11:40, Harsh Prateek Bora wrote:
> > On ppc64, the PowerVM hypervisor runs with limited memory and a VCPU
> > creation during hotplug may fail during kvm_ioctl for KVM_CREATE_VCPU,
> > leading to termination of guest since errp is set to _fatal while
> > calling kvm_init_vcpu. This unexpected behaviour can be avoided by
> > pre-creating vcpu and parking it on success or return error otherwise.
> > This enables graceful error delivery for any vcpu hotplug failures while
> > the guest can keep running.

So this puts in on the park list so when kvm_init_vcpu() later runs it
will just take it off the park list instead of issuing another
KVM_CREATE_VCPU ioctl.

And kvm_init_vcpu() runs in the vcpu thread function, which does not
have a good way to indicate failure to the caller.

I'm don't know a lot about this part of qemu but it seems like a good
idea to move fail-able initialisation out of the vcpu thread in that
case. So the general idea seems good to me.

> > 
> > Based on api refactoring to create/park vcpus introduced in 1/8 of patch 
> > series:
> > https://lore.kernel.org/qemu-devel/2024031202.12992-2-salil.me...@huawei.com/

So from this series AFAIKS you're just using kvm_create / kvm_park
routines? You could easily pull that patch 1 out ahead of that larger
series if progress is slow on it, it's a decent cleanup by itself by
the looks.

> > 
> > Tested OK by repeatedly doing a hotplug/unplug of vcpus as below:
> > 
> >   #virsh setvcpus hotplug 40
> >   #virsh setvcpus hotplug 70
> > error: internal error: unable to execute QEMU command 'device_add':
> > kvmppc_cpu_realize: vcpu hotplug failed with -12
> > 
> > Reported-by: Anushree Mathur 
> > Suggested-by: Shivaprasad G Bhat 
> > Suggested-by: Vaibhav Jain 
> > Signed-off by: Harsh Prateek Bora 
> > ---
> > ---
> >   target/ppc/kvm.c | 42 ++
> >   1 file changed, 42 insertions(+)
> > 
> > diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
> > index 8231feb2d4..c887f6dfa0 100644
> > --- a/target/ppc/kvm.c
> > +++ b/target/ppc/kvm.c
> > @@ -48,6 +48,8 @@
> >   #include "qemu/mmap-alloc.h"
> >   #include "elf.h"
> >   #include "sysemu/kvm_int.h"
> > +#include "sysemu/kvm.h"
> > +#include "hw/core/accel-cpu.h"
> >   
> >   #define PROC_DEVTREE_CPU  "/proc/device-tree/cpus/"
> >   
> > @@ -2339,6 +2341,43 @@ static void alter_insns(uint64_t *word, uint64_t 
> > flags, bool on)
> >   }
> >   }
> >   
> > +static int max_cpu_index = 0;
> > +
> > +static bool kvmppc_cpu_realize(CPUState *cs, Error **errp)
> > +{
> > +int ret;
> > +
> > +cs->cpu_index = max_cpu_index++;
> > +
> > +POWERPC_CPU(cs)->vcpu_id = cs->cpu_index;

So you're overriding the cpu_get_free_index() allocator here.
And you need to because vcpu_id needs to be assigned before
the KVM create, I guess.

I guess it works. I would add a comment like s390x has.

> > +
> > +if (cs->parent_obj.hotplugged) {

Can _all_ kvm cpu creation go via this path? Why just limit it to
hotplugged?

> > +/* create and park to fail gracefully in case vcpu hotplug fails */
> > +ret = kvm_create_vcpu(cs);
> > +if (!ret) {
> > +kvm_park_vcpu(cs);

Seems like a small thing, but I would add a new core kvm function
that creates and parks the vcpu, so the target code doesn't have
to know about the parking internals, just that it needs to be
called.

Unless I'm missing something, we could get all targets to move their kvm
create to here and remove it removed from kvm_init_vcpu(), that would
just expect it to be on the parked list. But that could be done
incrementally.

> > +} else {
> > +max_cpu_index--;
> > +error_setg(errp, "%s: vcpu hotplug failed with %d",
> > + __func__, ret);
> > +return false;
> > +}
> > +}
> > +return true;
> > +}
> > +
> > +static void kvmppc_cpu_unrealize(CPUState *cpu)
> > +{
> > +if (POWERPC_CPU(cpu)->vcpu_id == (max_cpu_index - 1)) {
> > +/* only reclaim vcpuid if its the last one assigned
> > + * as reclaiming random vcpuid for parked vcpus may lead
> > + * to unexpected behaviour due to an existing kernel bug
> > + * when drc_index doesnt get reclaimed as expected.
> > + */
> > +max_cpu_index--;
> > +}

This looks like a fairly lossy allocator. Using cpu_get_free_index()
would be the way to go I think. I would export that and call it here,
and then you don't need this. Just have to take care of the assert,
something like this:

diff --git a/cpu-common.c b/cpu-common.c
index ce78273af5..9f90c8ec9b 100644
--- a/cpu-common.c
+++ b/cpu-common.c
@@ -57,14 +57,11 @@ void cpu_list_unlock(void)
 qemu_mutex_unlock(_cpu_list_lock);
 }

-static bool cpu_index_auto_assigned;
-
-static int cpu_get_free_index(void)
+int cpu_get_free_index(void)
 {
 CPUState *some_cpu;
 int

[PATCH] hw/loongarch: Add VM mode in IOCSR feature register in kvm mode

2024-05-13 Thread Bibo Mao

If VM runs in kvm mode, VM mode is added in IOCSR feature register.
So guest can detect kvm hypervisor type and enable possible pv functions.

Signed-off-by: Bibo Mao 
---
 hw/loongarch/virt.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/hw/loongarch/virt.c b/hw/loongarch/virt.c
index d87d9be576..44bcf25aee 100644
--- a/hw/loongarch/virt.c
+++ b/hw/loongarch/virt.c
@@ -10,6 +10,7 @@
 #include "qapi/error.h"
 #include "hw/boards.h"
 #include "hw/char/serial.h"
+#include "sysemu/kvm.h"
 #include "sysemu/sysemu.h"
 #include "sysemu/qtest.h"
 #include "sysemu/runstate.h"
@@ -840,18 +841,23 @@ static void virt_iocsr_misc_write(void *opaque, hwaddr 
addr,
 
 static uint64_t virt_iocsr_misc_read(void *opaque, hwaddr addr, unsigned size)
 {
+uint64_t ret;
+
 switch (addr) {
 case VERSION_REG:
 return 0x11ULL;
 case FEATURE_REG:
-return 1ULL << IOCSRF_MSI | 1ULL << IOCSRF_EXTIOI |
-   1ULL << IOCSRF_CSRIPI;
+ret = BIT(IOCSRF_MSI) | BIT(IOCSRF_EXTIOI) | BIT(IOCSRF_CSRIPI);
+if (kvm_enabled()) {
+ret |= BIT(IOCSRF_VM);
+}
+return ret;
 case VENDOR_REG:
 return 0x6e6f73676e6f6f4cULL; /* "Loongson" */
 case CPUNAME_REG:
 return 0x303030354133ULL; /* "3A5000" */
 case MISC_FUNC_REG:
-return 1ULL << IOCSRM_EXTIOI_EN;
+return BIT_ULL(IOCSRM_EXTIOI_EN);
 }
 return 0ULL;
 }
-- 
2.39.3

[PATCH] target/riscv: rvzicbo: Fixup CBO extension register calculation

2024-05-13 Thread Alistair Francis

When running the instruction

```
cbo.flush 0(x0)
```

QEMU would segfault.

The issue was in cpu_gpr[a->rs1] as QEMU does not have cpu_gpr[0]
allocated.

In order to fix this let's use the existing get_address()
helper. This also has the benefit of performing pointer mask
calculations on the address specified in rs1.

The pointer masking specificiation specifically states:

"""
Cache Management Operations: All instructions in Zicbom, Zicbop and Zicboz
"""

So this is the correct behaviour and we previously have been incorrectly
not masking the address.

Signed-off-by: Alistair Francis 
Reported-by: Fabian Thomas 
Fixes: e05da09b7cfd ("target/riscv: implement Zicbom extension")
---
 target/riscv/insn_trans/trans_rvzicbo.c.inc | 16 
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/target/riscv/insn_trans/trans_rvzicbo.c.inc 
b/target/riscv/insn_trans/trans_rvzicbo.c.inc
index d5d7095903..15711c3140 100644
--- a/target/riscv/insn_trans/trans_rvzicbo.c.inc
+++ b/target/riscv/insn_trans/trans_rvzicbo.c.inc
@@ -31,27 +31,35 @@
 static bool trans_cbo_clean(DisasContext *ctx, arg_cbo_clean *a)
 {
 REQUIRE_ZICBOM(ctx);
-gen_helper_cbo_clean_flush(tcg_env, cpu_gpr[a->rs1]);
+TCGv src = get_address(ctx, a->rs1, 0);
+
+gen_helper_cbo_clean_flush(tcg_env, src);
 return true;
 }
 
 static bool trans_cbo_flush(DisasContext *ctx, arg_cbo_flush *a)
 {
 REQUIRE_ZICBOM(ctx);
-gen_helper_cbo_clean_flush(tcg_env, cpu_gpr[a->rs1]);
+TCGv src = get_address(ctx, a->rs1, 0);
+
+gen_helper_cbo_clean_flush(tcg_env, src);
 return true;
 }
 
 static bool trans_cbo_inval(DisasContext *ctx, arg_cbo_inval *a)
 {
 REQUIRE_ZICBOM(ctx);
-gen_helper_cbo_inval(tcg_env, cpu_gpr[a->rs1]);
+TCGv src = get_address(ctx, a->rs1, 0);
+
+gen_helper_cbo_inval(tcg_env, src);
 return true;
 }
 
 static bool trans_cbo_zero(DisasContext *ctx, arg_cbo_zero *a)
 {
 REQUIRE_ZICBOZ(ctx);
-gen_helper_cbo_zero(tcg_env, cpu_gpr[a->rs1]);
+TCGv src = get_address(ctx, a->rs1, 0);
+
+gen_helper_cbo_zero(tcg_env, src);
 return true;
 }
-- 
2.45.0

Re: [PATCH v7 09/12] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents

2024-05-13 Thread Zhijian Li (Fujitsu)



On 19/04/2024 07:11, nifan@gmail.com wrote:
> +} else if (type == DC_EVENT_ADD_CAPACITY) {
> +if (cxl_extents_overlaps_dpa_range(>dc.extents, dpa, len)) {
> +error_setg(errp,
> +   "cannot add DPA already accessible  to the same 
> LD");
> +return;
> +}


Double *space* before 'to'

Re: [PATCH v7 00/12] Enabling DCD emulation support in Qemu

2024-05-13 Thread Zhijian Li (Fujitsu)

Hi Fan


Do you have a newer instruction to play with the DCD. It seems that
the instruction in RFC[0] doesn't work for current code.

[0] https://lore.kernel.org/all/20230511175609.2091136-1-fan...@samsung.com/



On 19/04/2024 07:10, nifan@gmail.com wrote:
> A git tree of this series can be found here (with one extra commit on top
> for printing out accepted/pending extent list):
> https://github.com/moking/qemu/tree/dcd-v7
> 
> v6->v7:
> 
> 1. Fixed the dvsec range register issue mentioned in the the cover letter in 
> v6.
> Only relevant bits are set to mark the device ready (Patch 6). (Jonathan)
> 2. Moved the if statement in cxl_setup_memory from Patch 6 to Patch 4. 
> (Jonathan)
> 3. Used MIN instead of if statement to get record_count in Patch 7. (Jonathan)
> 4. Added "Reviewed-by" tag to Patch 7.
> 5. Modified cxl_dc_extent_release_dry_run so the updated extent list can be
> reused in cmd_dcd_release_dyn_cap to simplify the process in Patch 8. 
> (Jørgen)
> 6. Added comments to indicate further "TODO" items in cmd_dcd_add_dyn_cap_rsp.
>  (Jonathan)
> 7. Avoided irrelevant code reformat in Patch 8. (Jonathan)
> 8. Modified QMP interfaces for adding/releasing DC extents to allow passing
> tags, selection policy, flags in the interface. (Jonathan, Gregory)
> 9. Redesigned the pending list so extents in the same requests are grouped
>  together. A new data structure is introduced to represent "extent group"
>  in pending list.  (Jonathan)
> 10. Added support in QMP interface for "More" flag.
> 11. Check "Forced removal" flag for release request and not let it pass 
> through.
> 12. Removed the dynamic capacity log type from CxlEventLog definition in 
> cxl.json
> to avoid the side effect it may introduce to inject error to DC event log.
> (Jonathan)
> 13. Hard coded the event log type to dynamic capacity event log in QMP
>  interfaces. (Jonathan)
> 14. Adding space in between "-1]". (Jonathan)
> 15. Some minor comment fixes.
> 
> The code is tested with similar setup and has passed similar tests as listed
> in the cover letter of v5[1] and v6[2].
> Also, the code is tested with the latest DCD kernel patchset[3].
> 
> [1] Qemu DCD patchset v5: 
> https://lore.kernel.org/linux-cxl/20240304194331.1586191-1-nifan@gmail.com/T/#t
> [2] Qemu DCD patchset v6: 
> https://lore.kernel.org/linux-cxl/20240325190339.696686-1-nifan@gmail.com/T/#t
> [3] DCD kernel patches: 
> https://lore.kernel.org/linux-cxl/20240324-dcd-type2-upstream-v1-0-b7b00d623...@intel.com/T/#m11c571e21c4fe17c7d04ec5c2c7bc7cbf2cd07e3
> 
> 
> Fan Ni (12):
>hw/cxl/cxl-mailbox-utils: Add dc_event_log_size field to output
>  payload of identify memory device command
>hw/cxl/cxl-mailbox-utils: Add dynamic capacity region representative
>  and mailbox command support
>include/hw/cxl/cxl_device: Rename mem_size as static_mem_size for
>  type3 memory devices
>hw/mem/cxl_type3: Add support to create DC regions to type3 memory
>  devices
>hw/mem/cxl-type3: Refactor ct3_build_cdat_entries_for_mr to take mr
>  size instead of mr as argument
>hw/mem/cxl_type3: Add host backend and address space handling for DC
>  regions
>hw/mem/cxl_type3: Add DC extent list representative and get DC extent
>  list mailbox support
>hw/cxl/cxl-mailbox-utils: Add mailbox commands to support add/release
>  dynamic capacity response
>hw/cxl/events: Add qmp interfaces to add/release dynamic capacity
>  extents
>hw/mem/cxl_type3: Add DPA range validation for accesses to DC regions
>hw/cxl/cxl-mailbox-utils: Add superset extent release mailbox support
>hw/mem/cxl_type3: Allow to release extent superset in QMP interface
> 
>   hw/cxl/cxl-mailbox-utils.c  | 620 ++-
>   hw/mem/cxl_type3.c  | 633 +---
>   hw/mem/cxl_type3_stubs.c|  20 ++
>   include/hw/cxl/cxl_device.h |  81 -
>   include/hw/cxl/cxl_events.h |  18 +
>   qapi/cxl.json   |  69 
>   6 files changed, 1396 insertions(+), 45 deletions(-)
>

Re: CPR/liveupdate: test results using prior bug fix

2024-05-13 Thread Michael Galaxy


Hi Steve,

Thanks for the response.

It looks like literally *just today* 8.2.4 was released. I'll go check 
it out.


- Michael

On 5/13/24 15:10, Steven Sistare wrote:

Hi Michael,
  No surprise here, I did see some of the same failure messages and they
prompted me to submit the fix.  They are all symptoms of "the 
possibility of

ram and device state being out of sync" as mentioned in the commit.

I am not familiar with the process for maintaining old releases for qemu.
Perhaps someone on this list can comment on 8.2.3.

- Steve

On 5/13/2024 2:22 PM, Michael Galaxy wrote:

Hi Steve,

We found that this specific change in particular ("migration: stop vm 
for cpr") fixes a bug that we've identified in testing back-to-back 
live updates in a lab environment.


More specifically, *without* this change (which is not available in 
8.2.2, but *is* available in 9.0.0) causes the metadata save file to 
be corrupted when doing live updates one after another. Typically we 
see a corrupted save file somewhere in between 20 and 30 live updates 
and while doing a git bisect, we found that this change makes the 
problem go away.


Were you aware? Is there any plan in place to cherry pick this for 
8.2.3, perhaps or a plan to release 8.2.3 at some point?


Here are some examples of how the bug manifests in different 
locations of the QEMU metadata save file:


2024-04-26T13:28:53Z qemu-system-x86_64: Failed to load mtrr_var:base
2024-04-26T13:28:53Z qemu-system-x86_64: Failed to load cpu:env.mtrr_var
2024-04-26T13:28:53Z qemu-system-x86_64: error while loading state 
for instance 0x1b of device 'cpu'
2024-04-26T13:28:53Z qemu-system-x86_64: load of migration failed: 
Input/output error


And another:

2024-04-17T16:09:47Z qemu-system-x86_64: check_section_footer: Read 
section footer failed: -5
2024-04-17T16:09:47Z qemu-system-x86_64: load of migration failed: 
Invalid argument


And another:

2024-04-30T21:53:29Z qemu-system-x86_64: Unable to read ID string for 
section 163
2024-04-30T21:53:29Z qemu-system-x86_64: load of migration failed: 
Invalid argument


And another:

2024-05-01T16:01:44Z qemu-system-x86_64: Unable to read ID string for 
section 164
2024-05-01T16:01:44Z qemu-system-x86_64: load of migration failed: 
Invalid argument


As you can see, they occur quite randomly, but generally it takes at 
least 20-30+ live updates before the problem occurs.


- Michael

On 2/27/24 23:13, pet...@redhat.com wrote:

From: Steve Sistare

When migration for cpr is initiated, stop the vm and set state
RUN_STATE_FINISH_MIGRATE before ram is saved.  This eliminates the
possibility of ram and device state being out of sync, and guarantees
that a guest in the suspended state remains suspended, because qmp_cont
rejects a cont command in the RUN_STATE_FINISH_MIGRATE state.

Signed-off-by: Steve Sistare
Reviewed-by: Peter Xu
Link:https://urldefense.com/v3/__https://lore.kernel.org/r/1708622920-68779-11-git-send-email-steven.sistare@oracle.com__;!!GjvTz_vk!QLsFOCX-x2U9bzAo98SdidKlomHrmf_t0UmQKtgudoIcaDVoAJOPm39ZqaNP_nT5I8QqVfSgwhDZmg$ 
Signed-off-by: Peter Xu

---
  include/migration/misc.h |  1 +
  migration/migration.h    |  2 --
  migration/migration.c    | 51 


  3 files changed, 32 insertions(+), 22 deletions(-)

diff --git a/include/migration/misc.h b/include/migration/misc.h
index e4933b815b..5d1aa593ed 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -60,6 +60,7 @@ void migration_object_init(void);
  void migration_shutdown(void);
  bool migration_is_idle(void);
  bool migration_is_active(MigrationState *);
+bool migrate_mode_is_cpr(MigrationState *);
    typedef enum MigrationEventType {
  MIG_EVENT_PRECOPY_SETUP,
diff --git a/migration/migration.h b/migration/migration.h
index aef8afbe1f..65c0b61cbd 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -541,6 +541,4 @@ int migration_rp_wait(MigrationState *s);
   */
  void migration_rp_kick(MigrationState *s);
  -int migration_stop_vm(RunState state);
-
  #endif
diff --git a/migration/migration.c b/migration/migration.c
index 37c836b0b0..90a90947fb 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -167,11 +167,19 @@ static gint 
page_request_addr_cmp(gconstpointer ap, gconstpointer bp)

  return (a > b) - (a < b);
  }
  -int migration_stop_vm(RunState state)
+static int migration_stop_vm(MigrationState *s, RunState state)
  {
-    int ret = vm_stop_force_state(state);
+    int ret;
+
+    migration_downtime_start(s);
+
+    s->vm_old_state = runstate_get();
+    global_state_store();
+
+    ret = vm_stop_force_state(state);
    trace_vmstate_downtime_checkpoint("src-vm-stopped");
+    trace_migration_completion_vm_stop(ret);
    return ret;
  }
@@ -1602,6 +1610,11 @@ bool migration_is_active(MigrationState *s)
  s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE);
  }
  +bool migrate_mode_is_cpr(MigrationState *s)
+{
+    return s->parameters.mode ==

[PATCH] physmem: allow debug writes to MMIO regions

2024-05-13 Thread Perry Hung

Writes from GDB to memory-mapped IO regions are currently silently
dropped. cpu_memory_rw_debug() calls address_space_write_rom(), which
calls address_space_write_rom_internal(), which ignores all non-ram/rom
regions.

Add a check for MMIO regions and direct those to address_space_rw()
instead.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/213
Signed-off-by: Perry Hung 
---
 system/physmem.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/system/physmem.c b/system/physmem.c
index 342b7a8fd4..013cdd2ab1 100644
--- a/system/physmem.c
+++ b/system/physmem.c
@@ -3508,7 +3508,10 @@ int cpu_memory_rw_debug(CPUState *cpu, vaddr addr,
 if (l > len)
 l = len;
 phys_addr += (addr & ~TARGET_PAGE_MASK);
-if (is_write) {
+if (cpu_physical_memory_is_io(phys_addr)) {
+res = address_space_rw(cpu->cpu_ases[asidx].as, phys_addr, attrs,
+   buf, l, is_write);
+} else if (is_write) {
 res = address_space_write_rom(cpu->cpu_ases[asidx].as, phys_addr,
   attrs, buf, l);
 } else {
-- 
2.45.0

Re: [PATCH v2 2/4] migration: fix a typo

2024-05-13 Thread Fabiano Rosas

marcandre.lur...@redhat.com writes:

> From: Marc-André Lureau 
>
> Signed-off-by: Marc-André Lureau 
> ---
>  migration/vmstate.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/migration/vmstate.c b/migration/vmstate.c
> index b51212a75b..ff5d589a6d 100644
> --- a/migration/vmstate.c
> +++ b/migration/vmstate.c
> @@ -479,7 +479,7 @@ static int vmstate_subsection_load(QEMUFile *f, const 
> VMStateDescription *vmsd,
>  
>  len = qemu_peek_byte(f, 1);
>  if (len < strlen(vmsd->name) + 1) {
> -/* subsection name has be be "section_name/a" */
> +/* subsection name has to be "section_name/a" */
>  trace_vmstate_subsection_load_bad(vmsd->name, "(short)", "");
>  return 0;
>  }

Reviewed-by: Fabiano Rosas

Re: [PATCH v5 03/10] vfio: Extend migration_file_set_error() with Error** argument

2024-05-13 Thread Fabiano Rosas

Cédric Le Goater  writes:

> Use it to update the current error of the migration stream if
> available and if not, simply print out the error. Next changes will
> update with an error to report.
>
> Signed-off-by: Cédric Le Goater 

Acked-by: Fabiano Rosas

Re: Unmapping KVM Guest Memory from Host Kernel

2024-05-13 Thread Manwaring, Derek

On 2024-05-13 13:36-0700, Sean Christopherson wrote:
> Hmm, a slightly crazy idea (ok, maybe wildly crazy) would be to support 
> mapping
> all of guest_memfd into kernel address space, but as USER=1 mappings.  I.e. 
> don't
> require a carve-out from userspace, but do require CLAC/STAC when access guest
> memory from the kernel.  I think/hope that would provide the speculative 
> execution
> mitigation properties you're looking for?

This is interesting. I'm hesitant to rely on SMAP since it can be
enforced too late by the microarchitecture. But Canella, et al. [1] did
say in 2019 that the kernel->user access route seemed to be free of any
"Meltdown" effects. LASS sounds like it will be even stronger, though
it's not clear to me from Intel's programming reference that speculative
scenarios are in scope [2]. AMD does list SMAP specifically as a
feature that can control speculation [3].

I don't see an equivalent read-access control on ARM. It has PXN for
execute. Read access can probably also be controlled?  But I think for
the non-CoCo case we should favor solutions that are less dependent on
hardware-specific protections.

Derek

[1] https://www.usenix.org/system/files/sec19-canella.pdf
[2] https://cdrdv2.intel.com/v1/dl/getContent/671368
[3] 
https://www.amd.com/content/dam/amd/en/documents/epyc-technical-docs/tuning-guides/software-techniques-for-managing-speculation.pdf

Re: [PATCH V1 26/26] migration: only-migratable-modes

2024-05-13 Thread Fabiano Rosas

Steven Sistare  writes:

> On 5/9/2024 3:14 PM, Fabiano Rosas wrote:
>> Steve Sistare  writes:
>> 
>>> Add the only-migratable-modes option as a generalization of only-migratable.
>>> Only devices that support all requested modes are allowed.
>>>
>>> Signed-off-by: Steve Sistare 
>>> ---
>>>   include/migration/misc.h   |  3 +++
>>>   include/sysemu/sysemu.h|  1 -
>>>   migration/migration-hmp-cmds.c | 26 +-
>>>   migration/migration.c  | 22 +-
>>>   migration/savevm.c |  2 +-
>>>   qemu-options.hx| 16 ++--
>>>   system/globals.c   |  1 -
>>>   system/vl.c| 13 -
>>>   target/s390x/cpu_models.c  |  4 +++-
>>>   9 files changed, 75 insertions(+), 13 deletions(-)
>>>
>>> diff --git a/include/migration/misc.h b/include/migration/misc.h
>>> index 5b963ba..3ad2cd9 100644
>>> --- a/include/migration/misc.h
>>> +++ b/include/migration/misc.h
>>> @@ -119,6 +119,9 @@ bool migration_incoming_postcopy_advised(void);
>>>   /* True if background snapshot is active */
>>>   bool migration_in_bg_snapshot(void);
>>>   
>>> +void migration_set_required_mode(MigMode mode);
>>> +bool migration_mode_required(MigMode mode);
>>> +
>>>   /* migration/block-dirty-bitmap.c */
>>>   void dirty_bitmap_mig_init(void);
>>>   
>>> diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
>>> index 5b4397e..0a9c4b4 100644
>>> --- a/include/sysemu/sysemu.h
>>> +++ b/include/sysemu/sysemu.h
>>> @@ -8,7 +8,6 @@
>>>   
>>>   /* vl.c */
>>>   
>>> -extern int only_migratable;
>>>   extern const char *qemu_name;
>>>   extern QemuUUID qemu_uuid;
>>>   extern bool qemu_uuid_set;
>>> diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c
>>> index 414c7e8..ca913b7 100644
>>> --- a/migration/migration-hmp-cmds.c
>>> +++ b/migration/migration-hmp-cmds.c
>>> @@ -16,6 +16,7 @@
>>>   #include "qemu/osdep.h"
>>>   #include "block/qapi.h"
>>>   #include "migration/snapshot.h"
>>> +#include "migration/misc.h"
>>>   #include "monitor/hmp.h"
>>>   #include "monitor/monitor.h"
>>>   #include "qapi/error.h"
>>> @@ -33,6 +34,28 @@
>>>   #include "options.h"
>>>   #include "migration.h"
>>>   
>>> +static void migration_dump_modes(Monitor *mon)
>>> +{
>>> +int mode, n = 0;
>>> +
>>> +monitor_printf(mon, "only-migratable-modes: ");
>>> +
>>> +for (mode = 0; mode < MIG_MODE__MAX; mode++) {
>>> +if (migration_mode_required(mode)) {
>>> +if (n++) {
>>> +monitor_printf(mon, ",");
>>> +}
>>> +monitor_printf(mon, "%s", MigMode_str(mode));
>>> +}
>>> +}
>>> +
>>> +if (!n) {
>>> +monitor_printf(mon, "none\n");
>>> +} else {
>>> +monitor_printf(mon, "\n");
>>> +}
>>> +}
>>> +
>>>   static void migration_global_dump(Monitor *mon)
>>>   {
>>>   MigrationState *ms = migrate_get_current();
>>> @@ -41,7 +64,7 @@ static void migration_global_dump(Monitor *mon)
>>>   monitor_printf(mon, "store-global-state: %s\n",
>>>  ms->store_global_state ? "on" : "off");
>>>   monitor_printf(mon, "only-migratable: %s\n",
>>> -   only_migratable ? "on" : "off");
>>> +   migration_mode_required(MIG_MODE_NORMAL) ? "on" : 
>>> "off");
>>>   monitor_printf(mon, "send-configuration: %s\n",
>>>  ms->send_configuration ? "on" : "off");
>>>   monitor_printf(mon, "send-section-footer: %s\n",
>>> @@ -50,6 +73,7 @@ static void migration_global_dump(Monitor *mon)
>>>  ms->decompress_error_check ? "on" : "off");
>>>   monitor_printf(mon, "clear-bitmap-shift: %u\n",
>>>  ms->clear_bitmap_shift);
>>> +migration_dump_modes(mon);
>>>   }
>>>   
>>>   void hmp_info_migrate(Monitor *mon, const QDict *qdict)
>>> diff --git a/migration/migration.c b/migration/migration.c
>>> index 4984dee..5535b84 100644
>>> --- a/migration/migration.c
>>> +++ b/migration/migration.c
>>> @@ -1719,17 +1719,29 @@ static bool is_busy(Error **reasonp, Error **errp)
>>>   return false;
>>>   }
>>>   
>>> -static bool is_only_migratable(Error **reasonp, Error **errp, int modes)
>>> +static int migration_modes_required;
>>> +
>>> +void migration_set_required_mode(MigMode mode)
>>> +{
>>> +migration_modes_required |= BIT(mode);
>>> +}
>>> +
>>> +bool migration_mode_required(MigMode mode)
>>> +{
>>> +return !!(migration_modes_required & BIT(mode));
>>> +}
>>> +
>>> +static bool modes_are_required(Error **reasonp, Error **errp, int modes)
>>>   {
>>>   ERRP_GUARD();
>>>   
>>> -if (only_migratable && (modes & BIT(MIG_MODE_NORMAL))) {
>>> +if (migration_modes_required & modes) {
>>>   error_propagate_prepend(errp, *reasonp,
>>> -"disallowing migration blocker "
>>> -"(--only-migratable) for: ");
>>> +

Re: [PATCH V1 06/26] migration: precreate vmstate for exec

2024-05-13 Thread Fabiano Rosas

Steven Sistare  writes:

> On 5/6/2024 7:34 PM, Fabiano Rosas wrote:
>> Steve Sistare  writes:
>> 
>>> Provide migration_precreate_save for saving precreate vmstate across exec.
>>> Create a memfd, save its value in the environment, and serialize state
>>> to it.  Reverse the process in migration_precreate_load.
>>>
>>> Signed-off-by: Steve Sistare 
>>> ---
>>>   include/migration/misc.h |   5 ++
>>>   migration/meson.build|   1 +
>>>   migration/precreate.c| 139 
>>> +++
>>>   3 files changed, 145 insertions(+)
>>>   create mode 100644 migration/precreate.c
>>>
>>> diff --git a/include/migration/misc.h b/include/migration/misc.h
>>> index c9e200f..cf30351 100644
>>> --- a/include/migration/misc.h
>>> +++ b/include/migration/misc.h
>>> @@ -56,6 +56,11 @@ AnnounceParameters *migrate_announce_params(void);
>>>   
>>>   void dump_vmstate_json_to_file(FILE *out_fp);
>>>   
>>> +/* migration/precreate.c */
>>> +int migration_precreate_save(Error **errp);
>>> +void migration_precreate_unsave(void);
>>> +int migration_precreate_load(Error **errp);
>>> +
>>>   /* migration/migration.c */
>>>   void migration_object_init(void);
>>>   void migration_shutdown(void);
>>> diff --git a/migration/meson.build b/migration/meson.build
>>> index f76b1ba..50e7cb2 100644
>>> --- a/migration/meson.build
>>> +++ b/migration/meson.build
>>> @@ -26,6 +26,7 @@ system_ss.add(files(
>>> 'ram-compress.c',
>>> 'options.c',
>>> 'postcopy-ram.c',
>>> +  'precreate.c',
>>> 'savevm.c',
>>> 'socket.c',
>>> 'tls.c',
>>> diff --git a/migration/precreate.c b/migration/precreate.c
>>> new file mode 100644
>>> index 000..0bf5e1f
>>> --- /dev/null
>>> +++ b/migration/precreate.c
>>> @@ -0,0 +1,139 @@
>>> +/*
>>> + * Copyright (c) 2022, 2024 Oracle and/or its affiliates.
>>> + *
>>> + * This work is licensed under the terms of the GNU GPL, version 2 or 
>>> later.
>>> + * See the COPYING file in the top-level directory.
>>> + */
>>> +
>>> +#include "qemu/osdep.h"
>>> +#include "qemu/cutils.h"
>>> +#include "qemu/memfd.h"
>>> +#include "qapi/error.h"
>>> +#include "io/channel-file.h"
>>> +#include "migration/misc.h"
>>> +#include "migration/qemu-file.h"
>>> +#include "migration/savevm.h"
>>> +
>>> +#define PRECREATE_STATE_NAME "QEMU_PRECREATE_STATE"
>>> +
>>> +static QEMUFile *qemu_file_new_fd_input(int fd, const char *name)
>>> +{
>>> +g_autoptr(QIOChannelFile) fioc = qio_channel_file_new_fd(fd);
>>> +QIOChannel *ioc = QIO_CHANNEL(fioc);
>>> +qio_channel_set_name(ioc, name);
>>> +return qemu_file_new_input(ioc);
>>> +}
>>> +
>>> +static QEMUFile *qemu_file_new_fd_output(int fd, const char *name)
>>> +{
>>> +g_autoptr(QIOChannelFile) fioc = qio_channel_file_new_fd(fd);
>>> +QIOChannel *ioc = QIO_CHANNEL(fioc);
>>> +qio_channel_set_name(ioc, name);
>>> +return qemu_file_new_output(ioc);
>>> +}
>>> +
>>> +static int memfd_create_named(const char *name, Error **errp)
>>> +{
>>> +int mfd;
>>> +char val[16];
>>> +
>>> +mfd = memfd_create(name, 0);
>>> +if (mfd < 0) {
>>> +error_setg_errno(errp, errno, "memfd_create failed");
>>> +return -1;
>>> +}
>>> +
>>> +/* Remember mfd in environment for post-exec load */
>>> +qemu_clear_cloexec(mfd);
>>> +snprintf(val, sizeof(val), "%d", mfd);
>>> +g_setenv(name, val, 1);
>>> +
>>> +return mfd;
>>> +}
>>> +
>>> +static int memfd_find_named(const char *name, int *mfd_p, Error **errp)
>>> +{
>>> +const char *val = g_getenv(name);
>>> +
>>> +if (!val) {
>>> +*mfd_p = -1;
>>> +return 0;   /* No memfd was created, not an error */
>>> +}
>>> +g_unsetenv(name);
>>> +if (qemu_strtoi(val, NULL, 10, mfd_p)) {
>>> +error_setg(errp, "Bad %s env value %s", PRECREATE_STATE_NAME, val);
>>> +return -1;
>>> +}
>>> +lseek(*mfd_p, 0, SEEK_SET);
>>> +return 0;
>>> +}
>>> +
>>> +static void memfd_delete_named(const char *name)
>>> +{
>>> +int mfd;
>>> +const char *val = g_getenv(name);
>>> +
>>> +if (val) {
>>> +g_unsetenv(name);
>>> +if (!qemu_strtoi(val, NULL, 10, )) {
>>> +close(mfd);
>>> +}
>>> +}
>>> +}
>>> +
>>> +static QEMUFile *qemu_file_new_memfd_output(const char *name, Error **errp)
>>> +{
>>> +int mfd = memfd_create_named(name, errp);
>>> +
>>> +if (mfd < 0) {
>>> +return NULL;
>>> +}
>>> +
>>> +return qemu_file_new_fd_output(mfd, name);
>>> +}
>>> +
>>> +static QEMUFile *qemu_file_new_memfd_input(const char *name, Error **errp)
>>> +{
>>> +int ret, mfd;
>>> +
>>> +ret = memfd_find_named(name, , errp);
>>> +if (ret || mfd < 0) {
>>> +return NULL;
>>> +}
>>> +
>>> +return qemu_file_new_fd_input(mfd, name);
>>> +}
>>> +
>>> +int migration_precreate_save(Error **errp)
>>> +{
>>> +QEMUFile *f = qemu_file_new_memfd_output(PRECREATE_STATE_NAME, errp);
>>> +
>>> +if (!f) {
>>> +

[ANNOUNCE] QEMU 8.2.4 Stable released

2024-05-13 Thread Michael Tokarev

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Hi everyone,

The QEMU v8.2.4 stable release is now available.

You can grab the tarball from our download page here:

  https://www.qemu.org/download/#source

  https://download.qemu.org/qemu-8.2.4.tar.xz
  https://download.qemu.org/qemu-8.2.4.tar.xz.sig (signature)

v8.2.4 is now tagged in the official qemu.git repository, and the
stable-8.2 branch has been updated accordingly:

  https://gitlab.com/qemu-project/qemu/-/commits/stable-8.2

There are 16 changes since the previous v8.2.3 release.

Thank you everyone who has been involved and helped with the stable series!

/mjt

Changelog (stable-8.2-hash master-hash Author Name: Commmit-Subject):

1332b8dd43 Michael Tokarev:
 Update version for 8.2.4 release
07d46408cb e88a856efd Philippe Mathieu-Daudé:
 target/sh4: Fix SUBV opcode
dc5390a0ca c365e6b070 Philippe Mathieu-Daudé:
 target/sh4: Fix ADDV opcode
7b4804c965 eb656a60fd Philippe Mathieu-Daudé:
 hw/arm/npcm7xx: Store derivative OTP fuse key in little endian
dfcbb9ef24 4b00855f0e Alexandra Diupina:
 hw/dmax/xlnx_dpdma: fix handling of address_extension descriptor fields
d5cf8bed29 f2c8aeb1af Jeuk Kim:
 hw/ufs: Fix buffer overflow bug
5479d911bc a88a04906b Thomas Huth:
 .gitlab-ci.d/cirrus.yml: Shorten the runtime of the macOS and FreeBSD jobs
5b5655fdb7 dcc5c018c7 Peter Maydell:
 tests/avocado: update sunxi kernel from armbian to 6.6.16
7e5f59326d 0cbb322f70 Michael Tokarev:
 target/loongarch/cpu.c: typo fix: expection
f6abce29cc 06479dbf3d Li Zhijian:
 backends/cryptodev-builtin: Fix local_error leaks
37751067b1 4fa333e08d Eric Blake:
 nbd/server: Mark negotiation functions as coroutine_fn
cb4c222add ae6d91a7e9 Zhu Yangyang:
 nbd/server: do not poll within a coroutine context
6fee9efc2e 04f6fb897a Michael Tokarev:
 linux-user: do_setsockopt: fix SOL_ALG.ALG_SET_KEY
55b88e61ed 2cc637f1ea Li Zhijian:
 migration/colo: Fix bdrv_graph_rdlock_main_loop: Assertion 
`!qemu_in_coroutine()' failed.
cbae108098 10f86d1b84 Daniel Henrique Barboza:
 target/riscv/kvm: change timer regs size to u64
125b95d79e 450bd6618f Daniel Henrique Barboza:
 target/riscv/kvm: change KVM_REG_RISCV_FP_D to u64
bbdcc89678 49c211ffca Daniel Henrique Barboza:
 target/riscv/kvm: change KVM_REG_RISCV_FP_F to u32

-BEGIN PGP SIGNATURE-

iQEzBAEBCAAdFiEEe3O61ovnosKJMUsicBtPaxppPlkFAmZCexIACgkQcBtPaxpp
PlkwiQgAinEkfIr7ShAXPx4L1GrE9S4HbuF4cZrtJqcbSB6XN7v+zSKeWW89iNhX
6/UDcP57ORtincZyhlqzj/MEoOFiUgpEz9pAlJn12QlDZDOFGOD7yISibCKSZVsL
OKPOOH7HB6/koUmKKXij2JAc73G95ZkGrsvPS/ThiQbh89R1wGuarmvO447lgLZx
a4tlGa70hmu3+GGPYRUT4W+TNMvUP/jLj3BHq6PlMSz0cpr/REAsG93h5Bq1axwL
8bDSw2HSX09wE4yp4AalT+ymnphZ7oh3kDniLn/DDjRXmlatSuLJADzK0Q0ksoWm
rPY9ZLDOYaNAd1z29V6k8z2gG1rKtA==
=iFlq
-END PGP SIGNATURE-

Re: Unmapping KVM Guest Memory from Host Kernel

2024-05-13 Thread Sean Christopherson

On Mon, May 13, 2024, James Gowans wrote:
> On Mon, 2024-05-13 at 10:09 -0700, Sean Christopherson wrote:
> > On Mon, May 13, 2024, James Gowans wrote:
> > > On Mon, 2024-05-13 at 08:39 -0700, Sean Christopherson wrote:
> > > > > Sean, you mentioned that you envision guest_memfd also supporting 
> > > > > non-CoCo VMs.
> > > > > Do you have some thoughts about how to make the above cases work in 
> > > > > the
> > > > > guest_memfd context?
> > > > 
> > > > Yes.  The hand-wavy plan is to allow selectively mmap()ing 
> > > > guest_memfd().  There
> > > > is a long thread[*] discussing how exactly we want to do that.  The 
> > > > TL;DR is that
> > > > the basic functionality is also straightforward; the bulk of the 
> > > > discussion is
> > > > around gup(), reclaim, page migration, etc.
> > > 
> > > I still need to read this long thread, but just a thought on the word
> > > "restricted" here: for MMIO the instruction can be anywhere and
> > > similarly the load/store MMIO data can be anywhere. Does this mean that
> > > for running unmodified non-CoCo VMs with guest_memfd backend that we'll
> > > always need to have the whole of guest memory mmapped?
> > 
> > Not necessarily, e.g. KVM could re-establish the direct map or mremap() 
> > on-demand.
> > There are variation on that, e.g. if ASI[*] were to ever make it's way 
> > upstream,
> > which is a huge if, then we could have guest_memfd mapped into a KVM-only 
> > CR3.
> 
> Yes, on-demand mapping in of guest RAM pages is definitely an option. It
> sounds quite challenging to need to always go via interfaces which
> demand map/fault memory, and also potentially quite slow needing to
> unmap and flush afterwards. 
> 
> Not too sure what you have in mind with "guest_memfd mapped into KVM-
> only CR3" - could you expand?

Remove guest_memfd from the kernel's direct map, e.g. so that the kernel 
at-large
can't touch guest memory, but have a separate set of page tables that have the
direct map, userspace page tables, _and_ kernel mappings for guest_memfd.  On
KVM_RUN (or vcpu_load()?), switch to KVM's CR3 so that KVM always map/unmap are
free (literal nops).

That's an imperfect solution as IRQs and NMIs will run kernel code with KVM's
page tables, i.e. guest memory would still be exposed to the host kernel.  And
of course we'd need to get buy in from multiple architecturs and maintainers,
etc.

> > > I guess the idea is that this use case will still be subject to the
> > > normal restriction rules, but for a non-CoCo non-pKVM VM there will be
> > > no restriction in practice, and userspace will need to mmap everything
> > > always?
> > > 
> > > It really seems yucky to need to have all of guest RAM mmapped all the
> > > time just for MMIO to work... But I suppose there is no way around that
> > > for Intel x86.
> > 
> > It's not just MMIO.  Nested virtualization, and more specifically shadowing 
> > nested
> > TDP, is also problematic (probably more so than MMIO).  And there are more 
> > cases,
> > i.e. we'll need a generic solution for this.  As above, there are a variety 
> > of
> > options, it's largely just a matter of doing the work.  I'm not saying it's 
> > a
> > trivial amount of work/effort, but it's far from an unsolvable problem.
> 
> I didn't even think of nested virt, but that will absolutely be an even
> bigger problem too. MMIO was just the first roadblock which illustrated
> the problem.
> Overall what I'm trying to figure out is whether there is any sane path
> here other than needing to mmap all guest RAM all the time. Trying to
> get nested virt and MMIO and whatever else needs access to guest RAM
> working by doing just-in-time (aka: on-demand) mappings and unmappings
> of guest RAM sounds like a painful game of whack-a-mole, potentially
> really bad for performance too.

It's a whack-a-mole game that KVM already plays, e.g. for dirty tracking, 
post-copy
demand paging, etc..  There is still plenty of room for improvement, e.g. to 
reduce
the number of touchpoints and thus the potential for missed cases.  But KVM more
or less needs to solve this basic problem no matter what, so I don't think that
guest_memfd adds much, if any, burden.

> Do you think we should look at doing this on-demand mapping, or, for
> now, simply require that all guest RAM is mmapped all the time and KVM
> be given a valid virtual addr for the memslots?

I don't think "map everything into userspace" is a viable approach, precisely
because it requires reflecting that back into KVM's memslots, which in turn
means guest_memfd needs to allow gup().  And I don't think we want to allow 
gup(),
because that opens a rather large can of worms (see the long thread I linked).

Hmm, a slightly crazy idea (ok, maybe wildly crazy) would be to support mapping
all of guest_memfd into kernel address space, but as USER=1 mappings.  I.e. 
don't
require a carve-out from userspace, but do require CLAC/STAC when access guest
memory from the kernel.  I think/hope that would provide the

Re: CPR/liveupdate: test results using prior bug fix

2024-05-13 Thread Steven Sistare


Hi Michael,
  No surprise here, I did see some of the same failure messages and they
prompted me to submit the fix.  They are all symptoms of "the possibility of
ram and device state being out of sync" as mentioned in the commit.

I am not familiar with the process for maintaining old releases for qemu.
Perhaps someone on this list can comment on 8.2.3.

- Steve

On 5/13/2024 2:22 PM, Michael Galaxy wrote:

Hi Steve,

We found that this specific change in particular ("migration: stop vm for cpr") 
fixes a bug that we've identified in testing back-to-back live updates in a lab 
environment.


More specifically, *without* this change (which is not available in 8.2.2, but 
*is* available in 9.0.0) causes the metadata save file to be corrupted when 
doing live updates one after another. Typically we see a corrupted save file 
somewhere in between 20 and 30 live updates and while doing a git bisect, we 
found that this change makes the problem go away.


Were you aware? Is there any plan in place to cherry pick this for 8.2.3, 
perhaps or a plan to release 8.2.3 at some point?


Here are some examples of how the bug manifests in different locations of the 
QEMU metadata save file:


2024-04-26T13:28:53Z qemu-system-x86_64: Failed to load mtrr_var:base
2024-04-26T13:28:53Z qemu-system-x86_64: Failed to load cpu:env.mtrr_var
2024-04-26T13:28:53Z qemu-system-x86_64: error while loading state for instance 
0x1b of device 'cpu'
2024-04-26T13:28:53Z qemu-system-x86_64: load of migration failed: Input/output 
error

And another:

2024-04-17T16:09:47Z qemu-system-x86_64: check_section_footer: Read section 
footer failed: -5
2024-04-17T16:09:47Z qemu-system-x86_64: load of migration failed: Invalid 
argument

And another:

2024-04-30T21:53:29Z qemu-system-x86_64: Unable to read ID string for section 
163
2024-04-30T21:53:29Z qemu-system-x86_64: load of migration failed: Invalid 
argument

And another:

2024-05-01T16:01:44Z qemu-system-x86_64: Unable to read ID string for section 
164
2024-05-01T16:01:44Z qemu-system-x86_64: load of migration failed: Invalid 
argument
  

As you can see, they occur quite randomly, but generally it takes at least 
20-30+ live updates before the problem occurs.


- Michael

On 2/27/24 23:13, pet...@redhat.com wrote:

From: Steve Sistare

When migration for cpr is initiated, stop the vm and set state
RUN_STATE_FINISH_MIGRATE before ram is saved.  This eliminates the
possibility of ram and device state being out of sync, and guarantees
that a guest in the suspended state remains suspended, because qmp_cont
rejects a cont command in the RUN_STATE_FINISH_MIGRATE state.

Signed-off-by: Steve Sistare
Reviewed-by: Peter Xu
Link:https://urldefense.com/v3/__https://lore.kernel.org/r/1708622920-68779-11-git-send-email-steven.sistare@oracle.com__;!!GjvTz_vk!QLsFOCX-x2U9bzAo98SdidKlomHrmf_t0UmQKtgudoIcaDVoAJOPm39ZqaNP_nT5I8QqVfSgwhDZmg$  
Signed-off-by: Peter Xu

---
  include/migration/misc.h |  1 +
  migration/migration.h|  2 --
  migration/migration.c| 51 
  3 files changed, 32 insertions(+), 22 deletions(-)

diff --git a/include/migration/misc.h b/include/migration/misc.h
index e4933b815b..5d1aa593ed 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -60,6 +60,7 @@ void migration_object_init(void);
  void migration_shutdown(void);
  bool migration_is_idle(void);
  bool migration_is_active(MigrationState *);
+bool migrate_mode_is_cpr(MigrationState *);
  
  typedef enum MigrationEventType {

  MIG_EVENT_PRECOPY_SETUP,
diff --git a/migration/migration.h b/migration/migration.h
index aef8afbe1f..65c0b61cbd 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -541,6 +541,4 @@ int migration_rp_wait(MigrationState *s);
   */
  void migration_rp_kick(MigrationState *s);
  
-int migration_stop_vm(RunState state);

-
  #endif
diff --git a/migration/migration.c b/migration/migration.c
index 37c836b0b0..90a90947fb 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -167,11 +167,19 @@ static gint page_request_addr_cmp(gconstpointer ap, 
gconstpointer bp)
  return (a > b) - (a < b);
  }
  
-int migration_stop_vm(RunState state)

+static int migration_stop_vm(MigrationState *s, RunState state)
  {
-int ret = vm_stop_force_state(state);
+int ret;
+
+migration_downtime_start(s);
+
+s->vm_old_state = runstate_get();
+global_state_store();
+
+ret = vm_stop_force_state(state);
  
  trace_vmstate_downtime_checkpoint("src-vm-stopped");

+trace_migration_completion_vm_stop(ret);
  
  return ret;

  }
@@ -1602,6 +1610,11 @@ bool migration_is_active(MigrationState *s)
  s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE);
  }
  
+bool migrate_mode_is_cpr(MigrationState *s)

+{
+return s->parameters.mode == MIG_MODE_CPR_REBOOT;
+}
+
  int migrate_init(MigrationState *s, Error **errp)
  {
  int ret;
@@ -2454,10 +2467,7 @@ static int postcopy_start(MigrationState *ms,

Re: [PATCH V1 26/26] migration: only-migratable-modes

2024-05-13 Thread Steven Sistare


On 5/9/2024 3:14 PM, Fabiano Rosas wrote:

Steve Sistare  writes:


Add the only-migratable-modes option as a generalization of only-migratable.
Only devices that support all requested modes are allowed.

Signed-off-by: Steve Sistare 
---
  include/migration/misc.h   |  3 +++
  include/sysemu/sysemu.h|  1 -
  migration/migration-hmp-cmds.c | 26 +-
  migration/migration.c  | 22 +-
  migration/savevm.c |  2 +-
  qemu-options.hx| 16 ++--
  system/globals.c   |  1 -
  system/vl.c| 13 -
  target/s390x/cpu_models.c  |  4 +++-
  9 files changed, 75 insertions(+), 13 deletions(-)

diff --git a/include/migration/misc.h b/include/migration/misc.h
index 5b963ba..3ad2cd9 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -119,6 +119,9 @@ bool migration_incoming_postcopy_advised(void);
  /* True if background snapshot is active */
  bool migration_in_bg_snapshot(void);
  
+void migration_set_required_mode(MigMode mode);

+bool migration_mode_required(MigMode mode);
+
  /* migration/block-dirty-bitmap.c */
  void dirty_bitmap_mig_init(void);
  
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h

index 5b4397e..0a9c4b4 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -8,7 +8,6 @@
  
  /* vl.c */
  
-extern int only_migratable;

  extern const char *qemu_name;
  extern QemuUUID qemu_uuid;
  extern bool qemu_uuid_set;
diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c
index 414c7e8..ca913b7 100644
--- a/migration/migration-hmp-cmds.c
+++ b/migration/migration-hmp-cmds.c
@@ -16,6 +16,7 @@
  #include "qemu/osdep.h"
  #include "block/qapi.h"
  #include "migration/snapshot.h"
+#include "migration/misc.h"
  #include "monitor/hmp.h"
  #include "monitor/monitor.h"
  #include "qapi/error.h"
@@ -33,6 +34,28 @@
  #include "options.h"
  #include "migration.h"
  
+static void migration_dump_modes(Monitor *mon)

+{
+int mode, n = 0;
+
+monitor_printf(mon, "only-migratable-modes: ");
+
+for (mode = 0; mode < MIG_MODE__MAX; mode++) {
+if (migration_mode_required(mode)) {
+if (n++) {
+monitor_printf(mon, ",");
+}
+monitor_printf(mon, "%s", MigMode_str(mode));
+}
+}
+
+if (!n) {
+monitor_printf(mon, "none\n");
+} else {
+monitor_printf(mon, "\n");
+}
+}
+
  static void migration_global_dump(Monitor *mon)
  {
  MigrationState *ms = migrate_get_current();
@@ -41,7 +64,7 @@ static void migration_global_dump(Monitor *mon)
  monitor_printf(mon, "store-global-state: %s\n",
 ms->store_global_state ? "on" : "off");
  monitor_printf(mon, "only-migratable: %s\n",
-   only_migratable ? "on" : "off");
+   migration_mode_required(MIG_MODE_NORMAL) ? "on" : "off");
  monitor_printf(mon, "send-configuration: %s\n",
 ms->send_configuration ? "on" : "off");
  monitor_printf(mon, "send-section-footer: %s\n",
@@ -50,6 +73,7 @@ static void migration_global_dump(Monitor *mon)
 ms->decompress_error_check ? "on" : "off");
  monitor_printf(mon, "clear-bitmap-shift: %u\n",
 ms->clear_bitmap_shift);
+migration_dump_modes(mon);
  }
  
  void hmp_info_migrate(Monitor *mon, const QDict *qdict)

diff --git a/migration/migration.c b/migration/migration.c
index 4984dee..5535b84 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1719,17 +1719,29 @@ static bool is_busy(Error **reasonp, Error **errp)
  return false;
  }
  
-static bool is_only_migratable(Error **reasonp, Error **errp, int modes)

+static int migration_modes_required;
+
+void migration_set_required_mode(MigMode mode)
+{
+migration_modes_required |= BIT(mode);
+}
+
+bool migration_mode_required(MigMode mode)
+{
+return !!(migration_modes_required & BIT(mode));
+}
+
+static bool modes_are_required(Error **reasonp, Error **errp, int modes)
  {
  ERRP_GUARD();
  
-if (only_migratable && (modes & BIT(MIG_MODE_NORMAL))) {

+if (migration_modes_required & modes) {
  error_propagate_prepend(errp, *reasonp,
-"disallowing migration blocker "
-"(--only-migratable) for: ");
+"-only-migratable{-modes}  specified, but: ");


extra space before 'specified'


Will fix, thanks.


  *reasonp = NULL;
  return true;
  }
+
  return false;
  }
  
@@ -1783,7 +1795,7 @@ int migrate_add_blocker_modes(Error **reasonp, Error **errp, MigMode mode, ...)

  modes = get_modes(mode, ap);
  va_end(ap);
  
-if (is_only_migratable(reasonp, errp, modes)) {

+if (modes_are_required(reasonp, errp, modes)) {
  return -EACCES;
  } else if (is_busy(reasonp, errp)) {

Re: Unmapping KVM Guest Memory from Host Kernel

2024-05-13 Thread Gowans, James

On Mon, 2024-05-13 at 10:09 -0700, Sean Christopherson wrote:
> On Mon, May 13, 2024, James Gowans wrote:
> > On Mon, 2024-05-13 at 08:39 -0700, Sean Christopherson wrote:
> > > > Sean, you mentioned that you envision guest_memfd also supporting 
> > > > non-CoCo VMs.
> > > > Do you have some thoughts about how to make the above cases work in the
> > > > guest_memfd context?
> > > 
> > > Yes.  The hand-wavy plan is to allow selectively mmap()ing guest_memfd(). 
> > >  There
> > > is a long thread[*] discussing how exactly we want to do that.  The TL;DR 
> > > is that
> > > the basic functionality is also straightforward; the bulk of the 
> > > discussion is
> > > around gup(), reclaim, page migration, etc.
> > 
> > I still need to read this long thread, but just a thought on the word
> > "restricted" here: for MMIO the instruction can be anywhere and
> > similarly the load/store MMIO data can be anywhere. Does this mean that
> > for running unmodified non-CoCo VMs with guest_memfd backend that we'll
> > always need to have the whole of guest memory mmapped?
> 
> Not necessarily, e.g. KVM could re-establish the direct map or mremap() 
> on-demand.
> There are variation on that, e.g. if ASI[*] were to ever make it's way 
> upstream,
> which is a huge if, then we could have guest_memfd mapped into a KVM-only CR3.

Yes, on-demand mapping in of guest RAM pages is definitely an option. It
sounds quite challenging to need to always go via interfaces which
demand map/fault memory, and also potentially quite slow needing to
unmap and flush afterwards. 

Not too sure what you have in mind with "guest_memfd mapped into KVM-
only CR3" - could you expand?

> > I guess the idea is that this use case will still be subject to the
> > normal restriction rules, but for a non-CoCo non-pKVM VM there will be
> > no restriction in practice, and userspace will need to mmap everything
> > always?
> > 
> > It really seems yucky to need to have all of guest RAM mmapped all the
> > time just for MMIO to work... But I suppose there is no way around that
> > for Intel x86.
> 
> It's not just MMIO.  Nested virtualization, and more specifically shadowing 
> nested
> TDP, is also problematic (probably more so than MMIO).  And there are more 
> cases,
> i.e. we'll need a generic solution for this.  As above, there are a variety of
> options, it's largely just a matter of doing the work.  I'm not saying it's a
> trivial amount of work/effort, but it's far from an unsolvable problem.

I didn't even think of nested virt, but that will absolutely be an even
bigger problem too. MMIO was just the first roadblock which illustrated
the problem.
Overall what I'm trying to figure out is whether there is any sane path
here other than needing to mmap all guest RAM all the time. Trying to
get nested virt and MMIO and whatever else needs access to guest RAM
working by doing just-in-time (aka: on-demand) mappings and unmappings
of guest RAM sounds like a painful game of whack-a-mole, potentially
really bad for performance too.

Do you think we should look at doing this on-demand mapping, or, for
now, simply require that all guest RAM is mmapped all the time and KVM
be given a valid virtual addr for the memslots?
Note that I'm specifically referring to regular non-CoCo non-enlightened
VMs here. For CoCo we definitely need all the cooperative MMIO and
sharing. What we're trying to do here is to get guest RAM out of the
direct map using guest_memfd, and now tackling the knock-on problem of
whether or not to mmap all of guest RAM all the time in userspace.

JG

Re: [PATCH V1 13/26] physmem: ram_block_create

2024-05-13 Thread Steven Sistare


On 5/13/2024 2:37 PM, Fabiano Rosas wrote:

Steve Sistare  writes:


Create a common subroutine to allocate a RAMBlock, de-duping the code to
populate its common fields.  Add a trace point for good measure.
No functional change.

Signed-off-by: Steve Sistare 
---
  system/physmem.c| 47 ++-
  system/trace-events |  3 +++
  2 files changed, 29 insertions(+), 21 deletions(-)

diff --git a/system/physmem.c b/system/physmem.c
index c3d04ca..6216b14 100644
--- a/system/physmem.c
+++ b/system/physmem.c
@@ -52,6 +52,7 @@
  #include "sysemu/hw_accel.h"
  #include "sysemu/xen-mapcache.h"
  #include "trace/trace-root.h"
+#include "trace.h"
  
  #ifdef CONFIG_FALLOCATE_PUNCH_HOLE

  #include 
@@ -1918,11 +1919,29 @@ out_free:
  }
  }
  
+static RAMBlock *ram_block_create(MemoryRegion *mr, ram_addr_t size,

+  ram_addr_t max_size, uint32_t ram_flags)
+{
+RAMBlock *rb = g_malloc0(sizeof(*rb));
+
+rb->used_length = size;
+rb->max_length = max_size;
+rb->fd = -1;
+rb->flags = ram_flags;
+rb->page_size = qemu_real_host_page_size();
+rb->mr = mr;
+rb->guest_memfd = -1;
+trace_ram_block_create(rb->idstr, rb->flags, rb->fd, rb->used_length,


There's no idstr at this point, is there? I think this needs to be
memory_region_name(mr).


Thanks, will fix. That is a bug in my patch factoring.  I add the call to
qemu_ram_set_idstr in patch "physmem: set ram block idstr earlier".

- Steve


+   rb->max_length, mr->align);
+return rb;
+}
+
  #ifdef CONFIG_POSIX
  RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr,
   uint32_t ram_flags, int fd, off_t offset,
   Error **errp)
  {
+void *host;
  RAMBlock *new_block;
  Error *local_err = NULL;
  int64_t file_size, file_align;
@@ -1962,19 +1981,14 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, 
MemoryRegion *mr,
  return NULL;
  }
  
-new_block = g_malloc0(sizeof(*new_block));

-new_block->mr = mr;
-new_block->used_length = size;
-new_block->max_length = size;
-new_block->flags = ram_flags;
-new_block->guest_memfd = -1;
-new_block->host = file_ram_alloc(new_block, size, fd, !file_size, offset,
- errp);
-if (!new_block->host) {
+new_block = ram_block_create(mr, size, size, ram_flags);
+host = file_ram_alloc(new_block, size, fd, !file_size, offset, errp);
+if (!host) {
  g_free(new_block);
  return NULL;
  }
  
+new_block->host = host;

  ram_block_add(new_block, _err);
  if (local_err) {
  g_free(new_block);
@@ -1982,7 +1996,6 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, 
MemoryRegion *mr,
  return NULL;
  }
  return new_block;
-
  }
  
  
@@ -2054,18 +2067,10 @@ RAMBlock *qemu_ram_alloc_internal(ram_addr_t size, ram_addr_t max_size,

  align = MAX(align, TARGET_PAGE_SIZE);
  size = ROUND_UP(size, align);
  max_size = ROUND_UP(max_size, align);
-
-new_block = g_malloc0(sizeof(*new_block));
-new_block->mr = mr;
-new_block->resized = resized;
-new_block->used_length = size;
-new_block->max_length = max_size;
  assert(max_size >= size);
-new_block->fd = -1;
-new_block->guest_memfd = -1;
-new_block->page_size = qemu_real_host_page_size();
-new_block->host = host;
-new_block->flags = ram_flags;
+new_block = ram_block_create(mr, size, max_size, ram_flags);
+new_block->resized = resized;
+
  ram_block_add(new_block, _err);
  if (local_err) {
  g_free(new_block);
diff --git a/system/trace-events b/system/trace-events
index 69c9044..f0a80ba 100644
--- a/system/trace-events
+++ b/system/trace-events
@@ -38,3 +38,6 @@ dirtylimit_state_finalize(void)
  dirtylimit_throttle_pct(int cpu_index, uint64_t pct, int64_t time_us) "CPU[%d] throttle percent: 
%" PRIu64 ", throttle adjust time %"PRIi64 " us"
  dirtylimit_set_vcpu(int cpu_index, uint64_t quota) "CPU[%d] set dirty page rate 
limit %"PRIu64
  dirtylimit_vcpu_execute(int cpu_index, int64_t sleep_time_us) "CPU[%d] sleep %"PRIi64 
" us"
+
+# physmem.c
+ram_block_create(const char *name, uint32_t flags, int fd, size_t used_length, size_t 
max_length, size_t align) "%s, flags %u, fd %d, len %lu, maxlen %lu, align %lu"

Re: [PATCH V1 24/26] seccomp: cpr-exec blocker

2024-05-13 Thread Steven Sistare


On 5/10/2024 3:54 AM, Daniel P. Berrangé wrote:

On Mon, Apr 29, 2024 at 08:55:33AM -0700, Steve Sistare wrote:

cpr-exec mode needs permission to exec.  Block it if permission is denied.

Signed-off-by: Steve Sistare 
---
  include/sysemu/seccomp.h |  1 +
  system/qemu-seccomp.c| 10 --
  system/vl.c  |  6 ++
  3 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/include/sysemu/seccomp.h b/include/sysemu/seccomp.h
index fe85989..023c0a1 100644
--- a/include/sysemu/seccomp.h
+++ b/include/sysemu/seccomp.h
@@ -22,5 +22,6 @@
  #define QEMU_SECCOMP_SET_RESOURCECTL (1 << 4)
  
  int parse_sandbox(void *opaque, QemuOpts *opts, Error **errp);

+uint32_t qemu_seccomp_get_opts(void);
  
  #endif

diff --git a/system/qemu-seccomp.c b/system/qemu-seccomp.c
index 5c20ac0..0d2a561 100644
--- a/system/qemu-seccomp.c
+++ b/system/qemu-seccomp.c
@@ -360,12 +360,18 @@ static int seccomp_start(uint32_t seccomp_opts, Error 
**errp)
  return rc < 0 ? -1 : 0;
  }
  
+static uint32_t seccomp_opts;

+
+uint32_t qemu_seccomp_get_opts(void)
+{
+return seccomp_opts;
+}
+
  int parse_sandbox(void *opaque, QemuOpts *opts, Error **errp)
  {
  if (qemu_opt_get_bool(opts, "enable", false)) {
-uint32_t seccomp_opts = QEMU_SECCOMP_SET_DEFAULT
-| QEMU_SECCOMP_SET_OBSOLETE;
  const char *value = NULL;
+seccomp_opts = QEMU_SECCOMP_SET_DEFAULT | QEMU_SECCOMP_SET_OBSOLETE;
  
  value = qemu_opt_get(opts, "obsolete");

  if (value) {
diff --git a/system/vl.c b/system/vl.c
index 7252100..b76881e 100644
--- a/system/vl.c
+++ b/system/vl.c
@@ -76,6 +76,7 @@
  #include "hw/block/block.h"
  #include "hw/i386/x86.h"
  #include "hw/i386/pc.h"
+#include "migration/blocker.h"
  #include "migration/cpr.h"
  #include "migration/misc.h"
  #include "migration/snapshot.h"
@@ -2493,6 +2494,11 @@ static void qemu_process_early_options(void)
  QemuOptsList *olist = qemu_find_opts_err("sandbox", NULL);
  if (olist) {
  qemu_opts_foreach(olist, parse_sandbox, NULL, _fatal);
+if (qemu_seccomp_get_opts() & QEMU_SECCOMP_SET_SPAWN) {
+Error *blocker = NULL;
+error_setg(, "-sandbox denies exec for cpr-exec");
+migrate_add_blocker_mode(, MIG_MODE_CPR_EXEC, 
_fatal);
+}
  }
  #endi


There are a whole pile of features that get blocked wehn -sandbox is
used. I'm not convinced we should be adding code to check for specific
blocked features, as such a list will always be incomplete at best, and
incorrectly block things at worst.

I view this primarily as a documentation task for the cpr-exec command.


For cpr and live migration, we do our best to prevent breaking the guest
for cases we know will fail.  Independently, a clear error message here
will reduce error reports for this new cpr feature.

Would it be more palatable if I move this blocker's creation to cpr_mig_init?

- Steve

Re: [PATCH V1 22/26] migration: ram block cpr-exec blockers

2024-05-13 Thread Steven Sistare


On 5/9/2024 2:01 PM, Fabiano Rosas wrote:

Steve Sistare  writes:


Unlike cpr-reboot mode, cpr-exec mode cannot save volatile ram blocks in the
migration stream file and recreate them later, because the physical memory for
the blocks is pinned and registered for vfio.  Add an exec-mode blocker for
volatile ram blocks.

Also add a blocker for RAM_GUEST_MEMFD.  Preserving guest_memfd may be
sufficient for cpr-exec, but it has not been tested yet.

- Steve


extra text here


Will fix, thanks - steve


Signed-off-by: Steve Sistare 


Reviewed-by: Fabiano Rosas

Re: [PATCH V1 09/26] migration: vmstate_register_named

2024-05-13 Thread Steven Sistare


On 5/9/2024 10:32 AM, Fabiano Rosas wrote:

Fabiano Rosas  writes:


Steve Sistare  writes:


Define vmstate_register_named which takes the instance name as its first
parameter, instead of generating the name from VMStateIf of the Object.
This will be needed to register objects that are not Objects.  Pass the
new name parameter to vmstate_register_with_alias_id.

Signed-off-by: Steve Sistare 


Reviewed-by: Fabiano Rosas 


Actually, can't we define a wrapper type just for this purpose? For
example, looking at dbus-vmstate.c:


One would need to provide a separate wrapper for each struct to be registered
as vmstate.  This patch set only has RAMBlock, but there are more coming in
my next patch sets.  vmstate_register_named avoids adding such boilerplate,
and makes it easier to add more cpr state in the future.

- Steve


static void dbus_vmstate_class_init(ObjectClass *oc, void *data)
{
...
 VMStateIfClass *vc = VMSTATE_IF_CLASS(oc);

 vc->get_id = dbus_vmstate_get_id;
...
}

static const TypeInfo dbus_vmstate_info = {
 .name = TYPE_DBUS_VMSTATE,
 .parent = TYPE_OBJECT,
 .instance_size = sizeof(DBusVMState),
 .instance_finalize = dbus_vmstate_finalize,
 .class_init = dbus_vmstate_class_init,
 .interfaces = (InterfaceInfo[]) {
 { TYPE_USER_CREATABLE },   // without this one
 { TYPE_VMSTATE_IF },
 { }
 }
};

static void register_types(void)
{
 type_register_static(_vmstate_info);
}
type_init(register_types);

Re: [PATCH V1 05/26] migration: precreate vmstate

2024-05-13 Thread Steven Sistare


On 5/7/2024 5:02 PM, Fabiano Rosas wrote:

Steve Sistare  writes:


Provide the VMStateDescription precreate field to mark objects that must
be loaded on the incoming side before devices have been created, because
they provide properties that will be needed at creation time.  They will
be saved to and loaded from their own QEMUFile, via


It's not obvious to me what the reason is to have a separate
QEMUFile. Could you expand on this?


The migration stream is read in the calling sequence at B below, but precreate
state is needed at A before chardev and memory backends are created.

main()
  qemu_init()
A:
qemu_create_early_backends()
qemu_create_late_backends()
migration_object_init()
qmp_x_exit_preconfig()
  qmp_migrate_incoming()

  qemu_default_main()
qemu_main_loop()
  fd_accept_incoming_migration()
migration_channel_process_incoming()
  migration_ioc_process_incoming()
migration_incoming_process()
  process_incoming_migration_co()
B:
qemu_loadvm_state()

precreate objects could be emitted first in the existing migration stream and
read at A, but this requires untangling numerous ordering dependencies amongst
migration_object_init, qemu_create_machine, configure_accelerators, monitor
init, and the main loop.

- Steve

Re: [PATCH V1 06/26] migration: precreate vmstate for exec

2024-05-13 Thread Steven Sistare


On 5/6/2024 7:34 PM, Fabiano Rosas wrote:

Steve Sistare  writes:


Provide migration_precreate_save for saving precreate vmstate across exec.
Create a memfd, save its value in the environment, and serialize state
to it.  Reverse the process in migration_precreate_load.

Signed-off-by: Steve Sistare 
---
  include/migration/misc.h |   5 ++
  migration/meson.build|   1 +
  migration/precreate.c| 139 +++
  3 files changed, 145 insertions(+)
  create mode 100644 migration/precreate.c

diff --git a/include/migration/misc.h b/include/migration/misc.h
index c9e200f..cf30351 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -56,6 +56,11 @@ AnnounceParameters *migrate_announce_params(void);
  
  void dump_vmstate_json_to_file(FILE *out_fp);
  
+/* migration/precreate.c */

+int migration_precreate_save(Error **errp);
+void migration_precreate_unsave(void);
+int migration_precreate_load(Error **errp);
+
  /* migration/migration.c */
  void migration_object_init(void);
  void migration_shutdown(void);
diff --git a/migration/meson.build b/migration/meson.build
index f76b1ba..50e7cb2 100644
--- a/migration/meson.build
+++ b/migration/meson.build
@@ -26,6 +26,7 @@ system_ss.add(files(
'ram-compress.c',
'options.c',
'postcopy-ram.c',
+  'precreate.c',
'savevm.c',
'socket.c',
'tls.c',
diff --git a/migration/precreate.c b/migration/precreate.c
new file mode 100644
index 000..0bf5e1f
--- /dev/null
+++ b/migration/precreate.c
@@ -0,0 +1,139 @@
+/*
+ * Copyright (c) 2022, 2024 Oracle and/or its affiliates.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/cutils.h"
+#include "qemu/memfd.h"
+#include "qapi/error.h"
+#include "io/channel-file.h"
+#include "migration/misc.h"
+#include "migration/qemu-file.h"
+#include "migration/savevm.h"
+
+#define PRECREATE_STATE_NAME "QEMU_PRECREATE_STATE"
+
+static QEMUFile *qemu_file_new_fd_input(int fd, const char *name)
+{
+g_autoptr(QIOChannelFile) fioc = qio_channel_file_new_fd(fd);
+QIOChannel *ioc = QIO_CHANNEL(fioc);
+qio_channel_set_name(ioc, name);
+return qemu_file_new_input(ioc);
+}
+
+static QEMUFile *qemu_file_new_fd_output(int fd, const char *name)
+{
+g_autoptr(QIOChannelFile) fioc = qio_channel_file_new_fd(fd);
+QIOChannel *ioc = QIO_CHANNEL(fioc);
+qio_channel_set_name(ioc, name);
+return qemu_file_new_output(ioc);
+}
+
+static int memfd_create_named(const char *name, Error **errp)
+{
+int mfd;
+char val[16];
+
+mfd = memfd_create(name, 0);
+if (mfd < 0) {
+error_setg_errno(errp, errno, "memfd_create failed");
+return -1;
+}
+
+/* Remember mfd in environment for post-exec load */
+qemu_clear_cloexec(mfd);
+snprintf(val, sizeof(val), "%d", mfd);
+g_setenv(name, val, 1);
+
+return mfd;
+}
+
+static int memfd_find_named(const char *name, int *mfd_p, Error **errp)
+{
+const char *val = g_getenv(name);
+
+if (!val) {
+*mfd_p = -1;
+return 0;   /* No memfd was created, not an error */
+}
+g_unsetenv(name);
+if (qemu_strtoi(val, NULL, 10, mfd_p)) {
+error_setg(errp, "Bad %s env value %s", PRECREATE_STATE_NAME, val);
+return -1;
+}
+lseek(*mfd_p, 0, SEEK_SET);
+return 0;
+}
+
+static void memfd_delete_named(const char *name)
+{
+int mfd;
+const char *val = g_getenv(name);
+
+if (val) {
+g_unsetenv(name);
+if (!qemu_strtoi(val, NULL, 10, )) {
+close(mfd);
+}
+}
+}
+
+static QEMUFile *qemu_file_new_memfd_output(const char *name, Error **errp)
+{
+int mfd = memfd_create_named(name, errp);
+
+if (mfd < 0) {
+return NULL;
+}
+
+return qemu_file_new_fd_output(mfd, name);
+}
+
+static QEMUFile *qemu_file_new_memfd_input(const char *name, Error **errp)
+{
+int ret, mfd;
+
+ret = memfd_find_named(name, , errp);
+if (ret || mfd < 0) {
+return NULL;
+}
+
+return qemu_file_new_fd_input(mfd, name);
+}
+
+int migration_precreate_save(Error **errp)
+{
+QEMUFile *f = qemu_file_new_memfd_output(PRECREATE_STATE_NAME, errp);
+
+if (!f) {
+return -1;
+} else if (qemu_savevm_precreate_save(f, errp)) {
+memfd_delete_named(PRECREATE_STATE_NAME);
+return -1;
+} else {
+/* Do not close f, as mfd must remain open. */
+return 0;
+}
+}
+
+void migration_precreate_unsave(void)
+{
+memfd_delete_named(PRECREATE_STATE_NAME);
+}
+
+int migration_precreate_load(Error **errp)
+{
+int ret;
+QEMUFile *f = qemu_file_new_memfd_input(PRECREATE_STATE_NAME, errp);


Can we avoid the QEMUFile? I don't see it being exported from this file.


It is not exported, but within this file, it is the basis for all read and
write operations, via the existing functions

Re: [PATCH V1 03/26] migration: SAVEVM_FOREACH

2024-05-13 Thread Steven Sistare


On 5/6/2024 7:17 PM, Fabiano Rosas wrote:

Steve Sistare  writes:


Define an abstraction SAVEVM_FOREACH to loop over all savevm state
handlers, and replace QTAILQ_FOREACH.  Define variants for ALL so
we can loop over all handlers vs a subset of handlers in a subsequent
patch, but at this time there is no distinction between the two.
No functional change.

Signed-off-by: Steve Sistare 
---
  migration/savevm.c | 55 +++---
  1 file changed, 32 insertions(+), 23 deletions(-)

diff --git a/migration/savevm.c b/migration/savevm.c
index 4509482..6829ba3 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -237,6 +237,15 @@ static SaveState savevm_state = {
  .global_section_id = 0,
  };
  
+#define SAVEVM_FOREACH(se, entry)\

+QTAILQ_FOREACH(se, _state.handlers, entry)\
+
+#define SAVEVM_FOREACH_ALL(se, entry)\
+QTAILQ_FOREACH(se, _state.handlers, entry)


This feels worse than SAVEVM_FOREACH_NOT_PRECREATED. We'll have to keep
coming back to the definition to figure out which FOREACH is the real
deal.


I take your point, but the majority of the loops do not care about precreated
objects, so it seems backwards to make them more verbose with 
SAVEVM_FOREACH_NOT_PRECREATE.  I can go either way, but we need

Peter's opinion also.


+
+#define SAVEVM_FOREACH_SAFE_ALL(se, entry, new_se)   \
+QTAILQ_FOREACH_SAFE(se, _state.handlers, entry, new_se)
+
  static SaveStateEntry *find_se(const char *idstr, uint32_t instance_id);
  
  static bool should_validate_capability(int capability)

@@ -674,7 +683,7 @@ static uint32_t calculate_new_instance_id(const char *idstr)
  SaveStateEntry *se;
  uint32_t instance_id = 0;
  
-QTAILQ_FOREACH(se, _state.handlers, entry) {

+SAVEVM_FOREACH_ALL(se, entry) {


In this patch we can't have both instances...


  if (strcmp(idstr, se->idstr) == 0
  && instance_id <= se->instance_id) {
  instance_id = se->instance_id + 1;
@@ -690,7 +699,7 @@ static int calculate_compat_instance_id(const char *idstr)
  SaveStateEntry *se;
  int instance_id = 0;
  
-QTAILQ_FOREACH(se, _state.handlers, entry) {

+SAVEVM_FOREACH(se, entry) {


...otherwise one of the two changes will go undocumented because the
actual reason for it will only be described in the next patch.


Sure, I'll move this to the precreate patch.

- Steve


  if (!se->compat) {
  continue;
  }
@@ -816,7 +825,7 @@ void unregister_savevm(VMStateIf *obj, const char *idstr, 
void *opaque)
  }
  pstrcat(id, sizeof(id), idstr);
  
-QTAILQ_FOREACH_SAFE(se, _state.handlers, entry, new_se) {

+SAVEVM_FOREACH_SAFE_ALL(se, entry, new_se) {
  if (strcmp(se->idstr, id) == 0 && se->opaque == opaque) {
  savevm_state_handler_remove(se);
  g_free(se->compat);
@@ -939,7 +948,7 @@ void vmstate_unregister(VMStateIf *obj, const 
VMStateDescription *vmsd,
  {
  SaveStateEntry *se, *new_se;
  
-QTAILQ_FOREACH_SAFE(se, _state.handlers, entry, new_se) {

+SAVEVM_FOREACH_SAFE_ALL(se, entry, new_se) {
  if (se->vmsd == vmsd && se->opaque == opaque) {
  savevm_state_handler_remove(se);
  g_free(se->compat);
@@ -1223,7 +1232,7 @@ bool qemu_savevm_state_blocked(Error **errp)
  {
  SaveStateEntry *se;
  
-QTAILQ_FOREACH(se, _state.handlers, entry) {

+SAVEVM_FOREACH(se, entry) {
  if (se->vmsd && se->vmsd->unmigratable) {
  error_setg(errp, "State blocked by non-migratable device '%s'",
 se->idstr);
@@ -1237,7 +1246,7 @@ void qemu_savevm_non_migratable_list(strList **reasons)
  {
  SaveStateEntry *se;
  
-QTAILQ_FOREACH(se, _state.handlers, entry) {

+SAVEVM_FOREACH(se, entry) {
  if (se->vmsd && se->vmsd->unmigratable) {
  QAPI_LIST_PREPEND(*reasons,
g_strdup_printf("non-migratable device: %s",
@@ -1276,7 +1285,7 @@ bool qemu_savevm_state_guest_unplug_pending(void)
  {
  SaveStateEntry *se;
  
-QTAILQ_FOREACH(se, _state.handlers, entry) {

+SAVEVM_FOREACH(se, entry) {
  if (se->vmsd && se->vmsd->dev_unplug_pending &&
  se->vmsd->dev_unplug_pending(se->opaque)) {
  return true;
@@ -1291,7 +1300,7 @@ int qemu_savevm_state_prepare(Error **errp)
  SaveStateEntry *se;
  int ret;
  
-QTAILQ_FOREACH(se, _state.handlers, entry) {

+SAVEVM_FOREACH(se, entry) {
  if (!se->ops || !se->ops->save_prepare) {
  continue;
  }
@@ -1321,7 +1330,7 @@ int qemu_savevm_state_setup(QEMUFile *f, Error **errp)
  json_writer_start_array(ms->vmdesc, "devices");
  
  trace_savevm_state_setup();

-QTAILQ_FOREACH(se, _state.handlers, entry) {
+SAVEVM_FOREACH(se, entry) {
  if (se->vmsd && se->vmsd->early_setup) {

Re: [PATCH-for-9.1 v2 2/3] migration: Remove RDMA protocol handling

2024-05-13 Thread Michael Galaxy

One thing to keep in mind here (despite me not having any hardware to 
test) was that one of the original goals here
in the RDMA implementation was not simply raw throughput nor raw 
latency, but a lack of CPU utilization in kernel
space due to the offload. While it is entirely possible that newer 
hardware w/ TCP might compete, the significant

reductions in CPU usage in the TCP/IP stack were a big win at the time.

Just something to consider while you're doing the testing

- Michael

On 5/9/24 03:58, Zheng Chuan wrote:

Hi, Peter，Lei，Jinpu.

On 2024/5/8 0:28, Peter Xu wrote:

On Tue, May 07, 2024 at 01:50:43AM +, Gonglei (Arei) wrote:

Hello,


-Original Message-
From: Peter Xu [mailto:pet...@redhat.com]
Sent: Monday, May 6, 2024 11:18 PM
To: Gonglei (Arei) 
Cc: Daniel P. Berrangé ; Markus Armbruster
; Michael Galaxy ; Yu Zhang
; Zhijian Li (Fujitsu) ; Jinpu Wang
; Elmar Gerdes ;
qemu-devel@nongnu.org; Yuval Shaia ; Kevin Wolf
; Prasanna Kumar Kalever
; Cornelia Huck ;
Michael Roth ; Prasanna Kumar Kalever
; integrat...@gluster.org; Paolo Bonzini
; qemu-bl...@nongnu.org; de...@lists.libvirt.org;
Hanna Reitz ; Michael S. Tsirkin ;
Thomas Huth ; Eric Blake ; Song
Gao ; Marc-André Lureau
; Alex Bennée ;
Wainer dos Santos Moschetta ; Beraldo Leal
; Pannengyuan ;
Xiexiangyou 
Subject: Re: [PATCH-for-9.1 v2 2/3] migration: Remove RDMA protocol handling

On Mon, May 06, 2024 at 02:06:28AM +, Gonglei (Arei) wrote:

Hi, Peter

Hey, Lei,

Happy to see you around again after years.


Haha, me too.


RDMA features high bandwidth, low latency (in non-blocking lossless
network), and direct remote memory access by bypassing the CPU (As you
know, CPU resources are expensive for cloud vendors, which is one of
the reasons why we introduced offload cards.), which TCP does not have.

It's another cost to use offload cards, v.s. preparing more cpu resources?


Software and hardware offload converged architecture is the way to go for all 
cloud vendors
(Including comprehensive benefits in terms of performance, cost, security, and 
innovation speed),
it's not just a matter of adding the resource of a DPU card.


In some scenarios where fast live migration is needed (extremely short
interruption duration and migration duration) is very useful. To this
end, we have also developed RDMA support for multifd.

Will any of you upstream that work?  I'm curious how intrusive would it be
when adding it to multifd, if it can keep only 5 exported functions like what
rdma.h does right now it'll be pretty nice.  We also want to make sure it works
with arbitrary sized loads and buffers, e.g. vfio is considering to add IO 
loads to
multifd channels too.


In fact, we sent the patchset to the community in 2021. Pls see:
https://urldefense.com/v3/__https://lore.kernel.org/all/20210203185906.GT2950@work-vm/T/__;!!GjvTz_vk!VfP_SV-8uRya7rBdopv8OUJkmnSi44Ktpqq1E7sr_Xcwt6zvveW51qboWOBSTChdUG1hJwfAl7HZl4NUEGc$

Yes, I have sent the patchset of multifd support for rdma migration by taking 
over my colleague, and also
sorry for not keeping on this work at that time due to some reasons.
And also I am strongly agree with Lei that the RDMA protocol has some special 
advantages against with TCP
in some scenario, and we are indeed to use it in our product.


I wasn't aware of that for sure in the past..

Multifd has changed quite a bit in the last 9.0 release, that may not apply
anymore.  One thing to mention is please look at Dan's comment on possible
use of rsocket.h:

https://urldefense.com/v3/__https://lore.kernel.org/all/zjjm6rcqs5eho...@redhat.com/__;!!GjvTz_vk!VfP_SV-8uRya7rBdopv8OUJkmnSi44Ktpqq1E7sr_Xcwt6zvveW51qboWOBSTChdUG1hJwfAl7HZ0CFSE-o$

And Jinpu did help provide an initial test result over the library:

https://urldefense.com/v3/__https://lore.kernel.org/qemu-devel/camgffek8wiknqmouyxcathgtiem2dwocf_w7t0vmcd-i30t...@mail.gmail.com/__;!!GjvTz_vk!VfP_SV-8uRya7rBdopv8OUJkmnSi44Ktpqq1E7sr_Xcwt6zvveW51qboWOBSTChdUG1hJwfAl7HZxPNcdb4$

It looks like we have a chance to apply that in QEMU.




One thing to note that the question here is not about a pure performance
comparison between rdma and nics only.  It's about help us make a decision
on whether to drop rdma, iow, even if rdma performs well, the community still
has the right to drop it if nobody can actively work and maintain it.
It's just that if nics can perform as good it's more a reason to drop, unless
companies can help to provide good support and work together.


We are happy to provide the necessary review and maintenance work for RDMA
if the community needs it.

CC'ing Chuan Zheng.

I'm not sure whether you and Jinpu's team would like to work together and
provide a final solution for rdma over multifd.  It could be much simpler
than the original 2021 proposal if the rsocket API will work out.

Thanks,


That's a good news to see the socket abstraction for RDMA!
When I was developed the series above, the most pain is the RDMA migration has 
no QIOChannel

Re: [PATCH 2/3] hw/timer/imx_gpt: Convert DPRINTF to trace events

2024-05-13 Thread Bernhard Beschow




Am 13. Mai 2024 11:30:04 UTC schrieb "Philippe Mathieu-Daudé" 
:
>On 13/5/24 12:11, Bernhard Beschow wrote:
>> Signed-off-by: Bernhard Beschow 
>> ---
>>   hw/timer/imx_gpt.c| 18 +-
>>   hw/timer/trace-events |  6 ++
>>   2 files changed, 11 insertions(+), 13 deletions(-)
>
>
>> @@ -317,7 +310,7 @@ static uint64_t imx_gpt_read(void *opaque, hwaddr 
>> offset, unsigned size)
>>   break;
>>   }
>>   -DPRINTF("(%s) = 0x%08x\n", imx_gpt_reg_name(offset >> 2), reg_value);
>> +trace_imx_gpt_read(imx_gpt_reg_name(offset >> 2), reg_value);
>> return reg_value;
>>   }
>> @@ -384,8 +377,7 @@ static void imx_gpt_write(void *opaque, hwaddr offset, 
>> uint64_t value,
>>   IMXGPTState *s = IMX_GPT(opaque);
>>   uint32_t oldreg;
>>   -DPRINTF("(%s, value = 0x%08x)\n", imx_gpt_reg_name(offset >> 2),
>> -(uint32_t)value);
>> +trace_imx_gpt_write(imx_gpt_reg_name(offset >> 2), (uint32_t)value);
>
>
>> @@ -49,6 +49,12 @@ cmsdk_apb_dualtimer_read(uint64_t offset, uint64_t data, 
>> unsigned size) "CMSDK A
>>   cmsdk_apb_dualtimer_write(uint64_t offset, uint64_t data, unsigned size) 
>> "CMSDK APB dualtimer write: offset 0x%" PRIx64 " data 0x%" PRIx64 " size %u"
>>   cmsdk_apb_dualtimer_reset(void) "CMSDK APB dualtimer: reset"
>>   +# imx_gpt.c
>> +imx_gpt_set_freq(uint32_t clksrc, uint32_t freq) "Setting clksrc %d to %d 
>> Hz"
>
>'%d' is signed, for unsigned you want '%u'.

I'll respin.

Thanks,
Bernhard

>
>> +imx_gpt_read(const char *name, uint32_t value) "%s -> 0x%08x"
>> +imx_gpt_write(const char *name, uint32_t value) "%s <- 0x%08x"
>
>I'd avoid the cast and use uint64_t/PRIx64 here to KISS, regardless:
>Reviewed-by: Philippe Mathieu-Daudé 
>
>> +imx_gpt_timeout(void) ""
>> +
>

Re: [PATCH V1 13/26] physmem: ram_block_create

2024-05-13 Thread Fabiano Rosas

Steve Sistare  writes:

> Create a common subroutine to allocate a RAMBlock, de-duping the code to
> populate its common fields.  Add a trace point for good measure.
> No functional change.
>
> Signed-off-by: Steve Sistare 
> ---
>  system/physmem.c| 47 ++-
>  system/trace-events |  3 +++
>  2 files changed, 29 insertions(+), 21 deletions(-)
>
> diff --git a/system/physmem.c b/system/physmem.c
> index c3d04ca..6216b14 100644
> --- a/system/physmem.c
> +++ b/system/physmem.c
> @@ -52,6 +52,7 @@
>  #include "sysemu/hw_accel.h"
>  #include "sysemu/xen-mapcache.h"
>  #include "trace/trace-root.h"
> +#include "trace.h"
>  
>  #ifdef CONFIG_FALLOCATE_PUNCH_HOLE
>  #include 
> @@ -1918,11 +1919,29 @@ out_free:
>  }
>  }
>  
> +static RAMBlock *ram_block_create(MemoryRegion *mr, ram_addr_t size,
> +  ram_addr_t max_size, uint32_t ram_flags)
> +{
> +RAMBlock *rb = g_malloc0(sizeof(*rb));
> +
> +rb->used_length = size;
> +rb->max_length = max_size;
> +rb->fd = -1;
> +rb->flags = ram_flags;
> +rb->page_size = qemu_real_host_page_size();
> +rb->mr = mr;
> +rb->guest_memfd = -1;
> +trace_ram_block_create(rb->idstr, rb->flags, rb->fd, rb->used_length,

There's no idstr at this point, is there? I think this needs to be
memory_region_name(mr).

> +   rb->max_length, mr->align);
> +return rb;
> +}
> +
>  #ifdef CONFIG_POSIX
>  RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr,
>   uint32_t ram_flags, int fd, off_t offset,
>   Error **errp)
>  {
> +void *host;
>  RAMBlock *new_block;
>  Error *local_err = NULL;
>  int64_t file_size, file_align;
> @@ -1962,19 +1981,14 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, 
> MemoryRegion *mr,
>  return NULL;
>  }
>  
> -new_block = g_malloc0(sizeof(*new_block));
> -new_block->mr = mr;
> -new_block->used_length = size;
> -new_block->max_length = size;
> -new_block->flags = ram_flags;
> -new_block->guest_memfd = -1;
> -new_block->host = file_ram_alloc(new_block, size, fd, !file_size, offset,
> - errp);
> -if (!new_block->host) {
> +new_block = ram_block_create(mr, size, size, ram_flags);
> +host = file_ram_alloc(new_block, size, fd, !file_size, offset, errp);
> +if (!host) {
>  g_free(new_block);
>  return NULL;
>  }
>  
> +new_block->host = host;
>  ram_block_add(new_block, _err);
>  if (local_err) {
>  g_free(new_block);
> @@ -1982,7 +1996,6 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, 
> MemoryRegion *mr,
>  return NULL;
>  }
>  return new_block;
> -
>  }
>  
>  
> @@ -2054,18 +2067,10 @@ RAMBlock *qemu_ram_alloc_internal(ram_addr_t size, 
> ram_addr_t max_size,
>  align = MAX(align, TARGET_PAGE_SIZE);
>  size = ROUND_UP(size, align);
>  max_size = ROUND_UP(max_size, align);
> -
> -new_block = g_malloc0(sizeof(*new_block));
> -new_block->mr = mr;
> -new_block->resized = resized;
> -new_block->used_length = size;
> -new_block->max_length = max_size;
>  assert(max_size >= size);
> -new_block->fd = -1;
> -new_block->guest_memfd = -1;
> -new_block->page_size = qemu_real_host_page_size();
> -new_block->host = host;
> -new_block->flags = ram_flags;
> +new_block = ram_block_create(mr, size, max_size, ram_flags);
> +new_block->resized = resized;
> +
>  ram_block_add(new_block, _err);
>  if (local_err) {
>  g_free(new_block);
> diff --git a/system/trace-events b/system/trace-events
> index 69c9044..f0a80ba 100644
> --- a/system/trace-events
> +++ b/system/trace-events
> @@ -38,3 +38,6 @@ dirtylimit_state_finalize(void)
>  dirtylimit_throttle_pct(int cpu_index, uint64_t pct, int64_t time_us) 
> "CPU[%d] throttle percent: %" PRIu64 ", throttle adjust time %"PRIi64 " us"
>  dirtylimit_set_vcpu(int cpu_index, uint64_t quota) "CPU[%d] set dirty page 
> rate limit %"PRIu64
>  dirtylimit_vcpu_execute(int cpu_index, int64_t sleep_time_us) "CPU[%d] sleep 
> %"PRIi64 " us"
> +
> +# physmem.c
> +ram_block_create(const char *name, uint32_t flags, int fd, size_t 
> used_length, size_t max_length, size_t align) "%s, flags %u, fd %d, len %lu, 
> maxlen %lu, align %lu"

CPR/liveupdate: test results using prior bug fix

2024-05-13 Thread Michael Galaxy


Hi Steve,

We found that this specific change in particular ("migration: stop vm 
for cpr") fixes a bug that we've identified in testing back-to-back live 
updates in a lab environment.


More specifically, *without* this change (which is not available in 
8.2.2, but *is* available in 9.0.0) causes the metadata save file to be 
corrupted when doing live updates one after another. Typically we see a 
corrupted save file somewhere in between 20 and 30 live updates and 
while doing a git bisect, we found that this change makes the problem go 
away.


Were you aware? Is there any plan in place to cherry pick this for 
8.2.3, perhaps or a plan to release 8.2.3 at some point?


Here are some examples of how the bug manifests in different locations 
of the QEMU metadata save file:


2024-04-26T13:28:53Z qemu-system-x86_64: Failed to load mtrr_var:base
2024-04-26T13:28:53Z qemu-system-x86_64: Failed to load cpu:env.mtrr_var
2024-04-26T13:28:53Z qemu-system-x86_64: error while loading state for instance 
0x1b of device 'cpu'
2024-04-26T13:28:53Z qemu-system-x86_64: load of migration failed: Input/output 
error

And another:

2024-04-17T16:09:47Z qemu-system-x86_64: check_section_footer: Read section 
footer failed: -5
2024-04-17T16:09:47Z qemu-system-x86_64: load of migration failed: Invalid 
argument

And another:

2024-04-30T21:53:29Z qemu-system-x86_64: Unable to read ID string for section 
163
2024-04-30T21:53:29Z qemu-system-x86_64: load of migration failed: Invalid 
argument

And another:

2024-05-01T16:01:44Z qemu-system-x86_64: Unable to read ID string for section 
164
2024-05-01T16:01:44Z qemu-system-x86_64: load of migration failed: Invalid 
argument
 

As you can see, they occur quite randomly, but generally it takes at 
least 20-30+ live updates before the problem occurs.


- Michael

On 2/27/24 23:13, pet...@redhat.com wrote:

From: Steve Sistare

When migration for cpr is initiated, stop the vm and set state
RUN_STATE_FINISH_MIGRATE before ram is saved.  This eliminates the
possibility of ram and device state being out of sync, and guarantees
that a guest in the suspended state remains suspended, because qmp_cont
rejects a cont command in the RUN_STATE_FINISH_MIGRATE state.

Signed-off-by: Steve Sistare
Reviewed-by: Peter Xu
Link:https://urldefense.com/v3/__https://lore.kernel.org/r/1708622920-68779-11-git-send-email-steven.sistare@oracle.com__;!!GjvTz_vk!QLsFOCX-x2U9bzAo98SdidKlomHrmf_t0UmQKtgudoIcaDVoAJOPm39ZqaNP_nT5I8QqVfSgwhDZmg$  
Signed-off-by: Peter Xu

---
  include/migration/misc.h |  1 +
  migration/migration.h|  2 --
  migration/migration.c| 51 
  3 files changed, 32 insertions(+), 22 deletions(-)

diff --git a/include/migration/misc.h b/include/migration/misc.h
index e4933b815b..5d1aa593ed 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -60,6 +60,7 @@ void migration_object_init(void);
  void migration_shutdown(void);
  bool migration_is_idle(void);
  bool migration_is_active(MigrationState *);
+bool migrate_mode_is_cpr(MigrationState *);
  
  typedef enum MigrationEventType {

  MIG_EVENT_PRECOPY_SETUP,
diff --git a/migration/migration.h b/migration/migration.h
index aef8afbe1f..65c0b61cbd 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -541,6 +541,4 @@ int migration_rp_wait(MigrationState *s);
   */
  void migration_rp_kick(MigrationState *s);
  
-int migration_stop_vm(RunState state);

-
  #endif
diff --git a/migration/migration.c b/migration/migration.c
index 37c836b0b0..90a90947fb 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -167,11 +167,19 @@ static gint page_request_addr_cmp(gconstpointer ap, 
gconstpointer bp)
  return (a > b) - (a < b);
  }
  
-int migration_stop_vm(RunState state)

+static int migration_stop_vm(MigrationState *s, RunState state)
  {
-int ret = vm_stop_force_state(state);
+int ret;
+
+migration_downtime_start(s);
+
+s->vm_old_state = runstate_get();
+global_state_store();
+
+ret = vm_stop_force_state(state);
  
  trace_vmstate_downtime_checkpoint("src-vm-stopped");

+trace_migration_completion_vm_stop(ret);
  
  return ret;

  }
@@ -1602,6 +1610,11 @@ bool migration_is_active(MigrationState *s)
  s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE);
  }
  
+bool migrate_mode_is_cpr(MigrationState *s)

+{
+return s->parameters.mode == MIG_MODE_CPR_REBOOT;
+}
+
  int migrate_init(MigrationState *s, Error **errp)
  {
  int ret;
@@ -2454,10 +2467,7 @@ static int postcopy_start(MigrationState *ms, Error 
**errp)
  bql_lock();
  trace_postcopy_start_set_run();
  
-migration_downtime_start(ms);

-
-global_state_store();
-ret = migration_stop_vm(RUN_STATE_FINISH_MIGRATE);
+ret = migration_stop_vm(ms, RUN_STATE_FINISH_MIGRATE);
  if (ret < 0) {
  goto fail;
  }
@@ -2652,15 +2662,12 @@ static int migration_completion_precopy(MigrationState 
*s,
  int ret;

Re: [PATCH v2 13/33] plugins: Use DisasContextBase for qemu_plugin_insn_haddr

2024-05-13 Thread Pierrick Bouvier


On 4/24/24 16:31, Richard Henderson wrote:

We can delay the computation of haddr until the plugin
actually requests it.

Signed-off-by: Richard Henderson 
---
  include/qemu/plugin.h  |  4 
  accel/tcg/plugin-gen.c | 20 
  plugins/api.c  | 25 -
  3 files changed, 24 insertions(+), 25 deletions(-)

diff --git a/include/qemu/plugin.h b/include/qemu/plugin.h
index 03081be543..3db0e75d16 100644
--- a/include/qemu/plugin.h
+++ b/include/qemu/plugin.h
@@ -98,7 +98,6 @@ struct qemu_plugin_dyn_cb {
  /* Internal context for instrumenting an instruction */
  struct qemu_plugin_insn {
  uint64_t vaddr;
-void *haddr;
  GArray *insn_cbs;
  GArray *mem_cbs;
  uint8_t len;
@@ -119,9 +118,6 @@ struct qemu_plugin_tb {
  GPtrArray *insns;
  size_t n;
  uint64_t vaddr;
-uint64_t vaddr2;
-void *haddr1;
-void *haddr2;
  
  /* if set, the TB calls helpers that might access guest memory */

  bool mem_helper;
diff --git a/accel/tcg/plugin-gen.c b/accel/tcg/plugin-gen.c
index a4656859c6..b036773d3c 100644
--- a/accel/tcg/plugin-gen.c
+++ b/accel/tcg/plugin-gen.c
@@ -319,9 +319,6 @@ bool plugin_gen_tb_start(CPUState *cpu, const 
DisasContextBase *db)
  ret = true;
  
  ptb->vaddr = db->pc_first;

-ptb->vaddr2 = -1;
-ptb->haddr1 = db->host_addr[0];
-ptb->haddr2 = NULL;
  ptb->mem_helper = false;
  
  tcg_gen_plugin_cb(PLUGIN_GEN_FROM_TB);

@@ -363,23 +360,6 @@ void plugin_gen_insn_start(CPUState *cpu, const 
DisasContextBase *db)
  pc = db->pc_next;
  insn->vaddr = pc;
  
-/*

- * Detect page crossing to get the new host address.
- * Note that we skip this when haddr1 == NULL, e.g. when we're
- * fetching instructions from a region not backed by RAM.
- */
-if (ptb->haddr1 == NULL) {
-insn->haddr = NULL;
-} else if (is_same_page(db, db->pc_next)) {
-insn->haddr = ptb->haddr1 + pc - ptb->vaddr;
-} else {
-if (ptb->vaddr2 == -1) {
-ptb->vaddr2 = TARGET_PAGE_ALIGN(db->pc_first);
-get_page_addr_code_hostp(cpu_env(cpu), ptb->vaddr2, >haddr2);
-}
-insn->haddr = ptb->haddr2 + pc - ptb->vaddr2;
-}
-
  tcg_gen_plugin_cb(PLUGIN_GEN_FROM_INSN);
  }
  
diff --git a/plugins/api.c b/plugins/api.c

index 39895a1cb1..4b6690c7d6 100644
--- a/plugins/api.c
+++ b/plugins/api.c
@@ -242,7 +242,30 @@ uint64_t qemu_plugin_insn_vaddr(const struct 
qemu_plugin_insn *insn)
  
  void *qemu_plugin_insn_haddr(const struct qemu_plugin_insn *insn)

  {
-return insn->haddr;
+const DisasContextBase *db = tcg_ctx->plugin_db;
+vaddr page0_last = db->pc_first | ~TARGET_PAGE_MASK;
+
+if (db->fake_insn) {
+return NULL;
+}
+
+/*
+ * ??? The return value is not intended for use of host memory,
+ * but as a proxy for address space and physical address.
+ * Thus we are only interested in the first byte and do not
+ * care about spanning pages.
+ */
+if (insn->vaddr <= page0_last) {
+if (db->host_addr[0] == NULL) {
+return NULL;
+}
+return db->host_addr[0] + insn->vaddr - db->pc_first;
+} else {
+if (db->host_addr[1] == NULL) {
+return NULL;
+}
+return db->host_addr[1] + insn->vaddr - (page0_last + 1);
+}
  }
  
  char *qemu_plugin_insn_disas(const struct qemu_plugin_insn *insn)


Reviewed-by: Pierrick Bouvier

Re: [PATCH 5/6] migration: Rephrase message on failure to save / load Xen device state

2024-05-13 Thread Fabiano Rosas

Markus Armbruster  writes:

> Functions that use an Error **errp parameter to return errors should
> not also report them to the user, because reporting is the caller's
> job.  When the caller does, the error is reported twice.  When it
> doesn't (because it recovered from the error), there is no error to
> report, i.e. the report is bogus.
>
> qmp_xen_save_devices_state() and qmp_xen_load_devices_state() violate
> this principle: they call qemu_save_device_state() and
> qemu_loadvm_state(), which call error_report_err().
>
> I wish I could clean this up now, but migration's error reporting is
> too complicated (confused?) for me to mess with it.
>
> Instead, I'm merely improving the error reported by
> qmp_xen_load_devices_state() and qmp_xen_load_devices_state() to the
> QMP core from
>
> An IO error has occurred
>
> to
> saving Xen device state failed
>
> and
>
> loading Xen device state failed
>
> respectively.
>
> Signed-off-by: Markus Armbruster 

Acked-by: Fabiano Rosas

Re: Intention to work on GSoC project

2024-05-13 Thread Sahil

Hi,

On Monday, May 13, 2024 7:53:40 PM GMT+5:30 Eugenio Perez Martin wrote:
> [...]
> > I have started working on implementing packed virtqueue support in
> > vhost-shadow-virtqueue.c. The changes I have made so far are very
> > minimal. I have one confusion as well.
> > 
> > In "vhost_svq_add()" [1], a structure of type "VhostShadowVirtqueue"
> > is being used. My initial idea was to create a whole new structure (eg:
> > VhostShadowVirtqueuePacked). But I realized that "VhostShadowVirtqueue"
> > is being used in a lot of other places such as in "struct vhost_vdpa" [2]
> > (in "vhost-vdpa.h"). So maybe this isn't a good idea.
> > 
> > The problem is that "VhostShadowVirtqueue" has a member of type "struct
> > vring" [3] which represents a split virtqueue [4]. My idea now is to
> > instead wrap this member in a union so that the struct would look
> > something like this.
> > 
> > struct VhostShadowVirtqueue {
> > union {
> > struct vring vring;
> > struct packed_vring vring;
> > }
> > ...
> > }
> > 
> > I am not entirely sure if this is a good idea. It is similar to what's
> > been done in linux's "drivers/virtio/virtio_ring.c" ("struct
> > vring_virtqueue" [5]).
> > 
> > I thought I would ask this first before continuing further.
> 
> That's right, this second option makes perfect sense.
> 
> VhostShadowVirtqueue should abstract both split and packed. You'll see
> that some members are reused, while others are only used in one
> version so they are placed after a union. They should follow the same
> pattern, although it is not a problem if we need to divert a little
> bit from the kernel's code.
> 

Understood, thank you for the reply.

Thanks,
Sahil

Re: Unmapping KVM Guest Memory from Host Kernel

2024-05-13 Thread Sean Christopherson

On Mon, May 13, 2024, James Gowans wrote:
> On Mon, 2024-05-13 at 08:39 -0700, Sean Christopherson wrote:
> > > Sean, you mentioned that you envision guest_memfd also supporting 
> > > non-CoCo VMs.
> > > Do you have some thoughts about how to make the above cases work in the
> > > guest_memfd context?
> > 
> > Yes.  The hand-wavy plan is to allow selectively mmap()ing guest_memfd().  
> > There
> > is a long thread[*] discussing how exactly we want to do that.  The TL;DR 
> > is that
> > the basic functionality is also straightforward; the bulk of the discussion 
> > is
> > around gup(), reclaim, page migration, etc.
> 
> I still need to read this long thread, but just a thought on the word
> "restricted" here: for MMIO the instruction can be anywhere and
> similarly the load/store MMIO data can be anywhere. Does this mean that
> for running unmodified non-CoCo VMs with guest_memfd backend that we'll
> always need to have the whole of guest memory mmapped?

Not necessarily, e.g. KVM could re-establish the direct map or mremap() 
on-demand.
There are variation on that, e.g. if ASI[*] were to ever make it's way upstream,
which is a huge if, then we could have guest_memfd mapped into a KVM-only CR3.

> I guess the idea is that this use case will still be subject to the
> normal restriction rules, but for a non-CoCo non-pKVM VM there will be 
> no restriction in practice, and userspace will need to mmap everything
> always?
> 
> It really seems yucky to need to have all of guest RAM mmapped all the
> time just for MMIO to work... But I suppose there is no way around that
> for Intel x86.

It's not just MMIO.  Nested virtualization, and more specifically shadowing 
nested
TDP, is also problematic (probably more so than MMIO).  And there are more 
cases,
i.e. we'll need a generic solution for this.  As above, there are a variety of
options, it's largely just a matter of doing the work.  I'm not saying it's a
trivial amount of work/effort, but it's far from an unsolvable problem.

Re: [PATCH v5 03/10] vfio: Extend migration_file_set_error() with Error** argument

2024-05-13 Thread Cédric Le Goater


On 5/13/24 15:14, Avihai Horon wrote:


On 06/05/2024 12:20, Cédric Le Goater wrote:

External email: Use caution opening links or attachments


Change commit title:

vfio: Extend migration_file_set_error() with Error** argument

to:

migration: Extend migration_file_set_error() with Error* argument

?


yes.
 

Other than that,
Reviewed-by: Avihai Horon 



Thanks,

C.







Use it to update the current error of the migration stream if
available and if not, simply print out the error. Next changes will
update with an error to report.

Signed-off-by: Cédric Le Goater 
---
  include/migration/misc.h | 2 +-
  hw/vfio/common.c | 2 +-
  hw/vfio/migration.c  | 4 ++--
  migration/migration.c    | 6 --
  4 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/include/migration/misc.h b/include/migration/misc.h
index 
c9e200f4eb8f8a8ab2c8b8d0e0dbf871817b94fc..8da2f6454d82046c449f034eb978e1247a9be682
 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -103,7 +103,7 @@ void migration_add_notifier_mode(NotifierWithReturn *notify,

  void migration_remove_notifier(NotifierWithReturn *notify);
  bool migration_is_running(void);
-void migration_file_set_error(int err);
+void migration_file_set_error(int ret, Error *err);

  /* True if incoming migration entered POSTCOPY_INCOMING_DISCARD */
  bool migration_in_incoming_postcopy(void);
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 
b5102f54a6474a50c6366e8fbce23812d55e384e..ed5ee6349ced78b3bde68d2ee506f78ba1a9dd9c
 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -150,7 +150,7 @@ bool vfio_viommu_preset(VFIODevice *vbasedev)
  static void vfio_set_migration_error(int err)
  {
  if (migration_is_setup_or_active()) {
-    migration_file_set_error(err);
+    migration_file_set_error(err, NULL);
  }
  }

diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 
06ae40969b6c19037e190008e14f28be646278cd..bf2fd0759ba6e4fb103cc5c1a43edb180a3d0de4
 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -726,7 +726,7 @@ static void vfio_vmstate_change_prepare(void *opaque, bool 
running,
   * Migration should be aborted in this case, but vm_state_notify()
   * currently does not support reporting failures.
   */
-    migration_file_set_error(ret);
+    migration_file_set_error(ret, NULL);
  }

  trace_vfio_vmstate_change_prepare(vbasedev->name, running,
@@ -756,7 +756,7 @@ static void vfio_vmstate_change(void *opaque, bool running, 
RunState state)
   * Migration should be aborted in this case, but vm_state_notify()
   * currently does not support reporting failures.
   */
-    migration_file_set_error(ret);
+    migration_file_set_error(ret, NULL);
  }

  trace_vfio_vmstate_change(vbasedev->name, running, RunState_str(state),
diff --git a/migration/migration.c b/migration/migration.c
index 
b5af6b5105d58f358f6d4d31694e21debd8eb81d..9c648f5ba1c0104088e37baf90d9f94fbdc21570
 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -3033,13 +3033,15 @@ static MigThrError postcopy_pause(MigrationState *s)
  }
  }

-void migration_file_set_error(int err)
+void migration_file_set_error(int ret, Error *err)
  {
  MigrationState *s = current_migration;

  WITH_QEMU_LOCK_GUARD(>qemu_file_lock) {
  if (s->to_dst_file) {
-    qemu_file_set_error(s->to_dst_file, err);
+    qemu_file_set_error_obj(s->to_dst_file, ret, err);
+    } else if (err) {
+    error_report_err(err);
  }
  }
  }
--
2.45.0

Re: [PATCH v5 02/10] vfio: Add Error** argument to vfio_devices_dma_logging_start()

2024-05-13 Thread Cédric Le Goater


On 5/13/24 15:08, Avihai Horon wrote:


On 06/05/2024 12:20, Cédric Le Goater wrote:

External email: Use caution opening links or attachments


This allows to update the Error argument of the VFIO log_global_start()
handler. Errors detected when device level logging is started will be
propagated up to qemu_savevm_state_setup() when the ram save_setup()
handler is executed.


Errors for container based logging will also be propagated now.



The vfio_set_migration_error() call becomes redundant in
vfio_devices_dma_logging_start(). Remove it.


Becomes redundant in vfio_listener_log_global_start()?



Both sentences updated.



Other than that,
Reviewed-by: Avihai Horon 




Thanks,

C.




Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Cédric Le Goater 
---

  Changes in v5:

  - Used error_setg_errno() in vfio_devices_dma_logging_start()

  hw/vfio/common.c | 26 +++---
  1 file changed, 15 insertions(+), 11 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 
485e53916491f1164d29e739fb7106c0c77df737..b5102f54a6474a50c6366e8fbce23812d55e384e
 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -1027,7 +1027,8 @@ static void vfio_device_feature_dma_logging_start_destroy(
  g_free(feature);
  }

-static int vfio_devices_dma_logging_start(VFIOContainerBase *bcontainer)
+static int vfio_devices_dma_logging_start(VFIOContainerBase *bcontainer,
+  Error **errp)
  {
  struct vfio_device_feature *feature;
  VFIODirtyRanges ranges;
@@ -1038,6 +1039,7 @@ static int 
vfio_devices_dma_logging_start(VFIOContainerBase *bcontainer)
  feature = vfio_device_feature_dma_logging_start_create(bcontainer,
 );
  if (!feature) {
+    error_setg_errno(errp, errno, "Failed to prepare DMA logging");
  return -errno;
  }

@@ -1049,8 +1051,8 @@ static int 
vfio_devices_dma_logging_start(VFIOContainerBase *bcontainer)
  ret = ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature);
  if (ret) {
  ret = -errno;
-    error_report("%s: Failed to start DMA logging, err %d (%s)",
- vbasedev->name, ret, strerror(errno));
+    error_setg_errno(errp, errno, "%s: Failed to start DMA logging",
+ vbasedev->name);
  goto out;
  }
  vbasedev->dirty_tracking = true;
@@ -1069,20 +1071,19 @@ out:
  static bool vfio_listener_log_global_start(MemoryListener *listener,
 Error **errp)
  {
+    ERRP_GUARD();
  VFIOContainerBase *bcontainer = container_of(listener, VFIOContainerBase,
   listener);
  int ret;

  if (vfio_devices_all_device_dirty_tracking(bcontainer)) {
-    ret = vfio_devices_dma_logging_start(bcontainer);
+    ret = vfio_devices_dma_logging_start(bcontainer, errp);
  } else {
-    ret = vfio_container_set_dirty_page_tracking(bcontainer, true, NULL);
+    ret = vfio_container_set_dirty_page_tracking(bcontainer, true, errp);
  }

  if (ret) {
-    error_report("vfio: Could not start dirty page tracking, err: %d (%s)",
- ret, strerror(-ret));
-    vfio_set_migration_error(ret);
+    error_prepend(errp, "vfio: Could not start dirty page tracking - ");
  }
  return !ret;
  }
@@ -1091,17 +1092,20 @@ static void 
vfio_listener_log_global_stop(MemoryListener *listener)
  {
  VFIOContainerBase *bcontainer = container_of(listener, VFIOContainerBase,
   listener);
+    Error *local_err = NULL;
  int ret = 0;

  if (vfio_devices_all_device_dirty_tracking(bcontainer)) {
  vfio_devices_dma_logging_stop(bcontainer);
  } else {
-    ret = vfio_container_set_dirty_page_tracking(bcontainer, false, NULL);
+    ret = vfio_container_set_dirty_page_tracking(bcontainer, false,
+ _err);
  }

  if (ret) {
-    error_report("vfio: Could not stop dirty page tracking, err: %d (%s)",
- ret, strerror(-ret));
+    error_prepend(_err,
+  "vfio: Could not stop dirty page tracking - ");
+    error_report_err(local_err);
  vfio_set_migration_error(ret);
  }
  }
--
2.45.0

RE: [PATCH v2] vhost-user-gpu: fix import of DMABUF

2024-05-13 Thread Kim, Dongwon

Hi Marc-André,

This commit looks good but are you planning to merge this before "ui/console: 
Private QemuDmaBuf struct"?
It will cause some conflict. Let me know if rebasing is needed on "ui/console: 
Private QemuDmaBuf struct".

> -Original Message-
> From: marcandre.lur...@redhat.com 
> Sent: Monday, May 13, 2024 4:13 AM
> To: qemu-devel@nongnu.org
> Cc: Kim, Dongwon ; Marc-André Lureau
> ; Gerd Hoffmann ;
> Michael S. Tsirkin 
> Subject: [PATCH v2] vhost-user-gpu: fix import of DMABUF
> 
> From: Marc-André Lureau 
> 
> When using vhost-user-gpu with GL, qemu -display gtk doesn't show output
> and prints: qemu: eglCreateImageKHR failed
> 
> Since commit 9ac06df8b ("virtio-gpu-udmabuf: correct naming of
> QemuDmaBuf size properties"), egl_dmabuf_import_texture() uses
> backing_{width,height} for the texture dimension.
> 
> Fixes: commit 9ac06df8b ("virtio-gpu-udmabuf: correct naming of
> QemuDmaBuf size properties")
> Signed-off-by: Marc-André Lureau 
> ---
>  hw/display/vhost-user-gpu.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/display/vhost-user-gpu.c b/hw/display/vhost-user-gpu.c index
> 709c8a02a1..96743aba8a 100644
> --- a/hw/display/vhost-user-gpu.c
> +++ b/hw/display/vhost-user-gpu.c
> @@ -273,8 +273,10 @@ vhost_user_gpu_handle_display(VhostUserGPU *g,
> VhostUserGpuMsg *msg)
>  }
>  *dmabuf = (QemuDmaBuf) {
>  .fd = fd,
> -.width = m->fd_width,
> -.height = m->fd_height,
> +.width = m->width,
> +.height = m->height,
> +.backing_width = m->fd_width,
> +.backing_height = m->fd_height,
>  .stride = m->fd_stride,
>  .fourcc = m->fd_drm_fourcc,
>  .y0_top = m->fd_flags & VIRTIO_GPU_RESOURCE_FLAG_Y_0_TOP,
> --
> 2.41.0.28.gd7d8841f67

Re: [PATCH v5 01/10] vfio: Add Error** argument to .set_dirty_page_tracking() handler

2024-05-13 Thread Cédric Le Goater


On 5/13/24 15:03, Avihai Horon wrote:

Hi Cedric,

On 06/05/2024 12:20, Cédric Le Goater wrote:

External email: Use caution opening links or attachments


We will use the Error object to improve error reporting in the
.log_global*() handlers of VFIO. Add documentation while at it.


First of all, I think commit 3688fec8923 ("memory: Add Error** argument to 
.log_global_start() handler") forgot to set errp in vfio_listener_log_global_start() 
in case of error.


yes. This is unfortunate. There has been a few respins, the series
was split and I was hoping to upstream this part sooner. My bad.


This causes a null pointer de-reference if DPT start fails.
Maybe add a fix for that in the beginning of this series, or as a stand-alone 
fix?


Since it is fixed by patch 1+2 of this series, we should be fine ?


Back to this patch, personally, I found the split of patch #1 and #2 a bit 
confusing.
Maybe consider squashing patch #1 and #2 so container based and device based 
DPT start/stop are changed in the same patch? Like you did in patch #8?
Whatever you think is better.


ok. Let's see how v5 goes. I might just send a PR with it if
no major changes are requested.



In any case:
Reviewed-by: Avihai Horon 



Thanks,

C.






Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Cédric Le Goater 
---

  Changes in v5:

  - Fixed typo in set_dirty_page_tracking documentation

  include/hw/vfio/vfio-container-base.h | 18 --
  hw/vfio/common.c  |  4 ++--
  hw/vfio/container-base.c  |  4 ++--
  hw/vfio/container.c   |  6 +++---
  4 files changed, 23 insertions(+), 9 deletions(-)

diff --git a/include/hw/vfio/vfio-container-base.h 
b/include/hw/vfio/vfio-container-base.h
index 
3582d5f97a37877b2adfc0d0b06996c82403f8b7..326ceea52a2030eec9dad289a9845866c4a8c090
 100644
--- a/include/hw/vfio/vfio-container-base.h
+++ b/include/hw/vfio/vfio-container-base.h
@@ -82,7 +82,7 @@ int vfio_container_add_section_window(VFIOContainerBase 
*bcontainer,
  void vfio_container_del_section_window(VFIOContainerBase *bcontainer,
 MemoryRegionSection *section);
  int vfio_container_set_dirty_page_tracking(VFIOContainerBase *bcontainer,
-   bool start);
+   bool start, Error **errp);
  int vfio_container_query_dirty_bitmap(const VFIOContainerBase *bcontainer,
    VFIOBitmap *vbmap,
    hwaddr iova, hwaddr size);
@@ -121,9 +121,23 @@ struct VFIOIOMMUClass {
  int (*attach_device)(const char *name, VFIODevice *vbasedev,
   AddressSpace *as, Error **errp);
  void (*detach_device)(VFIODevice *vbasedev);
+
  /* migration feature */
+
+    /**
+ * @set_dirty_page_tracking
+ *
+ * Start or stop dirty pages tracking on VFIO container
+ *
+ * @bcontainer: #VFIOContainerBase on which to de/activate dirty
+ *  page tracking
+ * @start: indicates whether to start or stop dirty pages tracking
+ * @errp: pointer to Error*, to store an error if it happens.
+ *
+ * Returns zero to indicate success and negative for error
+ */
  int (*set_dirty_page_tracking)(const VFIOContainerBase *bcontainer,
-   bool start);
+   bool start, Error **errp);
  int (*query_dirty_bitmap)(const VFIOContainerBase *bcontainer,
    VFIOBitmap *vbmap,
    hwaddr iova, hwaddr size);
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 
8f9cbdc0264044ce587877a7d19d14b28527291b..485e53916491f1164d29e739fb7106c0c77df737
 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -1076,7 +1076,7 @@ static bool vfio_listener_log_global_start(MemoryListener 
*listener,
  if (vfio_devices_all_device_dirty_tracking(bcontainer)) {
  ret = vfio_devices_dma_logging_start(bcontainer);
  } else {
-    ret = vfio_container_set_dirty_page_tracking(bcontainer, true);
+    ret = vfio_container_set_dirty_page_tracking(bcontainer, true, NULL);
  }

  if (ret) {
@@ -1096,7 +1096,7 @@ static void vfio_listener_log_global_stop(MemoryListener 
*listener)
  if (vfio_devices_all_device_dirty_tracking(bcontainer)) {
  vfio_devices_dma_logging_stop(bcontainer);
  } else {
-    ret = vfio_container_set_dirty_page_tracking(bcontainer, false);
+    ret = vfio_container_set_dirty_page_tracking(bcontainer, false, NULL);
  }

  if (ret) {
diff --git a/hw/vfio/container-base.c b/hw/vfio/container-base.c
index 
913ae49077c4f09b7b27517c1231cfbe4befb7fb..7c0764121d24b02b6c4e66e368d7dff78a6d65aa
 100644
--- a/hw/vfio/container-base.c
+++ b/hw/vfio/container-base.c
@@ -53,14 +53,14 @@ void vfio_container_del_section_window(VFIOContainerBase 
*bcontainer,
  }

  int

Re: Unmapping KVM Guest Memory from Host Kernel

2024-05-13 Thread Gowans, James

On Mon, 2024-05-13 at 08:39 -0700, Sean Christopherson wrote:
> > Sean, you mentioned that you envision guest_memfd also supporting non-CoCo 
> > VMs.
> > Do you have some thoughts about how to make the above cases work in the
> > guest_memfd context?
> 
> Yes.  The hand-wavy plan is to allow selectively mmap()ing guest_memfd().  
> There
> is a long thread[*] discussing how exactly we want to do that.  The TL;DR is 
> that
> the basic functionality is also straightforward; the bulk of the discussion is
> around gup(), reclaim, page migration, etc.

I still need to read this long thread, but just a thought on the word
"restricted" here: for MMIO the instruction can be anywhere and
similarly the load/store MMIO data can be anywhere. Does this mean that
for running unmodified non-CoCo VMs with guest_memfd backend that we'll
always need to have the whole of guest memory mmapped?

I guess the idea is that this use case will still be subject to the
normal restriction rules, but for a non-CoCo non-pKVM VM there will be 
no restriction in practice, and userspace will need to mmap everything
always?

It really seems yucky to need to have all of guest RAM mmapped all the
time just for MMIO to work... But I suppose there is no way around that
for Intel x86.

JG

> 
> [*] https://lore.kernel.org/all/zdfor3ncep3ht...@casper.infradead.org

Re: Unmapping KVM Guest Memory from Host Kernel

2024-05-13 Thread Sean Christopherson

On Mon, May 13, 2024, Patrick Roy wrote:

> For non-CoCo VMs, where memory is not encrypted, and the threat model assumes 
> a
> trusted host userspace, we would like to avoid changing the VM model so
> completely. If we adopt CoCo’s approaches where KVM / Userspace touches guest
> memory we would get all the complexity, yet none of the encryption.
> Particularly the complexity on the MMIO path seems nasty, but x86 does not

Uber nit, modern AMD CPUs do provide the byte stream, though there is at least
one related erratum.  Intel CPUs don't provide the byte stream or pre-decode in
any way.

> pre-decode instructions on MMIO exits (which are just EPT_VIOLATIONs) like it
> does for PIO exits, so I also don’t really see a way around it in the
> guest_memfd model.

...

> Sean, you mentioned that you envision guest_memfd also supporting non-CoCo 
> VMs.
> Do you have some thoughts about how to make the above cases work in the
> guest_memfd context?

Yes.  The hand-wavy plan is to allow selectively mmap()ing guest_memfd().  There
is a long thread[*] discussing how exactly we want to do that.  The TL;DR is 
that
the basic functionality is also straightforward; the bulk of the discussion is
around gup(), reclaim, page migration, etc.

[*] https://lore.kernel.org/all/zdfor3ncep3ht...@casper.infradead.org

Re: [PATCH v6 6/7] migration/multifd: implement qpl compression and decompression

2024-05-13 Thread Fabiano Rosas

Yuan Liu  writes:

> each qpl job is used to (de)compress a normal page and it can
> be processed independently by the IAA hardware. All qpl jobs
> are submitted to the hardware at once, and wait for all jobs
> completion. If hardware path(IAA) is not available, use software
> for compression and decompression.
>
> Signed-off-by: Yuan Liu 
> Reviewed-by: Nanhai Zou 
> ---
>  migration/multifd-qpl.c | 284 +++-
>  1 file changed, 280 insertions(+), 4 deletions(-)
>
> diff --git a/migration/multifd-qpl.c b/migration/multifd-qpl.c
> index 89fa51091a..9a1fddbdd0 100644
> --- a/migration/multifd-qpl.c
> +++ b/migration/multifd-qpl.c
> @@ -13,6 +13,7 @@
>  #include "qemu/osdep.h"
>  #include "qemu/module.h"
>  #include "qapi/error.h"
> +#include "exec/ramblock.h"
>  #include "migration.h"
>  #include "multifd.h"
>  #include "qpl/qpl.h"
> @@ -204,6 +205,139 @@ static void multifd_qpl_send_cleanup(MultiFDSendParams 
> *p, Error **errp)
>  p->iov = NULL;
>  }
>  
> +/**
> + * multifd_qpl_prepare_job: prepare a compression or decompression job
> + *
> + * Prepare a compression or decompression job and configure job attributes
> + * including job compression level and flags.
> + *
> + * @job: pointer to the QplData structure

qpl_job structure

> + * @is_compression: compression or decompression indication
> + * @input: pointer to the input data buffer
> + * @input_len: the length of the input data
> + * @output: pointer to the output data buffer
> + * @output_len: the size of the output data buffer
> + */
> +static void multifd_qpl_prepare_job(qpl_job *job, bool is_compression,
> +uint8_t *input, uint32_t input_len,
> +uint8_t *output, uint32_t output_len)
> +{
> +job->op = is_compression ? qpl_op_compress : qpl_op_decompress;
> +job->next_in_ptr = input;
> +job->next_out_ptr = output;
> +job->available_in = input_len;
> +job->available_out = output_len;
> +job->flags = QPL_FLAG_FIRST | QPL_FLAG_LAST | QPL_FLAG_OMIT_VERIFY;
> +/* only supports one compression level */
> +job->level = 1;
> +}
> +
> +/**
> + * multifd_qpl_build_packet: build a qpl compressed data packet
> + *
> + * The qpl compressed data packet consists of two parts, one part stores
> + * the compressed length of each page, and the other part is the compressed
> + * data of each page. The zbuf_hdr stores the compressed length of all pages,
> + * and use a separate IOV to store the compressed data of each page.
> + *
> + * @qpl: pointer to the QplData structure
> + * @p: Params for the channel that we are using
> + * @idx: The index of the compressed length array
> + * @addr: pointer to the compressed data
> + * @len: The length of the compressed data
> + */
> +static void multifd_qpl_build_packet(QplData *qpl, MultiFDSendParams *p,
> + uint32_t idx, uint8_t *addr, uint32_t 
> len)
> +{
> +qpl->zbuf_hdr[idx] = cpu_to_be32(len);
> +p->iov[p->iovs_num].iov_base = addr;
> +p->iov[p->iovs_num].iov_len = len;
> +p->iovs_num++;
> +p->next_packet_size += len;
> +}
> +
> +/**
> + * multifd_qpl_compress_pages: compress normal pages
> + *
> + * Each normal page will be compressed independently, and the compression 
> jobs
> + * will be submitted to the IAA hardware in non-blocking mode, waiting for 
> all
> + * jobs to be completed and filling the compressed length and data into the
> + * sending IOVs. If IAA device is not available, the software path is used.
> + *
> + * Returns 0 for success or -1 for error
> + *
> + * @p: Params for the channel that we are using
> + * @errp: pointer to an error
> + */
> +static int multifd_qpl_compress_pages(MultiFDSendParams *p, Error **errp)
> +{
> +qpl_status status;
> +QplData *qpl = p->compress_data;
> +MultiFDPages_t *pages = p->pages;
> +uint8_t *zbuf = qpl->zbuf;
> +uint8_t *host = pages->block->host;
> +uint32_t job_num = pages->normal_num;

A bit misleading because job_num is used in the previous patch as a
synonym for page_count. We could change the previous patch to:
multifd_qpl_init(uint32_t page_count, ...

> +qpl_job *job = NULL;
> +
> +assert(job_num <= qpl->total_job_num);
> +/* submit all compression jobs */
> +for (int i = 0; i < job_num; i++) {
> +job = qpl->job_array[i];
> +multifd_qpl_prepare_job(job, true, host + pages->offset[i],
> +p->page_size, zbuf, p->page_size - 1);

Isn't the output buffer size == page size, why the -1?

> +/* if hardware path(IAA) is unavailable, call the software path */

If we're doing the fallback automatically, isn't that what qpl_path_auto
does already? What's the difference betweeen the two approaches?

> +if (!qpl->iaa_avail) {

This function got a bit convoluted, it's probably worth a check at the
start and a branch to different multifd_qpl_compress_pages_slow()
routine

Re: [PATCH 2/6] dump/win_dump: Improve error messages on write error

2024-05-13 Thread Markus Armbruster

Philippe Mathieu-Daudé  writes:

> On 13/5/24 16:16, Markus Armbruster wrote:
>> create_win_dump() and write_run report qemu_write_full() failure to
>> their callers as
>>  An IO error has occurred
>> The errno set by qemu_write_full() is lost.
>> Improve this to
>>  win-dump: failed to write header: 
>> and
>>  win-dump: failed to save memory: 
>> This matches how dump.c reports similar errors.
>> Signed-off-by: Markus Armbruster 
>> ---
>>   dump/win_dump.c | 7 ---
>>   1 file changed, 4 insertions(+), 3 deletions(-)
>> diff --git a/dump/win_dump.c b/dump/win_dump.c
>> index b7bfaff379..0e4fe692ce 100644
>> --- a/dump/win_dump.c
>> +++ b/dump/win_dump.c
>> @@ -12,7 +12,6 @@
>>   #include "sysemu/dump.h"
>>   #include "qapi/error.h"
>>   #include "qemu/error-report.h"
>> -#include "qapi/qmp/qerror.h"
>>   #include "exec/cpu-defs.h"
>>   #include "hw/core/cpu.h"
>>   #include "qemu/win_dump_defs.h"
>> @@ -52,6 +51,7 @@ static size_t write_run(uint64_t base_page, uint64_t 
>> page_count,
>>   uint64_t addr = base_page << TARGET_PAGE_BITS;
>>   uint64_t size = page_count << TARGET_PAGE_BITS;
>>   uint64_t len, l;
>> +int eno;
>>   size_t total = 0;
>> while (size) {
>> @@ -65,9 +65,10 @@ static size_t write_run(uint64_t base_page, uint64_t 
>> page_count,
>>   }
>> l = qemu_write_full(fd, buf, len);
>> +eno = errno;
>
> Hmm this show the qemu_write_full() API isn't ideal.
> Maybe we could pass  as argument and return errno.
> There are only 20 calls.

qemu_write_full() is a drop-in replacement for write().

>>   cpu_physical_memory_unmap(buf, addr, false, len);
>>   if (l != len) {
>> -error_setg(errp, QERR_IO_ERROR);
>> +error_setg_errno(errp, eno, "win-dump: failed to save memory");
>>   return 0;
>>   }
>>   @@ -459,7 +460,7 @@ void create_win_dump(DumpState *s, Error **errp)
>> s->written_size = qemu_write_full(s->fd, h, hdr_size);
>>   if (s->written_size != hdr_size) {
>> -error_setg(errp, QERR_IO_ERROR);
>> +error_setg_errno(errp, errno, "win-dump: failed to write header");
>>   goto out_restore;
>>   }
>>

Re: [PATCH 4/6] cpus: Improve error messages on memsave, pmemsave write error

2024-05-13 Thread Philippe Mathieu-Daudé


On 13/5/24 16:45, Markus Armbruster wrote:

Philippe Mathieu-Daudé  writes:


On 13/5/24 16:17, Markus Armbruster wrote:

qmp_memsave() and qmp_pmemsave() report fwrite() error as
  An IO error has occurred
Improve this to
  writing memory to '' failed
Signed-off-by: Markus Armbruster 
---
   system/cpus.c | 6 --
   1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/system/cpus.c b/system/cpus.c
index 68d161d96b..f8fa78f33d 100644
--- a/system/cpus.c
+++ b/system/cpus.c
@@ -813,7 +813,8 @@ void qmp_memsave(int64_t addr, int64_t size, const char 
*filename,
   goto exit;
   }
   if (fwrite(buf, 1, l, f) != l) {
-error_setg(errp, QERR_IO_ERROR);
+error_setg(errp, "writing memory to '%s' failed",
+   filename);
   goto exit;
   }
   addr += l;
@@ -843,7 +844,8 @@ void qmp_pmemsave(int64_t addr, int64_t size, const char 
*filename,
   l = size;
   cpu_physical_memory_read(addr, buf, l);
   if (fwrite(buf, 1, l, f) != l) {
-error_setg(errp, QERR_IO_ERROR);
+error_setg(errp, "writing memory to '%s' failed",
+   filename);


What about including errno with error_setg_errno()?


Sure fwrite() fails with errno reliably set?  The manual page doesn't
mention it...


Indeed. I can see some uses in the code base:

qemu-io-cmds.c:409:if (ferror(f)) {
qemu-io-cmds.c-410-perror(file_name);

qga/commands-posix.c-632-write_count = fwrite(buf, 1, count, fh);
qga/commands-posix.c:633:if (ferror(fh)) {
qga/commands-posix.c-634-error_setg_errno(errp, errno, "failed 
to write to file");


util/qemu-config.c:152:if (ferror(fp)) {
util/qemu-config.c-153-loc_pop();
util/qemu-config.c-154-error_setg_errno(errp, errno, "Cannot 
read config file");


Regardless,

Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH 2/6] dump/win_dump: Improve error messages on write error

2024-05-13 Thread Philippe Mathieu-Daudé


On 13/5/24 16:48, Markus Armbruster wrote:

Philippe Mathieu-Daudé  writes:


On 13/5/24 16:16, Markus Armbruster wrote:

create_win_dump() and write_run report qemu_write_full() failure to
their callers as
  An IO error has occurred
The errno set by qemu_write_full() is lost.
Improve this to
  win-dump: failed to write header: 
and
  win-dump: failed to save memory: 
This matches how dump.c reports similar errors.
Signed-off-by: Markus Armbruster 
---
   dump/win_dump.c | 7 ---
   1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/dump/win_dump.c b/dump/win_dump.c
index b7bfaff379..0e4fe692ce 100644
--- a/dump/win_dump.c
+++ b/dump/win_dump.c
@@ -12,7 +12,6 @@
   #include "sysemu/dump.h"
   #include "qapi/error.h"
   #include "qemu/error-report.h"
-#include "qapi/qmp/qerror.h"
   #include "exec/cpu-defs.h"
   #include "hw/core/cpu.h"
   #include "qemu/win_dump_defs.h"
@@ -52,6 +51,7 @@ static size_t write_run(uint64_t base_page, uint64_t 
page_count,
   uint64_t addr = base_page << TARGET_PAGE_BITS;
   uint64_t size = page_count << TARGET_PAGE_BITS;
   uint64_t len, l;
+int eno;
   size_t total = 0;
 while (size) {
@@ -65,9 +65,10 @@ static size_t write_run(uint64_t base_page, uint64_t 
page_count,
   }
 l = qemu_write_full(fd, buf, len);
+eno = errno;


Hmm this show the qemu_write_full() API isn't ideal.
Maybe we could pass  as argument and return errno.
There are only 20 calls.


qemu_write_full() is a drop-in replacement for write().


Fine.

Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH v2 2/3] vfio/migration: Emit VFIO migration QAPI event

2024-05-13 Thread Avihai Horon




On 13/05/2024 17:43, Cédric Le Goater wrote:

External email: Use caution opening links or attachments


On 5/13/24 16:34, Avihai Horon wrote:


On 13/05/2024 17:01, Cédric Le Goater wrote:

External email: Use caution opening links or attachments


On 5/9/24 11:09, Avihai Horon wrote:
Emit VFIO migration QAPI event when a VFIO device changes its 
migration
state. This can be used by management applications to get updates 
on the

current state of the VFIO device for their own purposes.

A new per VFIO device capability, "migration-events", is added so 
events
can be enabled only for the required devices. It is disabled by 
default.


Signed-off-by: Avihai Horon 
---
  include/hw/vfio/vfio-common.h |  1 +
  hw/vfio/migration.c   | 56 
+--

  hw/vfio/pci.c |  2 ++
  3 files changed, 56 insertions(+), 3 deletions(-)

diff --git a/include/hw/vfio/vfio-common.h 
b/include/hw/vfio/vfio-common.h

index b9da6c08ef..3ec5f2425e 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -115,6 +115,7 @@ typedef struct VFIODevice {
  bool no_mmap;
  bool ram_block_discard_allowed;
  OnOffAuto enable_migration;
+    bool migration_events;
  VFIODeviceOps *ops;
  unsigned int num_irqs;
  unsigned int num_regions;
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 06ae40969b..5a359c4c78 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -24,6 +24,7 @@
  #include "migration/register.h"
  #include "migration/blocker.h"
  #include "qapi/error.h"
+#include "qapi/qapi-events-vfio.h"
  #include "exec/ramlist.h"
  #include "exec/ram_addr.h"
  #include "pci.h"
@@ -80,6 +81,55 @@ static const char *mig_state_to_str(enum 
vfio_device_mig_state state)

  }
  }

+static VfioMigrationState
+mig_state_to_qapi_state(enum vfio_device_mig_state state)
+{
+    switch (state) {
+    case VFIO_DEVICE_STATE_STOP:
+    return QAPI_VFIO_MIGRATION_STATE_STOP;
+    case VFIO_DEVICE_STATE_RUNNING:
+    return QAPI_VFIO_MIGRATION_STATE_RUNNING;
+    case VFIO_DEVICE_STATE_STOP_COPY:
+    return QAPI_VFIO_MIGRATION_STATE_STOP_COPY;
+    case VFIO_DEVICE_STATE_RESUMING:
+    return QAPI_VFIO_MIGRATION_STATE_RESUMING;
+    case VFIO_DEVICE_STATE_RUNNING_P2P:
+    return QAPI_VFIO_MIGRATION_STATE_RUNNING_P2P;
+    case VFIO_DEVICE_STATE_PRE_COPY:
+    return QAPI_VFIO_MIGRATION_STATE_PRE_COPY;
+    case VFIO_DEVICE_STATE_PRE_COPY_P2P:
+    return QAPI_VFIO_MIGRATION_STATE_PRE_COPY_P2P;
+    default:
+    g_assert_not_reached();
+    }
+}
+
+static void vfio_migration_send_event(VFIODevice *vbasedev)
+{
+    VFIOMigration *migration = vbasedev->migration;
+    DeviceState *dev = vbasedev->dev;
+    g_autofree char *qom_path = NULL;
+    Object *obj;
+
+    if (!vbasedev->migration_events) {
+    return;
+    }


I would add an assert on vbasedev->ops->vfio_get_object


+    obj = vbasedev->ops->vfio_get_object(vbasedev);


and another assert on obj.


vfio_migration_init() already checks these:

 if (!vbasedev->ops->vfio_get_object) {
 return -EINVAL;
 }

 obj = vbasedev->ops->vfio_get_object(vbasedev);
 if (!obj) {
 return -EINVAL;
 }

Do you think these checks in migration init are enough?


I am sure they are today. These extra asserts are to avoid issues if
the code is moved around or if anyone finds inspiration by reading
vfio_migration_send_event().


Ah, I see your point.

I will add the asserts then.

Thanks.


Thanks,

C.








+    qom_path = object_get_canonical_path(obj);
+
+    qapi_event_send_vfio_migration(
+    dev->id, qom_path, 
mig_state_to_qapi_state(migration->device_state));

+}
+
+static void set_state(VFIODevice *vbasedev, enum 
vfio_device_mig_state state)


to avoid the conflict with vfio_migration_set_state(), let's call it :
vfio_migration_set_device_state() ? We want a 'vfio_migration_' prefix.


Sure, I will rename to that.

Thanks.




Thanks,

C.





+{
+    VFIOMigration *migration = vbasedev->migration;
+
+    migration->device_state = state;
+    vfio_migration_send_event(vbasedev);
+}
+
  static int vfio_migration_set_state(VFIODevice *vbasedev,
  enum vfio_device_mig_state 
new_state,
  enum vfio_device_mig_state 
recover_state)
@@ -125,12 +175,12 @@ static int 
vfio_migration_set_state(VFIODevice *vbasedev,

  goto reset_device;
  }

-    migration->device_state = recover_state;
+    set_state(vbasedev, recover_state);

  return ret;
  }

-    migration->device_state = new_state;
+    set_state(vbasedev, new_state);
  if (mig_state->data_fd != -1) {
  if (migration->data_fd != -1) {
  /*
@@ -156,7 +206,7 @@ reset_device:
   strerror(errno));
  }

-    migration->device_state = VFIO_DEVICE_STATE_RUNNING;
+    set_state(vbasedev,

Re: [PATCH 4/6] cpus: Improve error messages on memsave, pmemsave write error

2024-05-13 Thread Markus Armbruster

Philippe Mathieu-Daudé  writes:

> On 13/5/24 16:17, Markus Armbruster wrote:
>> qmp_memsave() and qmp_pmemsave() report fwrite() error as
>>  An IO error has occurred
>> Improve this to
>>  writing memory to '' failed
>> Signed-off-by: Markus Armbruster 
>> ---
>>   system/cpus.c | 6 --
>>   1 file changed, 4 insertions(+), 2 deletions(-)
>> diff --git a/system/cpus.c b/system/cpus.c
>> index 68d161d96b..f8fa78f33d 100644
>> --- a/system/cpus.c
>> +++ b/system/cpus.c
>> @@ -813,7 +813,8 @@ void qmp_memsave(int64_t addr, int64_t size, const char 
>> *filename,
>>   goto exit;
>>   }
>>   if (fwrite(buf, 1, l, f) != l) {
>> -error_setg(errp, QERR_IO_ERROR);
>> +error_setg(errp, "writing memory to '%s' failed",
>> +   filename);
>>   goto exit;
>>   }
>>   addr += l;
>> @@ -843,7 +844,8 @@ void qmp_pmemsave(int64_t addr, int64_t size, const char 
>> *filename,
>>   l = size;
>>   cpu_physical_memory_read(addr, buf, l);
>>   if (fwrite(buf, 1, l, f) != l) {
>> -error_setg(errp, QERR_IO_ERROR);
>> +error_setg(errp, "writing memory to '%s' failed",
>> +   filename);
>
> What about including errno with error_setg_errno()?

Sure fwrite() fails with errno reliably set?  The manual page doesn't
mention it...


>>   goto exit;
>>   }
>>   addr += l;

Re: [PATCH v2 2/3] vfio/migration: Emit VFIO migration QAPI event

2024-05-13 Thread Cédric Le Goater


On 5/13/24 16:34, Avihai Horon wrote:


On 13/05/2024 17:01, Cédric Le Goater wrote:

External email: Use caution opening links or attachments


On 5/9/24 11:09, Avihai Horon wrote:

Emit VFIO migration QAPI event when a VFIO device changes its migration
state. This can be used by management applications to get updates on the
current state of the VFIO device for their own purposes.

A new per VFIO device capability, "migration-events", is added so events
can be enabled only for the required devices. It is disabled by default.

Signed-off-by: Avihai Horon 
---
  include/hw/vfio/vfio-common.h |  1 +
  hw/vfio/migration.c   | 56 +--
  hw/vfio/pci.c |  2 ++
  3 files changed, 56 insertions(+), 3 deletions(-)

diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index b9da6c08ef..3ec5f2425e 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -115,6 +115,7 @@ typedef struct VFIODevice {
  bool no_mmap;
  bool ram_block_discard_allowed;
  OnOffAuto enable_migration;
+    bool migration_events;
  VFIODeviceOps *ops;
  unsigned int num_irqs;
  unsigned int num_regions;
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 06ae40969b..5a359c4c78 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -24,6 +24,7 @@
  #include "migration/register.h"
  #include "migration/blocker.h"
  #include "qapi/error.h"
+#include "qapi/qapi-events-vfio.h"
  #include "exec/ramlist.h"
  #include "exec/ram_addr.h"
  #include "pci.h"
@@ -80,6 +81,55 @@ static const char *mig_state_to_str(enum 
vfio_device_mig_state state)
  }
  }

+static VfioMigrationState
+mig_state_to_qapi_state(enum vfio_device_mig_state state)
+{
+    switch (state) {
+    case VFIO_DEVICE_STATE_STOP:
+    return QAPI_VFIO_MIGRATION_STATE_STOP;
+    case VFIO_DEVICE_STATE_RUNNING:
+    return QAPI_VFIO_MIGRATION_STATE_RUNNING;
+    case VFIO_DEVICE_STATE_STOP_COPY:
+    return QAPI_VFIO_MIGRATION_STATE_STOP_COPY;
+    case VFIO_DEVICE_STATE_RESUMING:
+    return QAPI_VFIO_MIGRATION_STATE_RESUMING;
+    case VFIO_DEVICE_STATE_RUNNING_P2P:
+    return QAPI_VFIO_MIGRATION_STATE_RUNNING_P2P;
+    case VFIO_DEVICE_STATE_PRE_COPY:
+    return QAPI_VFIO_MIGRATION_STATE_PRE_COPY;
+    case VFIO_DEVICE_STATE_PRE_COPY_P2P:
+    return QAPI_VFIO_MIGRATION_STATE_PRE_COPY_P2P;
+    default:
+    g_assert_not_reached();
+    }
+}
+
+static void vfio_migration_send_event(VFIODevice *vbasedev)
+{
+    VFIOMigration *migration = vbasedev->migration;
+    DeviceState *dev = vbasedev->dev;
+    g_autofree char *qom_path = NULL;
+    Object *obj;
+
+    if (!vbasedev->migration_events) {
+    return;
+    }


I would add an assert on vbasedev->ops->vfio_get_object


+    obj = vbasedev->ops->vfio_get_object(vbasedev);


and another assert on obj.


vfio_migration_init() already checks these:

     if (!vbasedev->ops->vfio_get_object) {
     return -EINVAL;
     }

     obj = vbasedev->ops->vfio_get_object(vbasedev);
     if (!obj) {
     return -EINVAL;
     }

Do you think these checks in migration init are enough?


I am sure they are today. These extra asserts are to avoid issues if
the code is moved around or if anyone finds inspiration by reading
vfio_migration_send_event().

Thanks,

C.








+    qom_path = object_get_canonical_path(obj);
+
+    qapi_event_send_vfio_migration(
+    dev->id, qom_path, mig_state_to_qapi_state(migration->device_state));
+}
+
+static void set_state(VFIODevice *vbasedev, enum vfio_device_mig_state state)


to avoid the conflict with vfio_migration_set_state(), let's call it :
vfio_migration_set_device_state() ? We want a 'vfio_migration_' prefix.


Sure, I will rename to that.

Thanks.




Thanks,

C.





+{
+    VFIOMigration *migration = vbasedev->migration;
+
+    migration->device_state = state;
+    vfio_migration_send_event(vbasedev);
+}
+
  static int vfio_migration_set_state(VFIODevice *vbasedev,
  enum vfio_device_mig_state new_state,
  enum vfio_device_mig_state recover_state)
@@ -125,12 +175,12 @@ static int vfio_migration_set_state(VFIODevice *vbasedev,
  goto reset_device;
  }

-    migration->device_state = recover_state;
+    set_state(vbasedev, recover_state);

  return ret;
  }

-    migration->device_state = new_state;
+    set_state(vbasedev, new_state);
  if (mig_state->data_fd != -1) {
  if (migration->data_fd != -1) {
  /*
@@ -156,7 +206,7 @@ reset_device:
   strerror(errno));
  }

-    migration->device_state = VFIO_DEVICE_STATE_RUNNING;
+    set_state(vbasedev, VFIO_DEVICE_STATE_RUNNING);

  return ret;
  }
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 64780d1b79..8840602c50 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -3362,6 +3362,8 @@ static Property

Re: [PATCH 3/3] hw/watchdog/wdt_imx2: Remove redundant assignment

2024-05-13 Thread Guenter Roeck


On 5/13/24 03:11, Bernhard Beschow wrote:

The same statement is executed unconditionally right before the if statement.

Cc: Guenter Roeck 
Signed-off-by: Bernhard Beschow 

---

The duplicate line may indicate a bug. I'm not familiar with the code, so this
patch may go into the wrong direction. Please check!


Should be ok. Technically the function should not be called to start with
if the watchdog isn't running. If it is, it might be useful to trace the content
of wcr and try to determine why the timer isn't stopped if  / when the watchdog
is disabled.

Reviewed-by: Guenter Roeck 

Thanks,
Guenter


---
  hw/watchdog/wdt_imx2.c | 1 -
  1 file changed, 1 deletion(-)

diff --git a/hw/watchdog/wdt_imx2.c b/hw/watchdog/wdt_imx2.c
index 6452fc4721..f9a7ea287f 100644
--- a/hw/watchdog/wdt_imx2.c
+++ b/hw/watchdog/wdt_imx2.c
@@ -39,7 +39,6 @@ static void imx2_wdt_expired(void *opaque)
  
  /* Perform watchdog action if watchdog is enabled */

  if (s->wcr & IMX2_WDT_WCR_WDE) {
-s->wrsr = IMX2_WDT_WRSR_TOUT;
  watchdog_perform_action();
  }
  }

Re: [PATCH 1/2] hw/core: allow parameter=1 for SMP topology on any machine

2024-05-13 Thread Daniel P . Berrangé

On Mon, May 13, 2024 at 10:22:22PM +0800, Zhao Liu wrote:
> Cc Paolo for x86 topology part
> 
> Hi Daniel,
> 
> On Mon, May 13, 2024 at 01:33:57PM +0100, Daniel P. Berrangé wrote:
> > Date: Mon, 13 May 2024 13:33:57 +0100
> > From: "Daniel P. Berrangé" 
> > Subject: [PATCH 1/2] hw/core: allow parameter=1 for SMP topology on any
> >  machine
> > 
> > This effectively reverts
> > 
> >   commit 54c4ea8f3ae614054079395842128a856a73dbf9
> >   Author: Zhao Liu 
> >   Date:   Sat Mar 9 00:01:37 2024 +0800
> > 
> > hw/core/machine-smp: Deprecate unsupported "parameter=1" SMP 
> > configurations
> > 
> > but is not done as a 'git revert' since the part of the changes to the
> > file hw/core/machine-smp.c which add 'has_XXX' checks remain desirable.
> > Furthermore, we have to tweak the subsequently added unit test to
> > account for differing warning message.
> > 
> > The rationale for the original deprecation was:
> > 
> >   "Currently, it was allowed for users to specify the unsupported
> >topology parameter as "1". For example, x86 PC machine doesn't
> >support drawer/book/cluster topology levels, but user could specify
> >"-smp drawers=1,books=1,clusters=1".
> > 
> >This is meaningless and confusing, so that the support for this kind
> >of configurations is marked deprecated since 9.0."
> > 
> > There are varying POVs on the topic of 'unsupported' topology levels.
> > 
> > It is common to say that on a system without hyperthreading, that there
> > is always 1 thread. Likewise when new CPUs introduced a concept of
> > multiple "dies', it was reasonable to say that all historical CPUs
> > before that implicitly had 1 'die'. Likewise for the more recently
> > introduced 'modules' and 'clusters' parameter'. From this POV, it is
> > valid to set 'parameter=1' on the -smp command line for any machine,
> > only a value > 1 is strictly an error condition.
> 
> Currently QEMU has become more and more difficult to maintain a general
> topology hierarchy, there are two recent examples:
> 
> 1. as you mentioned "module" v.s. "cluster", one reason for introducing
> "module" is because it is difficult to define what "cluster" is for x86,
> the cluster in the device tree can be nested, then it can correspond to
> an x86 die, or it can correspond to an x86 module. Therefore, specifying
> "clusters=1" for x86 is ambiguous.
> 
> 2. s390 introduces book and drawer, which are above socket/package
> level, but for x86, the level above the package names "cluster" (yeah,
> "cluster" again :-(). So if user sets "books=1" or "drawers=1" for x86,
> then it's meaningless. Similarly, "clusters=1" is also very confusing for
> x86 machine.
> 
> I think that only thread/core/socket are architecturally general, the
> other topology levels are hard to define across architectures, then
> allowing unsupported topology=1 is always confusing...
> 
> Moreover, QEMU currently requires a clear topology containment
> relationship when defining a topology, after which it will become
> increasingly difficult to define a generic topology containment
> relationship when new topology levels are introduced in the future...

I'm failing to see what real world technical problems QEMU faces
with a parameter being set to '1' by a mgmt app, when QEMU itself
treats all omitted values as being '1' anyway.

If we're trying to faithfully model the real world, then restricting
the topology against machine types though still looks inherantly wrong.
The valid topology ought to be constrained based on the named CPU model.
eg it doesn't make sense to allow 'dies=4' with a Skylake CPU model,
only an EPYC CPU model, especially if we want to model cache info in
a way that matches the real world silicon better.

> > It doesn't cause any functional difficulty for QEMU, because internally
> > the QEMU code is itself assuming that all "unsupported" parameters
> > implicitly have a value of '1'.
> > 
> > At the libvirt level, we've allowed applications to set 'parameter=1'
> > when configuring a guest, and pass that through to QEMU.
> > 
> > Deprecating this creates extra difficulty for because there's no info
> > exposed from QEMU about which machine types "support" which parameters.
> > Thus, libvirt can't know whether it is valid to pass 'parameter=1' for
> > a given machine type, or whether it will trigger deprecation messages.
> 
> I understand that libvirt is having trouble because there is no interface
> to expose which topology levels the current machine supports. As a
> workaround to eliminate the difficulties at the libvirt level, it's
> ok for me.
> 
> But I believe deprecating the unsupported topology is necessary, so do
> you think it's acceptable to include an interface to expose the supported
> topology if it's going to be deprecated again later?

As above, I think that restrictions based on machine type, while nice and
simple, are incorrect long term. If we did impose restrictions based on
CPU model, then we could trivially expose this

Re: [PATCH v2 3/3] vfio/migration: Don't emit STOP_COPY VFIO migration QAPI event twice

2024-05-13 Thread Avihai Horon




On 13/05/2024 17:13, Cédric Le Goater wrote:

External email: Use caution opening links or attachments


On 5/9/24 11:09, Avihai Horon wrote:

When migrating a VFIO device that supports pre-copy, it is transitioned
to STOP_COPY twice: once in vfio_vmstate_change() and second time in
vfio_save_complete_precopy().

The second transition is harmless, as it's a STOP_COPY->STOP_COPY no-op
transition. However, with the newly added VFIO migration QAPI event, the
STOP_COPY event is undesirably emitted twice.

Prevent this by returning early in vfio_migration_set_state() if
new_state is the same as current device state.

Note that the STOP_COPY transition in vfio_save_complete_precopy() is
essential for VFIO devices that don't support pre-copy, for migrating an
already stopped guest and for snapshots.

Signed-off-by: Avihai Horon 
---
  hw/vfio/migration.c | 4 
  1 file changed, 4 insertions(+)

diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 5a359c4c78..14ef9c924e 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -143,6 +143,10 @@ static int vfio_migration_set_state(VFIODevice 
*vbasedev,

  (struct vfio_device_feature_mig_state *)feature->data;
  int ret;



I wonder if we should improve the trace events a little to track better
the state transitions. May be move trace_vfio_migration_set_state()
at the beginning of vfio_migration_set_state() and introduce a new
event for the currently named routine set_state() ?

This can come with followups.


Yes, this sounds good.

Thanks.



Reviewed-by: Cédric Le Goater 

Thanks,

C.



+    if (new_state == migration->device_state) {
+    return 0;
+    }
+
  feature->argsz = sizeof(buf);
  feature->flags =
  VFIO_DEVICE_FEATURE_SET | 
VFIO_DEVICE_FEATURE_MIG_DEVICE_STATE;

Re: [PATCH v2 2/3] vfio/migration: Emit VFIO migration QAPI event

2024-05-13 Thread Avihai Horon




On 13/05/2024 17:01, Cédric Le Goater wrote:

External email: Use caution opening links or attachments


On 5/9/24 11:09, Avihai Horon wrote:

Emit VFIO migration QAPI event when a VFIO device changes its migration
state. This can be used by management applications to get updates on the
current state of the VFIO device for their own purposes.

A new per VFIO device capability, "migration-events", is added so events
can be enabled only for the required devices. It is disabled by default.

Signed-off-by: Avihai Horon 
---
  include/hw/vfio/vfio-common.h |  1 +
  hw/vfio/migration.c   | 56 +--
  hw/vfio/pci.c |  2 ++
  3 files changed, 56 insertions(+), 3 deletions(-)

diff --git a/include/hw/vfio/vfio-common.h 
b/include/hw/vfio/vfio-common.h

index b9da6c08ef..3ec5f2425e 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -115,6 +115,7 @@ typedef struct VFIODevice {
  bool no_mmap;
  bool ram_block_discard_allowed;
  OnOffAuto enable_migration;
+    bool migration_events;
  VFIODeviceOps *ops;
  unsigned int num_irqs;
  unsigned int num_regions;
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 06ae40969b..5a359c4c78 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -24,6 +24,7 @@
  #include "migration/register.h"
  #include "migration/blocker.h"
  #include "qapi/error.h"
+#include "qapi/qapi-events-vfio.h"
  #include "exec/ramlist.h"
  #include "exec/ram_addr.h"
  #include "pci.h"
@@ -80,6 +81,55 @@ static const char *mig_state_to_str(enum 
vfio_device_mig_state state)

  }
  }

+static VfioMigrationState
+mig_state_to_qapi_state(enum vfio_device_mig_state state)
+{
+    switch (state) {
+    case VFIO_DEVICE_STATE_STOP:
+    return QAPI_VFIO_MIGRATION_STATE_STOP;
+    case VFIO_DEVICE_STATE_RUNNING:
+    return QAPI_VFIO_MIGRATION_STATE_RUNNING;
+    case VFIO_DEVICE_STATE_STOP_COPY:
+    return QAPI_VFIO_MIGRATION_STATE_STOP_COPY;
+    case VFIO_DEVICE_STATE_RESUMING:
+    return QAPI_VFIO_MIGRATION_STATE_RESUMING;
+    case VFIO_DEVICE_STATE_RUNNING_P2P:
+    return QAPI_VFIO_MIGRATION_STATE_RUNNING_P2P;
+    case VFIO_DEVICE_STATE_PRE_COPY:
+    return QAPI_VFIO_MIGRATION_STATE_PRE_COPY;
+    case VFIO_DEVICE_STATE_PRE_COPY_P2P:
+    return QAPI_VFIO_MIGRATION_STATE_PRE_COPY_P2P;
+    default:
+    g_assert_not_reached();
+    }
+}
+
+static void vfio_migration_send_event(VFIODevice *vbasedev)
+{
+    VFIOMigration *migration = vbasedev->migration;
+    DeviceState *dev = vbasedev->dev;
+    g_autofree char *qom_path = NULL;
+    Object *obj;
+
+    if (!vbasedev->migration_events) {
+    return;
+    }


I would add an assert on vbasedev->ops->vfio_get_object


+    obj = vbasedev->ops->vfio_get_object(vbasedev);


and another assert on obj.


vfio_migration_init() already checks these:

    if (!vbasedev->ops->vfio_get_object) {
    return -EINVAL;
    }

    obj = vbasedev->ops->vfio_get_object(vbasedev);
    if (!obj) {
    return -EINVAL;
    }

Do you think these checks in migration init are enough?




+    qom_path = object_get_canonical_path(obj);
+
+    qapi_event_send_vfio_migration(
+    dev->id, qom_path, 
mig_state_to_qapi_state(migration->device_state));

+}
+
+static void set_state(VFIODevice *vbasedev, enum 
vfio_device_mig_state state)


to avoid the conflict with vfio_migration_set_state(), let's call it :
vfio_migration_set_device_state() ? We want a 'vfio_migration_' prefix.


Sure, I will rename to that.

Thanks.




Thanks,

C.





+{
+    VFIOMigration *migration = vbasedev->migration;
+
+    migration->device_state = state;
+    vfio_migration_send_event(vbasedev);
+}
+
  static int vfio_migration_set_state(VFIODevice *vbasedev,
  enum vfio_device_mig_state 
new_state,
  enum vfio_device_mig_state 
recover_state)
@@ -125,12 +175,12 @@ static int vfio_migration_set_state(VFIODevice 
*vbasedev,

  goto reset_device;
  }

-    migration->device_state = recover_state;
+    set_state(vbasedev, recover_state);

  return ret;
  }

-    migration->device_state = new_state;
+    set_state(vbasedev, new_state);
  if (mig_state->data_fd != -1) {
  if (migration->data_fd != -1) {
  /*
@@ -156,7 +206,7 @@ reset_device:
   strerror(errno));
  }

-    migration->device_state = VFIO_DEVICE_STATE_RUNNING;
+    set_state(vbasedev, VFIO_DEVICE_STATE_RUNNING);

  return ret;
  }
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 64780d1b79..8840602c50 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -3362,6 +3362,8 @@ static Property vfio_pci_dev_properties[] = {
  VFIO_FEATURE_ENABLE_IGD_OPREGION_BIT, false),
  DEFINE_PROP_ON_OFF_AUTO("enable-migration", VFIOPCIDevice,
  vbasedev.enable_migration,

Re: [PATCH 2/4] tests/lcitool: Remove g++ from the containers (except for the MinGW one)

2024-05-13 Thread Daniel P . Berrangé

On Mon, May 13, 2024 at 04:22:00PM +0200, Thomas Huth wrote:
> On 13/05/2024 14.11, Daniel P. Berrangé wrote:
> > On Mon, May 13, 2024 at 02:05:16PM +0200, Thomas Huth wrote:
> > > On 13/05/2024 12.28, Daniel P. Berrangé wrote:
> > > > A better way to handle this would be to define a separate project
> > > > 
> > > > 'tests/lcitool/projects/qemu-win-installer.yml'
> > > > 
> > > > With
> > > > 
> > > >  packages
> > > >- g++
> > > > 
> > > > Then enable the extra project for win64
> > > > 
> > > >   generate_dockerfile("fedora-win64-cross", "fedora-38",
> > > >   project='qemu,qemu-win-installer',
> > > >   cross="mingw64",
> > > >   trailer=cross_build("x86_64-w64-mingw32-",
> > > >   "x86_64-softmmu"))
> > > > 
> > > > which should result in an identical container to what we have today
> > > > for win64, while letting us slim the other containers.
> > > 
> > > Ok, good idea! ... but then we need to teach lcitool about mingw-w64-tools
> > > first, otherwise that vss code won't get built due to the missing "widl"
> > > tool.
> > 
> > Why is that a pre-requisite ?   What i've suggested will result in a
> > Dockerfile for win64 that is 100% identical to what we already have
> > in git today. So surely that will already succeed to the same extent
> > that CI succeeds today ?
> 
> If you want to have the same result, we can also simply remove g++
> everywhere, also for the mingw cross containers, since the vss code is
> currently not built at all due to the missing widl program.

Oh, I'm getting mixed up between the qemu-setup.exe and the qga installer
exe.

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH 6/6] qerror: QERR_IO_ERROR is no longer used, drop

2024-05-13 Thread Philippe Mathieu-Daudé


On 13/5/24 16:17, Markus Armbruster wrote:

Signed-off-by: Markus Armbruster 
---
  include/qapi/qmp/qerror.h | 3 ---
  1 file changed, 3 deletions(-)


One less!

Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH 4/6] cpus: Improve error messages on memsave, pmemsave write error

2024-05-13 Thread Philippe Mathieu-Daudé


On 13/5/24 16:17, Markus Armbruster wrote:

qmp_memsave() and qmp_pmemsave() report fwrite() error as

 An IO error has occurred

Improve this to

 writing memory to '' failed

Signed-off-by: Markus Armbruster 
---
  system/cpus.c | 6 --
  1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/system/cpus.c b/system/cpus.c
index 68d161d96b..f8fa78f33d 100644
--- a/system/cpus.c
+++ b/system/cpus.c
@@ -813,7 +813,8 @@ void qmp_memsave(int64_t addr, int64_t size, const char 
*filename,
  goto exit;
  }
  if (fwrite(buf, 1, l, f) != l) {
-error_setg(errp, QERR_IO_ERROR);
+error_setg(errp, "writing memory to '%s' failed",
+   filename);
  goto exit;
  }
  addr += l;
@@ -843,7 +844,8 @@ void qmp_pmemsave(int64_t addr, int64_t size, const char 
*filename,
  l = size;
  cpu_physical_memory_read(addr, buf, l);
  if (fwrite(buf, 1, l, f) != l) {
-error_setg(errp, QERR_IO_ERROR);
+error_setg(errp, "writing memory to '%s' failed",
+   filename);


What about including errno with error_setg_errno()?


  goto exit;
  }
  addr += l;

Re: [PATCH v5 10/10] vfio: Extend vfio_set_migration_error() with Error* argument

2024-05-13 Thread Avihai Horon




On 06/05/2024 12:20, Cédric Le Goater wrote:

External email: Use caution opening links or attachments


vfio_set_migration_error() sets the 'return' error on the migration
stream if a migration is in progress. To improve error reporting, add
a new Error* argument to also set the Error object on the migration
stream, if a migration is progress.

Signed-off-by: Cédric Le Goater 
---

  Changes in v5:

  - Rebased on 20c64c8a51a4 ("migration: migration_file_set_error")

  hw/vfio/common.c | 37 ++---
  1 file changed, 18 insertions(+), 19 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 
c3d82a9d6e434e33f361e4b96157bf912d5c3a2f..4cf3e13a8439bd1b9a032e9d4e75df676eba457b
 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -147,10 +147,10 @@ bool vfio_viommu_preset(VFIODevice *vbasedev)
  return vbasedev->bcontainer->space->as != _space_memory;
  }

-static void vfio_set_migration_error(int err)
+static void vfio_set_migration_error(int ret, Error *err)
  {
  if (migration_is_setup_or_active()) {
-migration_file_set_error(err, NULL);
+migration_file_set_error(ret, err);
  }
  }

@@ -295,9 +295,10 @@ static void vfio_iommu_map_notify(IOMMUNotifier *n, 
IOMMUTLBEntry *iotlb)
  iova, iova + iotlb->addr_mask);

  if (iotlb->target_as != _space_memory) {
-error_report("Wrong target AS \"%s\", only system memory is allowed",
- iotlb->target_as->name ? iotlb->target_as->name : "none");
-vfio_set_migration_error(-EINVAL);
+error_setg(_err,
+   "Wrong target AS \"%s\", only system memory is allowed",
+   iotlb->target_as->name ? iotlb->target_as->name : "none");
+vfio_set_migration_error(-EINVAL, local_err);
  return;
  }

@@ -330,11 +331,12 @@ static void vfio_iommu_map_notify(IOMMUNotifier *n, 
IOMMUTLBEntry *iotlb)
  ret = vfio_container_dma_unmap(bcontainer, iova,
 iotlb->addr_mask + 1, iotlb);
  if (ret) {
-error_report("vfio_container_dma_unmap(%p, 0x%"HWADDR_PRIx", "
- "0x%"HWADDR_PRIx") = %d (%s)",
- bcontainer, iova,
- iotlb->addr_mask + 1, ret, strerror(-ret));
-vfio_set_migration_error(ret);
+error_setg(_err,
+   "vfio_container_dma_unmap(%p, 0x%"HWADDR_PRIx", "
+   "0x%"HWADDR_PRIx") = %d (%s)",
+   bcontainer, iova,
+   iotlb->addr_mask + 1, ret, strerror(-ret));


Use error_setg_errno()?


+vfio_set_migration_error(ret, local_err);


Now dma unmap errors (and also the error before it) are not reported if 
they happen not during migration.


This makes me think, maybe vfio_set_migration_error() is redundant and 
can be replaced by migration_file_set_error()?


Thanks.


  }
  }
  out:
@@ -1108,8 +1110,7 @@ static void vfio_listener_log_global_stop(MemoryListener 
*listener)
  if (ret) {
  error_prepend(_err,
"vfio: Could not stop dirty page tracking - ");
-error_report_err(local_err);
-vfio_set_migration_error(ret);
+vfio_set_migration_error(ret, local_err);
  }
  }

@@ -1226,14 +1227,14 @@ static void vfio_iommu_map_dirty_notify(IOMMUNotifier 
*n, IOMMUTLBEntry *iotlb)
  trace_vfio_iommu_map_dirty_notify(iova, iova + iotlb->addr_mask);

  if (iotlb->target_as != _space_memory) {
-error_report("Wrong target AS \"%s\", only system memory is allowed",
- iotlb->target_as->name ? iotlb->target_as->name : "none");
+error_setg(_err,
+   "Wrong target AS \"%s\", only system memory is allowed",
+   iotlb->target_as->name ? iotlb->target_as->name : "none");
  goto out;
  }

  rcu_read_lock();
  if (!vfio_get_xlat_addr(iotlb, NULL, _addr, NULL, _err)) 
{
-error_report_err(local_err);
  goto out_lock;
  }

@@ -1244,7 +1245,6 @@ static void vfio_iommu_map_dirty_notify(IOMMUNotifier *n, 
IOMMUTLBEntry *iotlb)
"vfio_iommu_map_dirty_notify(%p, 0x%"HWADDR_PRIx", "
"0x%"HWADDR_PRIx") failed - ", bcontainer, iova,
iotlb->addr_mask + 1);
-error_report_err(local_err);
  }

  out_lock:
@@ -1252,7 +1252,7 @@ out_lock:

  out:
  if (ret) {
-vfio_set_migration_error(ret);
+vfio_set_migration_error(ret, local_err);
  }
  }

@@ -1372,8 +1372,7 @@ static void vfio_listener_log_sync(MemoryListener 
*listener,
  if (vfio_devices_all_dirty_tracking(bcontainer)) {
  ret = vfio_sync_dirty_bitmap(bcontainer, section, _err);
  if (ret) {
-error_report_err(local_err);
-vfio_set_migration_error(ret);
+

Re: [PATCH 1/6] block: Improve error message when external snapshot can't flush

2024-05-13 Thread Philippe Mathieu-Daudé


On 13/5/24 16:16, Markus Armbruster wrote:

external_snapshot_action() reports bdrv_flush() failure to its caller
as

 An IO error has occurred

The errno code returned by bdrv_flush() is lost.

Improve this to

 Write to node '' failed: 

Signed-off-by: Markus Armbruster 
---
  blockdev.c | 6 --
  1 file changed, 4 insertions(+), 2 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé

Re: Intention to work on GSoC project

2024-05-13 Thread Eugenio Perez Martin

On Mon, May 13, 2024 at 3:49 PM Sahil  wrote:
>
> Hi,
>
> On Wednesday, May 8, 2024 8:53:12 AM GMT+5:30 Sahil wrote:
> > Hi,
> >
> > On Tuesday, May 7, 2024 12:44:33 PM IST Eugenio Perez Martin wrote:
> > > [...]
> > >
> > > > Shall I start by implementing a mechanism to check if the feature bit
> > > > "VIRTIO_F_RING_PACKED" is set (using "virtio_vdev_has_feature")? And
> > > > if it's supported, "vhost_svq_add" should call "vhost_svq_add_packed".
> > > > Following this, I can then start implementing "vhost_svq_add_packed"
> > > > and progress from there.
> > > >
> > > > What are your thoughts on this?
> > >
> > > Yes, that's totally right.
> > >
> > > I recommend you to also disable _F_EVENT_IDX to start, so the first
> > > version is easier.
> > >
> > > Also, you can send as many incomplete RFCs as you want. For example,
> > > you can send a first version that only implements reading of the guest
> > > avail ring, so we know we're aligned on that. Then, we can send
> > > subsequents RFCs adding features on top.
> >
>
> I have started working on implementing packed virtqueue support in
> vhost-shadow-virtqueue.c. The changes I have made so far are very
> minimal. I have one confusion as well.
>
> In "vhost_svq_add()" [1], a structure of type "VhostShadowVirtqueue"
> is being used. My initial idea was to create a whole new structure (eg:
> VhostShadowVirtqueuePacked). But I realized that "VhostShadowVirtqueue"
> is being used in a lot of other places such as in "struct vhost_vdpa" [2]
> (in "vhost-vdpa.h"). So maybe this isn't a good idea.
>
> The problem is that "VhostShadowVirtqueue" has a member of type "struct
> vring" [3] which represents a split virtqueue [4]. My idea now is to instead
> wrap this member in a union so that the struct would look something like
> this.
>
> struct VhostShadowVirtqueue {
> union {
> struct vring vring;
> struct packed_vring vring;
> }
> ...
> }
>
> I am not entirely sure if this is a good idea. It is similar to what's been 
> done
> in linux's "drivers/virtio/virtio_ring.c" ("struct vring_virtqueue" [5]).
>
> I thought I would ask this first before continuing further.
>

That's right, this second option makes perfect sense.

VhostShadowVirtqueue should abstract both split and packed. You'll see
that some members are reused, while others are only used in one
version so they are placed after a union. They should follow the same
pattern, although it is not a problem if we need to divert a little
bit from the kernel's code.

Thanks!

> Thanks,
> Sahil
>
> [1] 
> https://gitlab.com/qemu-project/qemu/-/blob/master/hw/virtio/vhost-shadow-virtqueue.c#L249
> [2] 
> https://gitlab.com/qemu-project/qemu/-/blob/master/include/hw/virtio/vhost-vdpa.h#L69
> [3] 
> https://gitlab.com/qemu-project/qemu/-/blob/master/hw/virtio/vhost-shadow-virtqueue.h#L52
> [4] 
> https://gitlab.com/qemu-project/qemu/-/blob/master/include/standard-headers/linux/virtio_ring.h#L156
> [5] 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/virtio/virtio_ring.c#n199
>
>

Re: [PATCH 3/6] block/vmdk: Improve error messages on extent write error

2024-05-13 Thread Philippe Mathieu-Daudé


On 13/5/24 16:17, Markus Armbruster wrote:

vmdk_init_extent() reports blk_co_pwrite() failure to its caller as

 An IO error has occurred

The errno code returned by blk_co_pwrite() is lost.

Improve this to

 failed to write VMDK : 

Signed-off-by: Markus Armbruster 
---
  block/vmdk.c | 10 +-
  1 file changed, 5 insertions(+), 5 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH 5/6] migration: Rephrase message on failure to save / load Xen device state

2024-05-13 Thread Philippe Mathieu-Daudé


On 13/5/24 16:17, Markus Armbruster wrote:

Functions that use an Error **errp parameter to return errors should
not also report them to the user, because reporting is the caller's
job.  When the caller does, the error is reported twice.  When it
doesn't (because it recovered from the error), there is no error to
report, i.e. the report is bogus.

qmp_xen_save_devices_state() and qmp_xen_load_devices_state() violate
this principle: they call qemu_save_device_state() and
qemu_loadvm_state(), which call error_report_err().

I wish I could clean this up now, but migration's error reporting is
too complicated (confused?) for me to mess with it.

Instead, I'm merely improving the error reported by
qmp_xen_load_devices_state() and qmp_xen_load_devices_state() to the
QMP core from

 An IO error has occurred

to
 saving Xen device state failed

and

 loading Xen device state failed

respectively.

Signed-off-by: Markus Armbruster 
---
  migration/savevm.c | 5 ++---
  1 file changed, 2 insertions(+), 3 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH 2/6] dump/win_dump: Improve error messages on write error

2024-05-13 Thread Philippe Mathieu-Daudé


On 13/5/24 16:16, Markus Armbruster wrote:

create_win_dump() and write_run report qemu_write_full() failure to
their callers as

 An IO error has occurred

The errno set by qemu_write_full() is lost.

Improve this to

 win-dump: failed to write header: 

and

 win-dump: failed to save memory: 

This matches how dump.c reports similar errors.

Signed-off-by: Markus Armbruster 
---
  dump/win_dump.c | 7 ---
  1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/dump/win_dump.c b/dump/win_dump.c
index b7bfaff379..0e4fe692ce 100644
--- a/dump/win_dump.c
+++ b/dump/win_dump.c
@@ -12,7 +12,6 @@
  #include "sysemu/dump.h"
  #include "qapi/error.h"
  #include "qemu/error-report.h"
-#include "qapi/qmp/qerror.h"
  #include "exec/cpu-defs.h"
  #include "hw/core/cpu.h"
  #include "qemu/win_dump_defs.h"
@@ -52,6 +51,7 @@ static size_t write_run(uint64_t base_page, uint64_t 
page_count,
  uint64_t addr = base_page << TARGET_PAGE_BITS;
  uint64_t size = page_count << TARGET_PAGE_BITS;
  uint64_t len, l;
+int eno;
  size_t total = 0;
  
  while (size) {

@@ -65,9 +65,10 @@ static size_t write_run(uint64_t base_page, uint64_t 
page_count,
  }
  
  l = qemu_write_full(fd, buf, len);

+eno = errno;


Hmm this show the qemu_write_full() API isn't ideal.
Maybe we could pass  as argument and return errno.
There are only 20 calls.


  cpu_physical_memory_unmap(buf, addr, false, len);
  if (l != len) {
-error_setg(errp, QERR_IO_ERROR);
+error_setg_errno(errp, eno, "win-dump: failed to save memory");
  return 0;
  }
  
@@ -459,7 +460,7 @@ void create_win_dump(DumpState *s, Error **errp)
  
  s->written_size = qemu_write_full(s->fd, h, hdr_size);

  if (s->written_size != hdr_size) {
-error_setg(errp, QERR_IO_ERROR);
+error_setg_errno(errp, errno, "win-dump: failed to write header");
  goto out_restore;
  }

Re: [PATCH 2/4] tests/lcitool: Remove g++ from the containers (except for the MinGW one)

2024-05-13 Thread Thomas Huth


On 13/05/2024 14.11, Daniel P. Berrangé wrote:

On Mon, May 13, 2024 at 02:05:16PM +0200, Thomas Huth wrote:

On 13/05/2024 12.28, Daniel P. Berrangé wrote:

On Mon, May 13, 2024 at 12:22:50PM +0200, Thomas Huth wrote:

We don't need C++ for the normal QEMU builds anymore, so installing
g++ in each and every container seems to be a waste of time and disk
space. The only container that still needs it is the Fedora MinGW
container that builds the only remaining C++ code in ./qga/vss-win32/
and we can install it here with an extra RUN statement instead.

This way we can also add the mingw-w64-tools package quite easily
which contains the x86_64-w64-mingw32-widl program that is required
for compiling the vss code of the guest agent (it was missing before
this change, so the VSS code was actually never compiled in the CI).

Signed-off-by: Thomas Huth 
---
   tests/lcitool/projects/qemu.yml |  1 -
   tests/lcitool/refresh   | 10 --
   2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/tests/lcitool/projects/qemu.yml b/tests/lcitool/projects/qemu.yml
index 9173d1e36e..b63b6bd850 100644
--- a/tests/lcitool/projects/qemu.yml
+++ b/tests/lcitool/projects/qemu.yml
@@ -22,7 +22,6 @@ packages:
- findutils
- flex
- fuse3
- - g++
- gcc
- gcc-native
- gcovr
diff --git a/tests/lcitool/refresh b/tests/lcitool/refresh
index 24a735a3f2..dda07ddcd1 100755
--- a/tests/lcitool/refresh
+++ b/tests/lcitool/refresh
@@ -109,6 +109,11 @@ debian12_extras = [
   "ENV QEMU_CONFIGURE_OPTS --enable-netmap\n"
   ]
+fedora_mingw_extras = [ "\n"
+"RUN nosync dnf install -y mingw64-gcc-c++ mingw-w64-tools && \\\n"
+"  ln -s /usr/bin/ccache /usr/libexec/ccache-wrappers/x86_64-w64-mingw32-c++ && 
\\\n"
+"  ln -s /usr/bin/ccache 
/usr/libexec/ccache-wrappers/x86_64-w64-mingw32-g++\n\n"
+]
   def cross_build(prefix, targets):
   conf = "ENV QEMU_CONFIGURE_OPTS --cross-prefix=%s\n" % (prefix)
@@ -193,8 +198,9 @@ try:
   generate_dockerfile("fedora-win64-cross", "fedora-38",
   cross="mingw64",
-trailer=cross_build("x86_64-w64-mingw32-",
-"x86_64-softmmu"))
+trailer="".join(fedora_mingw_extras)
++ cross_build("x86_64-w64-mingw32-",
+  "x86_64-softmmu"))
   #
   # Cirrus packages lists for GitLab


A better way to handle this would be to define a separate project

'tests/lcitool/projects/qemu-win-installer.yml'

With

 packages
   - g++

Then enable the extra project for win64

  generate_dockerfile("fedora-win64-cross", "fedora-38",
  project='qemu,qemu-win-installer',
  cross="mingw64",
  trailer=cross_build("x86_64-w64-mingw32-",
  "x86_64-softmmu"))

which should result in an identical container to what we have today
for win64, while letting us slim the other containers.


Ok, good idea! ... but then we need to teach lcitool about mingw-w64-tools
first, otherwise that vss code won't get built due to the missing "widl"
tool.


Why is that a pre-requisite ?   What i've suggested will result in a
Dockerfile for win64 that is 100% identical to what we already have
in git today. So surely that will already succeed to the same extent
that CI succeeds today ?


If you want to have the same result, we can also simply remove g++ 
everywhere, also for the mingw cross containers, since the vss code is 
currently not built at all due to the missing widl program.


 Thomas

Re: [PATCH v2 3/3] vfio/migration: Don't emit STOP_COPY VFIO migration QAPI event twice

2024-05-13 Thread Cédric Le Goater


On 5/9/24 11:09, Avihai Horon wrote:

When migrating a VFIO device that supports pre-copy, it is transitioned
to STOP_COPY twice: once in vfio_vmstate_change() and second time in
vfio_save_complete_precopy().

The second transition is harmless, as it's a STOP_COPY->STOP_COPY no-op
transition. However, with the newly added VFIO migration QAPI event, the
STOP_COPY event is undesirably emitted twice.

Prevent this by returning early in vfio_migration_set_state() if
new_state is the same as current device state.

Note that the STOP_COPY transition in vfio_save_complete_precopy() is
essential for VFIO devices that don't support pre-copy, for migrating an
already stopped guest and for snapshots.

Signed-off-by: Avihai Horon 
---
  hw/vfio/migration.c | 4 
  1 file changed, 4 insertions(+)

diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 5a359c4c78..14ef9c924e 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -143,6 +143,10 @@ static int vfio_migration_set_state(VFIODevice *vbasedev,
  (struct vfio_device_feature_mig_state *)feature->data;
  int ret;
  


I wonder if we should improve the trace events a little to track better
the state transitions. May be move trace_vfio_migration_set_state()
at the beginning of vfio_migration_set_state() and introduce a new
event for the currently named routine set_state() ?

This can come with followups.


Reviewed-by: Cédric Le Goater 

Thanks,

C.



+if (new_state == migration->device_state) {
+return 0;
+}
+
  feature->argsz = sizeof(buf);
  feature->flags =
  VFIO_DEVICE_FEATURE_SET | VFIO_DEVICE_FEATURE_MIG_DEVICE_STATE;

[PATCH 4/6] cpus: Improve error messages on memsave, pmemsave write error

2024-05-13 Thread Markus Armbruster

qmp_memsave() and qmp_pmemsave() report fwrite() error as

An IO error has occurred

Improve this to

writing memory to '' failed

Signed-off-by: Markus Armbruster 
---
 system/cpus.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/system/cpus.c b/system/cpus.c
index 68d161d96b..f8fa78f33d 100644
--- a/system/cpus.c
+++ b/system/cpus.c
@@ -813,7 +813,8 @@ void qmp_memsave(int64_t addr, int64_t size, const char 
*filename,
 goto exit;
 }
 if (fwrite(buf, 1, l, f) != l) {
-error_setg(errp, QERR_IO_ERROR);
+error_setg(errp, "writing memory to '%s' failed",
+   filename);
 goto exit;
 }
 addr += l;
@@ -843,7 +844,8 @@ void qmp_pmemsave(int64_t addr, int64_t size, const char 
*filename,
 l = size;
 cpu_physical_memory_read(addr, buf, l);
 if (fwrite(buf, 1, l, f) != l) {
-error_setg(errp, QERR_IO_ERROR);
+error_setg(errp, "writing memory to '%s' failed",
+   filename);
 goto exit;
 }
 addr += l;
-- 
2.45.0

[PATCH 1/6] block: Improve error message when external snapshot can't flush

2024-05-13 Thread Markus Armbruster

external_snapshot_action() reports bdrv_flush() failure to its caller
as

An IO error has occurred

The errno code returned by bdrv_flush() is lost.

Improve this to

Write to node '' failed: 

Signed-off-by: Markus Armbruster 
---
 blockdev.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index 08eccc9052..528db3452f 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1406,8 +1406,10 @@ static void external_snapshot_action(TransactionAction 
*action,
 }
 
 if (!bdrv_is_read_only(state->old_bs)) {
-if (bdrv_flush(state->old_bs)) {
-error_setg(errp, QERR_IO_ERROR);
+ret = bdrv_flush(state->old_bs);
+if (ret < 0) {
+error_setg_errno(errp, -ret, "Write to node '%s' failed",
+ bdrv_get_device_or_node_name(state->old_bs));
 return;
 }
 }
-- 
2.45.0

Re: [PATCH v2 13/33] plugins: Use DisasContextBase for qemu_plugin_insn_haddr

2024-05-13 Thread Philippe Mathieu-Daudé


Cc'ing Pierrick & Alex on this last one :)

On 25/4/24 01:31, Richard Henderson wrote:

We can delay the computation of haddr until the plugin
actually requests it.

Signed-off-by: Richard Henderson 
---
  include/qemu/plugin.h  |  4 
  accel/tcg/plugin-gen.c | 20 
  plugins/api.c  | 25 -
  3 files changed, 24 insertions(+), 25 deletions(-)

diff --git a/include/qemu/plugin.h b/include/qemu/plugin.h
index 03081be543..3db0e75d16 100644
--- a/include/qemu/plugin.h
+++ b/include/qemu/plugin.h
@@ -98,7 +98,6 @@ struct qemu_plugin_dyn_cb {
  /* Internal context for instrumenting an instruction */
  struct qemu_plugin_insn {
  uint64_t vaddr;
-void *haddr;
  GArray *insn_cbs;
  GArray *mem_cbs;
  uint8_t len;
@@ -119,9 +118,6 @@ struct qemu_plugin_tb {
  GPtrArray *insns;
  size_t n;
  uint64_t vaddr;
-uint64_t vaddr2;
-void *haddr1;
-void *haddr2;
  
  /* if set, the TB calls helpers that might access guest memory */

  bool mem_helper;
diff --git a/accel/tcg/plugin-gen.c b/accel/tcg/plugin-gen.c
index a4656859c6..b036773d3c 100644
--- a/accel/tcg/plugin-gen.c
+++ b/accel/tcg/plugin-gen.c
@@ -319,9 +319,6 @@ bool plugin_gen_tb_start(CPUState *cpu, const 
DisasContextBase *db)
  ret = true;
  
  ptb->vaddr = db->pc_first;

-ptb->vaddr2 = -1;
-ptb->haddr1 = db->host_addr[0];
-ptb->haddr2 = NULL;
  ptb->mem_helper = false;
  
  tcg_gen_plugin_cb(PLUGIN_GEN_FROM_TB);

@@ -363,23 +360,6 @@ void plugin_gen_insn_start(CPUState *cpu, const 
DisasContextBase *db)
  pc = db->pc_next;
  insn->vaddr = pc;
  
-/*

- * Detect page crossing to get the new host address.
- * Note that we skip this when haddr1 == NULL, e.g. when we're
- * fetching instructions from a region not backed by RAM.
- */
-if (ptb->haddr1 == NULL) {
-insn->haddr = NULL;
-} else if (is_same_page(db, db->pc_next)) {
-insn->haddr = ptb->haddr1 + pc - ptb->vaddr;
-} else {
-if (ptb->vaddr2 == -1) {
-ptb->vaddr2 = TARGET_PAGE_ALIGN(db->pc_first);
-get_page_addr_code_hostp(cpu_env(cpu), ptb->vaddr2, >haddr2);
-}
-insn->haddr = ptb->haddr2 + pc - ptb->vaddr2;
-}
-
  tcg_gen_plugin_cb(PLUGIN_GEN_FROM_INSN);
  }
  
diff --git a/plugins/api.c b/plugins/api.c

index 39895a1cb1..4b6690c7d6 100644
--- a/plugins/api.c
+++ b/plugins/api.c
@@ -242,7 +242,30 @@ uint64_t qemu_plugin_insn_vaddr(const struct 
qemu_plugin_insn *insn)
  
  void *qemu_plugin_insn_haddr(const struct qemu_plugin_insn *insn)

  {
-return insn->haddr;
+const DisasContextBase *db = tcg_ctx->plugin_db;
+vaddr page0_last = db->pc_first | ~TARGET_PAGE_MASK;
+
+if (db->fake_insn) {
+return NULL;
+}
+
+/*
+ * ??? The return value is not intended for use of host memory,
+ * but as a proxy for address space and physical address.
+ * Thus we are only interested in the first byte and do not
+ * care about spanning pages.
+ */
+if (insn->vaddr <= page0_last) {
+if (db->host_addr[0] == NULL) {
+return NULL;
+}
+return db->host_addr[0] + insn->vaddr - db->pc_first;
+} else {
+if (db->host_addr[1] == NULL) {
+return NULL;
+}
+return db->host_addr[1] + insn->vaddr - (page0_last + 1);
+}
  }
  
  char *qemu_plugin_insn_disas(const struct qemu_plugin_insn *insn)

[PATCH 0/6] error: Eliminate QERR_IO_ERROR

2024-05-13 Thread Markus Armbruster

Markus Armbruster (6):
  block: Improve error message when external snapshot can't flush
  dump/win_dump: Improve error messages on write error
  block/vmdk: Improve error messages on extent write error
  cpus: Improve error messages on memsave, pmemsave write error
  migration: Rephrase message on failure to save / load Xen device state
  qerror: QERR_IO_ERROR is no longer used, drop

 include/qapi/qmp/qerror.h |  3 ---
 block/vmdk.c  | 10 +-
 blockdev.c|  6 --
 dump/win_dump.c   |  7 ---
 migration/savevm.c|  5 ++---
 system/cpus.c |  6 --
 6 files changed, 19 insertions(+), 18 deletions(-)

-- 
2.45.0

[PATCH 2/6] dump/win_dump: Improve error messages on write error

2024-05-13 Thread Markus Armbruster

create_win_dump() and write_run report qemu_write_full() failure to
their callers as

An IO error has occurred

The errno set by qemu_write_full() is lost.

Improve this to

win-dump: failed to write header: 

and

win-dump: failed to save memory: 

This matches how dump.c reports similar errors.

Signed-off-by: Markus Armbruster 
---
 dump/win_dump.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/dump/win_dump.c b/dump/win_dump.c
index b7bfaff379..0e4fe692ce 100644
--- a/dump/win_dump.c
+++ b/dump/win_dump.c
@@ -12,7 +12,6 @@
 #include "sysemu/dump.h"
 #include "qapi/error.h"
 #include "qemu/error-report.h"
-#include "qapi/qmp/qerror.h"
 #include "exec/cpu-defs.h"
 #include "hw/core/cpu.h"
 #include "qemu/win_dump_defs.h"
@@ -52,6 +51,7 @@ static size_t write_run(uint64_t base_page, uint64_t 
page_count,
 uint64_t addr = base_page << TARGET_PAGE_BITS;
 uint64_t size = page_count << TARGET_PAGE_BITS;
 uint64_t len, l;
+int eno;
 size_t total = 0;
 
 while (size) {
@@ -65,9 +65,10 @@ static size_t write_run(uint64_t base_page, uint64_t 
page_count,
 }
 
 l = qemu_write_full(fd, buf, len);
+eno = errno;
 cpu_physical_memory_unmap(buf, addr, false, len);
 if (l != len) {
-error_setg(errp, QERR_IO_ERROR);
+error_setg_errno(errp, eno, "win-dump: failed to save memory");
 return 0;
 }
 
@@ -459,7 +460,7 @@ void create_win_dump(DumpState *s, Error **errp)
 
 s->written_size = qemu_write_full(s->fd, h, hdr_size);
 if (s->written_size != hdr_size) {
-error_setg(errp, QERR_IO_ERROR);
+error_setg_errno(errp, errno, "win-dump: failed to write header");
 goto out_restore;
 }
 
-- 
2.45.0

[PATCH 5/6] migration: Rephrase message on failure to save / load Xen device state

2024-05-13 Thread Markus Armbruster

Functions that use an Error **errp parameter to return errors should
not also report them to the user, because reporting is the caller's
job.  When the caller does, the error is reported twice.  When it
doesn't (because it recovered from the error), there is no error to
report, i.e. the report is bogus.

qmp_xen_save_devices_state() and qmp_xen_load_devices_state() violate
this principle: they call qemu_save_device_state() and
qemu_loadvm_state(), which call error_report_err().

I wish I could clean this up now, but migration's error reporting is
too complicated (confused?) for me to mess with it.

Instead, I'm merely improving the error reported by
qmp_xen_load_devices_state() and qmp_xen_load_devices_state() to the
QMP core from

An IO error has occurred

to
saving Xen device state failed

and

loading Xen device state failed

respectively.

Signed-off-by: Markus Armbruster 
---
 migration/savevm.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/migration/savevm.c b/migration/savevm.c
index 4509482ec4..a4a856982a 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -45,7 +45,6 @@
 #include "qapi/qapi-commands-migration.h"
 #include "qapi/clone-visitor.h"
 #include "qapi/qapi-builtin-visit.h"
-#include "qapi/qmp/qerror.h"
 #include "qemu/error-report.h"
 #include "sysemu/cpus.h"
 #include "exec/memory.h"
@@ -3208,7 +3207,7 @@ void qmp_xen_save_devices_state(const char *filename, 
bool has_live, bool live,
 object_unref(OBJECT(ioc));
 ret = qemu_save_device_state(f);
 if (ret < 0 || qemu_fclose(f) < 0) {
-error_setg(errp, QERR_IO_ERROR);
+error_setg(errp, "saving Xen device state failed");
 } else {
 /* libxl calls the QMP command "stop" before calling
  * "xen-save-devices-state" and in case of migration failure, libxl
@@ -3257,7 +3256,7 @@ void qmp_xen_load_devices_state(const char *filename, 
Error **errp)
 ret = qemu_loadvm_state(f);
 qemu_fclose(f);
 if (ret < 0) {
-error_setg(errp, QERR_IO_ERROR);
+error_setg(errp, "loading Xen device state failed");
 }
 migration_incoming_state_destroy();
 }
-- 
2.45.0

[PATCH 3/6] block/vmdk: Improve error messages on extent write error

2024-05-13 Thread Markus Armbruster

vmdk_init_extent() reports blk_co_pwrite() failure to its caller as

An IO error has occurred

The errno code returned by blk_co_pwrite() is lost.

Improve this to

failed to write VMDK : 

Signed-off-by: Markus Armbruster 
---
 block/vmdk.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/block/vmdk.c b/block/vmdk.c
index 3b82979fdf..78f6433607 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -28,7 +28,6 @@
 #include "block/block_int.h"
 #include "sysemu/block-backend.h"
 #include "qapi/qmp/qdict.h"
-#include "qapi/qmp/qerror.h"
 #include "qemu/error-report.h"
 #include "qemu/module.h"
 #include "qemu/option.h"
@@ -2278,12 +2277,12 @@ vmdk_init_extent(BlockBackend *blk, int64_t filesize, 
bool flat, bool compress,
 /* write all the data */
 ret = blk_co_pwrite(blk, 0, sizeof(magic), , 0);
 if (ret < 0) {
-error_setg(errp, QERR_IO_ERROR);
+error_setg_errno(errp, -ret, "failed to write VMDK magic");
 goto exit;
 }
 ret = blk_co_pwrite(blk, sizeof(magic), sizeof(header), , 0);
 if (ret < 0) {
-error_setg(errp, QERR_IO_ERROR);
+error_setg_errno(errp, -ret, "failed to write VMDK header");
 goto exit;
 }
 
@@ -2303,7 +2302,7 @@ vmdk_init_extent(BlockBackend *blk, int64_t filesize, 
bool flat, bool compress,
 ret = blk_co_pwrite(blk, le64_to_cpu(header.rgd_offset) * BDRV_SECTOR_SIZE,
 gd_buf_size, gd_buf, 0);
 if (ret < 0) {
-error_setg(errp, QERR_IO_ERROR);
+error_setg_errno(errp, -ret, "failed to write VMDK grain directory");
 goto exit;
 }
 
@@ -2315,7 +2314,8 @@ vmdk_init_extent(BlockBackend *blk, int64_t filesize, 
bool flat, bool compress,
 ret = blk_co_pwrite(blk, le64_to_cpu(header.gd_offset) * BDRV_SECTOR_SIZE,
 gd_buf_size, gd_buf, 0);
 if (ret < 0) {
-error_setg(errp, QERR_IO_ERROR);
+error_setg_errno(errp, -ret,
+ "failed to write VMDK backup grain directory");
 }
 
 ret = 0;
-- 
2.45.0

[PATCH 6/6] qerror: QERR_IO_ERROR is no longer used, drop

2024-05-13 Thread Markus Armbruster

Signed-off-by: Markus Armbruster 
---
 include/qapi/qmp/qerror.h | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/include/qapi/qmp/qerror.h b/include/qapi/qmp/qerror.h
index 00b18e9082..bc9116f76a 100644
--- a/include/qapi/qmp/qerror.h
+++ b/include/qapi/qmp/qerror.h
@@ -20,9 +20,6 @@
 #define QERR_INVALID_PARAMETER_VALUE \
 "Parameter '%s' expects %s"
 
-#define QERR_IO_ERROR \
-"An IO error has occurred"
-
 #define QERR_MISSING_PARAMETER \
 "Parameter '%s' is missing"
 
-- 
2.45.0

Re: [PATCH v2 15/45] target/hppa: Use umax in do_ibranch_priv

2024-05-13 Thread Philippe Mathieu-Daudé


On 13/5/24 15:23, Richard Henderson wrote:

On 5/13/24 13:18, Philippe Mathieu-Daudé wrote:

Hi Richard,

On 13/5/24 09:46, Richard Henderson wrote:

Signed-off-by: Richard Henderson 
---
  target/hppa/translate.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/hppa/translate.c b/target/hppa/translate.c
index ae66068123..22935f4645 100644
--- a/target/hppa/translate.c
+++ b/target/hppa/translate.c
@@ -1981,7 +1981,7 @@ static TCGv_i64 do_ibranch_priv(DisasContext 
*ctx, TCGv_i64 offset)

  dest = tcg_temp_new_i64();
  tcg_gen_andi_i64(dest, offset, -4);
  tcg_gen_ori_i64(dest, dest, ctx->privilege);
-    tcg_gen_movcond_i64(TCG_COND_GTU, dest, dest, offset, dest, 
offset);

+    tcg_gen_umax_i64(dest, dest, offset);


Isn't tcg_gen_umax_i64(dest, dest, offset) equal to:

 tcg_gen_movcond_i64(TCG_COND_GEU, dest, dest, offset, dest, offset);

?


Yes, but I think it is clearer to use max.


OK, maybe mention it in commit description to clear doubts?


At some point we might add min/max opcodes to tcg too.


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH v2 1/3] qapi/vfio: Add VFIO migration QAPI event

2024-05-13 Thread Cédric Le Goater


On 5/9/24 11:09, Avihai Horon wrote:

Add a new QAPI event for VFIO migration. This event will be emitted when
a VFIO device changes its migration state, for example, during migration
or when stopping/starting the guest.

This event can be used by management applications to get updates on the
current state of the VFIO device for their own purposes.

Note that this new event is introduced since VFIO devices have a unique
set of migration states which cannot be described as accurately by other
existing events such as run state or migration status.

Signed-off-by: Avihai Horon 


LGTM,

Reviewed-by: Cédric Le Goater 

Thanks,

C.



---
  MAINTAINERS   |  1 +
  qapi/qapi-schema.json |  1 +
  qapi/vfio.json| 67 +++
  qapi/meson.build  |  1 +
  4 files changed, 70 insertions(+)
  create mode 100644 qapi/vfio.json

diff --git a/MAINTAINERS b/MAINTAINERS
index 84391777db..b5f1de459e 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2160,6 +2160,7 @@ F: hw/vfio/*
  F: include/hw/vfio/
  F: docs/igd-assign.txt
  F: docs/devel/migration/vfio.rst
+F: qapi/vfio.json
  
  vfio-ccw

  M: Eric Farman 
diff --git a/qapi/qapi-schema.json b/qapi/qapi-schema.json
index 5e33da7228..b1581988e4 100644
--- a/qapi/qapi-schema.json
+++ b/qapi/qapi-schema.json
@@ -78,5 +78,6 @@
  { 'include': 'pci.json' }
  { 'include': 'stats.json' }
  { 'include': 'virtio.json' }
+{ 'include': 'vfio.json' }
  { 'include': 'cryptodev.json' }
  { 'include': 'cxl.json' }
diff --git a/qapi/vfio.json b/qapi/vfio.json
new file mode 100644
index 00..a0e5013188
--- /dev/null
+++ b/qapi/vfio.json
@@ -0,0 +1,67 @@
+# -*- Mode: Python -*-
+# vim: filetype=python
+#
+
+##
+# = VFIO devices
+##
+
+##
+# @VfioMigrationState:
+#
+# An enumeration of the VFIO device migration states.
+#
+# @stop: The device is stopped.
+#
+# @running: The device is running.
+#
+# @stop-copy: The device is stopped and its internal state is available
+# for reading.
+#
+# @resuming: The device is stopped and its internal state is available
+# for writing.
+#
+# @running-p2p: The device is running in the P2P quiescent state.
+#
+# @pre-copy: The device is running, tracking its internal state and its
+# internal state is available for reading.
+#
+# @pre-copy-p2p: The device is running in the P2P quiescent state,
+# tracking its internal state and its internal state is available
+# for reading.
+#
+# Since: 9.1
+##
+{ 'enum': 'VfioMigrationState',
+  'data': [ 'stop', 'running', 'stop-copy', 'resuming', 'running-p2p',
+'pre-copy', 'pre-copy-p2p' ],
+  'prefix': 'QAPI_VFIO_MIGRATION_STATE' }
+
+##
+# @VFIO_MIGRATION:
+#
+# This event is emitted when a VFIO device migration state is changed.
+#
+# @device-id: The device's id, if it has one.
+#
+# @qom-path: The device's QOM path.
+#
+# @device-state: The new changed device migration state.
+#
+# Since: 9.1
+#
+# Example:
+#
+# <- { "timestamp": { "seconds": 1713771323, "microseconds": 212268 },
+#  "event": "VFIO_MIGRATION",
+#  "data": {
+#  "device-id": "vfio_dev1",
+#  "qom-path": "/machine/peripheral/vfio_dev1",
+#  "device-state": "stop" } }
+##
+{ 'event': 'VFIO_MIGRATION',
+  'data': {
+  'device-id': 'str',
+  'qom-path': 'str',
+  'device-state': 'VfioMigrationState'
+  } }
diff --git a/qapi/meson.build b/qapi/meson.build
index c92af6e063..e7bc54e5d0 100644
--- a/qapi/meson.build
+++ b/qapi/meson.build
@@ -52,6 +52,7 @@ qapi_all_modules = [
'stats',
'trace',
'transaction',
+  'vfio',
'virtio',
'yank',
  ]

Re: [PATCH 1/2] hw/core: allow parameter=1 for SMP topology on any machine

2024-05-13 Thread Zhao Liu

Cc Paolo for x86 topology part

Hi Daniel,

On Mon, May 13, 2024 at 01:33:57PM +0100, Daniel P. Berrangé wrote:
> Date: Mon, 13 May 2024 13:33:57 +0100
> From: "Daniel P. Berrangé" 
> Subject: [PATCH 1/2] hw/core: allow parameter=1 for SMP topology on any
>  machine
> 
> This effectively reverts
> 
>   commit 54c4ea8f3ae614054079395842128a856a73dbf9
>   Author: Zhao Liu 
>   Date:   Sat Mar 9 00:01:37 2024 +0800
> 
> hw/core/machine-smp: Deprecate unsupported "parameter=1" SMP 
> configurations
> 
> but is not done as a 'git revert' since the part of the changes to the
> file hw/core/machine-smp.c which add 'has_XXX' checks remain desirable.
> Furthermore, we have to tweak the subsequently added unit test to
> account for differing warning message.
> 
> The rationale for the original deprecation was:
> 
>   "Currently, it was allowed for users to specify the unsupported
>topology parameter as "1". For example, x86 PC machine doesn't
>support drawer/book/cluster topology levels, but user could specify
>"-smp drawers=1,books=1,clusters=1".
> 
>This is meaningless and confusing, so that the support for this kind
>of configurations is marked deprecated since 9.0."
> 
> There are varying POVs on the topic of 'unsupported' topology levels.
> 
> It is common to say that on a system without hyperthreading, that there
> is always 1 thread. Likewise when new CPUs introduced a concept of
> multiple "dies', it was reasonable to say that all historical CPUs
> before that implicitly had 1 'die'. Likewise for the more recently
> introduced 'modules' and 'clusters' parameter'. From this POV, it is
> valid to set 'parameter=1' on the -smp command line for any machine,
> only a value > 1 is strictly an error condition.

Currently QEMU has become more and more difficult to maintain a general
topology hierarchy, there are two recent examples:

1. as you mentioned "module" v.s. "cluster", one reason for introducing
"module" is because it is difficult to define what "cluster" is for x86,
the cluster in the device tree can be nested, then it can correspond to
an x86 die, or it can correspond to an x86 module. Therefore, specifying
"clusters=1" for x86 is ambiguous.

2. s390 introduces book and drawer, which are above socket/package
level, but for x86, the level above the package names "cluster" (yeah,
"cluster" again :-(). So if user sets "books=1" or "drawers=1" for x86,
then it's meaningless. Similarly, "clusters=1" is also very confusing for
x86 machine.

I think that only thread/core/socket are architecturally general, the
other topology levels are hard to define across architectures, then
allowing unsupported topology=1 is always confusing...

Moreover, QEMU currently requires a clear topology containment
relationship when defining a topology, after which it will become
increasingly difficult to define a generic topology containment
relationship when new topology levels are introduced in the future...

> It doesn't cause any functional difficulty for QEMU, because internally
> the QEMU code is itself assuming that all "unsupported" parameters
> implicitly have a value of '1'.
> 
> At the libvirt level, we've allowed applications to set 'parameter=1'
> when configuring a guest, and pass that through to QEMU.
> 
> Deprecating this creates extra difficulty for because there's no info
> exposed from QEMU about which machine types "support" which parameters.
> Thus, libvirt can't know whether it is valid to pass 'parameter=1' for
> a given machine type, or whether it will trigger deprecation messages.

I understand that libvirt is having trouble because there is no interface
to expose which topology levels the current machine supports. As a
workaround to eliminate the difficulties at the libvirt level, it's
ok for me.

But I believe deprecating the unsupported topology is necessary, so do
you think it's acceptable to include an interface to expose the supported
topology if it's going to be deprecated again later?

Regards,
Zhao

Re: [PATCH v2 2/3] vfio/migration: Emit VFIO migration QAPI event

2024-05-13 Thread Cédric Le Goater


On 5/9/24 11:09, Avihai Horon wrote:

Emit VFIO migration QAPI event when a VFIO device changes its migration
state. This can be used by management applications to get updates on the
current state of the VFIO device for their own purposes.

A new per VFIO device capability, "migration-events", is added so events
can be enabled only for the required devices. It is disabled by default.

Signed-off-by: Avihai Horon 
---
  include/hw/vfio/vfio-common.h |  1 +
  hw/vfio/migration.c   | 56 +--
  hw/vfio/pci.c |  2 ++
  3 files changed, 56 insertions(+), 3 deletions(-)

diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index b9da6c08ef..3ec5f2425e 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -115,6 +115,7 @@ typedef struct VFIODevice {
  bool no_mmap;
  bool ram_block_discard_allowed;
  OnOffAuto enable_migration;
+bool migration_events;
  VFIODeviceOps *ops;
  unsigned int num_irqs;
  unsigned int num_regions;
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 06ae40969b..5a359c4c78 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -24,6 +24,7 @@
  #include "migration/register.h"
  #include "migration/blocker.h"
  #include "qapi/error.h"
+#include "qapi/qapi-events-vfio.h"
  #include "exec/ramlist.h"
  #include "exec/ram_addr.h"
  #include "pci.h"
@@ -80,6 +81,55 @@ static const char *mig_state_to_str(enum 
vfio_device_mig_state state)
  }
  }
  
+static VfioMigrationState

+mig_state_to_qapi_state(enum vfio_device_mig_state state)
+{
+switch (state) {
+case VFIO_DEVICE_STATE_STOP:
+return QAPI_VFIO_MIGRATION_STATE_STOP;
+case VFIO_DEVICE_STATE_RUNNING:
+return QAPI_VFIO_MIGRATION_STATE_RUNNING;
+case VFIO_DEVICE_STATE_STOP_COPY:
+return QAPI_VFIO_MIGRATION_STATE_STOP_COPY;
+case VFIO_DEVICE_STATE_RESUMING:
+return QAPI_VFIO_MIGRATION_STATE_RESUMING;
+case VFIO_DEVICE_STATE_RUNNING_P2P:
+return QAPI_VFIO_MIGRATION_STATE_RUNNING_P2P;
+case VFIO_DEVICE_STATE_PRE_COPY:
+return QAPI_VFIO_MIGRATION_STATE_PRE_COPY;
+case VFIO_DEVICE_STATE_PRE_COPY_P2P:
+return QAPI_VFIO_MIGRATION_STATE_PRE_COPY_P2P;
+default:
+g_assert_not_reached();
+}
+}
+
+static void vfio_migration_send_event(VFIODevice *vbasedev)
+{
+VFIOMigration *migration = vbasedev->migration;
+DeviceState *dev = vbasedev->dev;
+g_autofree char *qom_path = NULL;
+Object *obj;
+
+if (!vbasedev->migration_events) {
+return;
+}


I would add an assert on vbasedev->ops->vfio_get_object


+obj = vbasedev->ops->vfio_get_object(vbasedev);


and another assert on obj.


+qom_path = object_get_canonical_path(obj);
+
+qapi_event_send_vfio_migration(
+dev->id, qom_path, mig_state_to_qapi_state(migration->device_state));
+}
+
+static void set_state(VFIODevice *vbasedev, enum vfio_device_mig_state state)


to avoid the conflict with vfio_migration_set_state(), let's call it :
vfio_migration_set_device_state() ? We want a 'vfio_migration_' prefix.


Thanks,

C.





+{
+VFIOMigration *migration = vbasedev->migration;
+
+migration->device_state = state;
+vfio_migration_send_event(vbasedev);
+}
+
  static int vfio_migration_set_state(VFIODevice *vbasedev,
  enum vfio_device_mig_state new_state,
  enum vfio_device_mig_state recover_state)
@@ -125,12 +175,12 @@ static int vfio_migration_set_state(VFIODevice *vbasedev,
  goto reset_device;
  }
  
-migration->device_state = recover_state;

+set_state(vbasedev, recover_state);
  
  return ret;

  }
  
-migration->device_state = new_state;

+set_state(vbasedev, new_state);
  if (mig_state->data_fd != -1) {
  if (migration->data_fd != -1) {
  /*
@@ -156,7 +206,7 @@ reset_device:
   strerror(errno));
  }
  
-migration->device_state = VFIO_DEVICE_STATE_RUNNING;

+set_state(vbasedev, VFIO_DEVICE_STATE_RUNNING);
  
  return ret;

  }
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 64780d1b79..8840602c50 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -3362,6 +3362,8 @@ static Property vfio_pci_dev_properties[] = {
  VFIO_FEATURE_ENABLE_IGD_OPREGION_BIT, false),
  DEFINE_PROP_ON_OFF_AUTO("enable-migration", VFIOPCIDevice,
  vbasedev.enable_migration, ON_OFF_AUTO_AUTO),
+DEFINE_PROP_BOOL("migration-events", VFIOPCIDevice,
+ vbasedev.migration_events, false),
  DEFINE_PROP_BOOL("x-no-mmap", VFIOPCIDevice, vbasedev.no_mmap, false),
  DEFINE_PROP_BOOL("x-balloon-allowed", VFIOPCIDevice,
   vbasedev.ram_block_discard_allowed, false),

Re: [RFC PATCH v3 00/18] SMMUv3 nested translation support

2024-05-13 Thread Julien Grall


Hi Mostafa,

On 29/04/2024 04:23, Mostafa Saleh wrote:

Future improvements:
=
1) One small improvement, that I don’t think it’s worth the extra
complexity, is in case of Stage-1 TLB miss for nested translation,
we can do stage-1 walk and lookup for stage-2 TLBs, instead of
doing the full walk.

Testing

1) IOMMUFD + VFIO
Kernel: 
https://lore.kernel.org/all/cover.1683688960.git.nicol...@nvidia.com/
VMM: 
https://qemu-devel.nongnu.narkive.com/o815DqpI/rfc-v5-0-8-arm-smmuv3-emulation-support

By assigning 
“virtio-net-pci,netdev=net0,disable-legacy=on,iommu_platform=on,ats=on”,
to a guest VM (on top of QEMU guest) with VIFO and IOMMUFD.

2) Work in progress prototype I am hacking on for nesting on KVM
(this is nowhere near complete, and misses many stuff but it
doesn't require VMs/VFIO) also with virtio-net-pci and git
cloning a bunch of stuff and also observing traces.

https://android-kvm.googlesource.com/linux/+log/refs/heads/smostafa/android15-6.6-smmu-nesting-wip

I also modified the Linux driver to test with mixed granules/levels.


We have tested the series as well:

Tested-by: Julien Grall 

Cheers,

--
Julien Grall

Re: [PATCH 0/2] hw/core: revert deprecation of 'parameter=1' for SMP topology

2024-05-13 Thread Ján Tomko


On a Monday in 2024, Daniel P. Berrangé wrote:

Since QEMU 9.0, users are complaining that depecation messages are shown
for every VM libvirt starts. This is due to the newly introduced
deprecation of 'parameter=1' for -smp. This proposes reverting that, see
the 1st patch for further commentary.

Daniel P. Berrangé (2):
 hw/core: allow parameter=1 for CPU topology on any machine
 tests: add testing of parameter=1 for SMP topology

docs/about/deprecated.rst   | 14 ---
hw/core/machine-smp.c   | 82 -
tests/unit/test-smp-parse.c | 16 ++--
3 files changed, 38 insertions(+), 74 deletions(-)



Reviewed-by: Ján Tomko 

Jano


signature.asc
Description: PGP signature

Re: [PATCH v5 09/10] vfio: Also trace event failures in vfio_save_complete_precopy()

2024-05-13 Thread Avihai Horon




On 06/05/2024 12:20, Cédric Le Goater wrote:

External email: Use caution opening links or attachments


vfio_save_complete_precopy() currently returns before doing the trace
event. Change that.

Signed-off-by: Cédric Le Goater 


Reviewed-by: Avihai Horon 


---
  hw/vfio/migration.c | 3 ---
  1 file changed, 3 deletions(-)

diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 
87437490bd50321b3eb27770c932078597053746..88591695a7b61c1c623c707334c5c57f5e54c58a
 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -581,9 +581,6 @@ static int vfio_save_complete_precopy(QEMUFile *f, void 
*opaque)

  qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);
  ret = qemu_file_get_error(f);
-if (ret) {
-return ret;
-}

  trace_vfio_save_complete_precopy(vbasedev->name, ret);

--
2.45.0

Re: [PATCH v5 08/10] vfio: Add Error** argument to .get_dirty_bitmap() handler

2024-05-13 Thread Avihai Horon




On 06/05/2024 12:20, Cédric Le Goater wrote:

External email: Use caution opening links or attachments


Let the callers do the error reporting. Add documentation while at it.

Signed-off-by: Cédric Le Goater 
---

  Changes in v5:

  - Replaced error_setg() by error_setg_errno() in
vfio_devices_query_dirty_bitmap() and vfio_legacy_query_dirty_bitmap()
  - ':' -> '-' in vfio_iommu_map_dirty_notify()

  include/hw/vfio/vfio-common.h |  4 +-
  include/hw/vfio/vfio-container-base.h | 17 +++-
  hw/vfio/common.c  | 59 ++-
  hw/vfio/container-base.c  |  5 ++-
  hw/vfio/container.c   | 14 ---
  5 files changed, 68 insertions(+), 31 deletions(-)

diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 
46f88493634b5634a9c14a5caa33a463fbf2c50d..68911d36676667352e94a97895828aff4b194b57
 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -274,9 +274,9 @@ bool
  vfio_devices_all_device_dirty_tracking(const VFIOContainerBase *bcontainer);
  int vfio_devices_query_dirty_bitmap(const VFIOContainerBase *bcontainer,
  VFIOBitmap *vbmap, hwaddr iova,
-hwaddr size);
+hwaddr size, Error **errp);


Nit: while at it, can we fix the line wrap here?


  int vfio_get_dirty_bitmap(const VFIOContainerBase *bcontainer, uint64_t iova,
-  uint64_t size, ram_addr_t ram_addr);
+  uint64_t size, ram_addr_t ram_addr, Error **errp);

  /* Returns 0 on success, or a negative errno. */
  int vfio_device_get_name(VFIODevice *vbasedev, Error **errp);
diff --git a/include/hw/vfio/vfio-container-base.h 
b/include/hw/vfio/vfio-container-base.h
index 
326ceea52a2030eec9dad289a9845866c4a8c090..48c92e186231c2c2b548abed08800faff3f430a7
 100644
--- a/include/hw/vfio/vfio-container-base.h
+++ b/include/hw/vfio/vfio-container-base.h
@@ -85,7 +85,7 @@ int vfio_container_set_dirty_page_tracking(VFIOContainerBase 
*bcontainer,
 bool start, Error **errp);
  int vfio_container_query_dirty_bitmap(const VFIOContainerBase *bcontainer,
VFIOBitmap *vbmap,
-  hwaddr iova, hwaddr size);
+  hwaddr iova, hwaddr size, Error **errp);


Nit: while at it, can we fix the line wrap here?



  void vfio_container_init(VFIOContainerBase *bcontainer,
   VFIOAddressSpace *space,
@@ -138,9 +138,22 @@ struct VFIOIOMMUClass {
   */
  int (*set_dirty_page_tracking)(const VFIOContainerBase *bcontainer,
 bool start, Error **errp);
+/**
+ * @query_dirty_bitmap
+ *
+ * Get list of dirty pages from container


s/list/bitmap?


+ *
+ * @bcontainer: #VFIOContainerBase from which to get dirty pages
+ * @vbmap: #VFIOBitmap internal bitmap structure
+ * @iova: iova base address
+ * @size: size of iova range
+ * @errp: pointer to Error*, to store an error if it happens.
+ *
+ * Returns zero to indicate success and negative for error
+ */
  int (*query_dirty_bitmap)(const VFIOContainerBase *bcontainer,
VFIOBitmap *vbmap,
-  hwaddr iova, hwaddr size);
+  hwaddr iova, hwaddr size, Error **errp);
  /* PCI specific */
  int (*pci_hot_reset)(VFIODevice *vbasedev, bool single);

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 
da748563eb33843e93631a5240759964f33162f2..c3d82a9d6e434e33f361e4b96157bf912d5c3a2f
 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -1141,7 +1141,7 @@ static int vfio_device_dma_logging_report(VFIODevice 
*vbasedev, hwaddr iova,

  int vfio_devices_query_dirty_bitmap(const VFIOContainerBase *bcontainer,
  VFIOBitmap *vbmap, hwaddr iova,
-hwaddr size)
+hwaddr size, Error **errp)


Nit: while at it, can we fix the line wrap here?


  {
  VFIODevice *vbasedev;
  int ret;
@@ -1150,10 +1150,10 @@ int vfio_devices_query_dirty_bitmap(const 
VFIOContainerBase *bcontainer,
  ret = vfio_device_dma_logging_report(vbasedev, iova, size,
   vbmap->bitmap);
  if (ret) {
-error_report("%s: Failed to get DMA logging report, iova: "
- "0x%" HWADDR_PRIx ", size: 0x%" HWADDR_PRIx
- ", err: %d (%s)",
- vbasedev->name, iova, size, ret, strerror(-ret));
+error_setg_errno(errp, -ret,
+ "%s: Failed to get DMA logging report, iova: "
+ "0x%" HWADDR_PRIx ", size: 0x%" HWADDR_PRIx,
+

Re: [PATCH 3/3] hw/misc: Implement mailbox properties for customer OTP and device specific private keys

2024-05-13 Thread Philippe Mathieu-Daudé


On 10/5/24 16:10, Rayhan Faizel wrote:

Four mailbox properties are implemented as follows:
1. Customer OTP: GET_CUSTOMER_OTP and SET_CUSTOMER_OTP
2. Device-specific private key: GET_PRIVATE_KEY and
SET_PRIVATE_KEY.

The customer OTP is located in the rows 36-43. The device-specific private key
is located in the rows 56-63.


Better to define these instead of using magic values in the code,
i.e.:

  #define OTP_PRIVATE_KEY_OFFSET 56
  #define OTP_PRIVATE_KEY_LENGTH 8


The customer OTP can be locked with the magic numbers 0x 0xaffe
when running the SET_CUSTOMER_OTP mailbox command.

P.S I am not sure if the magic lock combo applies to the private key as well.

Signed-off-by: Rayhan Faizel 
---
  hw/arm/bcm2835_peripherals.c |  2 +
  hw/misc/bcm2835_property.c   | 71 
  include/hw/arm/raspberrypi-fw-defs.h |  2 +
  include/hw/misc/bcm2835_property.h   |  2 +
  4 files changed, 77 insertions(+)




+/* Device-specific private key */
+
+case RPI_FWREQ_GET_PRIVATE_KEY:
+start_num = ldl_le_phys(>dma_as, value + 12);
+number = ldl_le_phys(>dma_as, value + 16);
+
+resplen = 8 + 4 * number;
+
+for (n = start_num; n < start_num + number && n < 8; n++) {
+stl_le_phys(>dma_as,
+value + 20 + ((n - start_num) << 2),
+bcm2835_otp_read_row(s->otp, 56 + n));
+}
+break;
+case RPI_FWREQ_SET_PRIVATE_KEY:
+start_num = ldl_le_phys(>dma_as, value + 12);
+number = ldl_le_phys(>dma_as, value + 16);
+
+resplen = 4;
+
+for (n = start_num; n < start_num + number && n < 8; n++) {
+otp_row = ldl_le_phys(>dma_as,
+  value + 20 + ((n - start_num) << 2));
+bcm2835_otp_write_row(s->otp, 56 + n, otp_row);
+}
+break;

Re: Intention to work on GSoC project

2024-05-13 Thread Sahil

Hi,

On Wednesday, May 8, 2024 8:53:12 AM GMT+5:30 Sahil wrote:
> Hi,
> 
> On Tuesday, May 7, 2024 12:44:33 PM IST Eugenio Perez Martin wrote:
> > [...]
> > 
> > > Shall I start by implementing a mechanism to check if the feature bit
> > > "VIRTIO_F_RING_PACKED" is set (using "virtio_vdev_has_feature")? And
> > > if it's supported, "vhost_svq_add" should call "vhost_svq_add_packed".
> > > Following this, I can then start implementing "vhost_svq_add_packed"
> > > and progress from there.
> > > 
> > > What are your thoughts on this?
> > 
> > Yes, that's totally right.
> > 
> > I recommend you to also disable _F_EVENT_IDX to start, so the first
> > version is easier.
> > 
> > Also, you can send as many incomplete RFCs as you want. For example,
> > you can send a first version that only implements reading of the guest
> > avail ring, so we know we're aligned on that. Then, we can send
> > subsequents RFCs adding features on top.
>

I have started working on implementing packed virtqueue support in
vhost-shadow-virtqueue.c. The changes I have made so far are very
minimal. I have one confusion as well.

In "vhost_svq_add()" [1], a structure of type "VhostShadowVirtqueue"
is being used. My initial idea was to create a whole new structure (eg:
VhostShadowVirtqueuePacked). But I realized that "VhostShadowVirtqueue"
is being used in a lot of other places such as in "struct vhost_vdpa" [2]
(in "vhost-vdpa.h"). So maybe this isn't a good idea.

The problem is that "VhostShadowVirtqueue" has a member of type "struct
vring" [3] which represents a split virtqueue [4]. My idea now is to instead
wrap this member in a union so that the struct would look something like
this.

struct VhostShadowVirtqueue {
union {
struct vring vring;
struct packed_vring vring;
}
...
}

I am not entirely sure if this is a good idea. It is similar to what's been done
in linux's "drivers/virtio/virtio_ring.c" ("struct vring_virtqueue" [5]).

I thought I would ask this first before continuing further.

Thanks,
Sahil

[1] 
https://gitlab.com/qemu-project/qemu/-/blob/master/hw/virtio/vhost-shadow-virtqueue.c#L249
[2] 
https://gitlab.com/qemu-project/qemu/-/blob/master/include/hw/virtio/vhost-vdpa.h#L69
[3] 
https://gitlab.com/qemu-project/qemu/-/blob/master/hw/virtio/vhost-shadow-virtqueue.h#L52
[4] 
https://gitlab.com/qemu-project/qemu/-/blob/master/include/standard-headers/linux/virtio_ring.h#L156
[5] 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/virtio/virtio_ring.c#n199

Re: [PATCH 1/3] hw/nvram: Add BCM2835 OTP device

2024-05-13 Thread Philippe Mathieu-Daudé


On 10/5/24 16:10, Rayhan Faizel wrote:

The OTP device registers are currently stubbed. For now, the device
houses the OTP rows which will be accessed directly by other peripherals.

Signed-off-by: Rayhan Faizel 
---
  hw/nvram/bcm2835_otp.c | 187 +
  hw/nvram/meson.build   |   1 +
  include/hw/nvram/bcm2835_otp.h |  43 
  3 files changed, 231 insertions(+)
  create mode 100644 hw/nvram/bcm2835_otp.c
  create mode 100644 include/hw/nvram/bcm2835_otp.h




+/* OTP rows are 1-indexed */
+uint32_t bcm2835_otp_read_row(BCM2835OTPState *s, unsigned int row)
+{
+assert(row <= 66 && row >= 1);
+
+return s->otp_rows[row - 1];
+}
+
+void bcm2835_otp_write_row(BCM2835OTPState *s, unsigned int row,
+   uint32_t value)
+{
+assert(row <= 66 && row >= 1);
+
+/* Real OTP rows work as e-fuses */
+s->otp_rows[row - 1] |= value;


Maybe name get/set instead of read/write?


+}

Re: [PATCH v5 07/10] memory: Add Error** argument to memory_get_xlat_addr()

2024-05-13 Thread Avihai Horon




On 06/05/2024 12:20, Cédric Le Goater wrote:

External email: Use caution opening links or attachments


Let the callers do the reporting. This will be useful in
vfio_iommu_map_dirty_notify().

Cc: "Michael S. Tsirkin" 
Cc: Paolo Bonzini 
Cc: David Hildenbrand 
Reviewed-by: Peter Xu 
Signed-off-by: Cédric Le Goater 
---
  include/exec/memory.h  | 15 ++-
  hw/vfio/common.c   | 13 +
  hw/virtio/vhost-vdpa.c |  5 -
  system/memory.c| 10 +-
  4 files changed, 32 insertions(+), 11 deletions(-)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 
dadb5cd65ab58b4868fcae06b4e301f0ecb0c1d2..2c45051b7b419c48b4e14c25f4d16a99ccd23996
 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -774,9 +774,22 @@ void 
ram_discard_manager_register_listener(RamDiscardManager *rdm,
  void ram_discard_manager_unregister_listener(RamDiscardManager *rdm,
   RamDiscardListener *rdl);

+/**
+ * memory_get_xlat_addr: Extract addresses from a TLB entry
+ *
+ * @iotlb: pointer to an #IOMMUTLBEntry
+ * @vaddr: virtual addressf


Nit: s/addressf/address

Thanks.


+ * @ram_addr: RAM address
+ * @read_only: indicates if writes are allowed
+ * @mr_has_discard_manager: indicates memory is controlled by a
+ *  RamDiscardManager
+ * @errp: pointer to Error*, to store an error if it happens.
+ *
+ * Return: true on success, else false setting @errp with error.
+ */
  bool memory_get_xlat_addr(IOMMUTLBEntry *iotlb, void **vaddr,
ram_addr_t *ram_addr, bool *read_only,
-  bool *mr_has_discard_manager);
+  bool *mr_has_discard_manager, Error **errp);

  typedef struct CoalescedMemoryRange CoalescedMemoryRange;
  typedef struct MemoryRegionIoeventfd MemoryRegionIoeventfd;
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 
b929bb0b7ac60dcef34c0d5a098d5d91f75501dd..da748563eb33843e93631a5240759964f33162f2
 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -253,12 +253,13 @@ static bool 
vfio_listener_skipped_section(MemoryRegionSection *section)

  /* Called with rcu_read_lock held.  */
  static bool vfio_get_xlat_addr(IOMMUTLBEntry *iotlb, void **vaddr,
-   ram_addr_t *ram_addr, bool *read_only)
+   ram_addr_t *ram_addr, bool *read_only,
+   Error **errp)
  {
  bool ret, mr_has_discard_manager;

  ret = memory_get_xlat_addr(iotlb, vaddr, ram_addr, read_only,
-   _has_discard_manager);
+   _has_discard_manager, errp);
  if (ret && mr_has_discard_manager) {
  /*
   * Malicious VMs might trigger discarding of IOMMU-mapped memory. The
@@ -288,6 +289,7 @@ static void vfio_iommu_map_notify(IOMMUNotifier *n, 
IOMMUTLBEntry *iotlb)
  hwaddr iova = iotlb->iova + giommu->iommu_offset;
  void *vaddr;
  int ret;
+Error *local_err = NULL;

  trace_vfio_iommu_map_notify(iotlb->perm == IOMMU_NONE ? "UNMAP" : "MAP",
  iova, iova + iotlb->addr_mask);
@@ -304,7 +306,8 @@ static void vfio_iommu_map_notify(IOMMUNotifier *n, 
IOMMUTLBEntry *iotlb)
  if ((iotlb->perm & IOMMU_RW) != IOMMU_NONE) {
  bool read_only;

-if (!vfio_get_xlat_addr(iotlb, , NULL, _only)) {
+if (!vfio_get_xlat_addr(iotlb, , NULL, _only, _err)) {
+error_report_err(local_err);
  goto out;
  }
  /*
@@ -1213,6 +1216,7 @@ static void vfio_iommu_map_dirty_notify(IOMMUNotifier *n, 
IOMMUTLBEntry *iotlb)
  VFIOContainerBase *bcontainer = giommu->bcontainer;
  hwaddr iova = iotlb->iova + giommu->iommu_offset;
  ram_addr_t translated_addr;
+Error *local_err = NULL;
  int ret = -EINVAL;

  trace_vfio_iommu_map_dirty_notify(iova, iova + iotlb->addr_mask);
@@ -1224,7 +1228,8 @@ static void vfio_iommu_map_dirty_notify(IOMMUNotifier *n, 
IOMMUTLBEntry *iotlb)
  }

  rcu_read_lock();
-if (!vfio_get_xlat_addr(iotlb, NULL, _addr, NULL)) {
+if (!vfio_get_xlat_addr(iotlb, NULL, _addr, NULL, _err)) {
+error_report_err(local_err);
  goto out_lock;
  }

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 
e827b9175fc61f1ef419e48d90a440b00449312a..ed99ab87457d8f31b98ace960713f48d47b27102
 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -208,6 +208,7 @@ static void vhost_vdpa_iommu_map_notify(IOMMUNotifier *n, 
IOMMUTLBEntry *iotlb)
  void *vaddr;
  int ret;
  Int128 llend;
+Error *local_err = NULL;

  if (iotlb->target_as != _space_memory) {
  error_report("Wrong target AS \"%s\", only system memory is allowed",
@@ -227,7 +228,9 @@ static void vhost_vdpa_iommu_map_notify(IOMMUNotifier *n, 
IOMMUTLBEntry *iotlb)
  if ((iotlb->perm & IOMMU_RW) != IOMMU_NONE) {
  bool read_only;

-

Re: [PATCH v2 0/4] Fix "virtio-gpu: fix scanout migration post-load"

2024-05-13 Thread Fiona Ebner

Am 13.05.24 um 15:21 schrieb Marc-André Lureau:
> 
> Indeed, it needs:
> 
> diff --git a/hw/display/virtio-gpu.c b/hw/display/virtio-gpu.c
> index 5de90bb62f..3a88eb5e3a 100644
> --- a/hw/display/virtio-gpu.c
> +++ b/hw/display/virtio-gpu.c
> @@ -1201,7 +1201,7 @@ static const VMStateDescription
> vmstate_virtio_gpu_scanout = {
> 
>  static const VMStateDescription vmstate_virtio_gpu_scanouts = {
>  .name = "virtio-gpu-scanouts",
> -.version_id = 1,
> +.version_id = 2,
> 
> 

Thanks! With that on top:

Tested-by: Fiona Ebner 

Tested with an Ubuntu 23.10 VM:

Machine type pc-i440fx-8.2:
1. create snapshot with 8.2, load with patched 9.0
2. create snapshot with patched 9.0, load with patched 9.0 and with 8.2

Machine type pc-i440fx-9.0:
1. create snapshot with patched 9.0, load with patched 9.0

No crashes/failures and didn't notice any other issues 

Best Regards,
Fiona

Re: [PATCH v5 06/10] vfio: Reverse test on vfio_get_dirty_bitmap()

2024-05-13 Thread Avihai Horon




On 06/05/2024 12:20, Cédric Le Goater wrote:

External email: Use caution opening links or attachments


Title should be: Reverse test on vfio_get_xlat_addr()?


It will simplify the changes coming after.

Signed-off-by: Cédric Le Goater 
---
  hw/vfio/common.c | 22 +-
  1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 
ed5ee6349ced78b3bde68d2ee506f78ba1a9dd9c..b929bb0b7ac60dcef34c0d5a098d5d91f75501dd
 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -1224,16 +1224,20 @@ static void vfio_iommu_map_dirty_notify(IOMMUNotifier 
*n, IOMMUTLBEntry *iotlb)
  }

  rcu_read_lock();
-if (vfio_get_xlat_addr(iotlb, NULL, _addr, NULL)) {
-ret = vfio_get_dirty_bitmap(bcontainer, iova, iotlb->addr_mask + 1,
-translated_addr);
-if (ret) {
-error_report("vfio_iommu_map_dirty_notify(%p, 0x%"HWADDR_PRIx", "
- "0x%"HWADDR_PRIx") = %d (%s)",
- bcontainer, iova, iotlb->addr_mask + 1, ret,
- strerror(-ret));
-}
+if (!vfio_get_xlat_addr(iotlb, NULL, _addr, NULL)) {
+goto out_lock;
  }
+
+ret = vfio_get_dirty_bitmap(bcontainer, iova, iotlb->addr_mask + 1,
+translated_addr);
+if (ret) {
+error_report("vfio_iommu_map_dirty_notify(%p, 0x%"HWADDR_PRIx", "
+ "0x%"HWADDR_PRIx") = %d (%s)",
+ bcontainer, iova, iotlb->addr_mask + 1, ret,
+ strerror(-ret));
+}
+
+out_lock:


s/out_lock/out_unlock?

With the above,

Reviewed-by: Avihai Horon 


  rcu_read_unlock();

  out:
--
2.45.0

Re: [PATCH 2/3] hw/arm: Connect OTP device to BCM2835

2024-05-13 Thread Philippe Mathieu-Daudé


Hi Rayhan,

On 10/5/24 16:10, Rayhan Faizel wrote:

Signed-off-by: Rayhan Faizel 
---
  hw/arm/bcm2835_peripherals.c | 13 -
  include/hw/arm/bcm2835_peripherals.h |  3 ++-
  2 files changed, 14 insertions(+), 2 deletions(-)




@@ -500,7 +512,6 @@ void bcm_soc_peripherals_common_realize(DeviceState *dev, 
Error **errp)
  create_unimp(s, >i2s, "bcm2835-i2s", I2S_OFFSET, 0x100);
  create_unimp(s, >smi, "bcm2835-smi", SMI_OFFSET, 0x100);
  create_unimp(s, >bscsl, "bcm2835-spis", BSC_SL_OFFSET, 0x100);
-create_unimp(s, >otp, "bcm2835-otp", OTP_OFFSET, 0x80);


Maybe worth noting in the description, before we were covering a range
of 0x80 and now 0x28, so a range of 0x58 I/O ends in RAM. Maybe better
keep a region of 0x80 in the previous patch?

Flatview diff:

(qemu) info mtree -f
FlatView #0
 AS "memory", root: system
 Root memory region: system
  -3f002fff (prio 0, ram): ram
  ...
- 3f20f000-3f20f07f (prio -1000, i/o): bcm2835-otp
- 3f20f080-3f211fff (prio 0, ram): ram @3f20f080
+ 3f20f000-3f20f027 (prio 0, i/o): bcm2835-otp
+ 3f20f028-3f211fff (prio 0, ram): ram @3f20f028

FlatView #3
 Root memory region: bcm2835-gpu
  -3fff (prio 0, ram): ram
  4000-7e002fff (prio 0, ram): ram
  ...
- 7e20f000-7e20f07f (prio -1000, i/o): bcm2835-otp
- 7e20f080-7e211fff (prio 0, ram): ram @3e20f080
+ 7e20f000-7e20f027 (prio 0, i/o): bcm2835-otp
+ 7e20f028-7e211fff (prio 0, ram): ram @3e20f028


  create_unimp(s, >dbus, "bcm2835-dbus", DBUS_OFFSET, 0x8000);
  create_unimp(s, >ave0, "bcm2835-ave0", AVE0_OFFSET, 0x8000);
  create_unimp(s, >v3d, "bcm2835-v3d", V3D_OFFSET, 0x1000);

Re: [PATCH v5 05/10] vfio: Add Error** argument to .vfio_save_config() handler

2024-05-13 Thread Avihai Horon




On 06/05/2024 12:20, Cédric Le Goater wrote:

External email: Use caution opening links or attachments


Use vmstate_save_state_with_err() to improve error reporting in the
callers and store a reported error under the migration stream. Add
documentation while at it.

Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Cédric Le Goater 
---
  include/hw/vfio/vfio-common.h | 25 -
  hw/vfio/migration.c   | 18 --
  hw/vfio/pci.c |  5 +++--
  3 files changed, 39 insertions(+), 9 deletions(-)

diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 
b9da6c08ef41174610eb92726c590309a53696a3..46f88493634b5634a9c14a5caa33a463fbf2c50d
 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -133,7 +133,30 @@ struct VFIODeviceOps {
  int (*vfio_hot_reset_multi)(VFIODevice *vdev);
  void (*vfio_eoi)(VFIODevice *vdev);
  Object *(*vfio_get_object)(VFIODevice *vdev);
-void (*vfio_save_config)(VFIODevice *vdev, QEMUFile *f);
+
+/**
+ * @vfio_save_config
+ *
+ * Save device config state
+ *
+ * @vdev: #VFIODevice for which to save the config
+ * @f: #QEMUFile where to send the data
+ * @errp: pointer to Error*, to store an error if it happens.
+ *
+ * Returns zero to indicate success and negative for error
+ */
+int (*vfio_save_config)(VFIODevice *vdev, QEMUFile *f, Error **errp);
+
+/**
+ * @vfio_load_config
+ *
+ * Load device config state
+ *
+ * @vdev: #VFIODevice for which to load the config
+ * @f: #QEMUFile where to get the data
+ *
+ * Returns zero to indicate success and negative for error
+ */
  int (*vfio_load_config)(VFIODevice *vdev, QEMUFile *f);
  };

diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 
9b6375c949f7a8dca857ead2506855f63fa051e4..87437490bd50321b3eb27770c932078597053746
 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -189,14 +189,19 @@ static int vfio_load_buffer(QEMUFile *f, VFIODevice 
*vbasedev,
  return ret;
  }

-static int vfio_save_device_config_state(QEMUFile *f, void *opaque)
+static int vfio_save_device_config_state(QEMUFile *f, void *opaque,
+ Error **errp)
  {
  VFIODevice *vbasedev = opaque;
+int ret;

  qemu_put_be64(f, VFIO_MIG_FLAG_DEV_CONFIG_STATE);

  if (vbasedev->ops && vbasedev->ops->vfio_save_config) {
-vbasedev->ops->vfio_save_config(vbasedev, f);
+ret = vbasedev->ops->vfio_save_config(vbasedev, f, errp);
+if (ret) {
+return ret;
+}
  }

  qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);


Below we have:

return qemu_file_get_error(f);

Need to set errp in case of error.


@@ -588,13 +593,14 @@ static int vfio_save_complete_precopy(QEMUFile *f, void 
*opaque)
  static void vfio_save_state(QEMUFile *f, void *opaque)
  {
  VFIODevice *vbasedev = opaque;
+Error *local_err = NULL;
  int ret;

-ret = vfio_save_device_config_state(f, opaque);
+ret = vfio_save_device_config_state(f, opaque, _err);
  if (ret) {
-error_report("%s: Failed to save device config space",
- vbasedev->name);
-qemu_file_set_error(f, ret);
+error_prepend(_err, "%s: Failed to save device config space",


Add " - " ("... device config space - "), like in the other patches?

Thanks.


+  vbasedev->name);
+qemu_file_set_error_obj(f, ret, local_err);
  }
  }

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 
64780d1b793345c8e8996fe6b7987059ce831c11..fc6e54e871508bb0e2a3ac9079a195c086531f21
 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2586,11 +2586,12 @@ static const VMStateDescription vmstate_vfio_pci_config 
= {
  }
  };

-static void vfio_pci_save_config(VFIODevice *vbasedev, QEMUFile *f)
+static int vfio_pci_save_config(VFIODevice *vbasedev, QEMUFile *f, Error 
**errp)
  {
  VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);

-vmstate_save_state(f, _vfio_pci_config, vdev, NULL);
+return vmstate_save_state_with_err(f, _vfio_pci_config, vdev, NULL,
+   errp);
  }

  static int vfio_pci_load_config(VFIODevice *vbasedev, QEMUFile *f)
--
2.45.0

Re: [PATCH 1/3] hw/nvram: Add BCM2835 OTP device

2024-05-13 Thread Philippe Mathieu-Daudé


Hi Rayhan,

On 10/5/24 16:10, Rayhan Faizel wrote:

The OTP device registers are currently stubbed. For now, the device
houses the OTP rows which will be accessed directly by other peripherals.

Signed-off-by: Rayhan Faizel 
---
  hw/nvram/bcm2835_otp.c | 187 +
  hw/nvram/meson.build   |   1 +
  include/hw/nvram/bcm2835_otp.h |  43 
  3 files changed, 231 insertions(+)
  create mode 100644 hw/nvram/bcm2835_otp.c
  create mode 100644 include/hw/nvram/bcm2835_otp.h




+static void bcm2835_otp_write(void *opaque, hwaddr addr,
+  uint64_t value, unsigned int size)
+{
+switch (addr) {
+case BCM2835_OTP_BOOTMODE_REG:
+qemu_log_mask(LOG_UNIMP,
+  "bcm2835_otp: BCM2835_OTP_BOOTMODE_REG\n");
+break;
+case BCM2835_OTP_CONFIG_REG:
+qemu_log_mask(LOG_UNIMP,
+  "bcm2835_otp: BCM2835_OTP_CONFIG_REG\n");
+break;
+case BCM2835_OTP_CTRL_LO_REG:
+qemu_log_mask(LOG_UNIMP,
+  "bcm2835_otp: BCM2835_OTP_CTRL_LO_REG\n");
+break;
+case BCM2835_OTP_CTRL_HI_REG:
+qemu_log_mask(LOG_UNIMP,
+  "bcm2835_otp: BCM2835_OTP_CTRL_HI_REG\n");
+break;
+case BCM2835_OTP_STATUS_REG:
+qemu_log_mask(LOG_UNIMP,
+  "bcm2835_otp: BCM2835_OTP_STATUS_REG\n");
+break;
+case BCM2835_OTP_BITSEL_REG:
+qemu_log_mask(LOG_UNIMP,
+  "bcm2835_otp: BCM2835_OTP_BITSEL_REG\n");
+break;
+case BCM2835_OTP_DATA_REG:
+qemu_log_mask(LOG_UNIMP,
+  "bcm2835_otp: BCM2835_OTP_DATA_REG\n");
+break;
+case BCM2835_OTP_ADDR_REG:
+qemu_log_mask(LOG_UNIMP,
+  "bcm2835_otp: BCM2835_OTP_ADDR_REG\n");
+break;
+case BCM2835_OTP_WRITE_DATA_READ_REG:
+qemu_log_mask(LOG_UNIMP,
+  "bcm2835_otp: BCM2835_OTP_WRITE_DATA_READ_REG\n");
+break;
+case BCM2835_OTP_INIT_STATUS_REG:
+qemu_log_mask(LOG_UNIMP,
+  "bcm2835_otp: BCM2835_OTP_INIT_STATUS_REG\n");
+break;
+default:
+qemu_log_mask(LOG_GUEST_ERROR,
+  "%s: Bad offset 0x%" HWADDR_PRIx "\n", __func__, addr);
+}
+}
+
+static const MemoryRegionOps bcm2835_otp_ops = {
+.read = bcm2835_otp_read,
+.write = bcm2835_otp_write,
+.endianness = DEVICE_NATIVE_ENDIAN,
+.valid = {


s/valid/impl/ here, this is your implementation. It isn't illegal to
access these registers with a non 32-bit size.


+.min_access_size = 4,
+.max_access_size = 4,
+},
+};




+/* https://elinux.org/BCM2835_registers#OTP */
+#define BCM2835_OTP_BOOTMODE_REG0x00
+#define BCM2835_OTP_CONFIG_REG  0x04
+#define BCM2835_OTP_CTRL_LO_REG 0x08
+#define BCM2835_OTP_CTRL_HI_REG 0x0c
+#define BCM2835_OTP_STATUS_REG  0x10
+#define BCM2835_OTP_BITSEL_REG  0x14
+#define BCM2835_OTP_DATA_REG0x18
+#define BCM2835_OTP_ADDR_REG0x1c
+#define BCM2835_OTP_WRITE_DATA_READ_REG 0x20
+#define BCM2835_OTP_INIT_STATUS_REG 0x24

[Bug 2065579] Re: [UBUNTU 22.04] OS guest boot issues on 9p filesystem

2024-05-13 Thread Sergio Durigan Junior

Thank you for the report.

Given that this is an upstream regression and there is a related
upstream bug about it, I believe it's best to wait for their
input/feedback before moving forward.

** Also affects: qemu
   Importance: Undecided
   Status: New

** No longer affects: qemu

** Changed in: qemu (Ubuntu)
 Assignee: (unassigned) => Sergio Durigan Junior (sergiodj)

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/2065579

Title:
  [UBUNTU 22.04] OS guest boot issues on 9p filesystem

Status in Ubuntu on IBM z Systems:
  New
Status in qemu package in Ubuntu:
  New

Bug description:
  === Reported by  - 2024-05-13 03:53:01 ===

  ---Problem Description---
  OS guest boot issues on 9p filesystem due to unix domain sockets open failure
   
  Contact Information = d.herrendoer...@de.ibm.com 
   
  Machine Type = 3931-7G4 
   
  ---uname output---
  5.15.0-92-generic #102-Ubuntu SMP Wed Jan 10 09:35:24 UTC 2024 s390x s390x 
s390x GNU/Linux
   
  ---Steps to Reproduce---
   #!/bin/bash

  # Cleanup target dir
  [ -d ./target ] && rm -rf target
  mkdir target

  # Add configuration updates
  mkdir -p ./target/etc/initramfs-tools/
  echo 9p >> ./target/etc/initramfs-tools/modules
  echo 9pnet_virtio >> ./target/etc/initramfs-tools/modules

  # Add the test script
  cat > ./target/test_init << EOF
  #!/bin/bash

  echo "Test for unix domain sockets"

  nc -Ul /socket &
  sleep 1
  echo "Sockets work" | nc -UN /socket || echo "Sockets fail"

  echo o > /proc/sysrq-trigger
  sleep 999
  EOF
  chmod 700 ./target/test_init

  # Create an Ubuntu 23.10 around it
  echo "Creating Ubuntu target OS"
  debootstrap --variant=minbase\
  
--include=udev,kmod,initramfs-tools,systemd,netcat-openbsd,linux-image-generic \
  --exclude=man,bash-completion \
  mantic ./target > /dev/null || exit 1

  # Run the test in 9p forwarded filesystem
  echo "Running OS in qemu"
  qemu-system-s390x \
-m 8192 \
-smp 4 \
-nodefaults -nographic -no-reboot -no-user-config \
-kernel ./target/boot/vmlinuz \
-initrd ./target/boot/initrd.img \
-append 'root=fsRoot rw rootfstype=9p 
rootflags=trans=virtio,version=9p2000.L,msize=512000,cache=mmap,posixacl 
console=ttysclp0 init=/test_init quiet' \
-fsdev 
local,security_model=passthrough,multidevs=remap,id=fsdev-fsRoot,path=./target \
-device virtio-9p-pci,id=fsRoot,fsdev=fsdev-fsRoot,mount_tag=fsRoot \
-device virtio-serial-ccw -device sclpconsole,chardev=console \
-chardev stdio,id=console,signal=off 

   
  ---Debugger---
  A debugger is not configured

  Userspace rpm: qemu-(current).deb 
   
  Userspace tool common name: qemu 

  Userspace tool obtained from project website:  na 
   
  The userspace tool has the following bit modes: both 
   
  *Additional Instructions for d.herrendoer...@de.ibm.com:
  -Attach ltrace and strace of userspace application.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-z-systems/+bug/2065579/+subscriptions

Re: [PATCH 1/2] copy-before-write: allow specifying minimum cluster size

2024-05-13 Thread Fiona Ebner

Am 26.03.24 um 10:06 schrieb Markus Armbruster:
>> @@ -365,7 +368,13 @@ BlockCopyState *block_copy_state_new(BdrvChild *source, 
>> BdrvChild *target,
>>  
>>  GLOBAL_STATE_CODE();
>>  
>> -cluster_size = block_copy_calculate_cluster_size(target->bs, errp);
>> +if (min_cluster_size && !is_power_of_2(min_cluster_size)) {
> 
> min_cluster_size is int64_t, is_power_of_2() takes uint64_t.  Bad if
> min_cluster_size is negative.  Could this happen?
> 

No, because it comes in as a uint32_t via the QAPI (the internal caller
added by patch 2/2 from the backup code also gets the value via QAPI and
there uint32_t is used too).

---snip---

>> diff --git a/qapi/block-core.json b/qapi/block-core.json
>> index 0a72c590a8..85c8f88f6e 100644
>> --- a/qapi/block-core.json
>> +++ b/qapi/block-core.json
>> @@ -4625,12 +4625,18 @@
>>  # @on-cbw-error parameter will decide how this failure is handled.
>>  # Default 0. (Since 7.1)
>>  #
>> +# @min-cluster-size: Minimum size of blocks used by copy-before-write
>> +# operations.  Has to be a power of 2.  No effect if smaller than
>> +# the maximum of the target's cluster size and 64 KiB.  Default 0.
>> +# (Since 9.0)
>> +#
>>  # Since: 6.2
>>  ##
>>  { 'struct': 'BlockdevOptionsCbw',
>>'base': 'BlockdevOptionsGenericFormat',
>>'data': { 'target': 'BlockdevRef', '*bitmap': 'BlockDirtyBitmap',
>> -'*on-cbw-error': 'OnCbwError', '*cbw-timeout': 'uint32' } }
>> +'*on-cbw-error': 'OnCbwError', '*cbw-timeout': 'uint32',
>> +'*min-cluster-size': 'uint32' } }
> 
> Elsewhere in the schema, we use either 'int' or 'size' for cluster-size.
> Why the difference?
> 

The motivation was to disallow negative values up front and have it work
with block_copy_calculate_cluster_size(), whose result is an int64_t. If
I go with 'int', I'll have to add a check to disallow negative values.
If I go with 'size', I'll have to add a check for to disallow too large
values.

Which approach should I go with?

Best Regards,
Fiona

Re: [PATCH v2 15/45] target/hppa: Use umax in do_ibranch_priv

2024-05-13 Thread Richard Henderson


On 5/13/24 13:18, Philippe Mathieu-Daudé wrote:

Hi Richard,

On 13/5/24 09:46, Richard Henderson wrote:

Signed-off-by: Richard Henderson 
---
  target/hppa/translate.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/hppa/translate.c b/target/hppa/translate.c
index ae66068123..22935f4645 100644
--- a/target/hppa/translate.c
+++ b/target/hppa/translate.c
@@ -1981,7 +1981,7 @@ static TCGv_i64 do_ibranch_priv(DisasContext *ctx, 
TCGv_i64 offset)
  dest = tcg_temp_new_i64();
  tcg_gen_andi_i64(dest, offset, -4);
  tcg_gen_ori_i64(dest, dest, ctx->privilege);
-    tcg_gen_movcond_i64(TCG_COND_GTU, dest, dest, offset, dest, offset);
+    tcg_gen_umax_i64(dest, dest, offset);


Isn't tcg_gen_umax_i64(dest, dest, offset) equal to:

     tcg_gen_movcond_i64(TCG_COND_GEU, dest, dest, offset, dest, offset);

?


Yes, but I think it is clearer to use max.
At some point we might add min/max opcodes to tcg too.


r~

Re: [PATCH v5 04/10] vfio: Use new Error** argument in vfio_save_setup()

2024-05-13 Thread Avihai Horon




On 06/05/2024 12:20, Cédric Le Goater wrote:

External email: Use caution opening links or attachments


Nit: change commit title prefix to vfio/migration (also in other patches 
that are closely related to vfio migration)


Plus, maybe change subject to "Add an Error** argument to 
vfio_migration_set_state() and adjust callers" as it's the main subject 
of the patch?




Add an Error** argument to vfio_migration_set_state() and adjust
callers, including vfio_save_setup(). The error will be propagated up
to qemu_savevm_state_setup() where the save_setup() handler is
executed.

Modify vfio_vmstate_change_prepare() and vfio_vmstate_change() to
store a reported error under the migration stream if a migration is in
progress.

Signed-off-by: Cédric Le Goater 
---

  Changes in v5:

  - Replaced error_setg() by error_setg_errno() in vfio_migration_set_state()
  - Rebased on 20c64c8a51a4 ("migration: migration_file_set_error")

  hw/vfio/migration.c | 76 +
  1 file changed, 43 insertions(+), 33 deletions(-)

diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 
bf2fd0759ba6e4fb103cc5c1a43edb180a3d0de4..9b6375c949f7a8dca857ead2506855f63fa051e4
 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -82,7 +82,8 @@ static const char *mig_state_to_str(enum 
vfio_device_mig_state state)

  static int vfio_migration_set_state(VFIODevice *vbasedev,
  enum vfio_device_mig_state new_state,
-enum vfio_device_mig_state recover_state)
+enum vfio_device_mig_state recover_state,
+Error **errp)
  {
  VFIOMigration *migration = vbasedev->migration;
  uint64_t buf[DIV_ROUND_UP(sizeof(struct vfio_device_feature) +
@@ -102,25 +103,26 @@ static int vfio_migration_set_state(VFIODevice *vbasedev,
  ret = -errno;

  if (recover_state == VFIO_DEVICE_STATE_ERROR) {
-error_report("%s: Failed setting device state to %s, err: %s. "
- "Recover state is ERROR. Resetting device",
- vbasedev->name, mig_state_to_str(new_state),
- strerror(errno));
+error_setg_errno(errp, errno,
+ "%s: Failed setting device state to %s. "
+ "Recover state is ERROR. Resetting device",
+ vbasedev->name, mig_state_to_str(new_state));

  goto reset_device;
  }

-error_report(
-"%s: Failed setting device state to %s, err: %s. Setting device in 
recover state %s",
- vbasedev->name, mig_state_to_str(new_state),
- strerror(errno), mig_state_to_str(recover_state));
+error_setg_errno(errp, errno,
+ "%s: Failed setting device state to %s. "
+ "Setting device in recover state %s",
+ vbasedev->name, mig_state_to_str(new_state),
+ mig_state_to_str(recover_state));

  mig_state->device_state = recover_state;
  if (ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature)) {
  ret = -errno;
-error_report(
-"%s: Failed setting device in recover state, err: %s. Resetting 
device",
- vbasedev->name, strerror(errno));
+error_setg_errno(errp, errno,
+ "%s: Failed setting device in recover state. "
+ "Resetting device", vbasedev->name);


Here we set errp again when it's already set.
Maybe in this case just:

error_report_err(*errp);
*errp = NULL;
error_setg_errno(errp, errno,
 "%s: Failed setting device in recover state. "
 "Resetting device", vbasedev->name);
?



  goto reset_device;
  }
@@ -137,7 +139,7 @@ static int vfio_migration_set_state(VFIODevice *vbasedev,
   * This can happen if the device is asynchronously reset and
   * terminates a data transfer.
   */
-error_report("%s: data_fd out of sync", vbasedev->name);
+error_setg(errp, "%s: data_fd out of sync", vbasedev->name);
  close(mig_state->data_fd);

  return -EBADF;
@@ -168,10 +170,11 @@ reset_device:
   */
  static int
  vfio_migration_set_state_or_reset(VFIODevice *vbasedev,
-  enum vfio_device_mig_state new_state)
+  enum vfio_device_mig_state new_state,
+  Error **errp)
  {
  return vfio_migration_set_state(vbasedev, new_state,
-VFIO_DEVICE_STATE_ERROR);
+VFIO_DEVICE_STATE_ERROR, errp);
  }

  static int vfio_load_buffer(QEMUFile *f, VFIODevice *vbasedev,
@@ -399,10 +402,8 @@ static int

1 2 3 >

1 - 100 of 273 matches

Mail list logo