date:20221003

Re: [PATCH v2] target/sh4: Fix TB_FLAG_UNALIGN

2022-10-03 Thread Yoshinori Sato

On Mon, 03 Oct 2022 02:23:51 +0900,
Richard Henderson wrote:
> 
> Ping, or should I create a PR myself?
> 
> r~

Sorry.
I can't work this week, so please submit a PR.

> 
> On 9/1/22 07:15, Yoshinori Sato wrote:
> > On Thu, 01 Sep 2022 19:15:09 +0900,
> > Richard Henderson wrote:
> >> 
> >> The value previously chosen overlaps GUSA_MASK.
> >> 
> >> Rename all DELAY_SLOT_* and GUSA_* defines to emphasize
> >> that they are included in TB_FLAGs.  Add aliases for the
> >> FPSCR and SR bits that are included in TB_FLAGS, so that
> >> we don't accidentally reassign those bits.
> >> 
> >> Fixes: 4da06fb3062 ("target/sh4: Implement prctl_unalign_sigbus")
> >> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/856
> >> Signed-off-by: Richard Henderson 
> >> ---
> >>   target/sh4/cpu.h| 56 +
> >>   linux-user/sh4/signal.c |  6 +--
> >>   target/sh4/cpu.c|  6 +--
> >>   target/sh4/helper.c |  6 +--
> >>   target/sh4/translate.c  | 90 ++---
> >>   5 files changed, 88 insertions(+), 76 deletions(-)
> >> 
> >> diff --git a/target/sh4/cpu.h b/target/sh4/cpu.h
> >> index 9f15ef913c..727b829598 100644
> >> --- a/target/sh4/cpu.h
> >> +++ b/target/sh4/cpu.h
> >> @@ -78,26 +78,33 @@
> >>   #define FPSCR_RM_NEAREST   (0 << 0)
> >>   #define FPSCR_RM_ZERO  (1 << 0)
> >>   -#define DELAY_SLOT_MASK0x7
> >> -#define DELAY_SLOT (1 << 0)
> >> -#define DELAY_SLOT_CONDITIONAL (1 << 1)
> >> -#define DELAY_SLOT_RTE (1 << 2)
> >> +#define TB_FLAG_DELAY_SLOT   (1 << 0)
> >> +#define TB_FLAG_DELAY_SLOT_COND  (1 << 1)
> >> +#define TB_FLAG_DELAY_SLOT_RTE   (1 << 2)
> >> +#define TB_FLAG_PENDING_MOVCA(1 << 3)
> >> +#define TB_FLAG_GUSA_SHIFT   4  /* [11:4] */
> >> +#define TB_FLAG_GUSA_EXCLUSIVE   (1 << 12)
> >> +#define TB_FLAG_UNALIGN  (1 << 13)
> >> +#define TB_FLAG_SR_FD(1 << SR_FD)   /* 15 */
> >> +#define TB_FLAG_FPSCR_PR FPSCR_PR   /* 19 */
> >> +#define TB_FLAG_FPSCR_SZ FPSCR_SZ   /* 20 */
> >> +#define TB_FLAG_FPSCR_FR FPSCR_FR   /* 21 */
> >> +#define TB_FLAG_SR_RB(1 << SR_RB)   /* 29 */
> >> +#define TB_FLAG_SR_MD(1 << SR_MD)   /* 30 */
> >>   -#define TB_FLAG_PENDING_MOVCA  (1 << 3)
> >> -#define TB_FLAG_UNALIGN(1 << 4)
> >> -
> >> -#define GUSA_SHIFT 4
> >> -#ifdef CONFIG_USER_ONLY
> >> -#define GUSA_EXCLUSIVE (1 << 12)
> >> -#define GUSA_MASK  ((0xff << GUSA_SHIFT) | GUSA_EXCLUSIVE)
> >> -#else
> >> -/* Provide dummy versions of the above to allow tests against tbflags
> >> -   to be elided while avoiding ifdefs.  */
> >> -#define GUSA_EXCLUSIVE 0
> >> -#define GUSA_MASK  0
> >> -#endif
> >> -
> >> -#define TB_FLAG_ENVFLAGS_MASK  (DELAY_SLOT_MASK | GUSA_MASK)
> >> +#define TB_FLAG_DELAY_SLOT_MASK  (TB_FLAG_DELAY_SLOT |   \
> >> +  TB_FLAG_DELAY_SLOT_COND |  \
> >> +  TB_FLAG_DELAY_SLOT_RTE)
> >> +#define TB_FLAG_GUSA_MASK((0xff << TB_FLAG_GUSA_SHIFT) | \
> >> +  TB_FLAG_GUSA_EXCLUSIVE)
> >> +#define TB_FLAG_FPSCR_MASK   (TB_FLAG_FPSCR_PR | \
> >> +  TB_FLAG_FPSCR_SZ | \
> >> +  TB_FLAG_FPSCR_FR)
> >> +#define TB_FLAG_SR_MASK  (TB_FLAG_SR_FD | \
> >> +  TB_FLAG_SR_RB | \
> >> +  TB_FLAG_SR_MD)
> >> +#define TB_FLAG_ENVFLAGS_MASK(TB_FLAG_DELAY_SLOT_MASK | \
> >> +  TB_FLAG_GUSA_MASK)
> >> typedef struct tlb_t {
> >>   uint32_t vpn;/* virtual page number */
> >> @@ -258,7 +265,7 @@ static inline int cpu_mmu_index (CPUSH4State *env, 
> >> bool ifetch)
> >>   {
> >>   /* The instruction in a RTE delay slot is fetched in privileged
> >>  mode, but executed in user mode.  */
> >> -if (ifetch && (env->flags & DELAY_SLOT_RTE)) {
> >> +if (ifetch && (env->flags & TB_FLAG_DELAY_SLOT_RTE)) {
> >>   return 0;
> >>   } else {
> >>   return (env->sr & (1u << SR_MD)) == 0 ? 1 : 0;
> >> @@ -366,11 +373,10 @@ static inline void cpu_get_tb_cpu_state(CPUSH4State 
> >> *env, target_ulong *pc,
> >>   {
> >>   *pc = env->pc;
> >>   /* For a gUSA region, notice the end of the region.  */
> >> -*cs_base = env->flags & GUSA_MASK ? env->gregs[0] : 0;
> >> -*flags = env->flags /* TB_FLAG_ENVFLAGS_MASK: bits 0-2, 4-12 */
> >> -| (env->fpscr & (FPSCR_FR | FPSCR_SZ | FPSCR_PR))  /* Bits 
> >> 19-21 */
> >> -| (env->sr & ((1u << SR_MD) | (1u << SR_RB)))  /* Bits 
> >> 29-30 */
> >> -| (env->sr & (1u << SR_FD))/* Bit 15 
> >> */
> >> +*cs_base = env->flags & TB_FLAG_GUSA_MASK ? env->gregs[0] :

Re: [PATCH 1/1] qxl: add subsystem_vendor_id property

2022-10-03 Thread Denis V. Lunev

On 9/29/22 09:37, Gerd Hoffmann wrote:

On Wed, Sep 28, 2022 at 05:52:44PM +0200, Denis V. Lunev wrote:

This property is needed for WHQL/inboxing of Windows drivers. We do need
to get drivers to be separated by the hypervisor vendors and that should
be done as PCI subvendor ID.

This patch adds PCI subsystem vendor ID to QXL device to match that
convention.

We have pci_default_sub_vendor_id + pci_default_sub_device_id in
hw/pci/pci.c. If you want another subsystem id for another vendor
there is a single place to change it for all devices.

Right now there is no runtime switch for them, so updating it requires
a two-liner patch for your vendor build. We can discuss changing that,
but that should best be coordinated with libvirt folks to make sure
the management stack actually allows setting the subsystem id without
needing hacks.

Yes. There is no runtime switch for it. I have also checked this.

The story here seems more complex. We are using in our
downstream the following patch from Ben Warren

https://lists.gnu.org/archive/html/qemu-devel/2017-12/msg02128.html

and I have mistakenly thought that it was accepted in
the mainstream. OK, unfortunately that was not happen.
As this has been pointed out in the above thread
the discussion was moved into

https://patchwork.kernel.org/project/qemu-devel/patch/20171102133115.19195-1-lpro...@redhat.com/

Anyway, we need to support different PCI sub-vendor IDs
in order to be compliant with Microsoft WHQL rules. Though,
actually, at my opinion this requirement has nothing in
common with libvirt people. The most convenient way
here would be to specify these properties within vendor
machine types and this place is a perfect match as any
respectable has its own machine type.

I would also think that PCI level is not a good place for that
as we would not be able to apply this change blindly as at
PCI level this change would be too global and the same
was initially noted by Michael Tsirkin here

https://lists.gnu.org/archive/html/qemu-devel/2017-12/msg04384.html

Any thoughts?
What should we do with the original patch from Ben? We
still need an ability to expose vendor identity in QXL/virtio...

Den

[PATCH] docs/nuvoton: Update URL for images

2022-10-03 Thread Joel Stanley

openpower.xyz was retired some time ago. The OpenBMC Jenkins is where
images can be found these days.

Signed-off-by: Joel Stanley 
---
 docs/system/arm/nuvoton.rst | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/system/arm/nuvoton.rst b/docs/system/arm/nuvoton.rst
index ef2792076aa8..c38df32bde07 100644
--- a/docs/system/arm/nuvoton.rst
+++ b/docs/system/arm/nuvoton.rst
@@ -82,9 +82,9 @@ Boot options
 
 The Nuvoton machines can boot from an OpenBMC firmware image, or directly into
 a kernel using the ``-kernel`` option. OpenBMC images for ``quanta-gsj`` and
-possibly others can be downloaded from the OpenPOWER jenkins :
+possibly others can be downloaded from the OpenBMC jenkins :
 
-   https://openpower.xyz/
+   https://jenkins.openbmc.org/
 
 The firmware image should be attached as an MTD drive. Example :
 
-- 
2.35.1

Re: [PATCH v2 2/3] target/arm: Use ARMGranuleSize in ARMVAParameters

2022-10-03 Thread Richard Henderson


On 10/3/22 09:23, Peter Maydell wrote:

Now we have an enum for the granule size, use it in the
ARMVAParameters struct instead of the using16k/using64k bools.

Signed-off-by: Peter Maydell
---
  target/arm/internals.h | 23 +--
  target/arm/helper.c| 39 ---
  target/arm/ptw.c   |  8 +---
  3 files changed, 50 insertions(+), 20 deletions(-)


Reviewed-by: Richard Henderson 

r~

Re: x86, pflash, unassigned memory access

2022-10-03 Thread Alexey Kardashevskiy


Anyone, ping?

On 27/09/2022 12:35, Alexey Kardashevskiy wrote:

Hi!

I am trying qemu-system-x86_64 with OVMF with the q35 machine, the 
complete command line is below.


It works fine (including SEV on AMD EPYC), but these 2 parameters make 
me wonder if I miss something:


-drive 
if=pflash,format=raw,unit=0,file=/home/aik/OVMF_CODE.fd,readonly=on,id=MYPF \

-d guest_errors

With this, I see a bunch of
===
Invalid access at addr 0xFFC0, size 1, region '(null)', reason: 
rejected
Invalid access at addr 0xFFC1, size 1, region '(null)', reason: 
rejected
Invalid access at addr 0xFFC2, size 1, region '(null)', reason: 
rejected

...
Invalid access at addr 0xFFC00FFF, size 1, region '(null)', reason: 
rejected

QEMU Flash: Failed to find probe location
QEMU flash was not detected. Writable FVB is not being installed.
===

These are the indication of unassigned memory access which always meant 
a bug in my past experience (which is POWERPC so not so relevant here 
but nevertheless).


OVMF is probing the flash at 0xFFC0 (hardcoded in OVMF) in
https://github.com/tianocore/edk2/blob/master/OvmfPkg/QemuFlashFvbServicesRuntimeDxe/QemuFlash.c#L65
but cannot succeed - "info mtree -f" says that at no point there is 
anything at 0xFFC0:


===
...
fed1c000-fed1 (prio 1, i/o): lpc-rcrb-mmio
fee0-feef (prio 4096, i/o): kvm-apic-msi
ffc84000- (prio 0, romd): system.flash0 KVM
0008-00080fff (prio 0, i/o): 
virtio-pci-common-virtio-net

...
===

hw/block/pflash_cfi01.c suggests QEMU implements this protocol via 
pflash_cfi01_ops but it is never called as:

- it is the same memory region as the OVMF code and
- it is mapped at 0xffc84000 (which is 4G - 
size("./Build/OvmfX64/DEBUG_GCC5/FV/OVMF_CODE.fd"), not where OVMF 
expects it) and
- it has romd==true, it is a KVM memory slot and IO is never emulated in 
QEMU.


Adding another IO memory region with pflash_cfi01_ops and mapping it at 
0xFFC0 makes it loop in OVMF somewhere.


OVMF code is linked to hardcoded 0xffc84000 (FD_SIZE_IN_KB==4096).


So I wonder - are these illegal accesses a bug of some sort in QEMU or 
OVMF or command line? Thanks,





The complete command line is:

/home/aik/pbuild/qemu-snp-localhost-x86_64/qemu-system-x86_64 \
-enable-kvm \
-m 2G \
-smp 2 \
-netdev user,id=USER0,hostfwd=tcp::2223-:22 \
-device 
virtio-net-pci,id=vnet0,iommu_platform=on,disable-legacy=on,romfile=,netdev=USER0 \

-machine q35 \
-device 
virtio-scsi-pci,id=vscsi0,iommu_platform=on,disable-modern=off,disable-legacy=on \
-drive 
id=DRIVE0,if=none,file=img/u2204_128G_aikbook_sev.qcow2,format=qcow2 \

-device scsi-hd,id=scsi-hd0,drive=DRIVE0 \
-drive 
if=pflash,format=raw,unit=0,file=/home/aik/OVMF_CODE.fd,readonly=on,id=MYPF \

-nographic \
-chardev stdio,id=STDIO0,signal=off,mux=on \
-device isa-serial,id=isa-serial0,chardev=STDIO0 \
-mon id=MON0,chardev=STDIO0,mode=readline \
-kernel /boot/vmlinuz \
-append console=ttyS0,115200n1 earlyprintk root=/dev/sda3 \
-d guest_errors




--
Alexey

A few QEMU questiosn

2022-10-03 Thread a b

Hello, there,

I have a few newbie QEMU questions.  I found that mmu_idx in aarch64-softmmu  
falls in 8, 10 and 12.

I need some help to understand what they are for.

I cannot find which macros are for mmu-idx 8, 10 and 12 at 
target/arm/cpu.h.
 It looks like all the values from 
ARMMMUIdx
 are greater than 0x10 (ARM_MMU_IDX_A). Am I looking at the wrong place or 
missing something for the different MMU modes in aarch64?

I'd appreciate your help.

Regards

[PATCH v4 5/6] hw/arm/virt: Improve high memory region address

2022-10-03 Thread Gavin Shan

There are three high memory regions, which are VIRT_HIGH_REDIST2,
VIRT_HIGH_PCIE_ECAM and VIRT_HIGH_PCIE_MMIO. Their base addresses
are floating on highest RAM address. However, they can be disabled
in several cases.

(1) One specific high memory region is disabled by developer by
toggling vms->highmem_{redists, ecam, mmio}.

(2) VIRT_HIGH_PCIE_ECAM region is disabled on machine, which is
'virt-2.12' or ealier than it.

(3) VIRT_HIGH_PCIE_ECAM region is disabled when firmware is loaded
on 32-bits system.

(4) One specific high memory region is disabled when it breaks the
PA space limit.

The current implementation of virt_set_memmap() isn't comprehensive
because the space for one specific high memory region is always
reserved from the PA space for case (1), (2) and (3). In the code,
'base' and 'vms->highest_gpa' are always increased for those three
cases. It's unnecessary since the assigned space of the disabled
high memory region won't be used afterwards.

This improves the address assignment for those three high memory
region by skipping the address assignment for one specific high
memory region if it has been disabled in case (1), (2) and (3).
'vms->high_compact' is false for now, meaning that we don't have
any behavior changes until it becomes configurable through property
'compact-highmem' in next patch.

Signed-off-by: Gavin Shan 
---
 hw/arm/virt.c | 19 ---
 include/hw/arm/virt.h |  1 +
 2 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 59de7b78b5..4164da49e9 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1715,9 +1715,6 @@ static void virt_set_high_memmap(VirtMachineState *vms,
 region_base = ROUND_UP(base, extended_memmap[i].size);
 region_size = extended_memmap[i].size;
 
-vms->memmap[i].base = region_base;
-vms->memmap[i].size = region_size;
-
 /*
  * Check each device to see if they fit in the PA space,
  * moving highest_gpa as we go.
@@ -1725,12 +1722,20 @@ static void virt_set_high_memmap(VirtMachineState *vms,
  * For each device that doesn't fit, disable it.
  */
 fits = (region_base + region_size) <= BIT_ULL(pa_bits);
-if (fits) {
+if (*region_enabled && fits) {
+vms->memmap[i].base = region_base;
+vms->memmap[i].size = region_size;
 vms->highest_gpa = region_base + region_size - 1;
+base = region_base + region_size;
+} else {
+*region_enabled = false;
+if (!vms->highmem_compact) {
+base = region_base + region_size;
+if (fits) {
+vms->highest_gpa = region_base + region_size - 1;
+}
+}
 }
-
-*region_enabled &= fits;
-base = region_base + region_size;
 }
 }
 
diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
index 6ec479ca2b..709f623741 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -144,6 +144,7 @@ struct VirtMachineState {
 PFlashCFI01 *flash[2];
 bool secure;
 bool highmem;
+bool highmem_compact;
 bool highmem_ecam;
 bool highmem_mmio;
 bool highmem_redists;
-- 
2.23.0

[PATCH v4 6/6] hw/arm/virt: Add 'compact-highmem' property

2022-10-03 Thread Gavin Shan

After the improvement to high memory region address assignment is
applied, the memory layout can be changed, introducing possible
migration breakage. For example, VIRT_HIGH_PCIE_MMIO memory region
is disabled or enabled when the optimization is applied or not, with
the following configuration.

  pa_bits  = 40;
  vms->highmem_redists = false;
  vms->highmem_ecam= false;
  vms->highmem_mmio= true;

  # qemu-system-aarch64 -accel kvm -cpu host\
-machine virt-7.2,compact-highmem={on, off} \
-m 4G,maxmem=511G -monitor stdio

  Regioncompact-highmem=off compact-highmem=on
  
  RAM   [1GB 512GB][1GB 512GB]
  HIGH_GIC_REDISTS  [512GB   512GB+64MB]   [disabled]
  HIGH_PCIE_ECAM[512GB+256MB 512GB+512MB]  [disabled]
  HIGH_PCIE_MMIO[disabled] [512GB   1TB]

In order to keep backwords compatibility, we need to disable the
optimization on machines, which is virt-7.1 or ealier than it. It
means the optimization is enabled by default from virt-7.2. Besides,
'compact-highmem' property is added so that the optimization can be
explicitly enabled or disabled on all machine types by users.

Signed-off-by: Gavin Shan 
---
 docs/system/arm/virt.rst |  4 
 hw/arm/virt.c| 47 
 include/hw/arm/virt.h|  1 +
 3 files changed, 52 insertions(+)

diff --git a/docs/system/arm/virt.rst b/docs/system/arm/virt.rst
index 20442ea2c1..75bf5a4994 100644
--- a/docs/system/arm/virt.rst
+++ b/docs/system/arm/virt.rst
@@ -94,6 +94,10 @@ highmem
   address space above 32 bits. The default is ``on`` for machine types
   later than ``virt-2.12``.
 
+compact-highmem
+  Set ``on``/``off`` to enable/disable compact space for high memory regions.
+  The default is ``on`` for machine types later than ``virt-7.2``
+
 gic-version
   Specify the version of the Generic Interrupt Controller (GIC) to provide.
   Valid values are:
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 4164da49e9..9fe65a2ae1 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -174,6 +174,27 @@ static const MemMapEntry base_memmap[] = {
  * Note the extended_memmap is sized so that it eventually also includes the
  * base_memmap entries (VIRT_HIGH_GIC_REDIST2 index is greater than the last
  * index of base_memmap).
+ *
+ * The addresses assigned to these regions are affected by 'compact-highmem'
+ * property, which is to enable or disable the compact space in the Highmem
+ * IO regions. For example, VIRT_HIGH_PCIE_MMIO can be disabled or enabled
+ * depending on the property in the following scenario.
+ *
+ * pa_bits  = 40;
+ * vms->highmem_redists = false;
+ * vms->highmem_ecam= false;
+ * vms->highmem_mmio= true;
+ *
+ * # qemu-system-aarch64 -accel kvm -cpu host\
+ *   -machine virt-7.2,compact-highmem={on, off} \
+ *   -m 4G,maxmem=511G -monitor stdio
+ *
+ * Regioncompact-highmem=offcompact-highmem=on
+ * 
+ * RAM   [1GB 512GB][1GB 512GB]
+ * HIGH_GIC_REDISTS  [512GB   512GB+64MB]   [disabled]
+ * HIGH_PCIE_ECAM[512GB+256GB 512GB+512MB]  [disabled]
+ * HIGH_PCIE_MMIO[disabled] [512GB   1TB]
  */
 static MemMapEntry extended_memmap[] = {
 /* Additional 64 MB redist region (can contain up to 512 redistributors) */
@@ -2349,6 +2370,20 @@ static void virt_set_highmem(Object *obj, bool value, 
Error **errp)
 vms->highmem = value;
 }
 
+static bool virt_get_compact_highmem(Object *obj, Error **errp)
+{
+VirtMachineState *vms = VIRT_MACHINE(obj);
+
+return vms->highmem_compact;
+}
+
+static void virt_set_compact_highmem(Object *obj, bool value, Error **errp)
+{
+VirtMachineState *vms = VIRT_MACHINE(obj);
+
+vms->highmem_compact = value;
+}
+
 static bool virt_get_its(Object *obj, Error **errp)
 {
 VirtMachineState *vms = VIRT_MACHINE(obj);
@@ -2967,6 +3002,13 @@ static void virt_machine_class_init(ObjectClass *oc, 
void *data)
   "Set on/off to enable/disable using "
   "physical address space above 32 
bits");
 
+object_class_property_add_bool(oc, "compact-highmem",
+   virt_get_compact_highmem,
+   virt_set_compact_highmem);
+object_class_property_set_description(oc, "compact-highmem",
+  "Set on/off to enable/disable 
compact "
+  "space for high memory regions");
+
 object_class_property_add_str(oc, "gic-version", virt_get_gic_version,
   virt_set_gic_version);
 object_class_property_set_description(oc, "gic-version",
@@ -3051,6 +3093,7 @@ static void virt_instance_init(Object

[PATCH v4 4/6] hw/arm/virt: Introduce virt_get_high_memmap_enabled() helper

2022-10-03 Thread Gavin Shan

This introduces virt_get_high_memmap_enabled() helper, which returns
the pointer to vms->highmem_{redists, ecam, mmio}. The pointer will
be used in the subsequent patches.

No functional change intended.

Signed-off-by: Gavin Shan 
---
 hw/arm/virt.c | 30 +-
 1 file changed, 17 insertions(+), 13 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index b0b679d1f4..59de7b78b5 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1689,14 +1689,29 @@ static uint64_t virt_cpu_mp_affinity(VirtMachineState 
*vms, int idx)
 return arm_cpu_mp_affinity(idx, clustersz);
 }
 
+static inline bool *virt_get_high_memmap_enabled(VirtMachineState *vms,
+ int index)
+{
+bool *enabled_array[] = {
+>highmem_redists,
+>highmem_ecam,
+>highmem_mmio,
+};
+
+assert(index - VIRT_LOWMEMMAP_LAST < ARRAY_SIZE(enabled_array));
+
+return enabled_array[index - VIRT_LOWMEMMAP_LAST];
+}
+
 static void virt_set_high_memmap(VirtMachineState *vms,
  hwaddr base, int pa_bits)
 {
 hwaddr region_base, region_size;
-bool fits;
+bool *region_enabled, fits;
 int i;
 
 for (i = VIRT_LOWMEMMAP_LAST; i < ARRAY_SIZE(extended_memmap); i++) {
+region_enabled = virt_get_high_memmap_enabled(vms, i);
 region_base = ROUND_UP(base, extended_memmap[i].size);
 region_size = extended_memmap[i].size;
 
@@ -1714,18 +1729,7 @@ static void virt_set_high_memmap(VirtMachineState *vms,
 vms->highest_gpa = region_base + region_size - 1;
 }
 
-switch (i) {
-case VIRT_HIGH_GIC_REDIST2:
-vms->highmem_redists &= fits;
-break;
-case VIRT_HIGH_PCIE_ECAM:
-vms->highmem_ecam &= fits;
-break;
-case VIRT_HIGH_PCIE_MMIO:
-vms->highmem_mmio &= fits;
-break;
-}
-
+*region_enabled &= fits;
 base = region_base + region_size;
 }
 }
-- 
2.23.0

[PATCH v4 0/6] hw/arm/virt: Improve address assignment for high memory regions

2022-10-03 Thread Gavin Shan

There are three high memory regions, which are VIRT_HIGH_REDIST2,
VIRT_HIGH_PCIE_ECAM and VIRT_HIGH_PCIE_MMIO. Their base addresses
are floating on highest RAM address. However, they can be disabled
in several cases.

(1) One specific high memory region is disabled by developer by
toggling vms->highmem_{redists, ecam, mmio}.

(2) VIRT_HIGH_PCIE_ECAM region is disabled on machine, which is
'virt-2.12' or ealier than it.

(3) VIRT_HIGH_PCIE_ECAM region is disabled when firmware is loaded
on 32-bits system.

(4) One specific high memory region is disabled when it breaks the
PA space limit.

The current implementation of virt_set_memmap() isn't comprehensive
because the space for one specific high memory region is always
reserved from the PA space for case (1), (2) and (3). In the code,
'base' and 'vms->highest_gpa' are always increased for those three
cases. It's unnecessary since the assigned space of the disabled
high memory region won't be used afterwards.

The series intends to improve the address assignment for these
high memory regions.

PATCH[1-4] preparatory work for the improvment
PATCH[5]   improve high memory region address assignment
PATCH[6]   adds 'compact-highmem' to enable or disable the optimization

History
===
v3: https://lists.nongnu.org/archive/html/qemu-arm/2022-09/msg00258.html
v2: https://lore.kernel.org/all/20220815062958.100366-1-gs...@redhat.com/T/
v1: https://lists.nongnu.org/archive/html/qemu-arm/2022-08/msg00013.html

Changelog
==
v4:
  * Add virt_get_high_memmap_enabled() helper  (Eric)
  * Move 'vms->highmem_compact' and related logic from
PATCH[v4 6/6] to PATCH[v4 5/6] to avoid git-bisect
breakage   (Eric)
  * Document the legacy and optimized high memory region
layout in commit log and source code   (Eric)
v3:
  * Reorder the patches(Gavin)
  * Add 'highmem-compact' property for backwards compatibility (Eric)
v2:
  * Split the patches for easier review(Gavin)
  * Improved changelog (Marc)
  * Use 'bool fits' in virt_set_high_memmap()  (Eric)

Gavin Shan (6):
  hw/arm/virt: Introduce virt_set_high_memmap() helper
  hw/arm/virt: Rename variable size to region_size in
virt_set_high_memmap()
  hw/arm/virt: Introduce variable region_base in virt_set_high_memmap()
  hw/arm/virt: Introduce virt_get_high_memmap_enabled() helper
  hw/arm/virt: Improve high memory region address
  hw/arm/virt: Add 'compact-highmem' property

 docs/system/arm/virt.rst |   4 ++
 hw/arm/virt.c| 131 +--
 include/hw/arm/virt.h|   2 +
 3 files changed, 104 insertions(+), 33 deletions(-)

-- 
2.23.0

[PATCH v4 2/6] hw/arm/virt: Rename variable size to region_size in virt_set_high_memmap()

2022-10-03 Thread Gavin Shan

This renames variable 'size' to 'region_size' in virt_set_high_memmap().
Its counterpart ('region_base') will be introduced in next patch.

No functional change intended.

Signed-off-by: Gavin Shan 
Reviewed-by: Eric Auger 
---
 hw/arm/virt.c | 15 ---
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 4dab528b82..187b3ee0e2 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1692,15 +1692,16 @@ static uint64_t virt_cpu_mp_affinity(VirtMachineState 
*vms, int idx)
 static void virt_set_high_memmap(VirtMachineState *vms,
  hwaddr base, int pa_bits)
 {
+hwaddr region_size;
+bool fits;
 int i;
 
 for (i = VIRT_LOWMEMMAP_LAST; i < ARRAY_SIZE(extended_memmap); i++) {
-hwaddr size = extended_memmap[i].size;
-bool fits;
+region_size = extended_memmap[i].size;
 
-base = ROUND_UP(base, size);
+base = ROUND_UP(base, region_size);
 vms->memmap[i].base = base;
-vms->memmap[i].size = size;
+vms->memmap[i].size = region_size;
 
 /*
  * Check each device to see if they fit in the PA space,
@@ -1708,9 +1709,9 @@ static void virt_set_high_memmap(VirtMachineState *vms,
  *
  * For each device that doesn't fit, disable it.
  */
-fits = (base + size) <= BIT_ULL(pa_bits);
+fits = (base + region_size) <= BIT_ULL(pa_bits);
 if (fits) {
-vms->highest_gpa = base + size - 1;
+vms->highest_gpa = base + region_size - 1;
 }
 
 switch (i) {
@@ -1725,7 +1726,7 @@ static void virt_set_high_memmap(VirtMachineState *vms,
 break;
 }
 
-base += size;
+base += region_size;
 }
 }
 
-- 
2.23.0

[PATCH v4 1/6] hw/arm/virt: Introduce virt_set_high_memmap() helper

2022-10-03 Thread Gavin Shan

This introduces virt_set_high_memmap() helper. The logic of high
memory region address assignment is moved to the helper. The intention
is to make the subsequent optimization for high memory region address
assignment easier.

No functional change intended.

Signed-off-by: Gavin Shan 
Reviewed-by: Eric Auger 
Reviewed-by: Cornelia Huck 
---
 hw/arm/virt.c | 74 ---
 1 file changed, 41 insertions(+), 33 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 0961e053e5..4dab528b82 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1689,6 +1689,46 @@ static uint64_t virt_cpu_mp_affinity(VirtMachineState 
*vms, int idx)
 return arm_cpu_mp_affinity(idx, clustersz);
 }
 
+static void virt_set_high_memmap(VirtMachineState *vms,
+ hwaddr base, int pa_bits)
+{
+int i;
+
+for (i = VIRT_LOWMEMMAP_LAST; i < ARRAY_SIZE(extended_memmap); i++) {
+hwaddr size = extended_memmap[i].size;
+bool fits;
+
+base = ROUND_UP(base, size);
+vms->memmap[i].base = base;
+vms->memmap[i].size = size;
+
+/*
+ * Check each device to see if they fit in the PA space,
+ * moving highest_gpa as we go.
+ *
+ * For each device that doesn't fit, disable it.
+ */
+fits = (base + size) <= BIT_ULL(pa_bits);
+if (fits) {
+vms->highest_gpa = base + size - 1;
+}
+
+switch (i) {
+case VIRT_HIGH_GIC_REDIST2:
+vms->highmem_redists &= fits;
+break;
+case VIRT_HIGH_PCIE_ECAM:
+vms->highmem_ecam &= fits;
+break;
+case VIRT_HIGH_PCIE_MMIO:
+vms->highmem_mmio &= fits;
+break;
+}
+
+base += size;
+}
+}
+
 static void virt_set_memmap(VirtMachineState *vms, int pa_bits)
 {
 MachineState *ms = MACHINE(vms);
@@ -1744,39 +1784,7 @@ static void virt_set_memmap(VirtMachineState *vms, int 
pa_bits)
 /* We know for sure that at least the memory fits in the PA space */
 vms->highest_gpa = memtop - 1;
 
-for (i = VIRT_LOWMEMMAP_LAST; i < ARRAY_SIZE(extended_memmap); i++) {
-hwaddr size = extended_memmap[i].size;
-bool fits;
-
-base = ROUND_UP(base, size);
-vms->memmap[i].base = base;
-vms->memmap[i].size = size;
-
-/*
- * Check each device to see if they fit in the PA space,
- * moving highest_gpa as we go.
- *
- * For each device that doesn't fit, disable it.
- */
-fits = (base + size) <= BIT_ULL(pa_bits);
-if (fits) {
-vms->highest_gpa = base + size - 1;
-}
-
-switch (i) {
-case VIRT_HIGH_GIC_REDIST2:
-vms->highmem_redists &= fits;
-break;
-case VIRT_HIGH_PCIE_ECAM:
-vms->highmem_ecam &= fits;
-break;
-case VIRT_HIGH_PCIE_MMIO:
-vms->highmem_mmio &= fits;
-break;
-}
-
-base += size;
-}
+virt_set_high_memmap(vms, base, pa_bits);
 
 if (device_memory_size > 0) {
 ms->device_memory = g_malloc0(sizeof(*ms->device_memory));
-- 
2.23.0

[PATCH v4 3/6] hw/arm/virt: Introduce variable region_base in virt_set_high_memmap()

2022-10-03 Thread Gavin Shan

This introduces variable 'region_base' for the base address of the
specific high memory region. It's the preparatory work to optimize
high memory region address assignment.

No functional change intended.

Signed-off-by: Gavin Shan 
Reviewed-by: Eric Auger 
---
 hw/arm/virt.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 187b3ee0e2..b0b679d1f4 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1692,15 +1692,15 @@ static uint64_t virt_cpu_mp_affinity(VirtMachineState 
*vms, int idx)
 static void virt_set_high_memmap(VirtMachineState *vms,
  hwaddr base, int pa_bits)
 {
-hwaddr region_size;
+hwaddr region_base, region_size;
 bool fits;
 int i;
 
 for (i = VIRT_LOWMEMMAP_LAST; i < ARRAY_SIZE(extended_memmap); i++) {
+region_base = ROUND_UP(base, extended_memmap[i].size);
 region_size = extended_memmap[i].size;
 
-base = ROUND_UP(base, region_size);
-vms->memmap[i].base = base;
+vms->memmap[i].base = region_base;
 vms->memmap[i].size = region_size;
 
 /*
@@ -1709,9 +1709,9 @@ static void virt_set_high_memmap(VirtMachineState *vms,
  *
  * For each device that doesn't fit, disable it.
  */
-fits = (base + region_size) <= BIT_ULL(pa_bits);
+fits = (region_base + region_size) <= BIT_ULL(pa_bits);
 if (fits) {
-vms->highest_gpa = base + region_size - 1;
+vms->highest_gpa = region_base + region_size - 1;
 }
 
 switch (i) {
@@ -1726,7 +1726,7 @@ static void virt_set_high_memmap(VirtMachineState *vms,
 break;
 }
 
-base += region_size;
+base = region_base + region_size;
 }
 }
 
-- 
2.23.0

Re: ublk-qcow2: ublk-qcow2 is available

2022-10-03 Thread Denis V. Lunev


On 10/3/22 21:53, Stefan Hajnoczi wrote:

On Fri, Sep 30, 2022 at 05:24:11PM +0800, Ming Lei wrote:

ublk-qcow2 is available now.

Cool, thanks for sharing!

yep


So far it provides basic read/write function, and compression and snapshot
aren't supported yet. The target/backend implementation is completely
based on io_uring, and share the same io_uring with ublk IO command
handler, just like what ublk-loop does.

Follows the main motivations of ublk-qcow2:

- building one complicated target from scratch helps libublksrv APIs/functions
   become mature/stable more quickly, since qcow2 is complicated and needs more
   requirement from libublksrv compared with other simple ones(loop, null)

- there are several attempts of implementing qcow2 driver in kernel, such as
   ``qloop`` [2], ``dm-qcow2`` [3] and ``in kernel qcow2(ro)`` [4], so 
ublk-qcow2
   might useful be for covering requirement in this field

There is one important thing to keep in mind about all partly-userspace
implementations though:
* any single allocation happened in the context of the
   userspace daemon through try_to_free_pages() in
   kernel has a possibility to trigger the operation,
   which will require userspace daemon action, which
   is inside the kernel now.
* the probability of this is higher in the overcommitted
   environment

This was the main motivation of us in favor for the in-kernel
implementation.


- performance comparison with qemu-nbd, and it was my 1st thought to evaluate
   performance of ublk/io_uring backend by writing one ublk-qcow2 since ublksrv
   is started

- help to abstract common building block or design pattern for writing new ublk
   target/backend

So far it basically passes xfstest(XFS) test by using ublk-qcow2 block
device as TEST_DEV, and kernel building workload is verified too. Also
soft update approach is applied in meta flushing, and meta data
integrity is guaranteed, 'make test T=qcow2/040' covers this kind of
test, and only cluster leak is reported during this test.

The performance data looks much better compared with qemu-nbd, see
details in commit log[1], README[5] and STATUS[6]. And the test covers both
empty image and pre-allocated image, for example of pre-allocated qcow2
image(8GB):

- qemu-nbd (make test T=qcow2/002)

Single queue?


randwrite(4k): jobs 1, iops 24605
randread(4k): jobs 1, iops 30938
randrw(4k): jobs 1, iops read 13981 write 14001
rw(512k): jobs 1, iops read 724 write 728

Please try qemu-storage-daemon's VDUSE export type as well. The
command-line should be similar to this:

   # modprobe virtio_vdpa # attaches vDPA devices to host kernel
   # modprobe vduse
   # qemu-storage-daemon \
   --blockdev 
file,filename=test.qcow2,cache.direct=of|off,aio=native,node-name=file \
   --blockdev qcow2,file=file,node-name=qcow2 \
   --object iothread,id=iothread0 \
   --export 
vduse-blk,id=vduse0,name=vduse0,num-queues=$(nproc),node-name=qcow2,writable=on,iothread=iothread0
   # vdpa dev add name vduse0 mgmtdev vduse

A virtio-blk device should appear and xfstests can be run on it
(typically /dev/vda unless you already have other virtio-blk devices).

Afterwards you can destroy the device using:

   # vdpa dev del vduse0

but this would be anyway limited by a single thread doing AIO in
qemu-storage-daemon, I believe.



- ublk-qcow2 (make test T=qcow2/022)

There are a lot of other factors not directly related to NBD vs ublk. In
order to get an apples-to-apples comparison with qemu-* a ublk export
type is needed in qemu-storage-daemon. That way only the difference is
the ublk interface and the rest of the code path is identical, making it
possible to compare NBD, VDUSE, ublk, etc more precisely.

I think that comparison is interesting before comparing different qcow2
implementations because qcow2 sits on top of too much other code. It's
hard to know what should be accounted to configuration differences,
implementation differences, or fundamental differences that cannot be
overcome (this is the interesting part!).


randwrite(4k): jobs 1, iops 104481
randread(4k): jobs 1, iops 114937
randrw(4k): jobs 1, iops read 53630 write 53577
rw(512k): jobs 1, iops read 1412 write 1423

Also ublk-qcow2 aligns queue's chunk_sectors limit with qcow2's cluster size,
which is 64KB at default, this way simplifies backend io handling, but
it could be increased to 512K or more proper size for improving sequential
IO perf, just need one coroutine to handle more than one IOs.


[1] 
https://github.com/ming1/ubdsrv/commit/9faabbec3a92ca83ddae92335c66eabbeff654e7
[2] 
https://upcommons.upc.edu/bitstream/handle/2099.1/9619/65757.pdf?sequence=1=y
[3] https://lwn.net/Articles/889429/
[4] https://lab.ks.uni-freiburg.de/projects/kernel-qcow2/repository
[5] https://github.com/ming1/ubdsrv/blob/master/qcow2/README.rst
[6] https://github.com/ming1/ubdsrv/blob/master/qcow2/STATUS.rst


interesting...

Den

Re: [PATCH v3 5/5] hw/arm/virt: Add 'highmem-compact' property

2022-10-03 Thread Gavin Shan


Hi Eric,

On 10/3/22 4:49 PM, Eric Auger wrote:

On 9/29/22 01:49, Gavin Shan wrote:

On 9/28/22 10:22 PM, Eric Auger wrote:

On 9/22/22 01:13, Gavin Shan wrote:

After the improvement to high memory region address assignment is
applied, the memory layout is changed. For example, VIRT_HIGH_PCIE_MMIO

s/the memory layout is changed./the memory layout is changed,
introducing possible migration breakage.


Ok, much clearer.


memory region is enabled when the improvement is applied, but it's
disabled if the improvement isn't applied.

  pa_bits  = 40;
  vms->highmem_redists = false;
  vms->highmem_ecam    = false;
  vms->highmem_mmio    = true;

  # qemu-system-aarch64 -accel kvm -cpu host \
    -machine virt-7.2 -m 4G,maxmem=511G  \
    -monitor stdio

In order to keep backwords compatibility, we need to disable the
optimization on machines, which is virt-7.1 or ealier than it. It
means the optimization is enabled by default from virt-7.2. Besides,
'highmem-compact' property is added so that the optimization can be

I would rather rename the property into compact-highmem even if the vms
field is name highmem_compact to align with other highmem fields


Ok, but I would love to know why. Note that we already have
'highmem=on|off'. 'highmem_compact=on|off' seems consistent
to me.

To me the property name should rather sound 'english' with the adjective
before the name 'high memory"' but I am not a native english speaker
either.


Ok. I agree 'compact-highmem' is better. The backup variable name will
be still 'highmem_compact', which is consistent with the existing ones.




explicitly enabled or disabled on all machine types by users.

Signed-off-by: Gavin Shan 
---
   docs/system/arm/virt.rst |  4 
   hw/arm/virt.c    | 33 +
   include/hw/arm/virt.h    |  2 ++
   3 files changed, 39 insertions(+)

diff --git a/docs/system/arm/virt.rst b/docs/system/arm/virt.rst
index 20442ea2c1..f05ec2253b 100644
--- a/docs/system/arm/virt.rst
+++ b/docs/system/arm/virt.rst
@@ -94,6 +94,10 @@ highmem
     address space above 32 bits. The default is ``on`` for machine
types
     later than ``virt-2.12``.
   +highmem-compact
+  Set ``on``/``off`` to enable/disable compact space for high
memory regions.
+  The default is ``on`` for machine types later than ``virt-7.2``

I think you should document what is compact layout versus legacy one,
both in the commit msg and maybe as a comment in a code along with the
comment in hw/arm/virt.c starting with 'Highmem IO Regions: '


Ok, I will add this into the commit log in v4. I don't think it's
necessary
to add duplicate comment in the code. People can check the commit log for
details if needed.


+
   gic-version
     Specify the version of the Generic Interrupt Controller (GIC) to
provide.
     Valid values are:
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index b702f8f2b5..a4fbdaef91 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1734,6 +1734,13 @@ static void
virt_set_high_memmap(VirtMachineState *vms,
   base = region_base + region_size;
   } else {
   *region_enabled = false;
+
+    if (!vms->highmem_compact) {

this snippet should be already present in previous patch otherwise this
will break bisectability.



Hmm, nice catch! I think I need to swap PATCH[4] and PATCH[5] in next
revision. In that order, 'compact-highmem' is introduced in PATCH[4],
but not used yet. PATCH[5] has the optimization and 'compact-highmem'
is used.

No in general you introduce the property at the very end with the code
guarded with an unset vms->highmem_compact in the previous patch.



Yeah, what I need is define 'vms->highmem_compact' in PATCH[v3 4/5],
whose value is false. I also need to update @base and @vms->highest_gpa
on !vms->highmem_compact' in PATCH[v3 4/5].




+    base = region_base + region_size;
+    if (fits) {
+    vms->highest_gpa = region_base + region_size - 1;
+    }
+    }
   }
   }
   }
@@ -2348,6 +2355,20 @@ static void virt_set_highmem(Object *obj,
bool value, Error **errp)
   vms->highmem = value;
   }
   +static bool virt_get_highmem_compact(Object *obj, Error **errp)
+{
+    VirtMachineState *vms = VIRT_MACHINE(obj);
+
+    return vms->highmem_compact;
+}
+
+static void virt_set_highmem_compact(Object *obj, bool value, Error
**errp)
+{
+    VirtMachineState *vms = VIRT_MACHINE(obj);
+
+    vms->highmem_compact = value;
+}
+
   static bool virt_get_its(Object *obj, Error **errp)
   {
   VirtMachineState *vms = VIRT_MACHINE(obj);
@@ -2966,6 +2987,13 @@ static void
virt_machine_class_init(ObjectClass *oc, void *data)
     "Set on/off to
enable/disable using "
     "physical address space
above 32 bits");
   +    object_class_property_add_bool(oc, "highmem-compact",
+

Re: [PATCH v3] virtio-scsi: Send "REPORTED LUNS CHANGED" sense data upon disk hotplug events.

2022-10-03 Thread Venu Busireddy

On 2022-10-03 18:13:06 -0500, Venu Busireddy wrote:
> On 2022-09-30 18:25:48 +0200, Paolo Bonzini wrote:
> > On Fri, Sep 30, 2022 at 4:42 PM Venu Busireddy
> >  wrote:
> > > > > Immediately after a hotunplug event, qemu (without any action from
> > > > > the guest) processes a REPORT_LUNS command on the lun 0 of the device
> > > > > (haven't figured out what causes this).
> > > >
> > > > There is only one call to virtio_scsi_handle_cmd_req_prepare and it
> > > > takes the command from the guest, are you sure it is without any
> > > > action from the guest?
> > >
> > > I am sure, based on what I am observing. I am running the scsitrace
> > > (scsitrace -n vtioscsi -v) command on the Solaris guest, and I see no
> > > output there.
> > 
> > Do you have the sources to the driver and/or to the scsitrace dtrace
> 
> I do not have access to the source code. I am working on gaining access.
> 
> > script? Something must be putting the SCSI command in the queue.
> > Perhaps the driver is doing so when it sees an event? And if it is
> > bypassing the normal submission mechanism, the REPORT LUNS commands is
> > hidden in scsitrac; that in turn retruns a unit attention and steals
> 
> While SAM does say "if a REPORT LUNS command enters the enabled command
> state, the device server shall process the REPORT LUNS command and shall
> not report any unit attention condition;," it also says that the unit
> attention condition will not be cleared if the UA_INTLCK_CTRL is set to
> 10b or 11b in the "Control mode page."
> 
> It doesn't appear to me that virtio-scsi supports "Control mode pages."

Just to clarify, I am referring the mode pages with page code 0x0a (and
any subpage codes).

> Does it? If it doesn't, is the expected handling of REPORT LUNS command
> be same as the case of UA_INTLCK_CTRL being set to 00b?
> 
> And while trying to understand this, and reading the code regarding
> the handling of UA_INTLCK_CTRL, I ran across the following comment in
> scsi_req_get_sense():
> 
> /*
>  * FIXME: clearing unit attention conditions upon autosense should be done
>  * only if the UA_INTLCK_CTRL field in the Control mode page is set to 00b
>  * (SAM-5, 5.14).
>  *
>  * We assume UA_INTLCK_CTRL to be 00b for HBAs that support autosense, and
>  * 10b for HBAs that do not support it (do not call scsi_req_get_sense).
>  * Here we handle unit attention clearing for UA_INTLCK_CTRL == 00b.
>  */
> 
> If virtio-scsi doesn't support "Control mode pages," why does the above
> comment even say "assume UA_INTLCK_CTRL to be 00b" or address the case
> of 10b? Also, other than the reference to it in the above comment,
> UA_INTLCK_CTRL is not used anywhere else in the code. This comment
> confused me. Is the comment just wrong, or am I missing something? I am
> just trying to understand this better so that I am better prepared when
> the client driver folks start asking me questions about the qemu support.
> 
> Venu
> 
> > it from the other commands such as TEST UNIT READY, but that's a guest
> > driver bug.
> > 
> > But QEMU cannot just return the unit attention twice. I would start
> > with the patch to use the bus unit attention mechanism. It would be
> > even better to have two unit tests that check the behavior prescribed
> > by the standard: 1) UNIT ATTENTION from TEST UNIT READY immediately
> > after a hotunplug notification; 2) no UNIT ATTENTION from REPORT LUNS
> > and also no UNIT ATTENTION from a subsequent TEST UNIT READY command.
> > Debugging the guest is a separate step.

Re: [PATCH v10 3/7] block: add block layer APIs resembling Linux ZonedBlockDevice ioctls

2022-10-03 Thread Damien Le Moal

On 2022/10/04 2:47, Stefan Hajnoczi wrote:
> On Thu, Sep 29, 2022 at 04:36:27PM +0800, Sam Li wrote:
>> Add a new zoned_host_device BlockDriver. The zoned_host_device option
>> accepts only zoned host block devices. By adding zone management
>> operations in this new BlockDriver, users can use the new block
>> layer APIs including Report Zone and four zone management operations
>> (open, close, finish, reset).
>>
>> Qemu-io uses the new APIs to perform zoned storage commands of the device:
>> zone_report(zrp), zone_open(zo), zone_close(zc), zone_reset(zrs),
>> zone_finish(zf).
>>
>> For example, to test zone_report, use following command:
>> $ ./build/qemu-io --image-opts -n driver=zoned_host_device, 
>> filename=/dev/nullb0
>> -c "zrp offset nr_zones"
>>
>> Signed-off-by: Sam Li 
>> Reviewed-by: Hannes Reinecke 
>> ---
>>  block/block-backend.c | 146 +
>>  block/file-posix.c| 340 +-
>>  block/io.c|  41 
>>  include/block/block-common.h  |   4 +
>>  include/block/block-io.h  |   7 +
>>  include/block/block_int-common.h  |  24 +++
>>  include/block/raw-aio.h   |   6 +-
>>  include/sysemu/block-backend-io.h |  17 ++
>>  meson.build   |   4 +
>>  qapi/block-core.json  |   8 +-
>>  qemu-io-cmds.c| 148 +
>>  11 files changed, 741 insertions(+), 4 deletions(-)
>>
>> diff --git a/block/block-backend.c b/block/block-backend.c
>> index d4a5df2ac2..f7f7acd6f4 100644
>> --- a/block/block-backend.c
>> +++ b/block/block-backend.c
>> @@ -1431,6 +1431,15 @@ typedef struct BlkRwCo {
>>  void *iobuf;
>>  int ret;
>>  BdrvRequestFlags flags;
>> +union {
>> +struct {
>> +unsigned int *nr_zones;
>> +BlockZoneDescriptor *zones;
>> +} zone_report;
>> +struct {
>> +BlockZoneOp op;
>> +} zone_mgmt;
>> +};
>>  } BlkRwCo;
>>  
>>  int blk_make_zero(BlockBackend *blk, BdrvRequestFlags flags)
>> @@ -1775,6 +1784,143 @@ int coroutine_fn blk_co_flush(BlockBackend *blk)
>>  return ret;
>>  }
>>  
>> +static void blk_aio_zone_report_entry(void *opaque) {
> 
> 
> The coroutine_fn annotation is missing:
> 
>   static void coroutine_fn blk_aio_zone_report_entry(void *opaque) {
> 
>> +BlkAioEmAIOCB *acb = opaque;
>> +BlkRwCo *rwco = >rwco;
>> +
>> +rwco->ret = blk_co_zone_report(rwco->blk, rwco->offset,
>> +   rwco->zone_report.nr_zones,
>> +   rwco->zone_report.zones);
>> +blk_aio_complete(acb);
>> +}
>> +
>> +BlockAIOCB *blk_aio_zone_report(BlockBackend *blk, int64_t offset,
>> +unsigned int *nr_zones,
>> +BlockZoneDescriptor  *zones,
>> +BlockCompletionFunc *cb, void *opaque)
>> +{
>> +BlkAioEmAIOCB *acb;
>> +Coroutine *co;
>> +IO_CODE();
>> +
>> +blk_inc_in_flight(blk);
>> +acb = blk_aio_get(_aio_em_aiocb_info, blk, cb, opaque);
>> +acb->rwco = (BlkRwCo) {
>> +.blk= blk,
>> +.offset = offset,
>> +.ret= NOT_DONE,
>> +.zone_report = {
>> +.zones = zones,
>> +.nr_zones = nr_zones,
>> +},
>> +};
>> +acb->has_returned = false;
>> +
>> +co = qemu_coroutine_create(blk_aio_zone_report_entry, acb);
>> +bdrv_coroutine_enter(blk_bs(blk), co);
>> +
>> +acb->has_returned = true;
>> +if (acb->rwco.ret != NOT_DONE) {
>> +replay_bh_schedule_oneshot_event(blk_get_aio_context(blk),
>> + blk_aio_complete_bh, acb);
>> +}
>> +
>> +return >common;
>> +}
>> +
>> +static void blk_aio_zone_mgmt_entry(void *opaque) {
> 
> coroutine_fn is missing here.
> 
>> +BlkAioEmAIOCB *acb = opaque;
>> +BlkRwCo *rwco = >rwco;
>> +
>> +rwco->ret = blk_co_zone_mgmt(rwco->blk, rwco->zone_mgmt.op,
>> + rwco->offset, acb->bytes);
>> +blk_aio_complete(acb);
>> +}
>> +
>> +BlockAIOCB *blk_aio_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
>> +  int64_t offset, int64_t len,
>> +  BlockCompletionFunc *cb, void *opaque) {
>> +BlkAioEmAIOCB *acb;
>> +Coroutine *co;
>> +IO_CODE();
>> +
>> +blk_inc_in_flight(blk);
>> +acb = blk_aio_get(_aio_em_aiocb_info, blk, cb, opaque);
>> +acb->rwco = (BlkRwCo) {
>> +.blk= blk,
>> +.offset = offset,
>> +.ret= NOT_DONE,
>> +.zone_mgmt = {
>> +.op = op,
>> +},
>> +};
>> +acb->bytes = len;
>> +acb->has_returned = false;
>> +
>> +co = qemu_coroutine_create(blk_aio_zone_mgmt_entry, acb);
>> +bdrv_coroutine_enter(blk_bs(blk), co);
>> +
>> +acb->has_returned = true;
>> +if (acb->rwco.ret != NOT_DONE) {
>> +

Re: [PATCH v2] mips/malta: pass RNG seed to to kernel via env var

2022-10-03 Thread Jason A. Donenfeld

Hi Philippe,

On Tue, Oct 4, 2022 at 12:36 AM Philippe Mathieu-Daudé  wrote:
> Send each new revision as a new top-level thread, rather than burying it
> in-reply-to an earlier revision, as many reviewers are not looking
> inside deep threads for new patches.

Will do.

> You seem to justify this commit by the kernel commit, which justifies
> itself mentioning hypervisor use... So the egg comes first before the
> chicken.

Oh, that's not really the intention. My goal is to provide sane
interfaces for preboot environments -- whether those are in a
hypervisor like QEMU or in firmware like CFE -- to pass a random seed
along to the kernel. To that end, I've been making sure there's both a
kernel side and a QEMU side, and submitting both to see what folks
think. The fact that you have some questions (below) is a good thing;
I'm glad to have your input on it.

> > +
> > +qemu_guest_getrandom_nofail(rng_seed, sizeof(rng_seed));
> > +for (size_t i = 0; i < sizeof(rng_seed); ++i) {
> > +sprintf(rng_seed_hex + i * 2, "%02x", rng_seed[i]);
> > +}
> > +prom_set(prom_buf, prom_index++, "rngseed");
> > +prom_set(prom_buf, prom_index++, "%s", rng_seed_hex);
>
> You use the firmware interface to pass rng data to an hypervisor...
>
> Look to me you are forcing one API to ease another one. From the
> FW PoV it is a lie, because the FW will only change this value if
> an operator is involved. Here PROM stands for "programmable read-only
> memory", rarely modified. Having the 'rngseed' updated on each
> reset is surprising.
>
> Do you have an example of firmware doing that? (So I can understand
> whether this is the best way to mimic this behavior here).
>
> Aren't they better APIs to have hypervisors pass data to a kernel?

So a firmware interface *is* the intended situation here. To answer
your last question first: the "standard" firmware interface for
passing these seeds is via device tree's "rng-seed" field. There's
also a EFI protocol for this. And on x86 it can be passed through the
setup_data field. And on m68k the bootinfo bootloader/firmware struct
has a BI_RNG_SEED type. There's plenty of ARM and x86 hardware that
uses device tree and EFI for this, where the firmware is involved in
generating the seeds, and in the device tree case, in mangling the
device tree to have the right values. So, to answer your first
question, yes I think this is indeed a firmware-style interface.

Right now this is obviously intended for QEMU (and other hypervisors)
to implement. Later I'm hoping that firmware environments like CFE
might gain support for setting this. (You could do so interactively
now with "setenv".) So it seems like the environment block here really
is the right way to pass this. If you have a MIPS/malta platform
alternative, I'd be happy to consider it with you, but in my look at
things so far, the fw env block seems like by far the best way of
doing this, especially so considering it's part of both real firmware
environments and QEMU, and is relatively straightforward to implement.

Jason

Re: [PATCH v3] virtio-scsi: Send "REPORTED LUNS CHANGED" sense data upon disk hotplug events.

2022-10-03 Thread Venu Busireddy

On 2022-09-30 18:25:48 +0200, Paolo Bonzini wrote:
> On Fri, Sep 30, 2022 at 4:42 PM Venu Busireddy
>  wrote:
> > > > Immediately after a hotunplug event, qemu (without any action from
> > > > the guest) processes a REPORT_LUNS command on the lun 0 of the device
> > > > (haven't figured out what causes this).
> > >
> > > There is only one call to virtio_scsi_handle_cmd_req_prepare and it
> > > takes the command from the guest, are you sure it is without any
> > > action from the guest?
> >
> > I am sure, based on what I am observing. I am running the scsitrace
> > (scsitrace -n vtioscsi -v) command on the Solaris guest, and I see no
> > output there.
> 
> Do you have the sources to the driver and/or to the scsitrace dtrace

I do not have access to the source code. I am working on gaining access.

> script? Something must be putting the SCSI command in the queue.
> Perhaps the driver is doing so when it sees an event? And if it is
> bypassing the normal submission mechanism, the REPORT LUNS commands is
> hidden in scsitrac; that in turn retruns a unit attention and steals

While SAM does say "if a REPORT LUNS command enters the enabled command
state, the device server shall process the REPORT LUNS command and shall
not report any unit attention condition;," it also says that the unit
attention condition will not be cleared if the UA_INTLCK_CTRL is set to
10b or 11b in the "Control mode page."

It doesn't appear to me that virtio-scsi supports "Control mode pages."
Does it? If it doesn't, is the expected handling of REPORT LUNS command
be same as the case of UA_INTLCK_CTRL being set to 00b?

And while trying to understand this, and reading the code regarding
the handling of UA_INTLCK_CTRL, I ran across the following comment in
scsi_req_get_sense():

/*
 * FIXME: clearing unit attention conditions upon autosense should be done
 * only if the UA_INTLCK_CTRL field in the Control mode page is set to 00b
 * (SAM-5, 5.14).
 *
 * We assume UA_INTLCK_CTRL to be 00b for HBAs that support autosense, and
 * 10b for HBAs that do not support it (do not call scsi_req_get_sense).
 * Here we handle unit attention clearing for UA_INTLCK_CTRL == 00b.
 */

If virtio-scsi doesn't support "Control mode pages," why does the above
comment even say "assume UA_INTLCK_CTRL to be 00b" or address the case
of 10b? Also, other than the reference to it in the above comment,
UA_INTLCK_CTRL is not used anywhere else in the code. This comment
confused me. Is the comment just wrong, or am I missing something? I am
just trying to understand this better so that I am better prepared when
the client driver folks start asking me questions about the qemu support.

Venu

> it from the other commands such as TEST UNIT READY, but that's a guest
> driver bug.
> 
> But QEMU cannot just return the unit attention twice. I would start
> with the patch to use the bus unit attention mechanism. It would be
> even better to have two unit tests that check the behavior prescribed
> by the standard: 1) UNIT ATTENTION from TEST UNIT READY immediately
> after a hotunplug notification; 2) no UNIT ATTENTION from REPORT LUNS
> and also no UNIT ATTENTION from a subsequent TEST UNIT READY command.
> Debugging the guest is a separate step.

Re: [PULL 00/18] Block layer patches

2022-10-03 Thread Stefan Hajnoczi

Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/7.2 for any 
user-visible changes.


signature.asc
Description: PGP signature

Re: [PULL 00/10] target-arm queue

2022-10-03 Thread Stefan Hajnoczi

Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/7.2 for any 
user-visible changes.


signature.asc
Description: PGP signature

Re: [PULL 0/8] chardev patches

2022-10-03 Thread Stefan Hajnoczi

Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/7.2 for any 
user-visible changes.


signature.asc
Description: PGP signature

Re: [Virtio-fs] virtiofsd: Any reason why there's not an "openat2" sandbox mode?

2022-10-03 Thread Colin Walters

On Thu, Sep 29, 2022, at 1:03 PM, Vivek Goyal wrote:
> 
> So rust version of virtiofsd, already supports running unprivileged
> (inside a user namespace).

I know, but as I already said, the use case here is running inside an OpenShift 
unprivileged pod where *we are already in a container*.

> host$ podman unshare -- virtiofsd --socket-path=/tmp/vfsd.sock 
> --shared-dir /mnt \
> --announce-submounts --sandbox chroot &

Yes, but in current OCP 4.11 our seccomp policy denies CLONE_NEWUSER:

```
$ unshare -m
unshare: unshare failed: Function not implemented
```

https://docs.openshift.com/container-platform/4.11/security/seccomp-profiles.html

> I think only privileged operation it needs is assigning a range of
> subuid/subgid to the uid you are using on host.

We also turn on NO_NEW_PRIVILEGES by default in OCP pods.  

Now, I *could* in general get elevated permissions where I need to today.  But 
it's also really important to me to have a long term goal of having operating 
system builds and tests work well as "just another workload" in our production 
container platform (now, one *does* want to bind in /dev/kvm, but that's 
generally safe, and even that strictly speaking is optional if one can stomach 
the ~10x perf hit).

> Can you give rust virtiofsd (unprivileged) a try.

I admit to not actually trying it in a pod, but I think we all agree it can't 
work, and the only thing that can today is openat2.

Re: [PATCH v2] mips/malta: pass RNG seed to to kernel via env var

2022-10-03 Thread Philippe Mathieu-Daudé via


Hi Jason,

Per 
https://www.qemu.org/docs/master/devel/submitting-a-patch.html#when-resending-patches-add-a-version-tag:


Send each new revision as a new top-level thread, rather than burying it 
in-reply-to an earlier revision, as many reviewers are not looking 
inside deep threads for new patches.


On 3/10/22 12:36, Jason A. Donenfeld wrote:

As of the kernel commit linked below, Linux ingests an RNG seed
passed from the hypervisor. So, pass this for the Malta platform, and
reinitialize it on reboot too, so that it's always fresh.

>

Cc: Philippe Mathieu-Daudé 
Cc: Jiaxun Yang 
Cc: Aurelien Jarno 
Link: https://git.kernel.org/mips/c/056a68cea01


You seem to justify this commit by the kernel commit, which justifies
itself mentioning hypervisor use... So the egg comes first before the
chicken.


Signed-off-by: Jason A. Donenfeld 
---
Changes v1->v2:
- Update commit message.
- No code changes.

  hw/mips/malta.c | 25 +
  1 file changed, 25 insertions(+)

diff --git a/hw/mips/malta.c b/hw/mips/malta.c
index 0e932988e0..9d793b3c17 100644
--- a/hw/mips/malta.c
+++ b/hw/mips/malta.c
@@ -26,6 +26,7 @@
  #include "qemu/units.h"
  #include "qemu/bitops.h"
  #include "qemu/datadir.h"
+#include "qemu/guest-random.h"
  #include "hw/clock.h"
  #include "hw/southbridge/piix.h"
  #include "hw/isa/superio.h"
@@ -1017,6 +1018,17 @@ static void G_GNUC_PRINTF(3, 4) prom_set(uint32_t 
*prom_buf, int index,
  va_end(ap);
  }
  
+static void reinitialize_rng_seed(void *opaque)

+{
+char *rng_seed_hex = opaque;
+uint8_t rng_seed[32];
+
+qemu_guest_getrandom_nofail(rng_seed, sizeof(rng_seed));
+for (size_t i = 0; i < sizeof(rng_seed); ++i) {
+sprintf(rng_seed_hex + i * 2, "%02x", rng_seed[i]);
+}
+}
+
  /* Kernel */
  static uint64_t load_kernel(void)
  {
@@ -1028,6 +1040,8 @@ static uint64_t load_kernel(void)
  long prom_size;
  int prom_index = 0;
  uint64_t (*xlate_to_kseg0) (void *opaque, uint64_t addr);
+uint8_t rng_seed[32];
+char rng_seed_hex[sizeof(rng_seed) * 2 + 1];
  
  #if TARGET_BIG_ENDIAN

  big_endian = 1;
@@ -1115,9 +1129,20 @@ static uint64_t load_kernel(void)
  
  prom_set(prom_buf, prom_index++, "modetty0");

  prom_set(prom_buf, prom_index++, "38400n8r");
+
+qemu_guest_getrandom_nofail(rng_seed, sizeof(rng_seed));
+for (size_t i = 0; i < sizeof(rng_seed); ++i) {
+sprintf(rng_seed_hex + i * 2, "%02x", rng_seed[i]);
+}
+prom_set(prom_buf, prom_index++, "rngseed");
+prom_set(prom_buf, prom_index++, "%s", rng_seed_hex);


You use the firmware interface to pass rng data to an hypervisor...

Look to me you are forcing one API to ease another one. From the
FW PoV it is a lie, because the FW will only change this value if
an operator is involved. Here PROM stands for "programmable read-only
memory", rarely modified. Having the 'rngseed' updated on each
reset is surprising.

Do you have an example of firmware doing that? (So I can understand
whether this is the best way to mimic this behavior here).

Aren't they better APIs to have hypervisors pass data to a kernel?

Regards,

Phil.


  prom_set(prom_buf, prom_index++, NULL);
  
  rom_add_blob_fixed("prom", prom_buf, prom_size, ENVP_PADDR);

+qemu_register_reset(reinitialize_rng_seed,
+memmem(rom_ptr(ENVP_PADDR, prom_size), prom_size,
+   rng_seed_hex, sizeof(rng_seed_hex)));
  
  g_free(prom_buf);

  return kernel_entry;

[PULL 1/8] hw/virtio/vhost-shadow-virtqueue: Silence GCC error "maybe-uninitialized"

2022-10-03 Thread Laurent Vivier

From: Bernhard Beschow 

GCC issues a false positive warning, resulting in build failure with -Werror:

  In file included from /usr/include/glib-2.0/glib.h:114,
   from src/include/glib-compat.h:32,
   from src/include/qemu/osdep.h:144,
   from ../src/hw/virtio/vhost-shadow-virtqueue.c:10:
  In function ‘g_autoptr_cleanup_generic_gfree’,
  inlined from ‘vhost_handle_guest_kick’ at 
../src/hw/virtio/vhost-shadow-virtqueue.c:292:42:
  /usr/include/glib-2.0/glib/glib-autocleanups.h:28:3: error: ‘elem’ may be 
used uninitialized [-Werror=maybe-uninitialized]
 28 |   g_free (*pp);
|   ^~~~
  ../src/hw/virtio/vhost-shadow-virtqueue.c: In function 
‘vhost_handle_guest_kick’:
  ../src/hw/virtio/vhost-shadow-virtqueue.c:292:42: note: ‘elem’ was declared 
here
292 | g_autofree VirtQueueElement *elem;
|  ^~~~
  cc1: all warnings being treated as errors

There is actually no problem since "elem" is initialized in both branches.
Silence the warning by initializig it with "NULL".

$ gcc --version
gcc (GCC) 12.2.0

Fixes: 9c2ab2f1ec333be8614cc12272d4b91960704dbe ("vhost: stop transfer elem 
ownership in vhost_handle_guest_kick")
Signed-off-by: Bernhard Beschow 
Reviewed-by: Richard Henderson 
Reviewed-by: Philippe Mathieu-Daudé 
Message-Id: <20220910151117.6665-1-shen...@gmail.com>
Signed-off-by: Laurent Vivier 
---
 hw/virtio/vhost-shadow-virtqueue.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.c 
b/hw/virtio/vhost-shadow-virtqueue.c
index e8e5bbc368dd..596d4434d289 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -289,7 +289,7 @@ static void vhost_handle_guest_kick(VhostShadowVirtqueue 
*svq)
 virtio_queue_set_notification(svq->vq, false);
 
 while (true) {
-g_autofree VirtQueueElement *elem;
+g_autofree VirtQueueElement *elem = NULL;
 int r;
 
 if (svq->next_guest_avail_elem) {
-- 
2.37.3

Re: [PULL 0/8] Trivial branch for 7.2 patches

2022-10-03 Thread Laurent Vivier


Le 03/10/2022 à 21:04, Stefan Hajnoczi a écrit :

On Fri, 30 Sept 2022 at 16:22, Laurent Vivier  wrote:

Philippe Mathieu-Daudé via (1):
   block/qcow2-bitmap: Add missing cast to silent GCC error


Hi Laurent,
This commit uses a mailing list email, probably due to DKIM/SPF issues:
Author: Philippe Mathieu-Daudé via 

I think the policy is to reject such pull requests and fix the
authorship. Could you update your pull request and resend?



Thank you Stefan.

Normally I have a pre-publish-send-email to check that, but it didn't fail in 
the expected way...

I re-send the PR.

Laurent

Re: [PATCH v3 4/5] hw/arm/virt: Improve high memory region address assignment

2022-10-03 Thread Gavin Shan

Hi Eric,

On 10/3/22 4:44 PM, Eric Auger wrote:

On 9/29/22 01:37, Gavin Shan wrote:

On 9/28/22 10:51 PM, Eric Auger wrote:

On 9/22/22 01:13, Gavin Shan wrote:

There are three high memory regions, which are VIRT_HIGH_REDIST2,
VIRT_HIGH_PCIE_ECAM and VIRT_HIGH_PCIE_MMIO. Their base addresses
are floating on highest RAM address. However, they can be disabled
in several cases.

(1) One specific high memory region is disabled by developer by
  toggling vms->highmem_{redists, ecam, mmio}.

(2) VIRT_HIGH_PCIE_ECAM region is disabled on machine, which is
  'virt-2.12' or ealier than it.

(3) VIRT_HIGH_PCIE_ECAM region is disabled when firmware is loaded
  on 32-bits system.

(4) One specific high memory region is disabled when it breaks the
  PA space limit.

The current implementation of virt_set_memmap() isn't comprehensive
because the space for one specific high memory region is always
reserved from the PA space for case (1), (2) and (3). In the code,
'base' and 'vms->highest_gpa' are always increased for those three
cases. It's unnecessary since the assigned space of the disabled
high memory region won't be used afterwards.

This improves the address assignment for those three high memory
region by skipping the address assignment for one specific high
memory region if it has been disabled in case (1), (2) and (3).

Signed-off-by: Gavin Shan 
---
   hw/arm/virt.c | 44 ++--
   1 file changed, 26 insertions(+), 18 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index b0b679d1f4..b702f8f2b5 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1693,15 +1693,31 @@ static void
virt_set_high_memmap(VirtMachineState *vms,
    hwaddr base, int pa_bits)
   {
   hwaddr region_base, region_size;
-    bool fits;
+    bool *region_enabled, fits;

IDo you really need a pointer? If the region is unknown this is a bug in
virt code.

The pointer is needed so that we can disable the region by setting
'false'
to it at later point. Yeah, I think you're correct that 'unknown region'
is a bug and we need to do assert(region_enabled), or something like
below.

Yeah I don't think using a pointer here is useful.

When the high memory region can't fit into the PA space, it is disabled
by toggling the corresponding flag (vms->highmem_{redists, ecam, mmio})
to false. It's part of the original implementation, as below. We either
need a 'switch ... case' or a pointer. A pointer is more convenient since
we need check and possibly update to the value.

   switch (i) {
case VIRT_HIGH_GIC_REDIST2:
vms->highmem_redists &= fits;
break;
case VIRT_HIGH_PCIE_ECAM:
vms->highmem_ecam &= fits;
break;
case VIRT_HIGH_PCIE_MMIO:
vms->highmem_mmio &= fits;
break;
}

   int i;
     for (i = VIRT_LOWMEMMAP_LAST; i <
ARRAY_SIZE(extended_memmap); i++) {
   region_base = ROUND_UP(base, extended_memmap[i].size);
   region_size = extended_memmap[i].size;
   -    vms->memmap[i].base = region_base;
-    vms->memmap[i].size = region_size;
+    switch (i) {
+    case VIRT_HIGH_GIC_REDIST2:
+    region_enabled = >highmem_redists;
+    break;
+    case VIRT_HIGH_PCIE_ECAM:
+    region_enabled = >highmem_ecam;
+    break;
+    case VIRT_HIGH_PCIE_MMIO:
+    region_enabled = >highmem_mmio;
+    break;

While we are at it I would change the vms fields dealing with those
highmem regions and turn those fields into an array of bool indexed
using i - VIRT_LOWMEMMAP_LAST (using a macro or something alike). We
would not be obliged to have this switch, now duplicated.

It makes sense to me. How about to have something like below in v4?

static inline bool *virt_get_high_memmap_enabled(VirtMachineState
*vms, int index)
{
     bool *enabled_array[] = {
   >highmem_redists,
   >highmem_ecam,
   >highmem_mmio,
     };

     assert(index - VIRT_LOWMEMMAP_LAST < ARRAY_SIZE(enabled_array));

     return enabled_array[index - VIRT_LOWMEMMAP_LAST];
}

I was rather thinking as directly using a vms->highmem_flags[] but your
proposal may work as well.

Ok. I will use my proposed change in next revision.

+    default:
+    region_enabled = NULL;
+    }
+
+    /* Skip unknown region */
+    if (!region_enabled) {
+    continue;
+    }
     /*
    * Check each device to see if they fit in the PA space,
@@ -1710,23 +1726,15 @@ static void
virt_set_high_memmap(VirtMachineState *vms,
    * For each device that doesn't fit, disable it.
    */
   fits = (region_base + region_size) <= BIT_ULL(pa_bits);
-    if (fits) {
-    vms->highest_gpa = region_base + region_size - 1;
-    }
+    if (*region_enabled && fits) {
+    vms->memmap[i].base = region_base;
+

1 2 >

1 - 100 of 195 matches

Mail list logo