from:"Harsh Prateek Bora"

Re: [PATCH v4 01/11] ppc: Add Power11 DD2.0 processor

2024-05-30 Thread Harsh Prateek Bora





On 5/30/24 12:18, Aditya Gupta wrote:

Hello Harsh,

On Thu, May 30, 2024 at 10:57:31AM GMT, Harsh Prateek Bora wrote:

Hi Aditya,

On 5/28/24 12:35, Aditya Gupta wrote:

Add CPU target code to add support for new Power11 Processor.

Power11 core is same as Power10, hence reuse functions defined for
Power10.

Cc: Cédric Le Goater 
Cc: Daniel Henrique Barboza 
Cc: Frédéric Barrat 
Cc: Mahesh J Salgaonkar 
Cc: Madhavan Srinivasan 
Cc: Nicholas Piggin 
Signed-off-by: Aditya Gupta 
---
   target/ppc/compat.c |   7 +++
   target/ppc/cpu-models.c |   3 ++
   target/ppc/cpu-models.h |   3 ++
   target/ppc/cpu_init.c   | 102 
   4 files changed, 115 insertions(+)

diff --git a/target/ppc/compat.c b/target/ppc/compat.c
index ebef2cccecf3..12dd8ae290ca 100644
--- a/target/ppc/compat.c
+++ b/target/ppc/compat.c
@@ -100,6 +100,13 @@ static const CompatInfo compat_table[] = {
   .pcr_level = PCR_COMPAT_3_10,
   .max_vthreads = 8,
   },
+{ /* POWER11, ISA3.10 */
+.name = "power11",
+.pvr = CPU_POWERPC_LOGICAL_3_10_PLUS,
+.pcr = PCR_COMPAT_3_10,
+.pcr_level = PCR_COMPAT_3_10,
+.max_vthreads = 8,
+},
   };
   static const CompatInfo *compat_by_pvr(uint32_t pvr)
diff --git a/target/ppc/cpu-models.c b/target/ppc/cpu-models.c
index f2301b43f78b..ece348178188 100644
--- a/target/ppc/cpu-models.c
+++ b/target/ppc/cpu-models.c
@@ -734,6 +734,8 @@
   "POWER9 v2.2")
   POWERPC_DEF("power10_v2.0",  CPU_POWERPC_POWER10_DD20,   POWER10,
   "POWER10 v2.0")
+POWERPC_DEF("power11_v2.0",  CPU_POWERPC_POWER11_DD20,   POWER11,
+"POWER11_v2.0")
   #endif /* defined (TARGET_PPC64) */
   /***/
@@ -909,6 +911,7 @@ PowerPCCPUAlias ppc_cpu_aliases[] = {
   { "power8nvl", "power8nvl_v1.0" },
   { "power9", "power9_v2.2" },
   { "power10", "power10_v2.0" },
+{ "power11", "power11_v2.0" },
   #endif
   /* Generic PowerPCs */
diff --git a/target/ppc/cpu-models.h b/target/ppc/cpu-models.h
index 0229ef3a9a5c..ef74e387b047 100644
--- a/target/ppc/cpu-models.h
+++ b/target/ppc/cpu-models.h
@@ -354,6 +354,8 @@ enum {
   CPU_POWERPC_POWER10_BASE   = 0x0080,
   CPU_POWERPC_POWER10_DD1= 0x00801100,
   CPU_POWERPC_POWER10_DD20   = 0x00801200,
+CPU_POWERPC_POWER11_BASE   = 0x0082,
+CPU_POWERPC_POWER11_DD20   = 0x00821200,
   CPU_POWERPC_970_v22= 0x00390202,
   CPU_POWERPC_970FX_v10  = 0x00391100,
   CPU_POWERPC_970FX_v20  = 0x003C0200,
@@ -391,6 +393,7 @@ enum {
   CPU_POWERPC_LOGICAL_2_07   = 0x0F04,
   CPU_POWERPC_LOGICAL_3_00   = 0x0F05,
   CPU_POWERPC_LOGICAL_3_10   = 0x0F06,
+CPU_POWERPC_LOGICAL_3_10_PLUS  = 0x0F07,
   };
   /* System version register (used on MPC 8xxx)
*/
diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
index 01e358a4a5ac..82d700382cdd 100644
--- a/target/ppc/cpu_init.c
+++ b/target/ppc/cpu_init.c
@@ -6763,6 +6763,108 @@ POWERPC_FAMILY(POWER10)(ObjectClass *oc, void *data)
   pcc->l1_icache_size = 0x8000;
   }
+static bool ppc_pvr_match_power11(PowerPCCPUClass *pcc, uint32_t pvr, bool 
best)
+{
+uint32_t base = pvr & CPU_POWERPC_POWER_SERVER_MASK;
+uint32_t pcc_base = pcc->pvr & CPU_POWERPC_POWER_SERVER_MASK;
+
+if (!best && (base == CPU_POWERPC_POWER11_BASE)) {


Also, this helper is almost same as that of power10 except for the base 
check against respective value. This entire logic can be shared by 
passing respective base value to the another low level routine which 
takes this base value as arg. Let's try to avoid code duplication by 
resharing as much as possible.


Thanks
Harsh


+return true;
+}
+
+if (base != pcc_base) {
+return false;
+}
+
+if ((pvr & 0x0f00) == (pcc->pvr & 0x0f00)) {
+return true;
+}
+
+return false;
+}
+
+POWERPC_FAMILY(POWER11)(ObjectClass *oc, void *data)
+{
+DeviceClass *dc = DEVICE_CLASS(oc);
+PowerPCCPUClass *pcc = POWERPC_CPU_CLASS(oc);
+
+dc->fw_name = "PowerPC,POWER11";
+dc->desc = "POWER11";
+pcc->pvr_match = ppc_pvr_match_power11;
+pcc->pcr_mask = PCR_COMPAT_2_05 | PCR_COMPAT_2_06 | PCR_COMPAT_2_07 |
+PCR_COMPAT_3_00;
+pcc->pcr_supported = PCR_COMPAT_3_10 | PCR_COMPAT_3_00 | PCR_COMPAT_2_07 |
+ PCR_COMPAT_2_06 | PCR_COMPAT_2_05;
+pcc->init_proc = init_proc_POWER10;
+pcc->check_pow = check_pow_nocheck;
+pcc->insns_flags = PPC_INSNS_BASE | PPC_ISEL | PPC_STRING | PPC_MFTB |

Re: [PATCH 1/2] hw/acpi: Remove the deprecated QAPI MEM_UNPLUG_ERROR event

2024-05-30 Thread Harsh Prateek Bora





On 5/30/24 12:45, Philippe Mathieu-Daudé wrote:

The MEM_UNPLUG_ERROR event is deprecated since commit d43f1670c7
("qapi/qdev.json: add DEVICE_UNPLUG_GUEST_ERROR QAPI event"),
time to remove it.

Signed-off-by: Philippe Mathieu-Daudé 
---
  docs/about/deprecated.rst   |  5 -
  docs/about/removed-features.rst |  9 +
  qapi/machine.json   | 28 
  hw/acpi/memory_hotplug.c|  8 
  hw/ppc/spapr.c  | 11 +--
  5 files changed, 10 insertions(+), 51 deletions(-)



For spapr:
Reviewed-by: Harsh Prateek Bora 


diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst
index 40585ca7d5..4a61894db6 100644
--- a/docs/about/deprecated.rst
+++ b/docs/about/deprecated.rst
@@ -151,11 +151,6 @@ property types.
  QEMU Machine Protocol (QMP) events
  --
  
-``MEM_UNPLUG_ERROR`` (since 6.2)

-
-
-Use the more generic event ``DEVICE_UNPLUG_GUEST_ERROR`` instead.
-
  ``vcpu`` trace events (since 8.1)
  '
  
diff --git a/docs/about/removed-features.rst b/docs/about/removed-features.rst

index fba0cfb0b0..f1e70263e2 100644
--- a/docs/about/removed-features.rst
+++ b/docs/about/removed-features.rst
@@ -671,6 +671,15 @@ Use ``multifd-channels`` instead.
  
  Use ``multifd-compression`` instead.
  
+QEMU Machine Protocol (QMP) events

+--
+
+``MEM_UNPLUG_ERROR`` (removed in 9.1)
+'
+
+MEM_UNPLUG_ERROR has been replaced by the more generic 
``DEVICE_UNPLUG_GUEST_ERROR`` event.
+
+
  Human Monitor Protocol (HMP) commands
  -
  
diff --git a/qapi/machine.json b/qapi/machine.json

index bce6e1bbc4..453feb9347 100644
--- a/qapi/machine.json
+++ b/qapi/machine.json
@@ -1607,34 +1607,6 @@
  { 'event': 'MEMORY_DEVICE_SIZE_CHANGE',
'data': { '*id': 'str', 'size': 'size', 'qom-path' : 'str'} }
  
-##

-# @MEM_UNPLUG_ERROR:
-#
-# Emitted when memory hot unplug error occurs.
-#
-# @device: device name
-#
-# @msg: Informative message
-#
-# Features:
-#
-# @deprecated: This event is deprecated.  Use
-# @DEVICE_UNPLUG_GUEST_ERROR instead.
-#
-# Since: 2.4
-#
-# Example:
-#
-# <- { "event": "MEM_UNPLUG_ERROR",
-#  "data": { "device": "dimm1",
-#"msg": "acpi: device unplug for unsupported device"
-#  },
-#  "timestamp": { "seconds": 1265044230, "microseconds": 450486 } }
-##
-{ 'event': 'MEM_UNPLUG_ERROR',
-  'data': { 'device': 'str', 'msg': 'str' },
-  'features': ['deprecated'] }
-
  ##
  # @BootConfiguration:
  #
diff --git a/hw/acpi/memory_hotplug.c b/hw/acpi/memory_hotplug.c
index de6f974ebb..9b974b7274 100644
--- a/hw/acpi/memory_hotplug.c
+++ b/hw/acpi/memory_hotplug.c
@@ -178,14 +178,6 @@ static void acpi_memory_hotplug_write(void *opaque, hwaddr 
addr, uint64_t data,
  hotplug_handler_unplug(hotplug_ctrl, dev, _err);
  if (local_err) {
  trace_mhp_acpi_pc_dimm_delete_failed(mem_st->selector);
-
-/*
- * Send both MEM_UNPLUG_ERROR and DEVICE_UNPLUG_GUEST_ERROR
- * while the deprecation of MEM_UNPLUG_ERROR is
- * pending.
- */
-qapi_event_send_mem_unplug_error(dev->id ? : "",
- error_get_pretty(local_err));
  qapi_event_send_device_unplug_guest_error(dev->id,

dev->canonical_path);
  error_free(local_err);
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 4345764bce..81a187f126 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -3786,7 +3786,6 @@ void spapr_memory_unplug_rollback(SpaprMachineState 
*spapr, DeviceState *dev)
  SpaprDrc *drc;
  uint32_t nr_lmbs;
  uint64_t size, addr_start, addr;
-g_autofree char *qapi_error = NULL;
  int i;
  
  if (!dev) {

@@ -3823,16 +3822,8 @@ void spapr_memory_unplug_rollback(SpaprMachineState 
*spapr, DeviceState *dev)
  
  /*

   * Tell QAPI that something happened and the memory
- * hotunplug wasn't successful. Keep sending
- * MEM_UNPLUG_ERROR even while sending
- * DEVICE_UNPLUG_GUEST_ERROR until the deprecation of
- * MEM_UNPLUG_ERROR is due.
+ * hotunplug wasn't successful.
   */
-qapi_error = g_strdup_printf("Memory hotunplug rejected by the guest "
- "for device %s", dev->id);
-
-qapi_event_send_mem_unplug_error(dev->id ? : "", qapi_error);
-
  qapi_event_send_device_unplug_guest_error(dev->id,
dev->canonical_path);
  }

Re: [PATCH v4 02/11] ppc/pseries: Add Power11 cpu type

2024-05-30 Thread Harsh Prateek Bora





On 5/28/24 12:35, Aditya Gupta wrote:

Add sPAPR CPU Core definition for Power11

Cc: David Gibson  (reviewer:sPAPR (pseries))
Cc: Harsh Prateek Bora  (reviewer:sPAPR (pseries))
Cc: Cédric Le Goater 
Cc: Daniel Henrique Barboza 
Cc: Frédéric Barrat 
Cc: Mahesh J Salgaonkar 
Cc: Madhavan Srinivasan 
Cc: Nicholas Piggin 
Signed-off-by: Aditya Gupta 
---
  docs/system/ppc/pseries.rst | 6 +++---
  hw/ppc/spapr_cpu_core.c | 1 +
  2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/docs/system/ppc/pseries.rst b/docs/system/ppc/pseries.rst
index a876d897b6e4..3277564b34c2 100644
--- a/docs/system/ppc/pseries.rst
+++ b/docs/system/ppc/pseries.rst
@@ -15,9 +15,9 @@ Supported devices
  =
  
   * Multi processor support for many Power processors generations: POWER7,

-   POWER7+, POWER8, POWER8NVL, POWER9, and Power10. Support for POWER5+ exists,
-   but its state is unknown.
- * Interrupt Controller, XICS (POWER8) and XIVE (POWER9 and Power10)
+   POWER7+, POWER8, POWER8NVL, POWER9, Power10 and Power11. Support for POWER5+
+   exists, but its state is unknown.
+ * Interrupt Controller, XICS (POWER8) and XIVE (POWER9, Power10, Power11)


I think it would look more cleaner to rephrase as below:

 * Multi processor support for many Power processors generations:
   - POWER7, POWER7+
   - POWER8, POWER8NVL
   - POWER9
   - Power10
   - Power11.
   - Support for POWER5+ exists, but its state is unknown.
 * Interrupt Controller
- XICS (POWER8)
- XIVE (Supported by below:)
- POWER9
- Power10
- Power11

So, that every next platform just need to add one line for itself.

With that,
Reviewed-by: Harsh Prateek Bora 

Thanks
Harsh

   * vPHB PCIe Host bridge.
   * vscsi and vnet devices, compatible with the same devices available on a
 PowerVM hypervisor with VIOS managing LPARs.
diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c
index e7c9edd033c8..62416b7e0a7e 100644
--- a/hw/ppc/spapr_cpu_core.c
+++ b/hw/ppc/spapr_cpu_core.c
@@ -401,6 +401,7 @@ static const TypeInfo spapr_cpu_core_type_infos[] = {
  DEFINE_SPAPR_CPU_CORE_TYPE("power9_v2.0"),
  DEFINE_SPAPR_CPU_CORE_TYPE("power9_v2.2"),
  DEFINE_SPAPR_CPU_CORE_TYPE("power10_v2.0"),
+DEFINE_SPAPR_CPU_CORE_TYPE("power11_v2.0"),
  #ifdef CONFIG_KVM
  DEFINE_SPAPR_CPU_CORE_TYPE("host"),
  #endif

Re: [PATCH v4 01/11] ppc: Add Power11 DD2.0 processor

2024-05-29 Thread Harsh Prateek Bora


Hi Aditya,

On 5/28/24 12:35, Aditya Gupta wrote:

Add CPU target code to add support for new Power11 Processor.

Power11 core is same as Power10, hence reuse functions defined for
Power10.

Cc: Cédric Le Goater 
Cc: Daniel Henrique Barboza 
Cc: Frédéric Barrat 
Cc: Mahesh J Salgaonkar 
Cc: Madhavan Srinivasan 
Cc: Nicholas Piggin 
Signed-off-by: Aditya Gupta 
---
  target/ppc/compat.c |   7 +++
  target/ppc/cpu-models.c |   3 ++
  target/ppc/cpu-models.h |   3 ++
  target/ppc/cpu_init.c   | 102 
  4 files changed, 115 insertions(+)

diff --git a/target/ppc/compat.c b/target/ppc/compat.c
index ebef2cccecf3..12dd8ae290ca 100644
--- a/target/ppc/compat.c
+++ b/target/ppc/compat.c
@@ -100,6 +100,13 @@ static const CompatInfo compat_table[] = {
  .pcr_level = PCR_COMPAT_3_10,
  .max_vthreads = 8,
  },
+{ /* POWER11, ISA3.10 */
+.name = "power11",
+.pvr = CPU_POWERPC_LOGICAL_3_10_PLUS,
+.pcr = PCR_COMPAT_3_10,
+.pcr_level = PCR_COMPAT_3_10,
+.max_vthreads = 8,
+},
  };
  
  static const CompatInfo *compat_by_pvr(uint32_t pvr)

diff --git a/target/ppc/cpu-models.c b/target/ppc/cpu-models.c
index f2301b43f78b..ece348178188 100644
--- a/target/ppc/cpu-models.c
+++ b/target/ppc/cpu-models.c
@@ -734,6 +734,8 @@
  "POWER9 v2.2")
  POWERPC_DEF("power10_v2.0",  CPU_POWERPC_POWER10_DD20,   POWER10,
  "POWER10 v2.0")
+POWERPC_DEF("power11_v2.0",  CPU_POWERPC_POWER11_DD20,   POWER11,
+"POWER11_v2.0")
  #endif /* defined (TARGET_PPC64) */
  
  /***/

@@ -909,6 +911,7 @@ PowerPCCPUAlias ppc_cpu_aliases[] = {
  { "power8nvl", "power8nvl_v1.0" },
  { "power9", "power9_v2.2" },
  { "power10", "power10_v2.0" },
+{ "power11", "power11_v2.0" },
  #endif
  
  /* Generic PowerPCs */

diff --git a/target/ppc/cpu-models.h b/target/ppc/cpu-models.h
index 0229ef3a9a5c..ef74e387b047 100644
--- a/target/ppc/cpu-models.h
+++ b/target/ppc/cpu-models.h
@@ -354,6 +354,8 @@ enum {
  CPU_POWERPC_POWER10_BASE   = 0x0080,
  CPU_POWERPC_POWER10_DD1= 0x00801100,
  CPU_POWERPC_POWER10_DD20   = 0x00801200,
+CPU_POWERPC_POWER11_BASE   = 0x0082,
+CPU_POWERPC_POWER11_DD20   = 0x00821200,
  CPU_POWERPC_970_v22= 0x00390202,
  CPU_POWERPC_970FX_v10  = 0x00391100,
  CPU_POWERPC_970FX_v20  = 0x003C0200,
@@ -391,6 +393,7 @@ enum {
  CPU_POWERPC_LOGICAL_2_07   = 0x0F04,
  CPU_POWERPC_LOGICAL_3_00   = 0x0F05,
  CPU_POWERPC_LOGICAL_3_10   = 0x0F06,
+CPU_POWERPC_LOGICAL_3_10_PLUS  = 0x0F07,
  };
  
  /* System version register (used on MPC 8xxx)*/

diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
index 01e358a4a5ac..82d700382cdd 100644
--- a/target/ppc/cpu_init.c
+++ b/target/ppc/cpu_init.c
@@ -6763,6 +6763,108 @@ POWERPC_FAMILY(POWER10)(ObjectClass *oc, void *data)
  pcc->l1_icache_size = 0x8000;
  }
  
+static bool ppc_pvr_match_power11(PowerPCCPUClass *pcc, uint32_t pvr, bool best)

+{
+uint32_t base = pvr & CPU_POWERPC_POWER_SERVER_MASK;
+uint32_t pcc_base = pcc->pvr & CPU_POWERPC_POWER_SERVER_MASK;
+
+if (!best && (base == CPU_POWERPC_POWER11_BASE)) {
+return true;
+}
+
+if (base != pcc_base) {
+return false;
+}
+
+if ((pvr & 0x0f00) == (pcc->pvr & 0x0f00)) {
+return true;
+}
+
+return false;
+}
+
+POWERPC_FAMILY(POWER11)(ObjectClass *oc, void *data)
+{
+DeviceClass *dc = DEVICE_CLASS(oc);
+PowerPCCPUClass *pcc = POWERPC_CPU_CLASS(oc);
+
+dc->fw_name = "PowerPC,POWER11";
+dc->desc = "POWER11";
+pcc->pvr_match = ppc_pvr_match_power11;
+pcc->pcr_mask = PCR_COMPAT_2_05 | PCR_COMPAT_2_06 | PCR_COMPAT_2_07 |
+PCR_COMPAT_3_00;
+pcc->pcr_supported = PCR_COMPAT_3_10 | PCR_COMPAT_3_00 | PCR_COMPAT_2_07 |
+ PCR_COMPAT_2_06 | PCR_COMPAT_2_05;
+pcc->init_proc = init_proc_POWER10;
+pcc->check_pow = check_pow_nocheck;
+pcc->insns_flags = PPC_INSNS_BASE | PPC_ISEL | PPC_STRING | PPC_MFTB |
+   PPC_FLOAT | PPC_FLOAT_FSEL | PPC_FLOAT_FRES |
+   PPC_FLOAT_FSQRT | PPC_FLOAT_FRSQRTE |
+   PPC_FLOAT_FRSQRTES |
+   PPC_FLOAT_STFIWX |
+   PPC_FLOAT_EXT |
+   PPC_CACHE | PPC_CACHE_ICBI | PPC_CACHE_DCBZ |
+   PPC_MEM_SYNC | PPC_MEM_EIEIO |
+   PPC_MEM_TLBIE | PPC_MEM_TLBSYNC |
+   PPC_64B | PPC_64H | PPC_64BX | PPC_ALTIVEC |
+   PPC_SEGMENT_64B | PPC_SLBI |
+   PPC_POPCNTB | PPC_POPCNTWD |
+   PPC_CILDST;
+pcc->insns_flags2 = PPC2_VSX | PPC2_VSX207 |

Re: [PATCH V12 1/8] accel/kvm: Extract common KVM vCPU {creation,parking} code

2024-05-29 Thread Harsh Prateek Bora





On 5/30/24 05:12, Salil Mehta wrote:

KVM vCPU creation is done once during the vCPU realization when Qemu vCPU thread
is spawned. This is common to all the architectures as of now.

Hot-unplug of vCPU results in destruction of the vCPU object in QOM but the
corresponding KVM vCPU object in the Host KVM is not destroyed as KVM doesn't
support vCPU removal. Therefore, its representative KVM vCPU object/context in
Qemu is parked.

Refactor architecture common logic so that some APIs could be reused by vCPU
Hotplug code of some architectures likes ARM, Loongson etc. Update new/old APIs
with trace events. No functional change is intended here.

Signed-off-by: Salil Mehta 
Reviewed-by: Gavin Shan 
Tested-by: Vishnu Pajjuri 
Reviewed-by: Jonathan Cameron 
Tested-by: Xianglai Li 
Tested-by: Miguel Luis 
Reviewed-by: Shaoqin Huang 
Reviewed-by: Vishnu Pajjuri 
Reviewed-by: Nicholas Piggin 
Tested-by: Zhao Liu 
Reviewed-by: Zhao Liu 
---
  accel/kvm/kvm-all.c| 95 --
  accel/kvm/kvm-cpus.h   | 23 ++
  accel/kvm/trace-events |  5 ++-
  3 files changed, 90 insertions(+), 33 deletions(-)



Since there are no functional changes intended here and we have a
different patch series (ppc64 vcpu hotplug failure fixes) depending on
this patch as well, it will be nice to see this patch getting merged
soon.

Reviewed-by: Harsh Prateek Bora 


diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index c0be9f5eed..8f9128bb92 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -340,14 +340,71 @@ err:
  return ret;
  }
  
+void kvm_park_vcpu(CPUState *cpu)

+{
+struct KVMParkedVcpu *vcpu;
+
+trace_kvm_park_vcpu(cpu->cpu_index, kvm_arch_vcpu_id(cpu));
+
+vcpu = g_malloc0(sizeof(*vcpu));
+vcpu->vcpu_id = kvm_arch_vcpu_id(cpu);
+vcpu->kvm_fd = cpu->kvm_fd;
+QLIST_INSERT_HEAD(_state->kvm_parked_vcpus, vcpu, node);
+}
+
+int kvm_unpark_vcpu(KVMState *s, unsigned long vcpu_id)
+{
+struct KVMParkedVcpu *cpu;
+int kvm_fd = -ENOENT;
+
+QLIST_FOREACH(cpu, >kvm_parked_vcpus, node) {
+if (cpu->vcpu_id == vcpu_id) {
+QLIST_REMOVE(cpu, node);
+kvm_fd = cpu->kvm_fd;
+g_free(cpu);
+}
+}
+
+trace_kvm_unpark_vcpu(vcpu_id, kvm_fd > 0 ? "unparked" : "not found 
parked");
+
+return kvm_fd;
+}
+
+int kvm_create_vcpu(CPUState *cpu)
+{
+unsigned long vcpu_id = kvm_arch_vcpu_id(cpu);
+KVMState *s = kvm_state;
+int kvm_fd;
+
+/* check if the KVM vCPU already exist but is parked */
+kvm_fd = kvm_unpark_vcpu(s, vcpu_id);
+if (kvm_fd < 0) {
+/* vCPU not parked: create a new KVM vCPU */
+kvm_fd = kvm_vm_ioctl(s, KVM_CREATE_VCPU, vcpu_id);
+if (kvm_fd < 0) {
+error_report("KVM_CREATE_VCPU IOCTL failed for vCPU %lu", vcpu_id);
+return kvm_fd;
+}
+}
+
+cpu->kvm_fd = kvm_fd;
+cpu->kvm_state = s;
+cpu->vcpu_dirty = true;
+cpu->dirty_pages = 0;
+cpu->throttle_us_per_full = 0;
+
+trace_kvm_create_vcpu(cpu->cpu_index, vcpu_id, kvm_fd);
+
+return 0;
+}
+
  static int do_kvm_destroy_vcpu(CPUState *cpu)
  {
  KVMState *s = kvm_state;
  long mmap_size;
-struct KVMParkedVcpu *vcpu = NULL;
  int ret = 0;
  
-trace_kvm_destroy_vcpu();

+trace_kvm_destroy_vcpu(cpu->cpu_index, kvm_arch_vcpu_id(cpu));
  
  ret = kvm_arch_destroy_vcpu(cpu);

  if (ret < 0) {
@@ -373,10 +430,7 @@ static int do_kvm_destroy_vcpu(CPUState *cpu)
  }
  }
  
-vcpu = g_malloc0(sizeof(*vcpu));

-vcpu->vcpu_id = kvm_arch_vcpu_id(cpu);
-vcpu->kvm_fd = cpu->kvm_fd;
-QLIST_INSERT_HEAD(_state->kvm_parked_vcpus, vcpu, node);
+kvm_park_vcpu(cpu);
  err:
  return ret;
  }
@@ -389,24 +443,6 @@ void kvm_destroy_vcpu(CPUState *cpu)
  }
  }
  
-static int kvm_get_vcpu(KVMState *s, unsigned long vcpu_id)

-{
-struct KVMParkedVcpu *cpu;
-
-QLIST_FOREACH(cpu, >kvm_parked_vcpus, node) {
-if (cpu->vcpu_id == vcpu_id) {
-int kvm_fd;
-
-QLIST_REMOVE(cpu, node);
-kvm_fd = cpu->kvm_fd;
-g_free(cpu);
-return kvm_fd;
-}
-}
-
-return kvm_vm_ioctl(s, KVM_CREATE_VCPU, (void *)vcpu_id);
-}
-
  int kvm_init_vcpu(CPUState *cpu, Error **errp)
  {
  KVMState *s = kvm_state;
@@ -415,19 +451,14 @@ int kvm_init_vcpu(CPUState *cpu, Error **errp)
  
  trace_kvm_init_vcpu(cpu->cpu_index, kvm_arch_vcpu_id(cpu));
  
-ret = kvm_get_vcpu(s, kvm_arch_vcpu_id(cpu));

+ret = kvm_create_vcpu(cpu);
  if (ret < 0) {
-error_setg_errno(errp, -ret, "kvm_init_vcpu: kvm_get_vcpu failed 
(%lu)",
+error_setg_errno(errp, -ret,
+ "kvm_init_vcpu: kvm_create_vcpu failed (%lu)",
   kvm_arch_vcpu_id(cpu));

Re: [RFC PATCH 08/10] ppc/pnv: Invert the design for big-core machine modelling

2024-05-29 Thread Harsh Prateek Bora


Hi Nick,

On 5/26/24 17:56, Nicholas Piggin wrote:

POWER9 and POWER10 machines come in two variants, "big-core" and
"small-core".

Big core machines are SMT8 from the software point of view, but in the
low level platform topology ("xscom registers and pervasive
addressing"), these look more like a pair of small cores ganged
together.

Presently, the way this is modelled is to create an SMT8 PnvCore and
add special cases to xscom and pervasive for big-core mode. This is
becoming too complicated to manage as more of the machine is modelled.
The better approach looks like the inverse, which is creating 2xPnvCore
ganging them together to look like an SMT8 core in TCG. The TCG SMT code
is quite simple to do that, and then the xscom and pervasive modelling
does not need to differentiate big and small core modes for the most
part.

device-tree building does need a special case to only build one
CPU node for each big-core because that's what the firmware expects.
And so does a special case workaround in the ChipTOD model.

A big-core machine option is added for powernv9 and 10 machines.

Signed-off-by: Nicholas Piggin 
---
  include/hw/ppc/pnv.h |   3 +
  include/hw/ppc/pnv_core.h|   8 ++
  target/ppc/cpu.h |   4 +-
  hw/ppc/pnv.c | 183 ---
  hw/ppc/pnv_core.c|  20 +++-
  hw/ppc/spapr_cpu_core.c  |   6 +-
  target/ppc/misc_helper.c |   6 +-
  target/ppc/timebase_helper.c |   9 ++
  8 files changed, 197 insertions(+), 42 deletions(-)

diff --git a/include/hw/ppc/pnv.h b/include/hw/ppc/pnv.h
index 476b136146..93ecb062b4 100644
--- a/include/hw/ppc/pnv.h
+++ b/include/hw/ppc/pnv.h
@@ -100,6 +100,9 @@ struct PnvMachineState {
  PnvPnor  *pnor;
  
  hwaddr   fw_load_addr;

+
+bool big_core;
+bool big_core_tbst_quirk;
  };
  
  PnvChip *pnv_get_chip(PnvMachineState *pnv, uint32_t chip_id);

diff --git a/include/hw/ppc/pnv_core.h b/include/hw/ppc/pnv_core.h
index 21297262c1..39f8f33e6c 100644
--- a/include/hw/ppc/pnv_core.h
+++ b/include/hw/ppc/pnv_core.h
@@ -27,6 +27,13 @@
  
  /* ChipTOD and TimeBase State Machine */

  struct pnv_tod_tbst {
+/*
+ * POWER10 DD2.0 - big core TFMR drives the state machine on the even
+ * small core. Skiboot has a workaround that targets the even small core
+ * for CHIPTOD_TO_TB ops.
+ */
+bool big_core_quirk;
+
  int tb_ready_for_tod; /* core TB ready to receive TOD from chiptod */
  int tod_sent_to_tb;   /* chiptod sent TOD to the core TB */
  
@@ -49,6 +56,7 @@ struct PnvCore {
  
  /*< public >*/

  PowerPCCPU **threads;
+bool big_core;
  uint32_t pir;
  uint32_t hwid;
  uint64_t hrmor;
diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index 8fd6ade471..de15e38af8 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -1248,6 +1248,7 @@ struct CPUArchState {
  int access_type;
  
  /* For SMT processors */

+int has_smt_siblings;


   bool ?


  int core_index;
  
  #if !defined(CONFIG_USER_ONLY)

@@ -1276,7 +1277,6 @@ struct CPUArchState {
  uint32_t tlb_need_flush; /* Delayed flush needed */
  #define TLB_NEED_LOCAL_FLUSH   0x1
  #define TLB_NEED_GLOBAL_FLUSH  0x2
-
  #endif
  
  /* Other registers */

@@ -1407,7 +1407,7 @@ struct CPUArchState {
  };
  
  #define PPC_CPU_HAS_CORE_SIBLINGS(cs)   \

-(cs->nr_threads > 1)
+(POWERPC_CPU(cs)->env.has_smt_siblings)
  
  #define PPC_CPU_HAS_LPAR_SIBLINGS(cs)   \

  ((POWERPC_CPU(cs)->env.flags & POWERPC_FLAG_SMT_1LPAR) &&   \
diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
index 7d062ec16c..5364c55bbb 100644
--- a/hw/ppc/pnv.c
+++ b/hw/ppc/pnv.c
@@ -142,7 +142,7 @@ static int pnv_dt_core(PnvChip *chip, PnvCore *pc, void 
*fdt)
  CPUPPCState *env = >env;
  PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cs);
  PnvChipClass *pnv_cc = PNV_CHIP_GET_CLASS(chip);
-g_autofree uint32_t *servers_prop = g_new(uint32_t, smt_threads);
+uint32_t *servers_prop;
  int i;
  uint32_t pir, tir;
  uint32_t segs[] = {cpu_to_be32(28), cpu_to_be32(40),
@@ -157,6 +157,14 @@ static int pnv_dt_core(PnvChip *chip, PnvCore *pc, void 
*fdt)
  
  pnv_cc->processor_id(chip, pc->hwid, 0, , );
  
+/* Only one DT node per (big) core */

+if (tir != 0) {
+g_assert(pc->big_core);
+g_assert(tir == 1);
+g_assert(pc->hwid & 1) > +return -1;
+}
+
  nodename = g_strdup_printf("%s@%x", dc->fw_name, pir);
  offset = fdt_add_subnode(fdt, cpus_offset, nodename);
  _FDT(offset);
@@ -236,12 +244,28 @@ static int pnv_dt_core(PnvChip *chip, PnvCore *pc, void 
*fdt)
  }
  
  /* Build interrupt servers properties */

-for (i = 0; i < smt_threads; i++) {
-pnv_cc->processor_id(chip, pc->hwid, i, , );
-servers_prop[i] = cpu_to_be32(pir);
+if (pc->big_core) {
+servers_prop = g_new(uint32_t,

Re: [RFC PATCH 07/10] target/ppc: Add helpers to check for SMT sibling threads

2024-05-28 Thread Harsh Prateek Bora





On 5/26/24 17:56, Nicholas Piggin wrote:

Add helpers for TCG code to determine if there are SMT siblings
sharing per-core and per-lpar registers. This simplifies the
callers and makes SMT register topology simpler to modify with
later changes.

Signed-off-by: Nicholas Piggin 
---
  target/ppc/cpu.h |  7 +++
  target/ppc/cpu_init.c|  2 +-
  target/ppc/excp_helper.c | 16 +++-
  target/ppc/misc_helper.c | 27 ++-
  target/ppc/timebase_helper.c | 20 +++-
  5 files changed, 28 insertions(+), 44 deletions(-)

diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index 9a89083932..8fd6ade471 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -1406,6 +1406,13 @@ struct CPUArchState {
  uint64_t pmu_base_time;
  };
  
+#define PPC_CPU_HAS_CORE_SIBLINGS(cs)   \

+(cs->nr_threads > 1)
+
+#define PPC_CPU_HAS_LPAR_SIBLINGS(cs)   \
+((POWERPC_CPU(cs)->env.flags & POWERPC_FLAG_SMT_1LPAR) &&   \
+ PPC_CPU_HAS_CORE_SIBLINGS(cs))
+
  #define _CORE_ID(cs)\
  (POWERPC_CPU(cs)->env.core_index)
  
diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c

index ae483e20c4..e71ee008ed 100644
--- a/target/ppc/cpu_init.c
+++ b/target/ppc/cpu_init.c
@@ -6975,7 +6975,7 @@ static void ppc_cpu_realize(DeviceState *dev, Error 
**errp)
  
  pcc->parent_realize(dev, errp);
  
-if (env_cpu(env)->nr_threads > 1) {

+if (PPC_CPU_HAS_CORE_SIBLINGS(cs)) {
  env->flags |= POWERPC_FLAG_SMT;
  }
  
diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c

index 0cd542675f..fd45da0f2b 100644
--- a/target/ppc/excp_helper.c
+++ b/target/ppc/excp_helper.c
@@ -3029,7 +3029,7 @@ void helper_book3s_msgsnd(CPUPPCState *env, target_ulong 
rb)
  brdcast = true;
  }
  
-if (cs->nr_threads == 1 || !brdcast) {

+if (!PPC_CPU_HAS_CORE_SIBLINGS(cs) || !brdcast) {


Since there are multiple usage of above macro in negation below as well, 
we may probably want to introduce another macro PPC_CPU_HAS_SINGLE_CORE

which checks only for nr_threads == 1. Anyways,

Reviewed-by: Harsh Prateek Bora 



  ppc_set_irq(cpu, PPC_INTERRUPT_HDOORBELL, 1);
  return;
  }
@@ -3067,21 +3067,19 @@ void helper_book3s_msgsndp(CPUPPCState *env, 
target_ulong rb)
  CPUState *cs = env_cpu(env);
  PowerPCCPU *cpu = env_archcpu(env);
  CPUState *ccs;
-uint32_t nr_threads = cs->nr_threads;
  int ttir = rb & PPC_BITMASK(57, 63);
  
  helper_hfscr_facility_check(env, HFSCR_MSGP, "msgsndp", HFSCR_IC_MSGP);
  
-if (!(env->flags & POWERPC_FLAG_SMT_1LPAR)) {

-nr_threads = 1; /* msgsndp behaves as 1-thread in LPAR-per-thread 
mode*/
-}
-
-if (!dbell_type_server(rb) || ttir >= nr_threads) {
+if (!dbell_type_server(rb)) {
  return;
  }
  
-if (nr_threads == 1) {

-ppc_set_irq(cpu, PPC_INTERRUPT_DOORBELL, 1);
+/* msgsndp behaves as 1-thread in LPAR-per-thread mode*/
+if (!PPC_CPU_HAS_LPAR_SIBLINGS(cs)) {
+if (ttir == 0) {
+ppc_set_irq(cpu, PPC_INTERRUPT_DOORBELL, 1);
+}
  return;
  }
  
diff --git a/target/ppc/misc_helper.c b/target/ppc/misc_helper.c

index 46ba3a5584..598c956cdd 100644
--- a/target/ppc/misc_helper.c
+++ b/target/ppc/misc_helper.c
@@ -49,9 +49,8 @@ void helper_spr_core_write_generic(CPUPPCState *env, uint32_t 
sprn,
  {
  CPUState *cs = env_cpu(env);
  CPUState *ccs;
-uint32_t nr_threads = cs->nr_threads;
  
-if (nr_threads == 1) {

+if (!PPC_CPU_HAS_CORE_SIBLINGS(cs)) {
  env->spr[sprn] = val;
  return;
  }
@@ -196,7 +195,7 @@ void helper_store_ptcr(CPUPPCState *env, target_ulong val)
  return;
  }
  
-if (cs->nr_threads == 1 || !(env->flags & POWERPC_FLAG_SMT_1LPAR)) {

+if (!PPC_CPU_HAS_LPAR_SIBLINGS(cs)) {
  env->spr[SPR_PTCR] = val;
  tlb_flush(cs);
  } else {
@@ -243,16 +242,12 @@ target_ulong helper_load_dpdes(CPUPPCState *env)
  {
  CPUState *cs = env_cpu(env);
  CPUState *ccs;
-uint32_t nr_threads = cs->nr_threads;
  target_ulong dpdes = 0;
  
  helper_hfscr_facility_check(env, HFSCR_MSGP, "load DPDES", HFSCR_IC_MSGP);
  
-if (!(env->flags & POWERPC_FLAG_SMT_1LPAR)) {

-nr_threads = 1; /* DPDES behaves as 1-thread in LPAR-per-thread mode */
-}
-
-if (nr_threads == 1) {
+/* DPDES behaves as 1-thread in LPAR-per-thread mode */
+if (!PPC_CPU_HAS_LPAR_SIBLINGS(cs)) {
  if (env->pending_interrupts & PPC_INTERRUPT_DOORBELL) {
  dpdes = 1;
  }
@@ -279,21 +274,11 @@ void helper_store_dpdes(CPUPPCState *env, target_ulong 
val)
  PowerPCCPU *cpu = env_archcpu(env);
  CPUState *cs = env_cp

Re: [RFC PATCH 06/10] ppc: Add a core_index to CPUPPCState for SMT vCPUs

2024-05-28 Thread Harsh Prateek Bora


corrected typo, it's bitwise.

On 5/28/24 14:18, Harsh Prateek Bora wrote:
-    (POWERPC_CPU(cs)->env.spr_cb[SPR_PIR].default_value & 
~(cs->nr_threads - 1))

+    (POWERPC_CPU(cs)->env.core_index)


Dont we want to keep the bitwise & with ~(cs->nr_threads - 1) ?
How's it taken care ?

Re: [RFC PATCH 06/10] ppc: Add a core_index to CPUPPCState for SMT vCPUs

2024-05-28 Thread Harsh Prateek Bora





On 5/26/24 17:56, Nicholas Piggin wrote:

The way SMT thread siblings are matched is clunky, using hard-coded
logic that checks the PIR SPR.

Change that to use a new core_index variable in the CPUPPCState,
where all siblings have the same core_index. CPU realize routines have
flexibility in setting core/sibling topology.

Signed-off-by: Nicholas Piggin 
---
  target/ppc/cpu.h| 5 -
  hw/ppc/pnv_core.c   | 2 ++
  hw/ppc/spapr_cpu_core.c | 3 +++
  3 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index dac13d4dac..9a89083932 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -1247,6 +1247,9 @@ struct CPUArchState {
  /* when a memory exception occurs, the access type is stored here */
  int access_type;
  
+/* For SMT processors */

+int core_index;
+
  #if !defined(CONFIG_USER_ONLY)
  /* MMU context, only relevant for full system emulation */
  #if defined(TARGET_PPC64)
@@ -1404,7 +1407,7 @@ struct CPUArchState {
  };
  
  #define _CORE_ID(cs)\

-(POWERPC_CPU(cs)->env.spr_cb[SPR_PIR].default_value & ~(cs->nr_threads - 
1))
+(POWERPC_CPU(cs)->env.core_index)


Dont we want to keep the logical & with ~(cs->nr_threads - 1) ?
How's it taken care ?

  
  #define THREAD_SIBLING_FOREACH(cs, cs_sibling)  \

  CPU_FOREACH(cs_sibling) \
diff --git a/hw/ppc/pnv_core.c b/hw/ppc/pnv_core.c
index 9b5edd9e48..0f61aabb77 100644
--- a/hw/ppc/pnv_core.c
+++ b/hw/ppc/pnv_core.c
@@ -252,6 +252,8 @@ static void pnv_core_cpu_realize(PnvCore *pc, PowerPCCPU 
*cpu, Error **errp,
  pir_spr->default_value = pir;
  tir_spr->default_value = tir;
  
+env->core_index = core_hwid;

+
  /* Set time-base frequency to 512 MHz */
  cpu_ppc_tb_init(env, PNV_TIMEBASE_FREQ);
  }
diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c
index e7c9edd033..059d372c8a 100644
--- a/hw/ppc/spapr_cpu_core.c
+++ b/hw/ppc/spapr_cpu_core.c
@@ -300,16 +300,19 @@ static PowerPCCPU *spapr_create_vcpu(SpaprCpuCore *sc, 
int i, Error **errp)
  g_autofree char *id = NULL;
  CPUState *cs;
  PowerPCCPU *cpu;
+CPUPPCState *env;
  
  obj = object_new(scc->cpu_type);
  
  cs = CPU(obj);

  cpu = POWERPC_CPU(obj);
+env = >env;
  /*
   * All CPUs start halted. CPU0 is unhalted from the machine level reset 
code
   * and the rest are explicitly started up by the guest using an RTAS call.
   */
  qdev_prop_set_bit(DEVICE(obj), "start-powered-off", true);
+env->core_index = cc->core_id;


We could just do cpu->env.core_index and avoid creating local var env.

regards,
Harsh


  cs->cpu_index = cc->core_id + i;
  if (!spapr_set_vcpu_id(cpu, cs->cpu_index, errp)) {
  return NULL;

Re: [RFC PATCH 05/10] ppc/pnv: Extend chip_pir class method to TIR as well

2024-05-28 Thread Harsh Prateek Bora





On 5/26/24 17:56, Nicholas Piggin wrote:

The chip_pir chip class method allows the platform to set the PIR
processor identification register. Extend this to a more general
ID function which also allows the TIR to be set. This is in
preparation for "big core", which is a more complicated topology
of cores and threads.

Signed-off-by: Nicholas Piggin 
---
  include/hw/ppc/pnv_chip.h |  3 +-
  hw/ppc/pnv.c  | 61 ---
  hw/ppc/pnv_core.c | 10 ---
  3 files changed, 45 insertions(+), 29 deletions(-)

diff --git a/include/hw/ppc/pnv_chip.h b/include/hw/ppc/pnv_chip.h
index 8589f3291e..679723926a 100644
--- a/include/hw/ppc/pnv_chip.h
+++ b/include/hw/ppc/pnv_chip.h
@@ -147,7 +147,8 @@ struct PnvChipClass {
  
  DeviceRealize parent_realize;
  
-uint32_t (*chip_pir)(PnvChip *chip, uint32_t core_id, uint32_t thread_id);

+void (*processor_id)(PnvChip *chip, uint32_t core_id, uint32_t thread_id,
+ uint32_t *pir, uint32_t *tir);


Should it be named get_chip_core_thread_regs() ?


  void (*intc_create)(PnvChip *chip, PowerPCCPU *cpu, Error **errp);
  void (*intc_reset)(PnvChip *chip, PowerPCCPU *cpu);
  void (*intc_destroy)(PnvChip *chip, PowerPCCPU *cpu);
diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
index a706de2e36..7d062ec16c 100644
--- a/hw/ppc/pnv.c
+++ b/hw/ppc/pnv.c
@@ -144,7 +144,7 @@ static int pnv_dt_core(PnvChip *chip, PnvCore *pc, void 
*fdt)
  PnvChipClass *pnv_cc = PNV_CHIP_GET_CLASS(chip);
  g_autofree uint32_t *servers_prop = g_new(uint32_t, smt_threads);
  int i;
-uint32_t pir;
+uint32_t pir, tir;
  uint32_t segs[] = {cpu_to_be32(28), cpu_to_be32(40),
 0x, 0x};
  uint32_t tbfreq = PNV_TIMEBASE_FREQ;
@@ -155,7 +155,7 @@ static int pnv_dt_core(PnvChip *chip, PnvCore *pc, void 
*fdt)
  char *nodename;
  int cpus_offset = get_cpus_node(fdt);
  
-pir = pnv_cc->chip_pir(chip, pc->hwid, 0);

+pnv_cc->processor_id(chip, pc->hwid, 0, , );


As a generic helper API and potentially expandable, it should allow 
passing NULL for registers whose values are not really sought to avoid 
having to create un-necessary local variables by the caller.


regards,
Harsh
  
  nodename = g_strdup_printf("%s@%x", dc->fw_name, pir);

  offset = fdt_add_subnode(fdt, cpus_offset, nodename);
@@ -237,7 +237,8 @@ static int pnv_dt_core(PnvChip *chip, PnvCore *pc, void 
*fdt)
  
  /* Build interrupt servers properties */

  for (i = 0; i < smt_threads; i++) {
-servers_prop[i] = cpu_to_be32(pnv_cc->chip_pir(chip, pc->hwid, i));
+pnv_cc->processor_id(chip, pc->hwid, i, , );
+servers_prop[i] = cpu_to_be32(pir);
  }
  _FDT((fdt_setprop(fdt, offset, "ibm,ppc-interrupt-server#s",
 servers_prop, sizeof(*servers_prop) * smt_threads)));
@@ -249,14 +250,17 @@ static void pnv_dt_icp(PnvChip *chip, void *fdt, uint32_t 
hwid,
 uint32_t nr_threads)
  {
  PnvChipClass *pcc = PNV_CHIP_GET_CLASS(chip);
-uint32_t pir = pcc->chip_pir(chip, hwid, 0);
-uint64_t addr = PNV_ICP_BASE(chip) | (pir << 12);
+uint32_t pir, tir;
+uint64_t addr;
  char *name;
  const char compat[] = "IBM,power8-icp\0IBM,ppc-xicp";
  uint32_t irange[2], i, rsize;
  uint64_t *reg;
  int offset;
  
+pcc->processor_id(chip, hwid, 0, , );

+addr = PNV_ICP_BASE(chip) | (pir << 12);
+
  irange[0] = cpu_to_be32(pir);
  irange[1] = cpu_to_be32(nr_threads);
  
@@ -1104,10 +1108,12 @@ static void pnv_power10_init(MachineState *machine)

   *   25:28  Core number
   *   29:31  Thread ID
   */
-static uint32_t pnv_chip_pir_p8(PnvChip *chip, uint32_t core_id,
-uint32_t thread_id)
+static void pnv_processor_id_p8(PnvChip *chip,
+uint32_t core_id, uint32_t thread_id,
+uint32_t *pir, uint32_t *tir)
  {
-return (chip->chip_id << 7) | (core_id << 3) | thread_id;
+*pir = (chip->chip_id << 7) | (core_id << 3) | thread_id;
+*tir = thread_id;
  }
  
  static void pnv_chip_power8_intc_create(PnvChip *chip, PowerPCCPU *cpu,

@@ -1159,15 +1165,17 @@ static void pnv_chip_power8_intc_print_info(PnvChip 
*chip, PowerPCCPU *cpu,
   *
   * We only care about the lower bits. uint32_t is fine for the moment.
   */
-static uint32_t pnv_chip_pir_p9(PnvChip *chip, uint32_t core_id,
-uint32_t thread_id)
+static void pnv_processor_id_p9(PnvChip *chip,
+uint32_t core_id, uint32_t thread_id,
+uint32_t *pir, uint32_t *tir)
  {
  if (chip->nr_threads == 8) {
-return (chip->chip_id << 8) | ((thread_id & 1) << 2) | (core_id << 3) |
+*pir = (chip->chip_id << 8) | ((thread_id & 1) << 2) | (core_id << 3) |
 (thread_id >> 1);
  } else {
-

Re: [RFC PATCH 04/10] ppc/pnv: specialise init for powernv8/9/10 machines

2024-05-28 Thread Harsh Prateek Bora


Hi Nick,

On 5/26/24 17:56, Nicholas Piggin wrote:

This will allow different settings and checks for different
machine types with later changes.

Signed-off-by: Nicholas Piggin 
---
  hw/ppc/pnv.c | 35 ++-
  1 file changed, 30 insertions(+), 5 deletions(-)

diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
index 6e3a5ccdec..a706de2e36 100644
--- a/hw/ppc/pnv.c
+++ b/hw/ppc/pnv.c
@@ -976,11 +976,6 @@ static void pnv_init(MachineState *machine)
  pnv->num_chips =
  machine->smp.max_cpus / (machine->smp.cores * machine->smp.threads);
  
-if (machine->smp.threads > 8) {

-error_report("Cannot support more than 8 threads/core "
- "on a powernv machine");
-exit(1);
-}
  if (!is_power_of_2(machine->smp.threads)) {
  error_report("Cannot support %d threads/core on a powernv"
   "machine because it must be a power of 2",
@@ -1076,6 +1071,33 @@ static void pnv_init(MachineState *machine)
  }
  }
  
+static void pnv_power8_init(MachineState *machine)

+{
+if (machine->smp.threads > 8) {
+error_report("Cannot support more than 8 threads/core "
+ "on a powernv POWER8 machine");


We could use mc->desc for machine name above, so that ..


+exit(1);
+}


with this patch, we can reuse p8 init for both p9 and p10 (and not just 
reuse p9 for p10 with hard coded string?).


With that,
Reviewed-by: Harsh Prateek Bora 


+
+pnv_init(machine);
+}
+
+static void pnv_power9_init(MachineState *machine)
+{
+if (machine->smp.threads > 8) {
+error_report("Cannot support more than 8 threads/core "
+ "on a powernv9/10 machine");
+exit(1);
+}
+
+pnv_init(machine);
+}
+
+static void pnv_power10_init(MachineState *machine)
+{
+pnv_power9_init(machine);
+}
+
  /*
   *0:21  Reserved - Read as zeros
   *   22:24  Chip ID
@@ -2423,6 +2445,7 @@ static void pnv_machine_power8_class_init(ObjectClass 
*oc, void *data)
  };
  
  mc->desc = "IBM PowerNV (Non-Virtualized) POWER8";

+mc->init = pnv_power8_init;
  mc->default_cpu_type = POWERPC_CPU_TYPE_NAME("power8_v2.0");
  compat_props_add(mc->compat_props, phb_compat, G_N_ELEMENTS(phb_compat));
  
@@ -2449,6 +2472,7 @@ static void pnv_machine_power9_class_init(ObjectClass *oc, void *data)

  };
  
  mc->desc = "IBM PowerNV (Non-Virtualized) POWER9";

+mc->init = pnv_power9_init;
  mc->default_cpu_type = POWERPC_CPU_TYPE_NAME("power9_v2.2");
  compat_props_add(mc->compat_props, phb_compat, G_N_ELEMENTS(phb_compat));
  
@@ -2473,6 +2497,7 @@ static void pnv_machine_p10_common_class_init(ObjectClass *oc, void *data)

  { TYPE_PNV_PHB_ROOT_PORT, "version", "5" },
  };
  
+mc->init = pnv_power10_init;

  mc->default_cpu_type = POWERPC_CPU_TYPE_NAME("power10_v2.0");
  compat_props_add(mc->compat_props, phb_compat, G_N_ELEMENTS(phb_compat));

Re: [RFC PATCH 03/10] target/ppc: Improve SPR indirect registers

2024-05-28 Thread Harsh Prateek Bora




Hi Nick,

On 5/26/24 17:56, Nicholas Piggin wrote:

SPRC/SPRD were recently added to all BookS CPUs supported, but
they are only tested on POWER9 and POWER10, so restrict them to
those CPUs.



Hope you mean to restrict to P9/10 for both spapr and pnv or just pnv ?


SPR indirect scratch registers presently replicated per-CPU like
SMT SPRs, but the PnvCore is a better place for them since they
are restricted to P9/P10.

Also add SPR indirect read access to core thread state for POWER9
since skiboot accesses that when booting to check for big-core
mode.

Signed-off-by: Nicholas Piggin 
---
  include/hw/ppc/pnv_core.h |  1 +
  target/ppc/cpu.h  |  3 --
  target/ppc/cpu_init.c | 21 ++--
  target/ppc/misc_helper.c  | 67 ---
  4 files changed, 46 insertions(+), 46 deletions(-)

diff --git a/include/hw/ppc/pnv_core.h b/include/hw/ppc/pnv_core.h
index f434c71547..21297262c1 100644
--- a/include/hw/ppc/pnv_core.h
+++ b/include/hw/ppc/pnv_core.h
@@ -53,6 +53,7 @@ struct PnvCore {
  uint32_t hwid;
  uint64_t hrmor;
  
+target_ulong scratch[8]; /* SCRATCH registers */

  struct pnv_tod_tbst pnv_tod_tbst;
  
  PnvChip *chip;

diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index 1e86658da6..dac13d4dac 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -1253,9 +1253,6 @@ struct CPUArchState {
  ppc_slb_t slb[MAX_SLB_ENTRIES]; /* PowerPC 64 SLB area */
  struct CPUBreakpoint *ciabr_breakpoint;
  struct CPUWatchpoint *dawr0_watchpoint;
-
-/* POWER CPU regs/state */
-target_ulong scratch[8]; /* SCRATCH registers (shared across core) */
  #endif
  target_ulong sr[32];   /* segment registers */
  uint32_t nb_BATs;  /* number of BATs */
diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
index 01e358a4a5..ae483e20c4 100644
--- a/target/ppc/cpu_init.c
+++ b/target/ppc/cpu_init.c
@@ -5759,16 +5759,6 @@ static void register_power_common_book4_sprs(CPUPPCState 
*env)
   SPR_NOACCESS, SPR_NOACCESS,
   _read_generic, _core_write_generic,
   0x);
-spr_register_hv(env, SPR_POWER_SPRC, "SPRC",
- SPR_NOACCESS, SPR_NOACCESS,
- SPR_NOACCESS, SPR_NOACCESS,
- _read_generic, _write_sprc,
- 0x);
-spr_register_hv(env, SPR_POWER_SPRD, "SPRD",
- SPR_NOACCESS, SPR_NOACCESS,
- SPR_NOACCESS, SPR_NOACCESS,
- _read_sprd, _write_sprd,
- 0x);
  #endif
  }
  
@@ -5781,6 +5771,17 @@ static void register_power9_book4_sprs(CPUPPCState *env)

   SPR_NOACCESS, SPR_NOACCESS,
   _read_generic, _write_generic,
   KVM_REG_PPC_WORT, 0);
+/* SPRC/SPRD exist in earlier CPUs but only tested on POWER9/10 */
+spr_register_hv(env, SPR_POWER_SPRC, "SPRC",
+ SPR_NOACCESS, SPR_NOACCESS,
+ SPR_NOACCESS, SPR_NOACCESS,
+ _read_generic, _write_sprc,
+ 0x);
+spr_register_hv(env, SPR_POWER_SPRD, "SPRD",
+ SPR_NOACCESS, SPR_NOACCESS,
+ SPR_NOACCESS, SPR_NOACCESS,
+ _read_sprd, _write_sprd,
+ 0x);
  #endif
  }
  
diff --git a/target/ppc/misc_helper.c b/target/ppc/misc_helper.c

index fa47be2298..46ba3a5584 100644
--- a/target/ppc/misc_helper.c
+++ b/target/ppc/misc_helper.c
@@ -26,6 +26,7 @@
  #include "qemu/main-loop.h"
  #include "mmu-book3s-v3.h"
  #include "hw/ppc/ppc.h"
+#include "hw/ppc/pnv_core.h"
  
  #include "helper_regs.h"
  
@@ -321,11 +322,25 @@ void helper_store_sprc(CPUPPCState *env, target_ulong val)
  
  target_ulong helper_load_sprd(CPUPPCState *env)

  {
+PowerPCCPU *cpu = env_archcpu(env);
+PnvCore *pc = pnv_cpu_state(cpu)->core;


We may want to avoid creating local variable cpu here also like previous 
patches.


However, is this helper meant to be accessible for spapr as well ?


  target_ulong sprc = env->spr[SPR_POWER_SPRC];
  
-switch (sprc & 0x3c0) {

-case 0: /* SCRATCH0-7 */
-return env->scratch[(sprc >> 3) & 0x7];
+switch (sprc & 0x3e0) {
+case 0: /* SCRATCH0-3 */
+case 1: /* SCRATCH4-7 */
+return pc->scratch[(sprc >> 3) & 0x7];


If so, will pc be uninitialized in case of spapr ?


+case 0x1e0: /* core thread state */
+if (env->excp_model == POWERPC_EXCP_POWER9) {
+/*
+ * Only implement for POWER9 because skiboot uses it to check
+ * big-core mode. Other bits are unimplemented so we would
+ * prefer to get unimplemented message on POWER10 if it were
+ * used.
+ */
+return 0;
+}
+/* fallthru */
  default:
  qemu_log_mask(LOG_UNIMP, "mfSPRD: Unimplemented SPRC:0x"
TARGET_FMT_lx"\n", sprc);
@@ -334,41

Re: [RFC PATCH 02/10] ppc/pnv: Move timebase state into PnvCore

2024-05-28 Thread Harsh Prateek Bora

PnvCore *pc = pnv_cpu_state(cpu)->core;
+
+return >pnv_tod_tbst;
+}
+
  static void tb_state_machine_step(CPUPPCState *env)
  {
+PowerPCCPU *cpu = env_archcpu(env);
+struct pnv_tod_tbst *pnv_tod_tbst = cpu_get_tbst(cpu);


Since cpu is not used anywhere later, we could just do 
cpu_get_tbst(env_archcpu(env)) ?



  uint64_t tfmr = env->spr[SPR_TFMR];
  unsigned int tbst = tfmr_get_tb_state(tfmr);
  
@@ -307,15 +317,15 @@ static void tb_state_machine_step(CPUPPCState *env)

  return;
  }
  
-if (env->pnv_tod_tbst.tb_sync_pulse_timer) {

-env->pnv_tod_tbst.tb_sync_pulse_timer--;
+if (pnv_tod_tbst->tb_sync_pulse_timer) {
+pnv_tod_tbst->tb_sync_pulse_timer--;
  } else {
  tfmr |= TFMR_TB_SYNC_OCCURED;
  write_tfmr(env, tfmr);
  }
  
-if (env->pnv_tod_tbst.tb_state_timer) {

-env->pnv_tod_tbst.tb_state_timer--;
+if (pnv_tod_tbst->tb_state_timer) {
+pnv_tod_tbst->tb_state_timer--;
  return;
  }
  
@@ -332,20 +342,20 @@ static void tb_state_machine_step(CPUPPCState *env)

  } else if (tfmr & TFMR_MOVE_CHIP_TOD_TO_TB) {
  if (tbst == TBST_SYNC_WAIT) {
  tfmr = tfmr_new_tb_state(tfmr, TBST_GET_TOD);
-env->pnv_tod_tbst.tb_state_timer = 3;
+pnv_tod_tbst->tb_state_timer = 3;
  } else if (tbst == TBST_GET_TOD) {
-if (env->pnv_tod_tbst.tod_sent_to_tb) {
+if (pnv_tod_tbst->tod_sent_to_tb) {
  tfmr = tfmr_new_tb_state(tfmr, TBST_TB_RUNNING);
  tfmr &= ~TFMR_MOVE_CHIP_TOD_TO_TB;
-env->pnv_tod_tbst.tb_ready_for_tod = 0;
-env->pnv_tod_tbst.tod_sent_to_tb = 0;
+pnv_tod_tbst->tb_ready_for_tod = 0;
+pnv_tod_tbst->tod_sent_to_tb = 0;
  }
  } else {
  qemu_log_mask(LOG_GUEST_ERROR, "TFMR error: MOVE_CHIP_TOD_TO_TB "
"state machine in invalid state 0x%x\n", tbst);
  tfmr = tfmr_new_tb_state(tfmr, TBST_TB_ERROR);
  tfmr |= TFMR_FIRMWARE_CONTROL_ERROR;
-env->pnv_tod_tbst.tb_ready_for_tod = 0;
+pnv_tod_tbst->tb_ready_for_tod = 0;
  }
  }
  
@@ -361,6 +371,8 @@ target_ulong helper_load_tfmr(CPUPPCState *env)
  
  void helper_store_tfmr(CPUPPCState *env, target_ulong val)

  {
+PowerPCCPU *cpu = env_archcpu(env);
+struct pnv_tod_tbst *pnv_tod_tbst = cpu_get_tbst(cpu);


... similarly here as well.

With suggested minor improvements,
Reviewed-by: Harsh Prateek Bora 


  uint64_t tfmr = env->spr[SPR_TFMR];
  uint64_t clear_on_write;
  unsigned int tbst = tfmr_get_tb_state(tfmr);
@@ -384,14 +396,7 @@ void helper_store_tfmr(CPUPPCState *env, target_ulong val)
   * after the second mfspr.
   */
  tfmr &= ~TFMR_TB_SYNC_OCCURED;
-env->pnv_tod_tbst.tb_sync_pulse_timer = 1;
-
-if (ppc_cpu_tir(env_archcpu(env)) != 0 &&
-(val & (TFMR_LOAD_TOD_MOD | TFMR_MOVE_CHIP_TOD_TO_TB))) {
-qemu_log_mask(LOG_UNIMP, "TFMR timebase state machine can only be "
- "driven by thread 0\n");
-goto out;
-}
+pnv_tod_tbst->tb_sync_pulse_timer = 1;
  
  if (((tfmr | val) & (TFMR_LOAD_TOD_MOD | TFMR_MOVE_CHIP_TOD_TO_TB)) ==

  (TFMR_LOAD_TOD_MOD | TFMR_MOVE_CHIP_TOD_TO_TB)) {
@@ -399,7 +404,7 @@ void helper_store_tfmr(CPUPPCState *env, target_ulong val)
 "MOVE_CHIP_TOD_TO_TB both set\n");
  tfmr = tfmr_new_tb_state(tfmr, TBST_TB_ERROR);
  tfmr |= TFMR_FIRMWARE_CONTROL_ERROR;
-env->pnv_tod_tbst.tb_ready_for_tod = 0;
+pnv_tod_tbst->tb_ready_for_tod = 0;
  goto out;
  }
  
@@ -413,8 +418,8 @@ void helper_store_tfmr(CPUPPCState *env, target_ulong val)

  tfmr &= ~TFMR_LOAD_TOD_MOD;
  tfmr &= ~TFMR_MOVE_CHIP_TOD_TO_TB;
  tfmr &= ~TFMR_FIRMWARE_CONTROL_ERROR; /* XXX: should this be cleared? 
*/
-env->pnv_tod_tbst.tb_ready_for_tod = 0;
-env->pnv_tod_tbst.tod_sent_to_tb = 0;
+pnv_tod_tbst->tb_ready_for_tod = 0;
+pnv_tod_tbst->tod_sent_to_tb = 0;
  goto out;
  }
  
@@ -427,19 +432,19 @@ void helper_store_tfmr(CPUPPCState *env, target_ulong val)
  
  if (tfmr & TFMR_LOAD_TOD_MOD) {

  /* Wait for an arbitrary 3 mfspr until the next state transition. */
-env->pnv_tod_tbst.tb_state_timer = 3;
+pnv_tod_tbst->tb_state_timer = 3;
  } else if (tfmr & TFMR_MOVE_CHIP_TOD_TO_TB) {
  if (tbst == TBST_NOT_SET) {
  tfmr = tfmr_new_tb_state(tfmr, TBST_SYNC_WAIT);
-env->pnv_tod_tbst.tb_ready_for_tod = 1;
-env->pnv_tod_tbst.t

Re: [RFC PATCH 01/10] ppc/pnv: Add pointer from PnvCPUState to PnvCore

2024-05-28 Thread Harsh Prateek Bora





On 5/26/24 17:56, Nicholas Piggin wrote:

This helps move core state from CPU to core structures.

Signed-off-by: Nicholas Piggin 
---
  include/hw/ppc/pnv_core.h | 1 +
  hw/ppc/pnv_core.c | 3 +++
  2 files changed, 4 insertions(+)

diff --git a/include/hw/ppc/pnv_core.h b/include/hw/ppc/pnv_core.h
index c6d62fd145..30c1e5b1a3 100644
--- a/include/hw/ppc/pnv_core.h
+++ b/include/hw/ppc/pnv_core.h
@@ -54,6 +54,7 @@ struct PnvCoreClass {
  #define PNV_CORE_TYPE_NAME(cpu_model) cpu_model PNV_CORE_TYPE_SUFFIX
  
  typedef struct PnvCPUState {

+PnvCore *core;


Naming it *pc might be more intuitive with the most of its usage, 
although I see few usage as "pnv_core" as well.


Reviewed-by: Harsh Prateek Bora 


  Object *intc;
  } PnvCPUState;
  
diff --git a/hw/ppc/pnv_core.c b/hw/ppc/pnv_core.c

index f40ab721d6..7b0ea7812b 100644
--- a/hw/ppc/pnv_core.c
+++ b/hw/ppc/pnv_core.c
@@ -225,6 +225,7 @@ static const MemoryRegionOps pnv_core_power10_xscom_ops = {
  static void pnv_core_cpu_realize(PnvCore *pc, PowerPCCPU *cpu, Error **errp,
   int thread_index)
  {
+PnvCPUState *pnv_cpu = pnv_cpu_state(cpu);
  CPUPPCState *env = >env;
  int core_hwid;
  ppc_spr_t *pir = >spr_cb[SPR_PIR];
@@ -232,6 +233,8 @@ static void pnv_core_cpu_realize(PnvCore *pc, PowerPCCPU 
*cpu, Error **errp,
  Error *local_err = NULL;
  PnvChipClass *pcc = PNV_CHIP_GET_CLASS(pc->chip);
  
+pnv_cpu->core = pc;

+
  if (!qdev_realize(DEVICE(cpu), NULL, errp)) {
  return;
  }

[PATCH v3 2/3] cpu-common.c: export cpu_get_free_index to be reused later

2024-05-23 Thread Harsh Prateek Bora

This helper provides an easy way to identify the next available free cpu
index which can be used for vcpu creation. Until now, this is being
called at a very later stage and there is a need to be able to call it
earlier (for now, with ppc64) hence the need to export.

Suggested-by: Nicholas Piggin 
Signed-off-by: Harsh Prateek Bora 
---
 include/exec/cpu-common.h | 2 ++
 cpu-common.c  | 7 ---
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
index 6d5318895a..0386f1ab29 100644
--- a/include/exec/cpu-common.h
+++ b/include/exec/cpu-common.h
@@ -29,6 +29,8 @@ void cpu_list_lock(void);
 void cpu_list_unlock(void);
 unsigned int cpu_list_generation_id_get(void);
 
+int cpu_get_free_index(void);
+
 void tcg_iommu_init_notifier_list(CPUState *cpu);
 void tcg_iommu_free_notifier_list(CPUState *cpu);
 
diff --git a/cpu-common.c b/cpu-common.c
index ce78273af5..82bd1b432d 100644
--- a/cpu-common.c
+++ b/cpu-common.c
@@ -57,14 +57,12 @@ void cpu_list_unlock(void)
 qemu_mutex_unlock(_cpu_list_lock);
 }
 
-static bool cpu_index_auto_assigned;
 
-static int cpu_get_free_index(void)
+int cpu_get_free_index(void)
 {
 CPUState *some_cpu;
 int max_cpu_index = 0;
 
-cpu_index_auto_assigned = true;
 CPU_FOREACH(some_cpu) {
 if (some_cpu->cpu_index >= max_cpu_index) {
 max_cpu_index = some_cpu->cpu_index + 1;
@@ -83,8 +81,11 @@ unsigned int cpu_list_generation_id_get(void)
 
 void cpu_list_add(CPUState *cpu)
 {
+static bool cpu_index_auto_assigned;
+
 QEMU_LOCK_GUARD(_cpu_list_lock);
 if (cpu->cpu_index == UNASSIGNED_CPU_INDEX) {
+cpu_index_auto_assigned = true;
 cpu->cpu_index = cpu_get_free_index();
 assert(cpu->cpu_index != UNASSIGNED_CPU_INDEX);
 } else {
-- 
2.39.3

[PATCH v3 1/3] accel/kvm: Introduce kvm_create_and_park_vcpu() helper

2024-05-23 Thread Harsh Prateek Bora

There are distinct helpers for creating and parking a KVM vCPU.
However, there can be cases where a platform needs to create and
immediately park the vCPU during early stages of vcpu init which
can later be reused when vcpu thread gets initialized. This would
help detect failures with kvm_create_vcpu at an early stage.

Based on api refactoring to create/park vcpus introduced in 1/8 of patch series:
https://lore.kernel.org/qemu-devel/2024052221.232114-1-salil.me...@huawei.com/

Suggested-by: Nicholas Piggin 
Signed-off-by: Harsh Prateek Bora 
---
 accel/kvm/kvm-cpus.h |  8 
 accel/kvm/kvm-all.c  | 12 
 2 files changed, 20 insertions(+)

diff --git a/accel/kvm/kvm-cpus.h b/accel/kvm/kvm-cpus.h
index 2e6bb38b5d..00e534b3b9 100644
--- a/accel/kvm/kvm-cpus.h
+++ b/accel/kvm/kvm-cpus.h
@@ -46,4 +46,12 @@ void kvm_park_vcpu(CPUState *cpu);
  * @returns: KVM fd
  */
 int kvm_unpark_vcpu(KVMState *s, unsigned long vcpu_id);
+
+/**
+ *  * kvm_create_and_park_vcpu - Create and park a KVM vCPU
+ *   * @cpu: QOM CPUState object for which KVM vCPU has to be created and 
parked.
+ **
+ * * @returns: 0 when success, errno (<0) when failed.
+ *  */
+int kvm_create_and_park_vcpu(CPUState *cpu);
 #endif /* KVM_CPUS_H */
diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index fc30e5d5b8..d70ca62ff5 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -398,6 +398,18 @@ int kvm_create_vcpu(CPUState *cpu)
 return 0;
 }
 
+int kvm_create_and_park_vcpu(CPUState *cpu)
+{
+int ret = 0;
+
+ret = kvm_create_vcpu(cpu);
+if (!ret) {
+kvm_park_vcpu(cpu);
+}
+
+return ret;
+}
+
 static int do_kvm_destroy_vcpu(CPUState *cpu)
 {
 KVMState *s = kvm_state;
-- 
2.39.3

[PATCH v3 0/3] target/ppc: vcpu hotplug failure handling fixes

2024-05-23 Thread Harsh Prateek Bora

On ppc64, the PowerVM hypervisor runs with limited memory and a VCPU
creation during hotplug may fail during kvm_ioctl for KVM_CREATE_VCPU,
leading to termination of guest since errp is set to _fatal while
calling kvm_init_vcpu. This unexpected behaviour can be avoided by
pre-creating and parking vcpu on success or return error otherwise.
This enables graceful error delivery for any vcpu hotplug failures while
the guest can keep running.

This series adds another helper to create and park vcpu (based on below
patch by Salil), exports cpu_get_free_index to be reused later and adds
ppc arch specfic handling for vcpu hotplug failure using kvm accel
helper cpu_target_realize.

Based on api refactoring to create/park vcpus introduced in 1/8 of patch series:
https://lore.kernel.org/qemu-devel/2024052221.232114-1-salil.me...@huawei.com/

Changelog:
v3: Addressed review comments from Nick
v2: Addressed review comments from Nick
v1: Initial patch

Harsh Prateek Bora (3):
  accel/kvm: Introduce kvm_create_and_park_vcpu() helper
  cpu-common.c: export cpu_get_free_index to be reused later
  target/ppc: handle vcpu hotplug failure gracefully

 accel/kvm/kvm-cpus.h  |  8 
 include/exec/cpu-common.h |  2 ++
 accel/kvm/kvm-all.c   | 12 
 cpu-common.c  |  7 ---
 target/ppc/kvm.c  | 41 +++
 5 files changed, 67 insertions(+), 3 deletions(-)

-- 
2.39.3

[PATCH v3 3/3] target/ppc: handle vcpu hotplug failure gracefully

2024-05-23 Thread Harsh Prateek Bora

On ppc64, the PowerVM hypervisor runs with limited memory and a VCPU
creation during hotplug may fail during kvm_ioctl for KVM_CREATE_VCPU,
leading to termination of guest since errp is set to _fatal while
calling kvm_init_vcpu. This unexpected behaviour can be avoided by
pre-creating and parking vcpu on success or return error otherwise.
This enables graceful error delivery for any vcpu hotplug failures while
the guest can keep running.

Also introducing KVM AccelCPUClass to init cpu_target_realize for kvm.

Tested OK by repeatedly doing a hotplug/unplug of vcpus as below:

 #virsh setvcpus hotplug 40
 #virsh setvcpus hotplug 70
error: internal error: unable to execute QEMU command 'device_add':
kvmppc_cpu_realize: vcpu hotplug failed with -12

Reported-by: Anushree Mathur 
Suggested-by: Shivaprasad G Bhat 
Suggested-by: Vaibhav Jain 
Signed-off by: Harsh Prateek Bora 
Tested-by: Anushree Mathur 
---
 target/ppc/kvm.c | 41 +
 1 file changed, 41 insertions(+)

diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
index 63930d4a77..8e5a7c3d2d 100644
--- a/target/ppc/kvm.c
+++ b/target/ppc/kvm.c
@@ -48,6 +48,8 @@
 #include "qemu/mmap-alloc.h"
 #include "elf.h"
 #include "sysemu/kvm_int.h"
+#include "accel/kvm/kvm-cpus.h"
+#include "hw/core/accel-cpu.h"
 
 #define PROC_DEVTREE_CPU  "/proc/device-tree/cpus/"
 
@@ -2339,6 +2341,25 @@ static void alter_insns(uint64_t *word, uint64_t flags, 
bool on)
 }
 }
 
+static bool kvmppc_cpu_realize(CPUState *cs, Error **errp)
+{
+int ret;
+const char *vcpu_str = (cs->parent_obj.hotplugged == true) ?
+   "hotplug" : "create";
+cs->cpu_index = cpu_get_free_index();
+
+POWERPC_CPU(cs)->vcpu_id = cs->cpu_index;
+
+/* create and park to fail gracefully in case vcpu hotplug fails */
+ret = kvm_create_and_park_vcpu(cs);
+if (ret) {
+error_setg(errp, "%s: vcpu %s failed with %d",
+ __func__, vcpu_str, ret);
+return false;
+}
+return true;
+}
+
 static void kvmppc_host_cpu_class_init(ObjectClass *oc, void *data)
 {
 PowerPCCPUClass *pcc = POWERPC_CPU_CLASS(oc);
@@ -2959,3 +2980,23 @@ void kvmppc_set_reg_tb_offset(PowerPCCPU *cpu, int64_t 
tb_offset)
 void kvm_arch_accel_class_init(ObjectClass *oc)
 {
 }
+
+static void kvm_cpu_accel_class_init(ObjectClass *oc, void *data)
+{
+AccelCPUClass *acc = ACCEL_CPU_CLASS(oc);
+
+acc->cpu_target_realize = kvmppc_cpu_realize;
+}
+
+static const TypeInfo kvm_cpu_accel_type_info = {
+.name = ACCEL_CPU_NAME("kvm"),
+
+.parent = TYPE_ACCEL_CPU,
+.class_init = kvm_cpu_accel_class_init,
+.abstract = true,
+};
+static void kvm_cpu_accel_register_types(void)
+{
+type_register_static(_cpu_accel_type_info);
+}
+type_init(kvm_cpu_accel_register_types);
-- 
2.39.3

Re: [PATCH V11 1/8] accel/kvm: Extract common KVM vCPU {creation,parking} code

2024-05-23 Thread Harsh Prateek Bora


Hi Salil,

On 5/23/24 02:41, Salil Mehta wrote:

+void kvm_park_vcpu(CPUState *cpu);
+
+/**
+ * kvm_unpark_vcpu - unpark QEMU KVM vCPU context
+ * @s: KVM State
+ * @cpu: Architecture vCPU ID of the parked vCPU


s/@cpu/@vcpuid ?

Thanks
Harsh

+ *
+ * @returns: KVM fd
+ */
+int kvm_unpark_vcpu(KVMState *s, unsigned long vcpu_id);
  #endif /* KVM_CPUS_H */

[PATCH v2 3/7] target/ppc: optimize hreg_compute_pmu_hflags_value

2024-05-22 Thread Harsh Prateek Bora

The second if-condition can be true only if the first one above is true.
Enclose the latter into the former to avoid un-necessary check if first
condition fails.

Signed-off-by: Harsh Prateek Bora 
Reviewed-by: BALATON Zoltan 
---
 target/ppc/helper_regs.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/target/ppc/helper_regs.c b/target/ppc/helper_regs.c
index d09dcacd5e..261a8ba79f 100644
--- a/target/ppc/helper_regs.c
+++ b/target/ppc/helper_regs.c
@@ -66,9 +66,9 @@ static uint32_t hreg_compute_pmu_hflags_value(CPUPPCState 
*env)
 #ifndef CONFIG_USER_ONLY
 if (env->pmc_ins_cnt) {
 hflags |= 1 << HFLAGS_INSN_CNT;
-}
-if (env->pmc_ins_cnt & 0x1e) {
-hflags |= 1 << HFLAGS_PMC_OTHER;
+if (env->pmc_ins_cnt & 0x1e) {
+hflags |= 1 << HFLAGS_PMC_OTHER;
+}
 }
 #endif
 #endif
-- 
2.39.3

Re: [PATCH 6/6] target/ppc: redue code duplication across Power9/10 init code

2024-05-22 Thread Harsh Prateek Bora


Hi BALATON,

On 5/20/24 17:22, BALATON Zoltan wrote:

On Mon, 20 May 2024, Harsh Prateek Bora wrote:

Power9/10 initialization code consists of a lot of logical OR of
various flag bits as supported by respective Power platform during its
initialization, most of which is duplicated and only selected bits are
added or removed as needed with each new platform support being added.
Remove the duplicate code and share using common macros.

Signed-off-by: Harsh Prateek Bora 
---
target/ppc/cpu_init.h |  79 +++
target/ppc/cpu_init.c | 123 ++
2 files changed, 94 insertions(+), 108 deletions(-)
create mode 100644 target/ppc/cpu_init.h

diff --git a/target/ppc/cpu_init.h b/target/ppc/cpu_init.h
new file mode 100644
index 00..29358bfdf6
--- /dev/null
+++ b/target/ppc/cpu_init.h
@@ -0,0 +1,79 @@
+#ifndef TARGET_PPC_CPU_INIT_H
+#define TARGET_PPC_CPU_INIT_H
+
+#define POWERPC_FAMILY_POWER9_INSNS_FLAGS   \
+    PPC_INSNS_BASE | PPC_ISEL | PPC_STRING | PPC_MFTB | \
+    PPC_FLOAT | PPC_FLOAT_FSEL | PPC_FLOAT_FRES |   \
+    PPC_FLOAT_FSQRT | PPC_FLOAT_FRSQRTE | PPC_FLOAT_FRSQRTES |  \
+    PPC_FLOAT_STFIWX | PPC_FLOAT_EXT |PPC_CACHE | PPC_CACHE_ICBI |  \
+    PPC_CACHE_DCBZ | PPC_MEM_SYNC | PPC_MEM_EIEIO | PPC_MEM_TLBIE | \
+    PPC_MEM_TLBSYNC | PPC_64B | PPC_64H | PPC_64BX | PPC_ALTIVEC |  \
+    PPC_SEGMENT_64B | PPC_SLBI | PPC_POPCNTB | PPC_POPCNTWD |   \
+    PPC_CILDST
+#define POWERPC_FAMILY_POWER10_INSNS_FLAGS \
+    POWERPC_FAMILY_POWER9_INSNS_FLAGS


It's a good idea to make the cpu inits more readable but I'm not sure 
about having two names for the same thing. If these are the same could 
POWER10 also just use POWERPC_FAMILY_POWER9_INSNS_FLAGS (or if you 
really want to may call it POWERPC_FAMILY_POWER9_10_INSNS_FLAGS or 
similar but I think using earlier features where unchanged in newer CPU 
models would be OK and show these are the same).


Thanks for your valuable review comments on this series. I have 
addressed them and posted in v2.


regards,
Harsh


Regards,
BALATON Zoltan


+
+#define POWERPC_FAMILY_POWER9_INSNS_FLAGS2_COMMON   \
+    PPC2_VSX | PPC2_VSX207 | PPC2_DFP | PPC2_DBRX | \
+    PPC2_PERM_ISA206 | PPC2_DIVE_ISA206 | PPC2_ATOMIC_ISA206 |  \
+    PPC2_FP_CVT_ISA206 | PPC2_FP_TST_ISA206 | PPC2_BCTAR_ISA207 |   \
+    PPC2_LSQ_ISA207 | PPC2_ALTIVEC_207 | PPC2_ISA205 |  \
+    PPC2_ISA207S | PPC2_FP_CVT_S64 | PPC2_ISA300 | PPC2_PRCNTL |    \
+    PPC2_MEM_LWSYNC | PPC2_BCDA_ISA206
+
+#define POWERPC_FAMILY_POWER9_INSNS_FLAGS2  \
+    POWERPC_FAMILY_POWER9_INSNS_FLAGS2_COMMON | PPC2_TM
+#define POWERPC_FAMILY_POWER10_INSNS_FLAGS2 \
+    POWERPC_FAMILY_POWER9_INSNS_FLAGS2_COMMON | PPC2_ISA310
+
+#define POWERPC_POWER9_COMMON_PCC_MSR_MASK \
+    (1ull << MSR_SF) | \
+    (1ull << MSR_HV) | \
+    (1ull << MSR_VR) | \
+    (1ull << MSR_VSX) |    \
+    (1ull << MSR_EE) | \
+    (1ull << MSR_PR) | \
+    (1ull << MSR_FP) | \
+    (1ull << MSR_ME) | \
+    (1ull << MSR_FE0) |    \
+    (1ull << MSR_SE) | \
+    (1ull << MSR_DE) | \
+    (1ull << MSR_FE1) |    \
+    (1ull << MSR_IR) | \
+    (1ull << MSR_DR) | \
+    (1ull << MSR_PMM) |    \
+    (1ull << MSR_RI) | \
+    (1ull << MSR_LE)
+
+#define POWERPC_POWER9_PCC_MSR_MASK \
+    POWERPC_POWER9_COMMON_PCC_MSR_MASK | (1ull << MSR_TM)
+#define POWERPC_POWER10_PCC_MSR_MASK \
+    POWERPC_POWER9_COMMON_PCC_MSR_MASK
+#define POWERPC_POWER9_PCC_PCR_MASK \
+    PCR_COMPAT_2_05 | PCR_COMPAT_2_06 | PCR_COMPAT_2_07
+#define POWERPC_POWER10_PCC_PCR_MASK \
+    POWERPC_POWER9_PCC_PCR_MASK | PCR_COMPAT_3_00
+#define POWERPC_POWER9_PCC_PCR_SUPPORTED \
+    PCR_COMPAT_3_00 | PCR_COMPAT_2_07 | PCR_COMPAT_2_06 | 
PCR_COMPAT_2_05

+#define POWERPC_POWER10_PCC_PCR_SUPPORTED \
+    POWERPC_POWER9_PCC_PCR_SUPPORTED | PCR_COMPAT_3_10
+#define 
POWERPC_POWER9_PCC_LPCR_MASK    \
+    LPCR_VPM1 | LPCR_ISL | LPCR_KBV | LPCR_DPFD 
|   \
+    (LPCR_PECE_U_MASK & LPCR_HVEE) | LPCR_ILE | LPCR_AIL 
|  \
+    LPCR_UPRT | LPCR_EVIRT | LPCR_ONL | LPCR_HR | LPCR_LD 
| \
+    (LPCR_PECE_L_MASK & 
(LPCR_PDEE|LPCR_HDEE|LPCR_EEE|LPCR_DEE|LPCR_OEE)) | \
+    LPCR_MER | LPCR_GTSE | LPCR_TC | LPCR_HEIC | LPCR_LPES0 | 
LPCR_HVICE |  \

+    LPCR_HDICE
+/* DD2 adds an extra HAIL bit */
+#define POWERPC_POWER10_PCC_LPCR_MASK \
+    POWERPC_POWER9_PCC_LPC

[PATCH v2 2/7] target/ppc: optimize hreg_compute_pmu_hflags_value

2024-05-22 Thread Harsh Prateek Bora

Cache env->spr[SPR_POWER_MMCR0] in a local variable as used in multiple
conditions to avoid multiple indirect accesses.

Signed-off-by: Harsh Prateek Bora 
---
 target/ppc/helper_regs.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/target/ppc/helper_regs.c b/target/ppc/helper_regs.c
index 945fa1a596..d09dcacd5e 100644
--- a/target/ppc/helper_regs.c
+++ b/target/ppc/helper_regs.c
@@ -50,15 +50,16 @@ void hreg_swap_gpr_tgpr(CPUPPCState *env)
 static uint32_t hreg_compute_pmu_hflags_value(CPUPPCState *env)
 {
 uint32_t hflags = 0;
-
 #if defined(TARGET_PPC64)
-if (env->spr[SPR_POWER_MMCR0] & MMCR0_PMCC0) {
+target_ulong mmcr0 = env->spr[SPR_POWER_MMCR0];
+
+if (mmcr0 & MMCR0_PMCC0) {
 hflags |= 1 << HFLAGS_PMCC0;
 }
-if (env->spr[SPR_POWER_MMCR0] & MMCR0_PMCC1) {
+if (mmcr0 & MMCR0_PMCC1) {
 hflags |= 1 << HFLAGS_PMCC1;
 }
-if (env->spr[SPR_POWER_MMCR0] & MMCR0_PMCjCE) {
+if (mmcr0 & MMCR0_PMCjCE) {
 hflags |= 1 << HFLAGS_PMCJCE;
 }
 
-- 
2.39.3

[PATCH v2 6/7] target/ppc: reduce duplicate code between init_proc_POWER{9, 10}

2024-05-22 Thread Harsh Prateek Bora

Historically, the registration of sprs have been inherited alongwith
every new Power arch support being added leading to a lot of code
duplication. It's time to do necessary cleanups now to avoid further
duplication with newer arch support being added.

Signed-off-by: Harsh Prateek Bora 
---
 target/ppc/cpu_init.c | 43 +--
 1 file changed, 9 insertions(+), 34 deletions(-)

diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
index 6d82f24c87..5fb9a0583e 100644
--- a/target/ppc/cpu_init.c
+++ b/target/ppc/cpu_init.c
@@ -6307,7 +6307,7 @@ static struct ppc_radix_page_info POWER9_radix_page_info 
= {
 };
 #endif /* CONFIG_USER_ONLY */
 
-static void init_proc_POWER9(CPUPPCState *env)
+static void register_power9_common_sprs(CPUPPCState *env)
 {
 /* Common Registers */
 init_proc_book3s_common(env);
@@ -6326,7 +6326,6 @@ static void init_proc_POWER9(CPUPPCState *env)
 register_power5p_ear_sprs(env);
 register_power5p_tb_sprs(env);
 register_power6_common_sprs(env);
-register_HEIR32_spr(env);
 register_power6_dbg_sprs(env);
 register_power8_tce_address_control_sprs(env);
 register_power8_ids_sprs(env);
@@ -6342,6 +6341,12 @@ static void init_proc_POWER9(CPUPPCState *env)
 register_power9_book4_sprs(env);
 register_power8_rpr_sprs(env);
 register_power9_mmu_sprs(env);
+}
+
+static void init_proc_POWER9(CPUPPCState *env)
+{
+register_power9_common_sprs(env);
+register_HEIR32_spr(env);
 
 /* POWER9 Specific registers */
 spr_register_kvm(env, SPR_TIDR, "TIDR", NULL, NULL,
@@ -6499,39 +6504,9 @@ static struct ppc_radix_page_info 
POWER10_radix_page_info = {
 
 static void init_proc_POWER10(CPUPPCState *env)
 {
-/* Common Registers */
-init_proc_book3s_common(env);
-register_book3s_207_dbg_sprs(env);
-
-/* Common TCG PMU */
-init_tcg_pmu_power8(env);
-
-/* POWER8 Specific Registers */
-register_book3s_ids_sprs(env);
-register_amr_sprs(env);
-register_iamr_sprs(env);
-register_book3s_purr_sprs(env);
-register_power5p_common_sprs(env);
-register_power5p_lpar_sprs(env);
-register_power5p_ear_sprs(env);
-register_power5p_tb_sprs(env);
-register_power6_common_sprs(env);
+register_power9_common_sprs(env);
 register_HEIR64_spr(env);
-register_power6_dbg_sprs(env);
-register_power8_tce_address_control_sprs(env);
-register_power8_ids_sprs(env);
-register_power8_ebb_sprs(env);
-register_power8_fscr_sprs(env);
-register_power8_pmu_sup_sprs(env);
-register_power8_pmu_user_sprs(env);
-register_power8_tm_sprs(env);
-register_power8_pspb_sprs(env);
-register_power8_dpdes_sprs(env);
-register_vtb_sprs(env);
-register_power8_ic_sprs(env);
-register_power9_book4_sprs(env);
-register_power8_rpr_sprs(env);
-register_power9_mmu_sprs(env);
+
 register_power10_hash_sprs(env);
 register_power10_dexcr_sprs(env);
 register_power10_pmu_sup_sprs(env);
-- 
2.39.3

[PATCH v2 7/7] target/ppc: redue code duplication across Power9/10 init code

2024-05-22 Thread Harsh Prateek Bora

Power9/10 initialization code consists of a lot of logical OR of
various flag bits as supported by respective Power platform during its
initialization, most of which is duplicated and only selected bits are
added or removed as needed with each new platform support being added.
Remove the duplicate code and share using common macros.

Signed-off-by: Harsh Prateek Bora 
---
 target/ppc/cpu_init.h |  77 ++
 target/ppc/cpu_init.c | 123 ++
 2 files changed, 92 insertions(+), 108 deletions(-)
 create mode 100644 target/ppc/cpu_init.h

diff --git a/target/ppc/cpu_init.h b/target/ppc/cpu_init.h
new file mode 100644
index 00..53909987b0
--- /dev/null
+++ b/target/ppc/cpu_init.h
@@ -0,0 +1,77 @@
+#ifndef TARGET_PPC_CPU_INIT_H
+#define TARGET_PPC_CPU_INIT_H
+
+#define POWERPC_FAMILY_POWER9_INSNS_FLAGS   \
+PPC_INSNS_BASE | PPC_ISEL | PPC_STRING | PPC_MFTB | \
+PPC_FLOAT | PPC_FLOAT_FSEL | PPC_FLOAT_FRES |   \
+PPC_FLOAT_FSQRT | PPC_FLOAT_FRSQRTE | PPC_FLOAT_FRSQRTES |  \
+PPC_FLOAT_STFIWX | PPC_FLOAT_EXT |PPC_CACHE | PPC_CACHE_ICBI |  \
+PPC_CACHE_DCBZ | PPC_MEM_SYNC | PPC_MEM_EIEIO | PPC_MEM_TLBIE | \
+PPC_MEM_TLBSYNC | PPC_64B | PPC_64H | PPC_64BX | PPC_ALTIVEC |  \
+PPC_SEGMENT_64B | PPC_SLBI | PPC_POPCNTB | PPC_POPCNTWD |   \
+PPC_CILDST
+
+#define POWERPC_FAMILY_POWER9_INSNS_FLAGS2_COMMON   \
+PPC2_VSX | PPC2_VSX207 | PPC2_DFP | PPC2_DBRX | \
+PPC2_PERM_ISA206 | PPC2_DIVE_ISA206 | PPC2_ATOMIC_ISA206 |  \
+PPC2_FP_CVT_ISA206 | PPC2_FP_TST_ISA206 | PPC2_BCTAR_ISA207 |   \
+PPC2_LSQ_ISA207 | PPC2_ALTIVEC_207 | PPC2_ISA205 |  \
+PPC2_ISA207S | PPC2_FP_CVT_S64 | PPC2_ISA300 | PPC2_PRCNTL |\
+PPC2_MEM_LWSYNC | PPC2_BCDA_ISA206
+
+#define POWERPC_FAMILY_POWER9_INSNS_FLAGS2  \
+POWERPC_FAMILY_POWER9_INSNS_FLAGS2_COMMON | PPC2_TM
+#define POWERPC_FAMILY_POWER10_INSNS_FLAGS2 \
+POWERPC_FAMILY_POWER9_INSNS_FLAGS2_COMMON | PPC2_ISA310
+
+#define POWERPC_POWER9_COMMON_PCC_MSR_MASK \
+(1ull << MSR_SF) | \
+(1ull << MSR_HV) | \
+(1ull << MSR_VR) | \
+(1ull << MSR_VSX) |\
+(1ull << MSR_EE) | \
+(1ull << MSR_PR) | \
+(1ull << MSR_FP) | \
+(1ull << MSR_ME) | \
+(1ull << MSR_FE0) |\
+(1ull << MSR_SE) | \
+(1ull << MSR_DE) | \
+(1ull << MSR_FE1) |\
+(1ull << MSR_IR) | \
+(1ull << MSR_DR) | \
+(1ull << MSR_PMM) |\
+(1ull << MSR_RI) | \
+(1ull << MSR_LE)
+
+#define POWERPC_POWER9_PCC_MSR_MASK \
+POWERPC_POWER9_COMMON_PCC_MSR_MASK | (1ull << MSR_TM)
+#define POWERPC_POWER10_PCC_MSR_MASK \
+POWERPC_POWER9_COMMON_PCC_MSR_MASK
+#define POWERPC_POWER9_PCC_PCR_MASK \
+PCR_COMPAT_2_05 | PCR_COMPAT_2_06 | PCR_COMPAT_2_07
+#define POWERPC_POWER10_PCC_PCR_MASK \
+POWERPC_POWER9_PCC_PCR_MASK | PCR_COMPAT_3_00
+#define POWERPC_POWER9_PCC_PCR_SUPPORTED \
+PCR_COMPAT_3_00 | PCR_COMPAT_2_07 | PCR_COMPAT_2_06 | PCR_COMPAT_2_05
+#define POWERPC_POWER10_PCC_PCR_SUPPORTED \
+POWERPC_POWER9_PCC_PCR_SUPPORTED | PCR_COMPAT_3_10
+#define POWERPC_POWER9_PCC_LPCR_MASK\
+LPCR_VPM1 | LPCR_ISL | LPCR_KBV | LPCR_DPFD |   \
+(LPCR_PECE_U_MASK & LPCR_HVEE) | LPCR_ILE | LPCR_AIL |  \
+LPCR_UPRT | LPCR_EVIRT | LPCR_ONL | LPCR_HR | LPCR_LD | \
+(LPCR_PECE_L_MASK & (LPCR_PDEE|LPCR_HDEE|LPCR_EEE|LPCR_DEE|LPCR_OEE)) | \
+LPCR_MER | LPCR_GTSE | LPCR_TC | LPCR_HEIC | LPCR_LPES0 | LPCR_HVICE |  \
+LPCR_HDICE
+/* DD2 adds an extra HAIL bit */
+#define POWERPC_POWER10_PCC_LPCR_MASK \
+POWERPC_POWER9_PCC_LPCR_MASK | LPCR_HAIL
+#define POWERPC_POWER9_PCC_FLAGS_COMMON \
+POWERPC_FLAG_VRE | POWERPC_FLAG_SE | POWERPC_FLAG_BE |  \
+POWERPC_FLAG_PMM | POWERPC_FLAG_BUS_CLK | POWERPC_FLAG_CFAR |   \
+POWERPC_FLAG_VSX | POWERPC_FLAG_SCV
+
+#define POWERPC_POWER9_PCC_FLAGS  \
+POWERPC_POWER9_PCC_FLAGS_COMMON | POWERPC_FLAG_TM
+#define POWERPC_POWER10_PCC_FLAGS POWERPC_POWER9_PCC_FLAGS_COMMON
+
+#endif /* TARGET_PPC_CPU_INIT_H */
diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
index 5fb9a0583e..e4f6ad2399 100644
--- a/target/ppc/cpu_init.c
+++ b/target/ppc/cpu_init.c
@@ -51,6 +51,7 @@
 #include "kvm_ppc.h"
 #endif
 
+#include "cpu_init.h"
 /* #define PPC_

[PATCH v2 0/7] target/ppc: misc ppc improvements/optimizations

2024-05-22 Thread Harsh Prateek Bora

This a set of misc ppc arch specific code improvements/optimizations.
Although there exists similar instances for potential improvements in
the legacy ppc code, however, that can be taken up later as well.

Changelog:
v2: addressed review comments from BALATON Zoltan
v1: Initial patch

Harsh Prateek Bora (7):
  target/ppc: use locally stored msr and avoid indirect access
  target/ppc: optimize hreg_compute_pmu_hflags_value
  target/ppc: optimize hreg_compute_pmu_hflags_value
  target/ppc: optimize p9 exception handling routines
  target/ppc: optimize p9 exception handling routines for lpcr
  target/ppc: reduce duplicate code between init_proc_POWER{9,10}
  target/ppc: redue code duplication across Power9/10 init code

 target/ppc/cpu_init.h|  77 ++
 target/ppc/cpu_init.c| 166 ++-
 target/ppc/excp_helper.c |  72 +
 target/ppc/helper_regs.c |  19 ++---
 4 files changed, 150 insertions(+), 184 deletions(-)
 create mode 100644 target/ppc/cpu_init.h

-- 
2.39.3

[PATCH v2 4/7] target/ppc: optimize p9 exception handling routines

2024-05-22 Thread Harsh Prateek Bora

Currently, p9 exception handling has multiple if-condition checks where
it does an indirect access to pending_interrupts via env. Pass the
value during entry to avoid multiple indirect accesses.

Signed-off-by: Harsh Prateek Bora 
---
 target/ppc/excp_helper.c | 47 +---
 1 file changed, 25 insertions(+), 22 deletions(-)

diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
index 0712098cf7..704eddac63 100644
--- a/target/ppc/excp_helper.c
+++ b/target/ppc/excp_helper.c
@@ -1842,10 +1842,12 @@ static int p8_next_unmasked_interrupt(CPUPPCState *env)
  PPC_INTERRUPT_WDT | PPC_INTERRUPT_CDOORBELL | PPC_INTERRUPT_FIT |  \
  PPC_INTERRUPT_PIT | PPC_INTERRUPT_THERM)
 
-static int p9_interrupt_powersave(CPUPPCState *env)
+static int p9_interrupt_powersave(CPUPPCState *env,
+  uint32_t pending_interrupts)
 {
+
 /* External Exception */
-if ((env->pending_interrupts & PPC_INTERRUPT_EXT) &&
+if ((pending_interrupts & PPC_INTERRUPT_EXT) &&
 (env->spr[SPR_LPCR] & LPCR_EEE)) {
 bool heic = !!(env->spr[SPR_LPCR] & LPCR_HEIC);
 if (!heic || !FIELD_EX64_HV(env->msr) ||
@@ -1854,48 +1856,49 @@ static int p9_interrupt_powersave(CPUPPCState *env)
 }
 }
 /* Decrementer Exception */
-if ((env->pending_interrupts & PPC_INTERRUPT_DECR) &&
+if ((pending_interrupts & PPC_INTERRUPT_DECR) &&
 (env->spr[SPR_LPCR] & LPCR_DEE)) {
 return PPC_INTERRUPT_DECR;
 }
 /* Machine Check or Hypervisor Maintenance Exception */
 if (env->spr[SPR_LPCR] & LPCR_OEE) {
-if (env->pending_interrupts & PPC_INTERRUPT_MCK) {
+if (pending_interrupts & PPC_INTERRUPT_MCK) {
 return PPC_INTERRUPT_MCK;
 }
-if (env->pending_interrupts & PPC_INTERRUPT_HMI) {
+if (pending_interrupts & PPC_INTERRUPT_HMI) {
 return PPC_INTERRUPT_HMI;
 }
 }
 /* Privileged Doorbell Exception */
-if ((env->pending_interrupts & PPC_INTERRUPT_DOORBELL) &&
+if ((pending_interrupts & PPC_INTERRUPT_DOORBELL) &&
 (env->spr[SPR_LPCR] & LPCR_PDEE)) {
 return PPC_INTERRUPT_DOORBELL;
 }
 /* Hypervisor Doorbell Exception */
-if ((env->pending_interrupts & PPC_INTERRUPT_HDOORBELL) &&
+if ((pending_interrupts & PPC_INTERRUPT_HDOORBELL) &&
 (env->spr[SPR_LPCR] & LPCR_HDEE)) {
 return PPC_INTERRUPT_HDOORBELL;
 }
 /* Hypervisor virtualization exception */
-if ((env->pending_interrupts & PPC_INTERRUPT_HVIRT) &&
+if ((pending_interrupts & PPC_INTERRUPT_HVIRT) &&
 (env->spr[SPR_LPCR] & LPCR_HVEE)) {
 return PPC_INTERRUPT_HVIRT;
 }
-if (env->pending_interrupts & PPC_INTERRUPT_RESET) {
+if (pending_interrupts & PPC_INTERRUPT_RESET) {
 return PPC_INTERRUPT_RESET;
 }
 return 0;
 }
 
-static int p9_next_unmasked_interrupt(CPUPPCState *env)
+static int p9_next_unmasked_interrupt(CPUPPCState *env,
+  uint32_t pending_interrupts)
 {
 CPUState *cs = env_cpu(env);
 
 /* Ignore MSR[EE] when coming out of some power management states */
 bool msr_ee = FIELD_EX64(env->msr, MSR, EE) || env->resume_as_sreset;
 
-assert((env->pending_interrupts & P9_UNUSED_INTERRUPTS) == 0);
+assert((pending_interrupts & P9_UNUSED_INTERRUPTS) == 0);
 
 if (cs->halted) {
 if (env->spr[SPR_PSSCR] & PSSCR_EC) {
@@ -1903,7 +1906,7 @@ static int p9_next_unmasked_interrupt(CPUPPCState *env)
  * When PSSCR[EC] is set, LPCR[PECE] controls which interrupts can
  * wakeup the processor
  */
-return p9_interrupt_powersave(env);
+return p9_interrupt_powersave(env, pending_interrupts);
 } else {
 /*
  * When it's clear, any system-caused exception exits power-saving
@@ -1914,12 +1917,12 @@ static int p9_next_unmasked_interrupt(CPUPPCState *env)
 }
 
 /* Machine check exception */
-if (env->pending_interrupts & PPC_INTERRUPT_MCK) {
+if (pending_interrupts & PPC_INTERRUPT_MCK) {
 return PPC_INTERRUPT_MCK;
 }
 
 /* Hypervisor decrementer exception */
-if (env->pending_interrupts & PPC_INTERRUPT_HDECR) {
+if (pending_interrupts & PPC_INTERRUPT_HDECR) {
 /* LPCR will be clear when not supported so this will work */
 bool hdice = !!(env->spr[SPR_LPCR] & LPCR_HDICE);
 if ((msr_ee || !FIELD_EX64_HV(env->msr)) && hdice) {
@@ -1929,7 +1932,7 @@ static int p9_next_unmasked_interrupt(CPUPPCState *env)
 }
 
 /* Hypervisor virtualization interrupt */
-if (env->pending_interrupt

[PATCH v2 5/7] target/ppc: optimize p9 exception handling routines for lpcr

2024-05-22 Thread Harsh Prateek Bora

Like pending_interrupts, env->spr[SPR_LPCR] is being used at multiple
places across p9 exception handlers. Pass the value during entry and
avoid multiple indirect accesses.

Signed-off-by: Harsh Prateek Bora 
---
 target/ppc/excp_helper.c | 33 ++---
 1 file changed, 18 insertions(+), 15 deletions(-)

diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
index 704eddac63..d3db81e6ae 100644
--- a/target/ppc/excp_helper.c
+++ b/target/ppc/excp_helper.c
@@ -1843,13 +1843,14 @@ static int p8_next_unmasked_interrupt(CPUPPCState *env)
  PPC_INTERRUPT_PIT | PPC_INTERRUPT_THERM)
 
 static int p9_interrupt_powersave(CPUPPCState *env,
-  uint32_t pending_interrupts)
+  uint32_t pending_interrupts,
+  target_ulong lpcr)
 {
 
 /* External Exception */
 if ((pending_interrupts & PPC_INTERRUPT_EXT) &&
-(env->spr[SPR_LPCR] & LPCR_EEE)) {
-bool heic = !!(env->spr[SPR_LPCR] & LPCR_HEIC);
+(lpcr & LPCR_EEE)) {
+bool heic = !!(lpcr & LPCR_HEIC);
 if (!heic || !FIELD_EX64_HV(env->msr) ||
 FIELD_EX64(env->msr, MSR, PR)) {
 return PPC_INTERRUPT_EXT;
@@ -1857,11 +1858,11 @@ static int p9_interrupt_powersave(CPUPPCState *env,
 }
 /* Decrementer Exception */
 if ((pending_interrupts & PPC_INTERRUPT_DECR) &&
-(env->spr[SPR_LPCR] & LPCR_DEE)) {
+(lpcr & LPCR_DEE)) {
 return PPC_INTERRUPT_DECR;
 }
 /* Machine Check or Hypervisor Maintenance Exception */
-if (env->spr[SPR_LPCR] & LPCR_OEE) {
+if (lpcr & LPCR_OEE) {
 if (pending_interrupts & PPC_INTERRUPT_MCK) {
 return PPC_INTERRUPT_MCK;
 }
@@ -1871,17 +1872,17 @@ static int p9_interrupt_powersave(CPUPPCState *env,
 }
 /* Privileged Doorbell Exception */
 if ((pending_interrupts & PPC_INTERRUPT_DOORBELL) &&
-(env->spr[SPR_LPCR] & LPCR_PDEE)) {
+(lpcr & LPCR_PDEE)) {
 return PPC_INTERRUPT_DOORBELL;
 }
 /* Hypervisor Doorbell Exception */
 if ((pending_interrupts & PPC_INTERRUPT_HDOORBELL) &&
-(env->spr[SPR_LPCR] & LPCR_HDEE)) {
+(lpcr & LPCR_HDEE)) {
 return PPC_INTERRUPT_HDOORBELL;
 }
 /* Hypervisor virtualization exception */
 if ((pending_interrupts & PPC_INTERRUPT_HVIRT) &&
-(env->spr[SPR_LPCR] & LPCR_HVEE)) {
+(lpcr & LPCR_HVEE)) {
 return PPC_INTERRUPT_HVIRT;
 }
 if (pending_interrupts & PPC_INTERRUPT_RESET) {
@@ -1891,7 +1892,8 @@ static int p9_interrupt_powersave(CPUPPCState *env,
 }
 
 static int p9_next_unmasked_interrupt(CPUPPCState *env,
-  uint32_t pending_interrupts)
+  uint32_t pending_interrupts,
+  target_ulong lpcr)
 {
 CPUState *cs = env_cpu(env);
 
@@ -1906,7 +1908,7 @@ static int p9_next_unmasked_interrupt(CPUPPCState *env,
  * When PSSCR[EC] is set, LPCR[PECE] controls which interrupts can
  * wakeup the processor
  */
-return p9_interrupt_powersave(env, pending_interrupts);
+return p9_interrupt_powersave(env, pending_interrupts, lpcr);
 } else {
 /*
  * When it's clear, any system-caused exception exits power-saving
@@ -1924,7 +1926,7 @@ static int p9_next_unmasked_interrupt(CPUPPCState *env,
 /* Hypervisor decrementer exception */
 if (pending_interrupts & PPC_INTERRUPT_HDECR) {
 /* LPCR will be clear when not supported so this will work */
-bool hdice = !!(env->spr[SPR_LPCR] & LPCR_HDICE);
+bool hdice = !!(lpcr & LPCR_HDICE);
 if ((msr_ee || !FIELD_EX64_HV(env->msr)) && hdice) {
 /* HDEC clears on delivery */
 return PPC_INTERRUPT_HDECR;
@@ -1934,7 +1936,7 @@ static int p9_next_unmasked_interrupt(CPUPPCState *env,
 /* Hypervisor virtualization interrupt */
 if (pending_interrupts & PPC_INTERRUPT_HVIRT) {
 /* LPCR will be clear when not supported so this will work */
-bool hvice = !!(env->spr[SPR_LPCR] & LPCR_HVICE);
+bool hvice = !!(lpcr & LPCR_HVICE);
 if ((msr_ee || !FIELD_EX64_HV(env->msr)) && hvice) {
 return PPC_INTERRUPT_HVIRT;
 }
@@ -1942,8 +1944,8 @@ static int p9_next_unmasked_interrupt(CPUPPCState *env,
 
 /* External interrupt can ignore MSR:EE under some circumstances */
 if (pending_interrupts & PPC_INTERRUPT_EXT) {
-bool lpes0 = !!(env->spr[SPR_LPCR] & LPCR_LPES0);
-bool heic = !!(env->spr[SPR_LPCR] & LPCR_HEIC);
+bool lpes0 = !!(lpcr & LPCR_LPES0);
+bo

[PATCH v2 1/7] target/ppc: use locally stored msr and avoid indirect access

2024-05-22 Thread Harsh Prateek Bora

hreg_compute_hflags_value already stores msr locally to be used in most
of the logic in the routine however some instances are still using
env->msr which is unnecessary. Use locally stored value as available.

Signed-off-by: Harsh Prateek Bora 
---
 target/ppc/helper_regs.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/ppc/helper_regs.c b/target/ppc/helper_regs.c
index 25258986e3..945fa1a596 100644
--- a/target/ppc/helper_regs.c
+++ b/target/ppc/helper_regs.c
@@ -106,10 +106,10 @@ static uint32_t hreg_compute_hflags_value(CPUPPCState 
*env)
 
 if (ppc_flags & POWERPC_FLAG_DE) {
 target_ulong dbcr0 = env->spr[SPR_BOOKE_DBCR0];
-if ((dbcr0 & DBCR0_ICMP) && FIELD_EX64(env->msr, MSR, DE)) {
+if ((dbcr0 & DBCR0_ICMP) && FIELD_EX64(msr, MSR, DE)) {
 hflags |= 1 << HFLAGS_SE;
 }
-if ((dbcr0 & DBCR0_BRT) && FIELD_EX64(env->msr, MSR, DE)) {
+if ((dbcr0 & DBCR0_BRT) && FIELD_EX64(msr, MSR, DE)) {
 hflags |= 1 << HFLAGS_BE;
 }
 } else {
-- 
2.39.3

[PATCH 6/6] target/ppc: redue code duplication across Power9/10 init code

2024-05-20 Thread Harsh Prateek Bora

Power9/10 initialization code consists of a lot of logical OR of
various flag bits as supported by respective Power platform during its
initialization, most of which is duplicated and only selected bits are
added or removed as needed with each new platform support being added.
Remove the duplicate code and share using common macros.

Signed-off-by: Harsh Prateek Bora 
---
 target/ppc/cpu_init.h |  79 +++
 target/ppc/cpu_init.c | 123 ++
 2 files changed, 94 insertions(+), 108 deletions(-)
 create mode 100644 target/ppc/cpu_init.h

diff --git a/target/ppc/cpu_init.h b/target/ppc/cpu_init.h
new file mode 100644
index 00..29358bfdf6
--- /dev/null
+++ b/target/ppc/cpu_init.h
@@ -0,0 +1,79 @@
+#ifndef TARGET_PPC_CPU_INIT_H
+#define TARGET_PPC_CPU_INIT_H
+
+#define POWERPC_FAMILY_POWER9_INSNS_FLAGS   \
+PPC_INSNS_BASE | PPC_ISEL | PPC_STRING | PPC_MFTB | \
+PPC_FLOAT | PPC_FLOAT_FSEL | PPC_FLOAT_FRES |   \
+PPC_FLOAT_FSQRT | PPC_FLOAT_FRSQRTE | PPC_FLOAT_FRSQRTES |  \
+PPC_FLOAT_STFIWX | PPC_FLOAT_EXT |PPC_CACHE | PPC_CACHE_ICBI |  \
+PPC_CACHE_DCBZ | PPC_MEM_SYNC | PPC_MEM_EIEIO | PPC_MEM_TLBIE | \
+PPC_MEM_TLBSYNC | PPC_64B | PPC_64H | PPC_64BX | PPC_ALTIVEC |  \
+PPC_SEGMENT_64B | PPC_SLBI | PPC_POPCNTB | PPC_POPCNTWD |   \
+PPC_CILDST
+#define POWERPC_FAMILY_POWER10_INSNS_FLAGS \
+POWERPC_FAMILY_POWER9_INSNS_FLAGS
+
+#define POWERPC_FAMILY_POWER9_INSNS_FLAGS2_COMMON   \
+PPC2_VSX | PPC2_VSX207 | PPC2_DFP | PPC2_DBRX | \
+PPC2_PERM_ISA206 | PPC2_DIVE_ISA206 | PPC2_ATOMIC_ISA206 |  \
+PPC2_FP_CVT_ISA206 | PPC2_FP_TST_ISA206 | PPC2_BCTAR_ISA207 |   \
+PPC2_LSQ_ISA207 | PPC2_ALTIVEC_207 | PPC2_ISA205 |  \
+PPC2_ISA207S | PPC2_FP_CVT_S64 | PPC2_ISA300 | PPC2_PRCNTL |\
+PPC2_MEM_LWSYNC | PPC2_BCDA_ISA206
+
+#define POWERPC_FAMILY_POWER9_INSNS_FLAGS2  \
+POWERPC_FAMILY_POWER9_INSNS_FLAGS2_COMMON | PPC2_TM
+#define POWERPC_FAMILY_POWER10_INSNS_FLAGS2 \
+POWERPC_FAMILY_POWER9_INSNS_FLAGS2_COMMON | PPC2_ISA310
+
+#define POWERPC_POWER9_COMMON_PCC_MSR_MASK \
+(1ull << MSR_SF) | \
+(1ull << MSR_HV) | \
+(1ull << MSR_VR) | \
+(1ull << MSR_VSX) |\
+(1ull << MSR_EE) | \
+(1ull << MSR_PR) | \
+(1ull << MSR_FP) | \
+(1ull << MSR_ME) | \
+(1ull << MSR_FE0) |\
+(1ull << MSR_SE) | \
+(1ull << MSR_DE) | \
+(1ull << MSR_FE1) |\
+(1ull << MSR_IR) | \
+(1ull << MSR_DR) | \
+(1ull << MSR_PMM) |\
+(1ull << MSR_RI) | \
+(1ull << MSR_LE)
+
+#define POWERPC_POWER9_PCC_MSR_MASK \
+POWERPC_POWER9_COMMON_PCC_MSR_MASK | (1ull << MSR_TM)
+#define POWERPC_POWER10_PCC_MSR_MASK \
+POWERPC_POWER9_COMMON_PCC_MSR_MASK
+#define POWERPC_POWER9_PCC_PCR_MASK \
+PCR_COMPAT_2_05 | PCR_COMPAT_2_06 | PCR_COMPAT_2_07
+#define POWERPC_POWER10_PCC_PCR_MASK \
+POWERPC_POWER9_PCC_PCR_MASK | PCR_COMPAT_3_00
+#define POWERPC_POWER9_PCC_PCR_SUPPORTED \
+PCR_COMPAT_3_00 | PCR_COMPAT_2_07 | PCR_COMPAT_2_06 | PCR_COMPAT_2_05
+#define POWERPC_POWER10_PCC_PCR_SUPPORTED \
+POWERPC_POWER9_PCC_PCR_SUPPORTED | PCR_COMPAT_3_10
+#define POWERPC_POWER9_PCC_LPCR_MASK\
+LPCR_VPM1 | LPCR_ISL | LPCR_KBV | LPCR_DPFD |   \
+(LPCR_PECE_U_MASK & LPCR_HVEE) | LPCR_ILE | LPCR_AIL |  \
+LPCR_UPRT | LPCR_EVIRT | LPCR_ONL | LPCR_HR | LPCR_LD | \
+(LPCR_PECE_L_MASK & (LPCR_PDEE|LPCR_HDEE|LPCR_EEE|LPCR_DEE|LPCR_OEE)) | \
+LPCR_MER | LPCR_GTSE | LPCR_TC | LPCR_HEIC | LPCR_LPES0 | LPCR_HVICE |  \
+LPCR_HDICE
+/* DD2 adds an extra HAIL bit */
+#define POWERPC_POWER10_PCC_LPCR_MASK \
+POWERPC_POWER9_PCC_LPCR_MASK | LPCR_HAIL
+#define POWERPC_POWER9_PCC_FLAGS_COMMON \
+POWERPC_FLAG_VRE | POWERPC_FLAG_SE | POWERPC_FLAG_BE |  \
+POWERPC_FLAG_PMM | POWERPC_FLAG_BUS_CLK | POWERPC_FLAG_CFAR |   \
+POWERPC_FLAG_VSX | POWERPC_FLAG_SCV
+
+#define POWERPC_POWER9_PCC_FLAGS  \
+POWERPC_POWER9_PCC_FLAGS_COMMON | POWERPC_FLAG_TM
+#define POWERPC_POWER10_PCC_FLAGS POWERPC_POWER9_PCC_FLAGS_COMMON
+
+#endif /* TARGET_PPC_CPU_INIT_H */
diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
index 636e12ba7a..48773ec831 100644
--- a/target/ppc/cpu_init.c
+++ b/target/ppc/cpu_init.c
@@ -51,6 +5

[PATCH 3/6] target/ppc: optimize hreg_compute_pmu_hflags_value

2024-05-20 Thread Harsh Prateek Bora

The second if-condition can be true only if the first one above is true.
Enclose the latter into the former to avoid un-necessary check if first
condition fails.

Signed-off-by: Harsh Prateek Bora 
---
 target/ppc/helper_regs.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/target/ppc/helper_regs.c b/target/ppc/helper_regs.c
index 5de0df5795..89aacdf212 100644
--- a/target/ppc/helper_regs.c
+++ b/target/ppc/helper_regs.c
@@ -66,9 +66,9 @@ static uint32_t hreg_compute_pmu_hflags_value(CPUPPCState 
*env)
 #ifndef CONFIG_USER_ONLY
 if (env->pmc_ins_cnt) {
 hflags |= 1 << HFLAGS_INSN_CNT;
-}
-if (env->pmc_ins_cnt & 0x1e) {
-hflags |= 1 << HFLAGS_PMC_OTHER;
+if (env->pmc_ins_cnt & 0x1e) {
+hflags |= 1 << HFLAGS_PMC_OTHER;
+}
 }
 #endif
 #endif
-- 
2.39.3

[PATCH 4/6] target/ppc: optimize p9 exception handling routines

2024-05-20 Thread Harsh Prateek Bora

Currently, p9 exception handling has multiple if-condition checks where
it does an indirect access to pending_interrupts via env. Cache the
value during entry and reuse later to avoid multiple indirect accesses.

Signed-off-by: Harsh Prateek Bora 
---
 target/ppc/excp_helper.c | 39 +--
 1 file changed, 21 insertions(+), 18 deletions(-)

diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
index 0712098cf7..4f158196bb 100644
--- a/target/ppc/excp_helper.c
+++ b/target/ppc/excp_helper.c
@@ -1844,8 +1844,10 @@ static int p8_next_unmasked_interrupt(CPUPPCState *env)
 
 static int p9_interrupt_powersave(CPUPPCState *env)
 {
+uint32_t pending_interrupts = env->pending_interrupts;
+
 /* External Exception */
-if ((env->pending_interrupts & PPC_INTERRUPT_EXT) &&
+if ((pending_interrupts & PPC_INTERRUPT_EXT) &&
 (env->spr[SPR_LPCR] & LPCR_EEE)) {
 bool heic = !!(env->spr[SPR_LPCR] & LPCR_HEIC);
 if (!heic || !FIELD_EX64_HV(env->msr) ||
@@ -1854,35 +1856,35 @@ static int p9_interrupt_powersave(CPUPPCState *env)
 }
 }
 /* Decrementer Exception */
-if ((env->pending_interrupts & PPC_INTERRUPT_DECR) &&
+if ((pending_interrupts & PPC_INTERRUPT_DECR) &&
 (env->spr[SPR_LPCR] & LPCR_DEE)) {
 return PPC_INTERRUPT_DECR;
 }
 /* Machine Check or Hypervisor Maintenance Exception */
 if (env->spr[SPR_LPCR] & LPCR_OEE) {
-if (env->pending_interrupts & PPC_INTERRUPT_MCK) {
+if (pending_interrupts & PPC_INTERRUPT_MCK) {
 return PPC_INTERRUPT_MCK;
 }
-if (env->pending_interrupts & PPC_INTERRUPT_HMI) {
+if (pending_interrupts & PPC_INTERRUPT_HMI) {
 return PPC_INTERRUPT_HMI;
 }
 }
 /* Privileged Doorbell Exception */
-if ((env->pending_interrupts & PPC_INTERRUPT_DOORBELL) &&
+if ((pending_interrupts & PPC_INTERRUPT_DOORBELL) &&
 (env->spr[SPR_LPCR] & LPCR_PDEE)) {
 return PPC_INTERRUPT_DOORBELL;
 }
 /* Hypervisor Doorbell Exception */
-if ((env->pending_interrupts & PPC_INTERRUPT_HDOORBELL) &&
+if ((pending_interrupts & PPC_INTERRUPT_HDOORBELL) &&
 (env->spr[SPR_LPCR] & LPCR_HDEE)) {
 return PPC_INTERRUPT_HDOORBELL;
 }
 /* Hypervisor virtualization exception */
-if ((env->pending_interrupts & PPC_INTERRUPT_HVIRT) &&
+if ((pending_interrupts & PPC_INTERRUPT_HVIRT) &&
 (env->spr[SPR_LPCR] & LPCR_HVEE)) {
 return PPC_INTERRUPT_HVIRT;
 }
-if (env->pending_interrupts & PPC_INTERRUPT_RESET) {
+if (pending_interrupts & PPC_INTERRUPT_RESET) {
 return PPC_INTERRUPT_RESET;
 }
 return 0;
@@ -1891,11 +1893,12 @@ static int p9_interrupt_powersave(CPUPPCState *env)
 static int p9_next_unmasked_interrupt(CPUPPCState *env)
 {
 CPUState *cs = env_cpu(env);
+uint32_t pending_interrupts = env->pending_interrupts;
 
 /* Ignore MSR[EE] when coming out of some power management states */
 bool msr_ee = FIELD_EX64(env->msr, MSR, EE) || env->resume_as_sreset;
 
-assert((env->pending_interrupts & P9_UNUSED_INTERRUPTS) == 0);
+assert((pending_interrupts & P9_UNUSED_INTERRUPTS) == 0);
 
 if (cs->halted) {
 if (env->spr[SPR_PSSCR] & PSSCR_EC) {
@@ -1914,12 +1917,12 @@ static int p9_next_unmasked_interrupt(CPUPPCState *env)
 }
 
 /* Machine check exception */
-if (env->pending_interrupts & PPC_INTERRUPT_MCK) {
+if (pending_interrupts & PPC_INTERRUPT_MCK) {
 return PPC_INTERRUPT_MCK;
 }
 
 /* Hypervisor decrementer exception */
-if (env->pending_interrupts & PPC_INTERRUPT_HDECR) {
+if (pending_interrupts & PPC_INTERRUPT_HDECR) {
 /* LPCR will be clear when not supported so this will work */
 bool hdice = !!(env->spr[SPR_LPCR] & LPCR_HDICE);
 if ((msr_ee || !FIELD_EX64_HV(env->msr)) && hdice) {
@@ -1929,7 +1932,7 @@ static int p9_next_unmasked_interrupt(CPUPPCState *env)
 }
 
 /* Hypervisor virtualization interrupt */
-if (env->pending_interrupts & PPC_INTERRUPT_HVIRT) {
+if (pending_interrupts & PPC_INTERRUPT_HVIRT) {
 /* LPCR will be clear when not supported so this will work */
 bool hvice = !!(env->spr[SPR_LPCR] & LPCR_HVICE);
 if ((msr_ee || !FIELD_EX64_HV(env->msr)) && hvice) {
@@ -1938,7 +1941,7 @@ static int p9_next_unmasked_interrupt(CPUPPCState *env)
 }
 
 /* External interrupt can ignore MSR:EE under some circumstances */
-if (env->pending_interrupts & PPC_INTERRUPT_EXT) {
+if (pending_interrupts & PPC_INTERRUPT_EXT) {
 bool lpes0 =

[PATCH 2/6] target/ppc: optimize hreg_compute_pmu_hflags_value

2024-05-20 Thread Harsh Prateek Bora

Cache env->spr[SPR_POWER_MMCR0] in a local variable as used in multiple
conditions to avoid multiple indirect accesses.

Signed-off-by: Harsh Prateek Bora 
---
 target/ppc/helper_regs.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/target/ppc/helper_regs.c b/target/ppc/helper_regs.c
index 945fa1a596..5de0df5795 100644
--- a/target/ppc/helper_regs.c
+++ b/target/ppc/helper_regs.c
@@ -50,15 +50,16 @@ void hreg_swap_gpr_tgpr(CPUPPCState *env)
 static uint32_t hreg_compute_pmu_hflags_value(CPUPPCState *env)
 {
 uint32_t hflags = 0;
-
 #if defined(TARGET_PPC64)
-if (env->spr[SPR_POWER_MMCR0] & MMCR0_PMCC0) {
+target_ulong spr_power_mmcr0 = env->spr[SPR_POWER_MMCR0];
+
+if (spr_power_mmcr0 & MMCR0_PMCC0) {
 hflags |= 1 << HFLAGS_PMCC0;
 }
-if (env->spr[SPR_POWER_MMCR0] & MMCR0_PMCC1) {
+if (spr_power_mmcr0 & MMCR0_PMCC1) {
 hflags |= 1 << HFLAGS_PMCC1;
 }
-if (env->spr[SPR_POWER_MMCR0] & MMCR0_PMCjCE) {
+if (spr_power_mmcr0 & MMCR0_PMCjCE) {
 hflags |= 1 << HFLAGS_PMCJCE;
 }
 
-- 
2.39.3

[PATCH 1/6] target/ppc: use locally stored msr and avoid indirect access

2024-05-20 Thread Harsh Prateek Bora

hreg_compute_hflags_value already stores msr locally to be used in most
of the logic in the routine however some instances are still using
env->msr which is unnecessary. Use locally stored value as available.

Signed-off-by: Harsh Prateek Bora 
---
 target/ppc/helper_regs.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/ppc/helper_regs.c b/target/ppc/helper_regs.c
index 25258986e3..945fa1a596 100644
--- a/target/ppc/helper_regs.c
+++ b/target/ppc/helper_regs.c
@@ -106,10 +106,10 @@ static uint32_t hreg_compute_hflags_value(CPUPPCState 
*env)
 
 if (ppc_flags & POWERPC_FLAG_DE) {
 target_ulong dbcr0 = env->spr[SPR_BOOKE_DBCR0];
-if ((dbcr0 & DBCR0_ICMP) && FIELD_EX64(env->msr, MSR, DE)) {
+if ((dbcr0 & DBCR0_ICMP) && FIELD_EX64(msr, MSR, DE)) {
 hflags |= 1 << HFLAGS_SE;
 }
-if ((dbcr0 & DBCR0_BRT) && FIELD_EX64(env->msr, MSR, DE)) {
+if ((dbcr0 & DBCR0_BRT) && FIELD_EX64(msr, MSR, DE)) {
 hflags |= 1 << HFLAGS_BE;
 }
 } else {
-- 
2.39.3

[PATCH 0/6] target/ppc: misc ppc improvements/optimizations

2024-05-20 Thread Harsh Prateek Bora

This a set of misc ppc arch specific code improvements/optimizations.
Although there exists similar instances for potential improvements in
the legacy ppc code, however, that can be taken up later as well.

Harsh Prateek Bora (6):
  target/ppc: use locally stored msr and avoid indirect access
  target/ppc: optimize hreg_compute_pmu_hflags_value
  target/ppc: optimize hreg_compute_pmu_hflags_value
  target/ppc: optimize p9 exception handling routines
  target/ppc: reduce duplicate code between init_proc_POWER{9,10}
  target/ppc: redue code duplication across Power9/10 init code

 target/ppc/cpu_init.h|  79 +++
 target/ppc/cpu_init.c| 166 ++-
 target/ppc/excp_helper.c |  39 -
 target/ppc/helper_regs.c |  19 ++---
 4 files changed, 134 insertions(+), 169 deletions(-)
 create mode 100644 target/ppc/cpu_init.h

-- 
2.39.3

[PATCH 5/6] target/ppc: reduce duplicate code between init_proc_POWER{9, 10}

2024-05-20 Thread Harsh Prateek Bora

Historically, the registration of sprs have been inherited alongwith
every new Power arch support being added leading to a lot of code
duplication. It's time to do necessary cleanups now to avoid further
duplication with newer arch support being added.

Signed-off-by: Harsh Prateek Bora 
---
 target/ppc/cpu_init.c | 43 +--
 1 file changed, 9 insertions(+), 34 deletions(-)

diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
index 6d82f24c87..636e12ba7a 100644
--- a/target/ppc/cpu_init.c
+++ b/target/ppc/cpu_init.c
@@ -6307,7 +6307,7 @@ static struct ppc_radix_page_info POWER9_radix_page_info 
= {
 };
 #endif /* CONFIG_USER_ONLY */
 
-static void init_proc_POWER9(CPUPPCState *env)
+static inline void register_power9_common_sprs(CPUPPCState *env)
 {
 /* Common Registers */
 init_proc_book3s_common(env);
@@ -6326,7 +6326,6 @@ static void init_proc_POWER9(CPUPPCState *env)
 register_power5p_ear_sprs(env);
 register_power5p_tb_sprs(env);
 register_power6_common_sprs(env);
-register_HEIR32_spr(env);
 register_power6_dbg_sprs(env);
 register_power8_tce_address_control_sprs(env);
 register_power8_ids_sprs(env);
@@ -6342,6 +6341,12 @@ static void init_proc_POWER9(CPUPPCState *env)
 register_power9_book4_sprs(env);
 register_power8_rpr_sprs(env);
 register_power9_mmu_sprs(env);
+}
+
+static void init_proc_POWER9(CPUPPCState *env)
+{
+register_power9_common_sprs(env);
+register_HEIR32_spr(env);
 
 /* POWER9 Specific registers */
 spr_register_kvm(env, SPR_TIDR, "TIDR", NULL, NULL,
@@ -6499,39 +6504,9 @@ static struct ppc_radix_page_info 
POWER10_radix_page_info = {
 
 static void init_proc_POWER10(CPUPPCState *env)
 {
-/* Common Registers */
-init_proc_book3s_common(env);
-register_book3s_207_dbg_sprs(env);
-
-/* Common TCG PMU */
-init_tcg_pmu_power8(env);
-
-/* POWER8 Specific Registers */
-register_book3s_ids_sprs(env);
-register_amr_sprs(env);
-register_iamr_sprs(env);
-register_book3s_purr_sprs(env);
-register_power5p_common_sprs(env);
-register_power5p_lpar_sprs(env);
-register_power5p_ear_sprs(env);
-register_power5p_tb_sprs(env);
-register_power6_common_sprs(env);
+register_power9_common_sprs(env);
 register_HEIR64_spr(env);
-register_power6_dbg_sprs(env);
-register_power8_tce_address_control_sprs(env);
-register_power8_ids_sprs(env);
-register_power8_ebb_sprs(env);
-register_power8_fscr_sprs(env);
-register_power8_pmu_sup_sprs(env);
-register_power8_pmu_user_sprs(env);
-register_power8_tm_sprs(env);
-register_power8_pspb_sprs(env);
-register_power8_dpdes_sprs(env);
-register_vtb_sprs(env);
-register_power8_ic_sprs(env);
-register_power9_book4_sprs(env);
-register_power8_rpr_sprs(env);
-register_power9_mmu_sprs(env);
+
 register_power10_hash_sprs(env);
 register_power10_dexcr_sprs(env);
 register_power10_pmu_sup_sprs(env);
-- 
2.39.3

Re: [PATCH] target/ppc: handle vcpu hotplug failure gracefully

2024-05-20 Thread Harsh Prateek Bora





On 5/17/24 09:30, Nicholas Piggin wrote:

On Thu May 16, 2024 at 2:31 PM AEST, Harsh Prateek Bora wrote:

Hi Nick,

On 5/14/24 08:39, Nicholas Piggin wrote:

On Tue Apr 23, 2024 at 4:30 PM AEST, Harsh Prateek Bora wrote:

+ qemu-devel

On 4/23/24 11:40, Harsh Prateek Bora wrote:

On ppc64, the PowerVM hypervisor runs with limited memory and a VCPU
creation during hotplug may fail during kvm_ioctl for KVM_CREATE_VCPU,
leading to termination of guest since errp is set to _fatal while
calling kvm_init_vcpu. This unexpected behaviour can be avoided by
pre-creating vcpu and parking it on success or return error otherwise.
This enables graceful error delivery for any vcpu hotplug failures while
the guest can keep running.


So this puts in on the park list so when kvm_init_vcpu() later runs it
will just take it off the park list instead of issuing another
KVM_CREATE_VCPU ioctl.

And kvm_init_vcpu() runs in the vcpu thread function, which does not
have a good way to indicate failure to the caller.

I'm don't know a lot about this part of qemu but it seems like a good
idea to move fail-able initialisation out of the vcpu thread in that
case. So the general idea seems good to me.



Yeh ..



Based on api refactoring to create/park vcpus introduced in 1/8 of patch series:
https://lore.kernel.org/qemu-devel/2024031202.12992-2-salil.me...@huawei.com/


So from this series AFAIKS you're just using kvm_create / kvm_park
routines? You could easily pull that patch 1 out ahead of that larger
series if progress is slow on it, it's a decent cleanup by itself by
the looks.



Yeh, patch 1 of that series is only we need but the author mentioned on
the list that he is about to post next version soon.



Tested OK by repeatedly doing a hotplug/unplug of vcpus as below:

#virsh setvcpus hotplug 40
#virsh setvcpus hotplug 70
error: internal error: unable to execute QEMU command 'device_add':
kvmppc_cpu_realize: vcpu hotplug failed with -12

Reported-by: Anushree Mathur 
Suggested-by: Shivaprasad G Bhat 
Suggested-by: Vaibhav Jain 
Signed-off by: Harsh Prateek Bora 
---
---
target/ppc/kvm.c | 42 ++
1 file changed, 42 insertions(+)

diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
index 8231feb2d4..c887f6dfa0 100644
--- a/target/ppc/kvm.c
+++ b/target/ppc/kvm.c
@@ -48,6 +48,8 @@
#include "qemu/mmap-alloc.h"
#include "elf.h"
#include "sysemu/kvm_int.h"
+#include "sysemu/kvm.h"
+#include "hw/core/accel-cpu.h"

#define PROC_DEVTREE_CPU  "/proc/device-tree/cpus/"

@@ -2339,6 +2341,43 @@ static void alter_insns(uint64_t *word, uint64_t flags, bool on)

}
}

+static int max_cpu_index = 0;

+
+static bool kvmppc_cpu_realize(CPUState *cs, Error **errp)
+{
+int ret;
+
+cs->cpu_index = max_cpu_index++;
+
+POWERPC_CPU(cs)->vcpu_id = cs->cpu_index;


So you're overriding the cpu_get_free_index() allocator here.
And you need to because vcpu_id needs to be assigned before
the KVM create, I guess.



Yes ..


I guess it works. I would add a comment like s390x has.


Not sure which comment you were referring to but with exporting
cpu_get_free_index as suggested later, not sure if we still need any
comment.


Yeah that's true.


+
+if (cs->parent_obj.hotplugged) {


Can _all_ kvm cpu creation go via this path? Why just limit it to
hotplugged?


For the initial bootup, we actually want to abort if the requested vCPUs
cant be allocated so that user can retry until the requested vCPUs are
allocated. For hotplug failure, bringing down entire guest isn't fair,
hence the fix.


But you could make the error handling depend on hotplugged, no?
Perhaps put that error handling decision in common code so policy
is the same for all targets and back ends.


Hmm, I think just setting errp appropriately would suffice for both
cases as existing behaviour takes care of the rest of handling.
Something like below:

+static bool kvmppc_cpu_realize(CPUState *cs, Error **errp)
+{
+int ret;
+const char *vcpu_str = (cs->parent_obj.hotplugged == true) ?
+   "hotplug" : "create";
+cs->cpu_index = cpu_get_free_index();
+
+POWERPC_CPU(cs)->vcpu_id = cs->cpu_index;
+
+/* create and park to fail gracefully in case vcpu hotplug fails */
+ret = kvm_create_and_park_vcpu(cs);
+if (ret) {
+error_setg(errp, "%s: vcpu %s failed with %d",
+ __func__, vcpu_str, ret);
+return false;
+}
+return true;
+}




[...]


+}
+
static void kvmppc_host_cpu_class_init(ObjectClass *oc, void *data)
{
PowerPCCPUClass *pcc = POWERPC_CPU_CLASS(oc);
@@ -2963,4 +3002,7 @@ bool kvm_arch_cpu_check_are_resettable(void)

void kvm_arch_accel_class_init(ObjectClass *oc)

{
+AccelClass *ac = ACCEL_CLASS(oc);
+ac->cpu_common_

Re: [PATCH] ppc/spapr: Add ibm,pi-features

2024-05-20 Thread Harsh Prateek Bora





On 5/18/24 15:26, Nicholas Piggin wrote:

The ibm,pi-features property has a bit to say whether or not
msgsndp should be used. Linux checks if it is being run under
KVM and avoids msgsndp anyway, but it would be preferable to
rely on this bit.

Signed-off-by: Nicholas Piggin 


Reviewed-by: Harsh Prateek Bora 


---
  hw/ppc/spapr.c | 27 +++
  1 file changed, 27 insertions(+)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 611a9e5184..6891d91e6e 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -353,6 +353,31 @@ static void spapr_dt_pa_features(SpaprMachineState *spapr,
  _FDT((fdt_setprop(fdt, offset, "ibm,pa-features", pa_features, pa_size)));
  }
  
+static void spapr_dt_pi_features(SpaprMachineState *spapr,

+ PowerPCCPU *cpu,
+ void *fdt, int offset)
+{
+uint8_t pi_features[] = { 1, 0,
+0x00 };
+
+if (kvm_enabled() && ppc_check_compat(cpu, CPU_POWERPC_LOGICAL_3_00,
+  0, cpu->compat_pvr)) {
+/*
+ * POWER9 and later CPUs with KVM run in LPAR-per-thread mode where
+ * all threads are essentially independent CPUs, and msgsndp does not
+ * work (because it is physically-addressed) and therefore is
+ * emulated by KVM, so disable it here to ensure XIVE will be used.
+ * This is both KVM and CPU implementation-specific behaviour so a KVM
+ * cap would be cleanest, but for now this works. If KVM ever permits
+ * native msgsndp execution by guests, a cap could be added at that
+ * time.
+ */
+pi_features[2] |= 0x08; /* 4: No msgsndp */
+}
+
+_FDT((fdt_setprop(fdt, offset, "ibm,pi-features", pi_features, 
sizeof(pi_features;
+}
+
  static hwaddr spapr_node0_size(MachineState *machine)
  {
  if (machine->numa_state->num_nodes) {
@@ -815,6 +840,8 @@ static void spapr_dt_cpu(CPUState *cs, void *fdt, int 
offset,
  
  spapr_dt_pa_features(spapr, cpu, fdt, offset);
  
+spapr_dt_pi_features(spapr, cpu, fdt, offset);

+
  _FDT((fdt_setprop_cell(fdt, offset, "ibm,chip-id",
 cs->cpu_index / vcpus_per_socket)));

Re: [PATCH v2 1/4] accel/kvm: Extract common KVM vCPU {creation, parking} code

2024-05-16 Thread Harsh Prateek Bora

Hi Salil,

On 5/16/24 19:05, Salil Mehta wrote:

  From: Harsh Prateek Bora 
  Sent: Thursday, May 16, 2024 2:07 PM

  Hi Salil,

  On 5/16/24 17:42, Salil Mehta wrote:

  > Hi Harsh,
  >
  >>   From: Harsh Prateek Bora 
  >>   Sent: Thursday, May 16, 2024 11:15 AM
  >>
  >>   Hi Salil,
  >>
  >>   Thanks for your email.
  >>   Your patch 1/8 is included here based on review comments on my  previous
  >>   patch from one of the maintainers in the community and therefore I  had
  >>   kept you in CC to be aware of the desire of having this independent 
patch to
  >>   get merged earlier even if your other patches in the series may go 
through
  >>   further reviews.
  >
  > I really don’t know which discussion are  you pointing at? Please
  > understand you are fixing a bug and we are pushing a feature which has got 
large series.
  > It will break the patch-set  which is about t be merged.
  >
  > There will be significant overhead of testing on us for the work we
  > have been carrying forward for large time. This will be disruptive. Please 
dont!
  >

  I was referring to the review discussion on my prev patch here:

  https://lore.kernel.org/qemu-devel/d191d2jfar7l.2eh4s445m4...@gmail.com/

Sure, I'm, not sure what this means.

No worries. If you had followed the conversation on the review
link I shared, I had made it clear that we are expecting a patch update
from you and it is included here just to facilitate review of additional
patches on the top.

  Although your patch was included with this series only to facilitate review of
  the additional patches depending on just one of your patch.

Generally you rebase your patch-set over the other and clearly state on the 
cover
letter that this patch-set is dependent upon such and such patch-set. Just 
imagine
if everyone starts to unilaterally pick up patches from each other's patch-set 
it will
create a chaos not only for the feature owners but also for the maintainers.

Please go through the review discussion on the link I shared above. It
was included on the suggestion of one of the maintainers. However, if
you are going to send v9 soon, everyone would be happy to wait.

  I am not sure what is appearing disruptive here. It is a common practive in

  the community that maintainer(s) can pick individual patches from the
  series if it has been vetted by siginificant number of reviewers.

Don’t you think this patch-set is asking for acceptance for a patch already
part of another patch-set which is about to be accepted and is a bigger feature?
Will it cause maintenance overhead at the last moment? Yes, of course!

No, I dont think so.

  However, in this case, since you have mentioned to post next version soon,
  you need not worry about it as that would be the preferred version for both
  of the series.

Yes, but please understand we are working for the benefit of overall community.
Please cooperate here.

Hope I cleared your confusion. We are waiting to see your v9 soon.

  >

  >>
  >>   I am hoping to see your v9 soon and thereafter maintainer(s) may
  choose to
  >>   pick the latest independent patch if needs to be merged earlier.
  >
  >
  > I don’t think you are understanding what problem it is causing. For
  > your small bug fix you are causing significant delays at our end.
  >

  I hope I clarfied above that including your patch here doesnt delay anything.

  Hoping to see your v9 soon!

  Thanks

  Harsh
  >
  > Thanks
  > Salil.
  >>
  >>   Thanks for your work and let's be hopeful it gets merged soon.
  >>
  >>   regards,
  >>   Harsh
  >>
  >>   On 5/16/24 14:00, Salil Mehta wrote:
  >>   > Hi Harsh,
  >>   >
  >>   > Thanks for your interest in the patch-set but taking away patches like
  >>   > this from other series without any discussion can disrupt others work
  >>   > and its acceptance on time. This is because we will have to put lot of
  >>   > effort in rebasing bigger series and then testing overhead comes
  along
  >>   > with it.
  >>   >
  >>   > The patch-set (from where this  patch has been taken) is part of even
  >>   > bigger series and there have been many people and companies toiling
  to
  >>   > fix the bugs collectively in that series and for years.
  >>   >
  >>   > I'm about float the V9 version of the Arch agnostic series which this
  >>   > patch is part of and you can rebase your patch-set from there. I'm
  >>   > hopeful that it will get accepted in this cycle.
  >>   >
  >>   >
  >>   > Many thanks
  >>   > Salil.
  >>   >
  >>   >>   From: Harsh Prateek Bora

Re: [PATCH v2 1/4] accel/kvm: Extract common KVM vCPU {creation, parking} code

2024-05-16 Thread Harsh Prateek Bora

Hi Salil,

On 5/16/24 17:42, Salil Mehta wrote:

Hi Harsh,

  From: Harsh Prateek Bora 
  Sent: Thursday, May 16, 2024 11:15 AM

  Hi Salil,

  Thanks for your email.

  Your patch 1/8 is included here based on review comments on my previous
  patch from one of the maintainers in the community and therefore I had
  kept you in CC to be aware of the desire of having this independent patch to
  get merged earlier even if your other patches in the series may go through
  further reviews.

I really don’t know which discussion are  you pointing at? Please understand
you are fixing a bug and we are pushing a feature which has got large series.
It will break the patch-set  which is about t be merged.

There will be significant overhead of testing on us for the work we have been
carrying forward for large time. This will be disruptive. Please dont!

I was referring to the review discussion on my prev patch here:
https://lore.kernel.org/qemu-devel/d191d2jfar7l.2eh4s445m4...@gmail.com/

Although your patch was included with this series only to facilitate
review of the additional patches depending on just one of your patch.

I am not sure what is appearing disruptive here. It is a common practive
in the community that maintainer(s) can pick individual patches from the
series if it has been vetted by siginificant number of reviewers.

However, in this case, since you have mentioned to post next version
soon, you need not worry about it as that would be the preferred version
for both of the series.

  I am hoping to see your v9 soon and thereafter maintainer(s) may choose to

  pick the latest independent patch if needs to be merged earlier.

I don’t think you are understanding what problem it is causing. For your
small bug fix you are causing significant delays at our end.

I hope I clarfied above that including your patch here doesnt delay
anything. Hoping to see your v9 soon!

Thanks
Harsh

Thanks
Salil.

  Thanks for your work and let's be hopeful it gets merged soon.

  regards,

  Harsh

  On 5/16/24 14:00, Salil Mehta wrote:

  > Hi Harsh,
  >
  > Thanks for your interest in the patch-set but taking away patches like
  > this from other series without any discussion can disrupt others work
  > and its acceptance on time. This is because we will have to put lot of
  > effort in rebasing bigger series and then testing overhead comes along
  > with it.
  >
  > The patch-set (from where this  patch has been taken) is part of even
  > bigger series and there have been many people and companies toiling to
  > fix the bugs collectively in that series and for years.
  >
  > I'm about float the V9 version of the Arch agnostic series which this
  > patch is part of and you can rebase your patch-set from there. I'm
  > hopeful that it will get accepted in this cycle.
  >
  >
  > Many thanks
  > Salil.
  >
  >>   From: Harsh Prateek Bora 
  >>   Sent: Thursday, May 16, 2024 6:32 AM
  >>
  >>   From: Salil Mehta 
  >>
  >>   KVM vCPU creation is done once during the vCPU realization when
  Qemu
  >>   vCPU thread is spawned. This is common to all the architectures as of
  now.
  >>
  >>   Hot-unplug of vCPU results in destruction of the vCPU object in QOM
  but
  >>   the corresponding KVM vCPU object in the Host KVM is not destroyed
  as
  >>   KVM doesn't support vCPU removal. Therefore, its representative KVM
  >>   vCPU object/context in Qemu is parked.
  >>
  >>   Refactor architecture common logic so that some APIs could be reused
  by
  >>   vCPU Hotplug code of some architectures likes ARM, Loongson etc.
  Update
  >>   new/old APIs with trace events instead of DPRINTF. No functional
  change is
  >>   intended here.
  >>
  >>   Signed-off-by: Salil Mehta 
  >>   Reviewed-by: Gavin Shan 
  >>   Tested-by: Vishnu Pajjuri 
  >>   Reviewed-by: Jonathan Cameron 
  >>   Tested-by: Xianglai Li 
  >>   Tested-by: Miguel Luis 
  >>   Reviewed-by: Shaoqin Huang 
  >>   [harshpb: fixed rebase failures in include/sysemu/kvm.h]
  >>   Signed-off-by: Harsh Prateek Bora 
  >>   ---
  >>include/sysemu/kvm.h   | 15 ++
  >>accel/kvm/kvm-all.c| 64 ---
  -
  >>   --
  >>accel/kvm/trace-events |  5 +++-
  >>3 files changed, 68 insertions(+), 16 deletions(-)
  >>
  >>   diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h index
  >>   eaf801bc93..fa3ec74442 100644
  >>   --- a/include/sysemu/kvm.h
  >>   +++ b/include/sysemu/kvm.h
  >>   @@ -434,6 +434,21 @@ void kvm_set_sigmask_len(KVMState *s,
  unsigned
  >>   int sigmask_len);
  >>
  >>int kvm_physical_memory_addr_from_host(KVMState *s, void
  >>   *ram_add

Re: [PATCH v2 1/4] accel/kvm: Extract common KVM vCPU {creation, parking} code

2024-05-16 Thread Harsh Prateek Bora


Hi Salil,

Thanks for your email.
Your patch 1/8 is included here based on review comments on my previous 
patch from one of the maintainers in the community and therefore I had 
kept you in CC to be aware of the desire of having this independent 
patch to get merged earlier even if your other patches in the series may 
go through further reviews.


I am hoping to see your v9 soon and thereafter maintainer(s) may choose 
to pick the latest independent patch if needs to be merged earlier.


Thanks for your work and let's be hopeful it gets merged soon.

regards,
Harsh

On 5/16/24 14:00, Salil Mehta wrote:

Hi Harsh,

Thanks for your interest in the patch-set but taking away patches like
this from other series without any discussion can disrupt others work
and its acceptance on time. This is because we will have to put lot of
effort in rebasing bigger series and then testing overhead comes along
with it.

The patch-set (from where this  patch has been taken) is part of even
bigger series and there have been many people and companies toiling
to fix the bugs collectively in that series and for years.

I'm about float the V9 version of the Arch agnostic series which this
patch is part of and you can rebase your patch-set from there. I'm
hopeful that it will get accepted in this cycle.


Many thanks
Salil.


  From: Harsh Prateek Bora 
  Sent: Thursday, May 16, 2024 6:32 AM
  
  From: Salil Mehta 
  
  KVM vCPU creation is done once during the vCPU realization when Qemu

  vCPU thread is spawned. This is common to all the architectures as of now.
  
  Hot-unplug of vCPU results in destruction of the vCPU object in QOM but

  the corresponding KVM vCPU object in the Host KVM is not destroyed as
  KVM doesn't support vCPU removal. Therefore, its representative KVM
  vCPU object/context in Qemu is parked.
  
  Refactor architecture common logic so that some APIs could be reused by

  vCPU Hotplug code of some architectures likes ARM, Loongson etc. Update
  new/old APIs with trace events instead of DPRINTF. No functional change is
  intended here.
  
  Signed-off-by: Salil Mehta 

  Reviewed-by: Gavin Shan 
  Tested-by: Vishnu Pajjuri 
  Reviewed-by: Jonathan Cameron 
  Tested-by: Xianglai Li 
  Tested-by: Miguel Luis 
  Reviewed-by: Shaoqin Huang 
  [harshpb: fixed rebase failures in include/sysemu/kvm.h]
  Signed-off-by: Harsh Prateek Bora 
  ---
   include/sysemu/kvm.h   | 15 ++
   accel/kvm/kvm-all.c| 64 
  --
   accel/kvm/trace-events |  5 +++-
   3 files changed, 68 insertions(+), 16 deletions(-)
  
  diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h index

  eaf801bc93..fa3ec74442 100644
  --- a/include/sysemu/kvm.h
  +++ b/include/sysemu/kvm.h
  @@ -434,6 +434,21 @@ void kvm_set_sigmask_len(KVMState *s, unsigned
  int sigmask_len);
  
   int kvm_physical_memory_addr_from_host(KVMState *s, void

  *ram_addr,
  hwaddr *phys_addr);
  +/**
  + * kvm_create_vcpu - Gets a parked KVM vCPU or creates a KVM vCPU
  + * @cpu: QOM CPUState object for which KVM vCPU has to be
  fetched/created.
  + *
  + * @returns: 0 when success, errno (<0) when failed.
  + */
  +int kvm_create_vcpu(CPUState *cpu);
  +
  +/**
  + * kvm_park_vcpu - Park QEMU KVM vCPU context
  + * @cpu: QOM CPUState object for which QEMU KVM vCPU context has to
  be parked.
  + *
  + * @returns: none
  + */
  +void kvm_park_vcpu(CPUState *cpu);
  
   #endif /* COMPILING_PER_TARGET */
  
  diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c index

  d7281b93f3..30d42847de 100644
  --- a/accel/kvm/kvm-all.c
  +++ b/accel/kvm/kvm-all.c
  @@ -128,6 +128,7 @@ static QemuMutex kml_slots_lock;  #define
  kvm_slots_unlock()  qemu_mutex_unlock(_slots_lock)
  
   static void kvm_slot_init_dirty_bitmap(KVMSlot *mem);

  +static int kvm_get_vcpu(KVMState *s, unsigned long vcpu_id);
  
   static inline void kvm_resample_fd_remove(int gsi)  { @@ -340,14 +341,53

  @@ err:
   return ret;
   }
  
  +void kvm_park_vcpu(CPUState *cpu)

  +{
  +struct KVMParkedVcpu *vcpu;
  +
  +trace_kvm_park_vcpu(cpu->cpu_index, kvm_arch_vcpu_id(cpu));
  +
  +vcpu = g_malloc0(sizeof(*vcpu));
  +vcpu->vcpu_id = kvm_arch_vcpu_id(cpu);
  +vcpu->kvm_fd = cpu->kvm_fd;
  +QLIST_INSERT_HEAD(_state->kvm_parked_vcpus, vcpu, node); }
  +
  +int kvm_create_vcpu(CPUState *cpu)
  +{
  +unsigned long vcpu_id = kvm_arch_vcpu_id(cpu);
  +KVMState *s = kvm_state;
  +int kvm_fd;
  +
  +trace_kvm_create_vcpu(cpu->cpu_index, kvm_arch_vcpu_id(cpu));
  +
  +/* check if the KVM vCPU already exist but is parked */
  +kvm_fd = kvm_get_vcpu(s, vcpu_id);
  +if (kvm_fd < 0) {
  +/* vCPU not parked: create a new KVM vCPU */
  +kvm_fd = kvm_vm_ioctl(s, KVM_CREATE_VCPU, vcpu_id);
  +if (kvm_fd < 0) {
  +error_report("KVM_CREATE_VCPU IOCTL failed for vCPU %lu",
  v

[PATCH v2 2/4] accel/kvm: Introduce kvm_create_and_park_vcpu() helper

2024-05-15 Thread Harsh Prateek Bora

There are distinct helpers for creating and parking a KVM vCPU.
However, there can be cases where a platform needs to create and
immediately park the vCPU during early stages of vcpu init which
can later be reused when vcpu thread gets initialized. This would
help detect failures with kvm_create_vcpu at an early stage.

Based on api refactoring to create/park vcpus introduced in 1/8 of patch series:
https://lore.kernel.org/qemu-devel/2024031202.12992-2-salil.me...@huawei.com/

Suggested-by: Nicholas Piggin 
Signed-off-by: Harsh Prateek Bora 
---
 include/sysemu/kvm.h |  8 
 accel/kvm/kvm-all.c  | 12 
 2 files changed, 20 insertions(+)

diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index fa3ec74442..221e6bd55b 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -450,6 +450,14 @@ int kvm_create_vcpu(CPUState *cpu);
  */
 void kvm_park_vcpu(CPUState *cpu);
 
+/**
+ * kvm_create_and_park_vcpu - Create and park a KVM vCPU
+ * @cpu: QOM CPUState object for which KVM vCPU has to be created and parked.
+ *
+ * @returns: 0 when success, errno (<0) when failed.
+ */
+int kvm_create_and_park_vcpu(CPUState *cpu);
+
 #endif /* COMPILING_PER_TARGET */
 
 void kvm_cpu_synchronize_state(CPUState *cpu);
diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 30d42847de..3d7e5eaf0b 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -381,6 +381,18 @@ int kvm_create_vcpu(CPUState *cpu)
 return 0;
 }
 
+int kvm_create_and_park_vcpu(CPUState *cpu)
+{
+int ret = 0;
+
+ret = kvm_create_vcpu(cpu);
+if (!ret) {
+kvm_park_vcpu(cpu);
+}
+
+return ret;
+}
+
 static int do_kvm_destroy_vcpu(CPUState *cpu)
 {
 KVMState *s = kvm_state;
-- 
2.39.3

[PATCH v2 3/4] cpu-common.c: export cpu_get_free_index to be reused later

2024-05-15 Thread Harsh Prateek Bora

This helper provides an easy way to identify the next available free cpu
index which can be used for vcpu creation. Until now, this is being
called at a very later stage and there is a need to be able to call it
earlier (for now, with ppc64) hence the need to export.

Suggested-by: Nicholas Piggin 
Signed-off-by: Harsh Prateek Bora 
---
 include/exec/cpu-common.h | 2 ++
 cpu-common.c  | 7 ---
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
index 6d5318895a..0386f1ab29 100644
--- a/include/exec/cpu-common.h
+++ b/include/exec/cpu-common.h
@@ -29,6 +29,8 @@ void cpu_list_lock(void);
 void cpu_list_unlock(void);
 unsigned int cpu_list_generation_id_get(void);
 
+int cpu_get_free_index(void);
+
 void tcg_iommu_init_notifier_list(CPUState *cpu);
 void tcg_iommu_free_notifier_list(CPUState *cpu);
 
diff --git a/cpu-common.c b/cpu-common.c
index ce78273af5..82bd1b432d 100644
--- a/cpu-common.c
+++ b/cpu-common.c
@@ -57,14 +57,12 @@ void cpu_list_unlock(void)
 qemu_mutex_unlock(_cpu_list_lock);
 }
 
-static bool cpu_index_auto_assigned;
 
-static int cpu_get_free_index(void)
+int cpu_get_free_index(void)
 {
 CPUState *some_cpu;
 int max_cpu_index = 0;
 
-cpu_index_auto_assigned = true;
 CPU_FOREACH(some_cpu) {
 if (some_cpu->cpu_index >= max_cpu_index) {
 max_cpu_index = some_cpu->cpu_index + 1;
@@ -83,8 +81,11 @@ unsigned int cpu_list_generation_id_get(void)
 
 void cpu_list_add(CPUState *cpu)
 {
+static bool cpu_index_auto_assigned;
+
 QEMU_LOCK_GUARD(_cpu_list_lock);
 if (cpu->cpu_index == UNASSIGNED_CPU_INDEX) {
+cpu_index_auto_assigned = true;
 cpu->cpu_index = cpu_get_free_index();
 assert(cpu->cpu_index != UNASSIGNED_CPU_INDEX);
 } else {
-- 
2.39.3

[PATCH v2 1/4] accel/kvm: Extract common KVM vCPU {creation, parking} code

2024-05-15 Thread Harsh Prateek Bora

From: Salil Mehta 

KVM vCPU creation is done once during the vCPU realization when Qemu vCPU thread
is spawned. This is common to all the architectures as of now.

Hot-unplug of vCPU results in destruction of the vCPU object in QOM but the
corresponding KVM vCPU object in the Host KVM is not destroyed as KVM doesn't
support vCPU removal. Therefore, its representative KVM vCPU object/context in
Qemu is parked.

Refactor architecture common logic so that some APIs could be reused by vCPU
Hotplug code of some architectures likes ARM, Loongson etc. Update new/old APIs
with trace events instead of DPRINTF. No functional change is intended here.

Signed-off-by: Salil Mehta 
Reviewed-by: Gavin Shan 
Tested-by: Vishnu Pajjuri 
Reviewed-by: Jonathan Cameron 
Tested-by: Xianglai Li 
Tested-by: Miguel Luis 
Reviewed-by: Shaoqin Huang 
[harshpb: fixed rebase failures in include/sysemu/kvm.h]
Signed-off-by: Harsh Prateek Bora 
---
 include/sysemu/kvm.h   | 15 ++
 accel/kvm/kvm-all.c| 64 --
 accel/kvm/trace-events |  5 +++-
 3 files changed, 68 insertions(+), 16 deletions(-)

diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index eaf801bc93..fa3ec74442 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -434,6 +434,21 @@ void kvm_set_sigmask_len(KVMState *s, unsigned int 
sigmask_len);
 
 int kvm_physical_memory_addr_from_host(KVMState *s, void *ram_addr,
hwaddr *phys_addr);
+/**
+ * kvm_create_vcpu - Gets a parked KVM vCPU or creates a KVM vCPU
+ * @cpu: QOM CPUState object for which KVM vCPU has to be fetched/created.
+ *
+ * @returns: 0 when success, errno (<0) when failed.
+ */
+int kvm_create_vcpu(CPUState *cpu);
+
+/**
+ * kvm_park_vcpu - Park QEMU KVM vCPU context
+ * @cpu: QOM CPUState object for which QEMU KVM vCPU context has to be parked.
+ *
+ * @returns: none
+ */
+void kvm_park_vcpu(CPUState *cpu);
 
 #endif /* COMPILING_PER_TARGET */
 
diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index d7281b93f3..30d42847de 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -128,6 +128,7 @@ static QemuMutex kml_slots_lock;
 #define kvm_slots_unlock()  qemu_mutex_unlock(_slots_lock)
 
 static void kvm_slot_init_dirty_bitmap(KVMSlot *mem);
+static int kvm_get_vcpu(KVMState *s, unsigned long vcpu_id);
 
 static inline void kvm_resample_fd_remove(int gsi)
 {
@@ -340,14 +341,53 @@ err:
 return ret;
 }
 
+void kvm_park_vcpu(CPUState *cpu)
+{
+struct KVMParkedVcpu *vcpu;
+
+trace_kvm_park_vcpu(cpu->cpu_index, kvm_arch_vcpu_id(cpu));
+
+vcpu = g_malloc0(sizeof(*vcpu));
+vcpu->vcpu_id = kvm_arch_vcpu_id(cpu);
+vcpu->kvm_fd = cpu->kvm_fd;
+QLIST_INSERT_HEAD(_state->kvm_parked_vcpus, vcpu, node);
+}
+
+int kvm_create_vcpu(CPUState *cpu)
+{
+unsigned long vcpu_id = kvm_arch_vcpu_id(cpu);
+KVMState *s = kvm_state;
+int kvm_fd;
+
+trace_kvm_create_vcpu(cpu->cpu_index, kvm_arch_vcpu_id(cpu));
+
+/* check if the KVM vCPU already exist but is parked */
+kvm_fd = kvm_get_vcpu(s, vcpu_id);
+if (kvm_fd < 0) {
+/* vCPU not parked: create a new KVM vCPU */
+kvm_fd = kvm_vm_ioctl(s, KVM_CREATE_VCPU, vcpu_id);
+if (kvm_fd < 0) {
+error_report("KVM_CREATE_VCPU IOCTL failed for vCPU %lu", vcpu_id);
+return kvm_fd;
+}
+}
+
+cpu->kvm_fd = kvm_fd;
+cpu->kvm_state = s;
+cpu->vcpu_dirty = true;
+cpu->dirty_pages = 0;
+cpu->throttle_us_per_full = 0;
+
+return 0;
+}
+
 static int do_kvm_destroy_vcpu(CPUState *cpu)
 {
 KVMState *s = kvm_state;
 long mmap_size;
-struct KVMParkedVcpu *vcpu = NULL;
 int ret = 0;
 
-trace_kvm_destroy_vcpu();
+trace_kvm_destroy_vcpu(cpu->cpu_index, kvm_arch_vcpu_id(cpu));
 
 ret = kvm_arch_destroy_vcpu(cpu);
 if (ret < 0) {
@@ -373,10 +413,7 @@ static int do_kvm_destroy_vcpu(CPUState *cpu)
 }
 }
 
-vcpu = g_malloc0(sizeof(*vcpu));
-vcpu->vcpu_id = kvm_arch_vcpu_id(cpu);
-vcpu->kvm_fd = cpu->kvm_fd;
-QLIST_INSERT_HEAD(_state->kvm_parked_vcpus, vcpu, node);
+kvm_park_vcpu(cpu);
 err:
 return ret;
 }
@@ -397,6 +434,8 @@ static int kvm_get_vcpu(KVMState *s, unsigned long vcpu_id)
 if (cpu->vcpu_id == vcpu_id) {
 int kvm_fd;
 
+trace_kvm_get_vcpu(vcpu_id);
+
 QLIST_REMOVE(cpu, node);
 kvm_fd = cpu->kvm_fd;
 g_free(cpu);
@@ -404,7 +443,7 @@ static int kvm_get_vcpu(KVMState *s, unsigned long vcpu_id)
 }
 }
 
-return kvm_vm_ioctl(s, KVM_CREATE_VCPU, (void *)vcpu_id);
+return -ENOENT;
 }
 
 int kvm_init_vcpu(CPUState *cpu, Error **errp)
@@ -415,19 +454,14 @@ int kvm_init_vcpu(CPUState *cpu, Error **errp)
 
 trace_kvm_init_vcpu(cpu->cpu_index, kvm_arch_vcpu_id(cpu));
 
-ret = kvm_get_vcpu(s, kvm_arch_vcpu_id(cpu));
+ret =

[PATCH v2 4/4] target/ppc: handle vcpu hotplug failure gracefully

2024-05-15 Thread Harsh Prateek Bora

On ppc64, the PowerVM hypervisor runs with limited memory and a VCPU
creation during hotplug may fail during kvm_ioctl for KVM_CREATE_VCPU,
leading to termination of guest since errp is set to _fatal while
calling kvm_init_vcpu. This unexpected behaviour can be avoided by
pre-creating and parking vcpu on success or return error otherwise.
This enables graceful error delivery for any vcpu hotplug failures while
the guest can keep running.

Based on api refactoring to create/park vcpus introduced in 1/8 of patch series:
https://lore.kernel.org/qemu-devel/2024031202.12992-2-salil.me...@huawei.com/

Tested OK by repeatedly doing a hotplug/unplug of vcpus as below:

 #virsh setvcpus hotplug 40
 #virsh setvcpus hotplug 70
error: internal error: unable to execute QEMU command 'device_add':
kvmppc_cpu_realize: vcpu hotplug failed with -12

Reported-by: Anushree Mathur 
Suggested-by: Shivaprasad G Bhat 
Suggested-by: Vaibhav Jain 
Signed-off by: Harsh Prateek Bora 
Tested-by: Anushree Mathur 
---
 target/ppc/kvm.c | 24 
 1 file changed, 24 insertions(+)

diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
index 63930d4a77..25f0cf0ba8 100644
--- a/target/ppc/kvm.c
+++ b/target/ppc/kvm.c
@@ -48,6 +48,8 @@
 #include "qemu/mmap-alloc.h"
 #include "elf.h"
 #include "sysemu/kvm_int.h"
+#include "sysemu/kvm.h"
+#include "hw/core/accel-cpu.h"
 
 #define PROC_DEVTREE_CPU  "/proc/device-tree/cpus/"
 
@@ -2339,6 +2341,26 @@ static void alter_insns(uint64_t *word, uint64_t flags, 
bool on)
 }
 }
 
+static bool kvmppc_cpu_realize(CPUState *cs, Error **errp)
+{
+int ret;
+
+cs->cpu_index = cpu_get_free_index();
+
+POWERPC_CPU(cs)->vcpu_id = cs->cpu_index;
+
+if (cs->parent_obj.hotplugged) {
+/* create and park to fail gracefully in case vcpu hotplug fails */
+ret = kvm_create_and_park_vcpu(cs);
+if (ret) {
+error_setg(errp, "%s: vcpu hotplug failed with %d",
+ __func__, ret);
+return false;
+}
+}
+return true;
+}
+
 static void kvmppc_host_cpu_class_init(ObjectClass *oc, void *data)
 {
 PowerPCCPUClass *pcc = POWERPC_CPU_CLASS(oc);
@@ -2958,4 +2980,6 @@ void kvmppc_set_reg_tb_offset(PowerPCCPU *cpu, int64_t 
tb_offset)
 
 void kvm_arch_accel_class_init(ObjectClass *oc)
 {
+AccelClass *ac = ACCEL_CLASS(oc);
+ac->cpu_common_realize = kvmppc_cpu_realize;
 }
-- 
2.39.3

[PATCH v2 0/4] target/ppc: vcpu hotplug failure handling fixes

2024-05-15 Thread Harsh Prateek Bora

On ppc64, the PowerVM hypervisor runs with limited memory and a VCPU
creation during hotplug may fail during kvm_ioctl for KVM_CREATE_VCPU,
leading to termination of guest since errp is set to _fatal while
calling kvm_init_vcpu. This unexpected behaviour can be avoided by
pre-creating and parking vcpu on success or return error otherwise.
This enables graceful error delivery for any vcpu hotplug failures while
the guest can keep running.

This series adds another helper to create and park vcpu (based on below
patch by Salil), exports cpu_get_free_index to be reused later and adds
ppc arch specfic handling for vcpu hotplug failure.

Based on api refactoring to create/park vcpus introduced in 1/8 of patch series:
https://lore.kernel.org/qemu-devel/2024031202.12992-2-salil.me...@huawei.com/

PS: I have just included patch 1 of above series after fixing a rebase
failure along with this series for better review purpose only.

Changelog:

v2: Addressed review comments from Nick
v1: Initial patch

Harsh Prateek Bora (3):
  accel/kvm: Introduce kvm_create_and_park_vcpu() helper
  cpu-common.c: export cpu_get_free_index to be reused later
  target/ppc: handle vcpu hotplug failure gracefully

Salil Mehta (1):
  accel/kvm: Extract common KVM vCPU {creation, parking} code

 include/exec/cpu-common.h |  2 ++
 include/sysemu/kvm.h  | 23 
 accel/kvm/kvm-all.c   | 76 +++
 cpu-common.c  |  7 ++--
 target/ppc/kvm.c  | 24 +
 accel/kvm/trace-events|  5 ++-
 6 files changed, 118 insertions(+), 19 deletions(-)

-- 
2.39.3

Re: [PATCH] target/ppc: handle vcpu hotplug failure gracefully

2024-05-15 Thread Harsh Prateek Bora


Hi Nick,

On 5/14/24 08:39, Nicholas Piggin wrote:

On Tue Apr 23, 2024 at 4:30 PM AEST, Harsh Prateek Bora wrote:

+ qemu-devel

On 4/23/24 11:40, Harsh Prateek Bora wrote:

On ppc64, the PowerVM hypervisor runs with limited memory and a VCPU
creation during hotplug may fail during kvm_ioctl for KVM_CREATE_VCPU,
leading to termination of guest since errp is set to _fatal while
calling kvm_init_vcpu. This unexpected behaviour can be avoided by
pre-creating vcpu and parking it on success or return error otherwise.
This enables graceful error delivery for any vcpu hotplug failures while
the guest can keep running.


So this puts in on the park list so when kvm_init_vcpu() later runs it
will just take it off the park list instead of issuing another
KVM_CREATE_VCPU ioctl.

And kvm_init_vcpu() runs in the vcpu thread function, which does not
have a good way to indicate failure to the caller.

I'm don't know a lot about this part of qemu but it seems like a good
idea to move fail-able initialisation out of the vcpu thread in that
case. So the general idea seems good to me.



Yeh ..



Based on api refactoring to create/park vcpus introduced in 1/8 of patch series:
https://lore.kernel.org/qemu-devel/2024031202.12992-2-salil.me...@huawei.com/


So from this series AFAIKS you're just using kvm_create / kvm_park
routines? You could easily pull that patch 1 out ahead of that larger
series if progress is slow on it, it's a decent cleanup by itself by
the looks.



Yeh, patch 1 of that series is only we need but the author mentioned on 
the list that he is about to post next version soon.




Tested OK by repeatedly doing a hotplug/unplug of vcpus as below:

   #virsh setvcpus hotplug 40
   #virsh setvcpus hotplug 70
error: internal error: unable to execute QEMU command 'device_add':
kvmppc_cpu_realize: vcpu hotplug failed with -12

Reported-by: Anushree Mathur 
Suggested-by: Shivaprasad G Bhat 
Suggested-by: Vaibhav Jain 
Signed-off by: Harsh Prateek Bora 
---
---
   target/ppc/kvm.c | 42 ++
   1 file changed, 42 insertions(+)

diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
index 8231feb2d4..c887f6dfa0 100644
--- a/target/ppc/kvm.c
+++ b/target/ppc/kvm.c
@@ -48,6 +48,8 @@
   #include "qemu/mmap-alloc.h"
   #include "elf.h"
   #include "sysemu/kvm_int.h"
+#include "sysemu/kvm.h"
+#include "hw/core/accel-cpu.h"
   
   #define PROC_DEVTREE_CPU  "/proc/device-tree/cpus/"
   
@@ -2339,6 +2341,43 @@ static void alter_insns(uint64_t *word, uint64_t flags, bool on)

   }
   }
   
+static int max_cpu_index = 0;

+
+static bool kvmppc_cpu_realize(CPUState *cs, Error **errp)
+{
+int ret;
+
+cs->cpu_index = max_cpu_index++;
+
+POWERPC_CPU(cs)->vcpu_id = cs->cpu_index;


So you're overriding the cpu_get_free_index() allocator here.
And you need to because vcpu_id needs to be assigned before
the KVM create, I guess.



Yes ..


I guess it works. I would add a comment like s390x has.


Not sure which comment you were referring to but with exporting
cpu_get_free_index as suggested later, not sure if we still need any
comment.


+
+if (cs->parent_obj.hotplugged) {


Can _all_ kvm cpu creation go via this path? Why just limit it to
hotplugged?


For the initial bootup, we actually want to abort if the requested vCPUs
cant be allocated so that user can retry until the requested vCPUs are
allocated. For hotplug failure, bringing down entire guest isn't fair,
hence the fix.




+/* create and park to fail gracefully in case vcpu hotplug fails */
+ret = kvm_create_vcpu(cs);
+if (!ret) {
+kvm_park_vcpu(cs);


Seems like a small thing, but I would add a new core kvm function
that creates and parks the vcpu, so the target code doesn't have
to know about the parking internals, just that it needs to be
called.


Make sense, I will add another kvm helper: kvm_create_and_park_vcpu()



Unless I'm missing something, we could get all targets to move their kvm
create to here and remove it removed from kvm_init_vcpu(), that would
just expect it to be on the parked list. But that could be done
incrementally.


Hmm ..




+} else {
+max_cpu_index--;
+error_setg(errp, "%s: vcpu hotplug failed with %d",
+ __func__, ret);
+return false;
+}
+}
+return true;
+}
+
+static void kvmppc_cpu_unrealize(CPUState *cpu)
+{
+if (POWERPC_CPU(cpu)->vcpu_id == (max_cpu_index - 1)) {
+/* only reclaim vcpuid if its the last one assigned
+ * as reclaiming random vcpuid for parked vcpus may lead
+ * to unexpected behaviour due to an existing kernel bug
+ * when drc_index doesnt get reclaimed as expected.
+ */
+max_cpu_index--;
+}


This looks like a fairly lossy allocator. Using cpu_get_free_index()
would be the way to go I think. I would export that and call i

Re: [PATCH] spapr: Migrate ail-mode-3 spapr cap

2024-05-06 Thread Harsh Prateek Bora





On 5/6/24 17:26, Nicholas Piggin wrote:

This cap did not add the migration code when it was introduced. This
results in migration failure when changing the default using the
command line.

Cc: qemu-sta...@nongnu.org
Fixes: ccc5a4c5e10 ("spapr: Add SPAPR_CAP_AIL_MODE_3 for AIL mode 3 support for 
H_SET_MODE hcall")
Signed-off-by: Nicholas Piggin 


Reviewed-by: Harsh Prateek Bora 


---
  include/hw/ppc/spapr.h | 1 +
  hw/ppc/spapr.c | 1 +
  hw/ppc/spapr_caps.c| 1 +
  3 files changed, 3 insertions(+)

diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 4aaf23d28f..f6de3e9972 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -1004,6 +1004,7 @@ extern const VMStateDescription 
vmstate_spapr_cap_large_decr;
  extern const VMStateDescription vmstate_spapr_cap_ccf_assist;
  extern const VMStateDescription vmstate_spapr_cap_fwnmi;
  extern const VMStateDescription vmstate_spapr_cap_rpt_invalidate;
+extern const VMStateDescription vmstate_spapr_cap_ail_mode_3;
  extern const VMStateDescription vmstate_spapr_wdt;
  
  static inline uint8_t spapr_get_cap(SpaprMachineState *spapr, int cap)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index d2d1e310a3..065f58ec93 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -2169,6 +2169,7 @@ static const VMStateDescription vmstate_spapr = {
  _spapr_cap_fwnmi,
  _spapr_fwnmi,
  _spapr_cap_rpt_invalidate,
+_spapr_cap_ail_mode_3,
  _spapr_cap_nested_papr,
  NULL
  }
diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c
index 0a15415a1d..2f74923560 100644
--- a/hw/ppc/spapr_caps.c
+++ b/hw/ppc/spapr_caps.c
@@ -974,6 +974,7 @@ SPAPR_CAP_MIG_STATE(large_decr, 
SPAPR_CAP_LARGE_DECREMENTER);
  SPAPR_CAP_MIG_STATE(ccf_assist, SPAPR_CAP_CCF_ASSIST);
  SPAPR_CAP_MIG_STATE(fwnmi, SPAPR_CAP_FWNMI);
  SPAPR_CAP_MIG_STATE(rpt_invalidate, SPAPR_CAP_RPT_INVALIDATE);
+SPAPR_CAP_MIG_STATE(ail_mode_3, SPAPR_CAP_AIL_MODE_3);
  
  void spapr_caps_init(SpaprMachineState *spapr)

  {

Re: [PATCH V8 1/8] accel/kvm: Extract common KVM vCPU {creation, parking} code

2024-04-23 Thread Harsh Prateek Bora


+ Nick

Hi Salil,
I have posted a patch [1] for ppc which based on this refactoring patch.
I see there were some comments from Vishnu on this patch.
Are we expecting any further updates on this patch before merge?

Thanks
Harsh

[1] 
https://lore.kernel.org/qemu-devel/a0f9b2fc-4c8a-4c37-bc36-26bbaa627...@linux.ibm.com/T/#u


On 3/22/24 13:45, Harsh Prateek Bora wrote:

+ Vaibhav, Shiva

Hi Salil,

I came across your patch while trying to solve a related problem on 
spapr. One query below ..


On 3/12/24 07:29, Salil Mehta via wrote:
KVM vCPU creation is done once during the vCPU realization when Qemu 
vCPU thread

is spawned. This is common to all the architectures as of now.

Hot-unplug of vCPU results in destruction of the vCPU object in QOM 
but the
corresponding KVM vCPU object in the Host KVM is not destroyed as KVM 
doesn't
support vCPU removal. Therefore, its representative KVM vCPU 
object/context in

Qemu is parked.

Refactor architecture common logic so that some APIs could be reused 
by vCPU
Hotplug code of some architectures likes ARM, Loongson etc. Update 
new/old APIs
with trace events instead of DPRINTF. No functional change is intended 
here.


Signed-off-by: Salil Mehta 
Reviewed-by: Gavin Shan 
Tested-by: Vishnu Pajjuri 
Reviewed-by: Jonathan Cameron 
Tested-by: Xianglai Li 
Tested-by: Miguel Luis 
Reviewed-by: Shaoqin Huang 
---
  accel/kvm/kvm-all.c    | 64 --
  accel/kvm/trace-events |  5 +++-
  include/sysemu/kvm.h   | 16 +++
  3 files changed, 69 insertions(+), 16 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index a8cecd040e..3bc3207bda 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -126,6 +126,7 @@ static QemuMutex kml_slots_lock;
  #define kvm_slots_unlock()  qemu_mutex_unlock(_slots_lock)
  static void kvm_slot_init_dirty_bitmap(KVMSlot *mem);
+static int kvm_get_vcpu(KVMState *s, unsigned long vcpu_id);
  static inline void kvm_resample_fd_remove(int gsi)
  {
@@ -314,14 +315,53 @@ err:
  return ret;
  }
+void kvm_park_vcpu(CPUState *cpu)
+{
+    struct KVMParkedVcpu *vcpu;
+
+    trace_kvm_park_vcpu(cpu->cpu_index, kvm_arch_vcpu_id(cpu));
+
+    vcpu = g_malloc0(sizeof(*vcpu));
+    vcpu->vcpu_id = kvm_arch_vcpu_id(cpu);
+    vcpu->kvm_fd = cpu->kvm_fd;
+    QLIST_INSERT_HEAD(_state->kvm_parked_vcpus, vcpu, node);
+}
+
+int kvm_create_vcpu(CPUState *cpu)
+{
+    unsigned long vcpu_id = kvm_arch_vcpu_id(cpu);
+    KVMState *s = kvm_state;
+    int kvm_fd;
+
+    trace_kvm_create_vcpu(cpu->cpu_index, kvm_arch_vcpu_id(cpu));
+
+    /* check if the KVM vCPU already exist but is parked */
+    kvm_fd = kvm_get_vcpu(s, vcpu_id);
+    if (kvm_fd < 0) {
+    /* vCPU not parked: create a new KVM vCPU */
+    kvm_fd = kvm_vm_ioctl(s, KVM_CREATE_VCPU, vcpu_id);
+    if (kvm_fd < 0) {
+    error_report("KVM_CREATE_VCPU IOCTL failed for vCPU %lu", 
vcpu_id);

+    return kvm_fd;
+    }
+    }
+
+    cpu->kvm_fd = kvm_fd;
+    cpu->kvm_state = s;
+    cpu->vcpu_dirty = true;
+    cpu->dirty_pages = 0;
+    cpu->throttle_us_per_full = 0;
+
+    return 0;
+}
+
  static int do_kvm_destroy_vcpu(CPUState *cpu)
  {
  KVMState *s = kvm_state;
  long mmap_size;
-    struct KVMParkedVcpu *vcpu = NULL;
  int ret = 0;
-    trace_kvm_destroy_vcpu();
+    trace_kvm_destroy_vcpu(cpu->cpu_index, kvm_arch_vcpu_id(cpu));
  ret = kvm_arch_destroy_vcpu(cpu);
  if (ret < 0) {
@@ -347,10 +387,7 @@ static int do_kvm_destroy_vcpu(CPUState *cpu)
  }
  }
-    vcpu = g_malloc0(sizeof(*vcpu));
-    vcpu->vcpu_id = kvm_arch_vcpu_id(cpu);
-    vcpu->kvm_fd = cpu->kvm_fd;
-    QLIST_INSERT_HEAD(_state->kvm_parked_vcpus, vcpu, node);
+    kvm_park_vcpu(cpu);
  err:
  return ret;
  }
@@ -371,6 +408,8 @@ static int kvm_get_vcpu(KVMState *s, unsigned long 
vcpu_id)

  if (cpu->vcpu_id == vcpu_id) {
  int kvm_fd;
+    trace_kvm_get_vcpu(vcpu_id);
+
  QLIST_REMOVE(cpu, node);
  kvm_fd = cpu->kvm_fd;
  g_free(cpu);
@@ -378,7 +417,7 @@ static int kvm_get_vcpu(KVMState *s, unsigned long 
vcpu_id)

  }
  }
-    return kvm_vm_ioctl(s, KVM_CREATE_VCPU, (void *)vcpu_id);
+    return -ENOENT;
  }
  int kvm_init_vcpu(CPUState *cpu, Error **errp)
@@ -389,19 +428,14 @@ int kvm_init_vcpu(CPUState *cpu, Error **errp)
  trace_kvm_init_vcpu(cpu->cpu_index, kvm_arch_vcpu_id(cpu));
-    ret = kvm_get_vcpu(s, kvm_arch_vcpu_id(cpu));
+    ret = kvm_create_vcpu(cpu);
  if (ret < 0) {
-    error_setg_errno(errp, -ret, "kvm_init_vcpu: kvm_get_vcpu 
failed (%lu)",

+    error_setg_errno(errp, -ret,
+ "kvm_init_vcpu: kvm_create_vcpu failed (%lu)",
   kvm_arch_vcpu_id(cpu));


If a vcpu hotplug fails due to failure with kvm_create_vcpu ioctl,
current behaviour

Re: [PATCH] target/ppc: handle vcpu hotplug failure gracefully

2024-04-23 Thread Harsh Prateek Bora


+ qemu-devel

On 4/23/24 11:40, Harsh Prateek Bora wrote:

On ppc64, the PowerVM hypervisor runs with limited memory and a VCPU
creation during hotplug may fail during kvm_ioctl for KVM_CREATE_VCPU,
leading to termination of guest since errp is set to _fatal while
calling kvm_init_vcpu. This unexpected behaviour can be avoided by
pre-creating vcpu and parking it on success or return error otherwise.
This enables graceful error delivery for any vcpu hotplug failures while
the guest can keep running.

Based on api refactoring to create/park vcpus introduced in 1/8 of patch series:
https://lore.kernel.org/qemu-devel/2024031202.12992-2-salil.me...@huawei.com/

Tested OK by repeatedly doing a hotplug/unplug of vcpus as below:

  #virsh setvcpus hotplug 40
  #virsh setvcpus hotplug 70
error: internal error: unable to execute QEMU command 'device_add':
kvmppc_cpu_realize: vcpu hotplug failed with -12

Reported-by: Anushree Mathur 
Suggested-by: Shivaprasad G Bhat 
Suggested-by: Vaibhav Jain 
Signed-off by: Harsh Prateek Bora 
---
---
  target/ppc/kvm.c | 42 ++
  1 file changed, 42 insertions(+)

diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
index 8231feb2d4..c887f6dfa0 100644
--- a/target/ppc/kvm.c
+++ b/target/ppc/kvm.c
@@ -48,6 +48,8 @@
  #include "qemu/mmap-alloc.h"
  #include "elf.h"
  #include "sysemu/kvm_int.h"
+#include "sysemu/kvm.h"
+#include "hw/core/accel-cpu.h"
  
  #define PROC_DEVTREE_CPU  "/proc/device-tree/cpus/"
  
@@ -2339,6 +2341,43 @@ static void alter_insns(uint64_t *word, uint64_t flags, bool on)

  }
  }
  
+static int max_cpu_index = 0;

+
+static bool kvmppc_cpu_realize(CPUState *cs, Error **errp)
+{
+int ret;
+
+cs->cpu_index = max_cpu_index++;
+
+POWERPC_CPU(cs)->vcpu_id = cs->cpu_index;
+
+if (cs->parent_obj.hotplugged) {
+/* create and park to fail gracefully in case vcpu hotplug fails */
+ret = kvm_create_vcpu(cs);
+if (!ret) {
+kvm_park_vcpu(cs);
+} else {
+max_cpu_index--;
+error_setg(errp, "%s: vcpu hotplug failed with %d",
+ __func__, ret);
+return false;
+}
+}
+return true;
+}
+
+static void kvmppc_cpu_unrealize(CPUState *cpu)
+{
+if (POWERPC_CPU(cpu)->vcpu_id == (max_cpu_index - 1)) {
+/* only reclaim vcpuid if its the last one assigned
+ * as reclaiming random vcpuid for parked vcpus may lead
+ * to unexpected behaviour due to an existing kernel bug
+ * when drc_index doesnt get reclaimed as expected.
+ */
+max_cpu_index--;
+}
+}
+
  static void kvmppc_host_cpu_class_init(ObjectClass *oc, void *data)
  {
  PowerPCCPUClass *pcc = POWERPC_CPU_CLASS(oc);
@@ -2963,4 +3002,7 @@ bool kvm_arch_cpu_check_are_resettable(void)
  
  void kvm_arch_accel_class_init(ObjectClass *oc)

  {
+AccelClass *ac = ACCEL_CLASS(oc);
+ac->cpu_common_realize = kvmppc_cpu_realize;
+ac->cpu_common_unrealize = kvmppc_cpu_unrealize;
  }

Re: [PATCH 22/24] exec: Remove 'exec/tswap.h' from 'exec/cpu-all.h'

2024-04-18 Thread Harsh Prateek Bora





On 4/19/24 00:55, Philippe Mathieu-Daudé wrote:

"exec/cpu-all.h" doesn't require "exec/tswap.h". Remove it,
including it in the sources when required.

Signed-off-by: Philippe Mathieu-Daudé 
---
  hw/xtensa/bootparam.h   | 1 +
  include/exec/cpu-all.h  | 1 -
  accel/tcg/translator.c  | 1 +
  hw/arm/boot.c   | 1 +
  hw/arm/npcm7xx.c| 1 +
  hw/mips/fuloong2e.c | 1 +
  hw/mips/malta.c | 1 +
  hw/ppc/sam460ex.c   | 1 +
  hw/ppc/spapr.c  | 1 +


For spapr:
Reviewed-by: Harsh Prateek Bora 


  hw/ppc/virtex_ml507.c   | 1 +
  hw/sh4/r2d.c| 1 +
  target/arm/gdbstub.c| 1 +
  target/xtensa/xtensa-semi.c | 1 +
  13 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/hw/xtensa/bootparam.h b/hw/xtensa/bootparam.h
index f57ff850bc..e1d47b503c 100644
--- a/hw/xtensa/bootparam.h
+++ b/hw/xtensa/bootparam.h
@@ -1,6 +1,7 @@
  #ifndef HW_XTENSA_BOOTPARAM_H
  #define HW_XTENSA_BOOTPARAM_H
  
+#include "exec/tswap.h"

  #include "exec/cpu-common.h"
  
  #define BP_TAG_COMMAND_LINE 0x1001  /* command line (0-terminated string)*/

diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
index 554b937ddb..cfbf51822c 100644
--- a/include/exec/cpu-all.h
+++ b/include/exec/cpu-all.h
@@ -21,7 +21,6 @@
  
  #include "exec/cpu-common.h"

  #include "exec/memory.h"
-#include "exec/tswap.h"
  #include "hw/core/cpu.h"
  
  /* some important defines:

diff --git a/accel/tcg/translator.c b/accel/tcg/translator.c
index 6832e55135..85950377d9 100644
--- a/accel/tcg/translator.c
+++ b/accel/tcg/translator.c
@@ -12,6 +12,7 @@
  #include "qemu/error-report.h"
  #include "exec/exec-all.h"
  #include "exec/translator.h"
+#include "exec/tswap.h"
  #include "exec/cpu_ldst.h"
  #include "exec/plugin-gen.h"
  #include "tcg/tcg-op-common.h"
diff --git a/hw/arm/boot.c b/hw/arm/boot.c
index 84ea6a807a..93945a1a15 100644
--- a/hw/arm/boot.c
+++ b/hw/arm/boot.c
@@ -22,6 +22,7 @@
  #include "sysemu/reset.h"
  #include "hw/loader.h"
  #include "elf.h"
+#include "exec/tswap.h"
  #include "sysemu/device_tree.h"
  #include "qemu/config-file.h"
  #include "qemu/option.h"
diff --git a/hw/arm/npcm7xx.c b/hw/arm/npcm7xx.c
index cc68b5d8f1..1ef303415b 100644
--- a/hw/arm/npcm7xx.c
+++ b/hw/arm/npcm7xx.c
@@ -27,6 +27,7 @@
  #include "qemu/units.h"
  #include "sysemu/sysemu.h"
  #include "target/arm/cpu-qom.h"
+#include "exec/tswap.h"
  
  /*

   * This covers the whole MMIO space. We'll use this to catch any MMIO accesses
diff --git a/hw/mips/fuloong2e.c b/hw/mips/fuloong2e.c
index a45aac368c..1d0613a76f 100644
--- a/hw/mips/fuloong2e.c
+++ b/hw/mips/fuloong2e.c
@@ -40,6 +40,7 @@
  #include "sysemu/reset.h"
  #include "sysemu/sysemu.h"
  #include "qemu/error-report.h"
+#include "exec/tswap.h"
  
  #define ENVP_PADDR  0x2000

  #define ENVP_VADDR  cpu_mips_phys_to_kseg0(NULL, ENVP_PADDR)
diff --git a/hw/mips/malta.c b/hw/mips/malta.c
index af74008c82..3dca0f100c 100644
--- a/hw/mips/malta.c
+++ b/hw/mips/malta.c
@@ -56,6 +56,7 @@
  #include "semihosting/semihost.h"
  #include "hw/mips/cps.h"
  #include "hw/qdev-clock.h"
+#include "exec/tswap.h"
  #include "target/mips/internal.h"
  #include "trace.h"
  #include "cpu.h"
diff --git a/hw/ppc/sam460ex.c b/hw/ppc/sam460ex.c
index d42b677898..abc02f0817 100644
--- a/hw/ppc/sam460ex.c
+++ b/hw/ppc/sam460ex.c
@@ -24,6 +24,7 @@
  #include "hw/loader.h"
  #include "elf.h"
  #include "exec/memory.h"
+#include "exec/tswap.h"
  #include "ppc440.h"
  #include "hw/pci-host/ppc4xx.h"
  #include "hw/block/flash.h"
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index e9bc97fee0..b4b1f43983 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -74,6 +74,7 @@
  #include "hw/virtio/virtio-scsi.h"
  #include "hw/virtio/vhost-scsi-common.h"
  
+#include "exec/tswap.h"

  #include "exec/ram_addr.h"
  #include "hw/usb.h"
  #include "qemu/config-file.h"
diff --git a/hw/ppc/virtex_ml507.c b/hw/ppc/virtex_ml507.c
index d02f330650..fd23afebf5 100644
--- a/hw/ppc/virtex_ml507.c
+++ b/hw/ppc/virtex_ml507.c
@@ -38,6 +38,7 @@
  #include "qapi/error.h"
  #include "qemu/error-report.h"
  #include "qemu/option.h"
+#include "exec/tswap.h"
  
  #include "hw/intc/ppc-uic.h"

  #include "hw/ppc/ppc.h"
diff --git a/hw/sh4/r2d.c b/hw/sh4/r2d.c
index e5ac6751bd..5f4420f534 100644
--- a/hw/sh4/r2d.c
+++ b/hw/sh4/r2d.c
@@ -43,6 +43,7 @@
  #include "hw/loader.h&qu

Re: [PATCH v2 03/13] hw/ppc/spapr: Replace sprintf() by snprintf()

2024-04-15 Thread Harsh Prateek Bora





On 4/11/24 15:45, Philippe Mathieu-Daudé wrote:

sprintf() is deprecated on Darwin since macOS 13.0 / XCode 14.1,
resulting in painful developper experience.


s/developper/developer ?



Replace sprintf() by snprintf() in order to avoid:

   hw/ppc/spapr.c:385:5: warning: 'sprintf' is deprecated:
 This function is provided for compatibility reasons only.
 Due to security concerns inherent in the design of sprintf(3),
 it is highly recommended that you use snprintf(3) instead.
 [-Wdeprecated-declarations]
   sprintf(mem_name, "memory@%" HWADDR_PRIx, start);
   ^
   1 warning generated.

Signed-off-by: Philippe Mathieu-Daudé 


With the typo fixed,

Reviewed-by: Harsh Prateek Bora 


---
  hw/ppc/spapr.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index e9bc97fee0..9e97992c79 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -382,7 +382,7 @@ static int spapr_dt_memory_node(SpaprMachineState *spapr, 
void *fdt, int nodeid,
  mem_reg_property[0] = cpu_to_be64(start);
  mem_reg_property[1] = cpu_to_be64(size);
  
-sprintf(mem_name, "memory@%" HWADDR_PRIx, start);

+snprintf(mem_name, sizeof(mem_name), "memory@%" HWADDR_PRIx, start);
  off = fdt_add_subnode(fdt, 0, mem_name);
  _FDT(off);
  _FDT((fdt_setprop_string(fdt, off, "device_type", "memory")));

[PATCH] spapr: nested: use bitwise NOT operator for flags check

2024-03-28 Thread Harsh Prateek Bora

Check for flag bit in H_GUEST_GETSET_STATE_FLAG_GUEST_WIDE need to use
bitwise NOT operator to ensure no other flag bits are set.
Reported by Coverity as CID 1540008, 1540009.

Reported-by: Peter Maydell 
Signed-off by: Harsh Prateek Bora 
---
 hw/ppc/spapr_nested.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/ppc/spapr_nested.c b/hw/ppc/spapr_nested.c
index 936659b4c0..c02785756c 100644
--- a/hw/ppc/spapr_nested.c
+++ b/hw/ppc/spapr_nested.c
@@ -1511,7 +1511,7 @@ static target_ulong h_guest_getset_state(PowerPCCPU *cpu,
 if (flags & H_GUEST_GETSET_STATE_FLAG_GUEST_WIDE) {
 gsr.flags |= GUEST_STATE_REQUEST_GUEST_WIDE;
 }
-if (flags & !H_GUEST_GETSET_STATE_FLAG_GUEST_WIDE) {
+if (flags & ~H_GUEST_GETSET_STATE_FLAG_GUEST_WIDE) {
 return H_PARAMETER; /* flag not supported yet */
 }
 
-- 
2.39.3

Re: [PULL 35/38] spapr: nested: Introduce H_GUEST_[GET|SET]_STATE hcalls.

2024-03-28 Thread Harsh Prateek Bora





On 3/28/24 20:55, Peter Maydell wrote:

On Wed, 27 Mar 2024 at 05:41, Harsh Prateek Bora  wrote:




On 3/26/24 21:32, Peter Maydell wrote:

On Tue, 12 Mar 2024 at 17:11, Nicholas Piggin  wrote:


From: Harsh Prateek Bora 

Introduce the nested PAPR hcalls:
  - H_GUEST_GET_STATE which is used to get state of a nested guest or
a guest VCPU. The value field for each element in the request is
destination to be updated to reflect current state on success.
  - H_GUEST_SET_STATE which is used to modify the state of a guest or
a guest VCPU. On success, guest (or its VCPU) state shall be
updated as per the value field for the requested element(s).

Reviewed-by: Nicholas Piggin 
Signed-off-by: Michael Neuling 
Signed-off-by: Harsh Prateek Bora 
Signed-off-by: Nicholas Piggin 


Hi; Coverity points out a problem with this code (CID 1540008, 1540009):




+static target_ulong h_guest_getset_state(PowerPCCPU *cpu,
+ SpaprMachineState *spapr,
+ target_ulong *args,
+ bool set)
+{
+target_ulong flags = args[0];
+target_ulong lpid = args[1];
+target_ulong vcpuid = args[2];
+target_ulong buf = args[3];
+target_ulong buflen = args[4];
+struct guest_state_request gsr;
+SpaprMachineStateNestedGuest *guest;
+
+guest = spapr_get_nested_guest(spapr, lpid);
+if (!guest) {
+return H_P2;
+}
+gsr.buf = buf;
+assert(buflen <= GSB_MAX_BUF_SIZE);
+gsr.len = buflen;
+gsr.flags = 0;
+if (flags & H_GUEST_GETSET_STATE_FLAG_GUEST_WIDE) {


flags is a target_ulong, which means it might only be 32 bits.
But H_GUEST_GETSET_STATE_FLAG_GUEST_WIDE has a bit set in the
upper 32 bits only. So Coverity complains about this condition
being always-zero and the body of the if being dead code.

What was the intention here?


Hi Peter,
Ideally this is intended to be running on a ppc64 where target_ulong
should be uint64_t. I guess same holds true for existing nested-hv code
as well.


Sorry, I'm afraid I misread the Coverity report here;
sorry for the confusion. The 32-vs-64 bits question is a red
herring.

What Coverity is actually pointing out is in this next bit:


+gsr.flags |= GUEST_STATE_REQUEST_GUEST_WIDE;
+}
+if (flags & !H_GUEST_GETSET_STATE_FLAG_GUEST_WIDE) {


The C operator ! is the logical-NOT operator; since
H_GUEST_GETSET_STATE_FLAG_GUEST_WIDE is a non-zero value
that means that !H_GUEST_GETSET_STATE_FLAG_GUEST_WIDE is 0;
so we're testing (flags & 0), which is always false, and this
is the if() body which is dead-code as a result.

Should this be the bitwise-NOT ~  (ie "if any flag other
than this one is set"), or should this be an else clause
to the previous if() (ie "if this flag is not set") ?


Oh, this should have been bitwise-NOT, I shall send a follow-up patch 
for the fix.


regards,
Harsh



+return H_PARAMETER; /* flag not supported yet */
+}
+
+if (set) {
+gsr.flags |= GUEST_STATE_REQUEST_SET;
+}
+return map_and_getset_state(cpu, guest, vcpuid, );
+}




thanks
-- PMM

Re: [PULL 35/38] spapr: nested: Introduce H_GUEST_[GET|SET]_STATE hcalls.

2024-03-26 Thread Harsh Prateek Bora





On 3/26/24 21:32, Peter Maydell wrote:

On Tue, 12 Mar 2024 at 17:11, Nicholas Piggin  wrote:


From: Harsh Prateek Bora 

Introduce the nested PAPR hcalls:
 - H_GUEST_GET_STATE which is used to get state of a nested guest or
   a guest VCPU. The value field for each element in the request is
   destination to be updated to reflect current state on success.
 - H_GUEST_SET_STATE which is used to modify the state of a guest or
   a guest VCPU. On success, guest (or its VCPU) state shall be
   updated as per the value field for the requested element(s).

Reviewed-by: Nicholas Piggin 
Signed-off-by: Michael Neuling 
Signed-off-by: Harsh Prateek Bora 
Signed-off-by: Nicholas Piggin 


Hi; Coverity points out a problem with this code (CID 1540008, 1540009):




+static target_ulong h_guest_getset_state(PowerPCCPU *cpu,
+ SpaprMachineState *spapr,
+ target_ulong *args,
+ bool set)
+{
+target_ulong flags = args[0];
+target_ulong lpid = args[1];
+target_ulong vcpuid = args[2];
+target_ulong buf = args[3];
+target_ulong buflen = args[4];
+struct guest_state_request gsr;
+SpaprMachineStateNestedGuest *guest;
+
+guest = spapr_get_nested_guest(spapr, lpid);
+if (!guest) {
+return H_P2;
+}
+gsr.buf = buf;
+assert(buflen <= GSB_MAX_BUF_SIZE);
+gsr.len = buflen;
+gsr.flags = 0;
+if (flags & H_GUEST_GETSET_STATE_FLAG_GUEST_WIDE) {


flags is a target_ulong, which means it might only be 32 bits.
But H_GUEST_GETSET_STATE_FLAG_GUEST_WIDE has a bit set in the
upper 32 bits only. So Coverity complains about this condition
being always-zero and the body of the if being dead code.

What was the intention here?


Hi Peter,
Ideally this is intended to be running on a ppc64 where target_ulong
should be uint64_t. I guess same holds true for existing nested-hv code
as well.

Hi Nick,
Do you think keeping both nested APIs (i.e. entire spapr_nested.c)
within #ifdef TARGET_PPC64 would be a better choice here?

regards,
Harsh




+gsr.flags |= GUEST_STATE_REQUEST_GUEST_WIDE;
+}
+if (flags & !H_GUEST_GETSET_STATE_FLAG_GUEST_WIDE) {
+return H_PARAMETER; /* flag not supported yet */
+}
+
+if (set) {
+gsr.flags |= GUEST_STATE_REQUEST_SET;
+}
+return map_and_getset_state(cpu, guest, vcpuid, );
+}


thanks
-- PMM

Re: [PATCH for-9.1 v5 1/3] hw: Add compat machines for 9.1

2024-03-26 Thread Harsh Prateek Bora





On 3/25/24 19:44, Paolo Bonzini wrote:

Add 9.1 machine types for arm/i440fx/m68k/q35/s390x/spapr.

Cc: Cornelia Huck 
Cc: Thomas Huth 
Cc: Harsh Prateek Bora 
Cc: Gavin Shan 
Signed-off-by: Paolo Bonzini 
---
  include/hw/boards.h|  3 +++
  include/hw/i386/pc.h   |  3 +++
  hw/arm/virt.c  | 11 +--
  hw/core/machine.c  |  3 +++
  hw/i386/pc.c   |  3 +++
  hw/i386/pc_piix.c  | 17 ++---
  hw/i386/pc_q35.c   | 14 --
  hw/m68k/virt.c | 11 +--
  hw/ppc/spapr.c | 17 ++---


For spapr:
Reviewed-by: Harsh Prateek Bora 


  hw/s390x/s390-virtio-ccw.c | 14 +-
  10 files changed, 83 insertions(+), 13 deletions(-)

diff --git a/include/hw/boards.h b/include/hw/boards.h
index 8b8f6d5c00d..50e0cf4278e 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -425,6 +425,9 @@ struct MachineState {
  } \
  type_init(machine_initfn##_register_types)
  
+extern GlobalProperty hw_compat_9_0[];

+extern const size_t hw_compat_9_0_len;
+
  extern GlobalProperty hw_compat_8_2[];
  extern const size_t hw_compat_8_2_len;
  
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h

index 27a68071d77..349f79df086 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -198,6 +198,9 @@ void pc_system_parse_ovmf_flash(uint8_t *flash_ptr, size_t 
flash_size);
  /* sgx.c */
  void pc_machine_init_sgx_epc(PCMachineState *pcms);
  
+extern GlobalProperty pc_compat_9_0[];

+extern const size_t pc_compat_9_0_len;
+
  extern GlobalProperty pc_compat_8_2[];
  extern const size_t pc_compat_8_2_len;
  
diff --git a/hw/arm/virt.c b/hw/arm/virt.c

index a9a913aeadb..c9119ef3847 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -3223,10 +3223,17 @@ static void machvirt_machine_init(void)
  }
  type_init(machvirt_machine_init);
  
-static void virt_machine_9_0_options(MachineClass *mc)

+static void virt_machine_9_1_options(MachineClass *mc)
  {
  }
-DEFINE_VIRT_MACHINE_AS_LATEST(9, 0)
+DEFINE_VIRT_MACHINE_AS_LATEST(9, 1)
+
+static void virt_machine_9_0_options(MachineClass *mc)
+{
+virt_machine_9_1_options(mc);
+compat_props_add(mc->compat_props, hw_compat_9_0, hw_compat_9_0_len);
+}
+DEFINE_VIRT_MACHINE(9, 0)
  
  static void virt_machine_8_2_options(MachineClass *mc)

  {
diff --git a/hw/core/machine.c b/hw/core/machine.c
index 37ede0e7d4f..a92bec23147 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -33,6 +33,9 @@
  #include "hw/virtio/virtio-iommu.h"
  #include "audio/audio.h"
  
+GlobalProperty hw_compat_9_0[] = {};

+const size_t hw_compat_9_0_len = G_N_ELEMENTS(hw_compat_9_0);
+
  GlobalProperty hw_compat_8_2[] = {
  { "migration", "zero-page-detection", "legacy"},
  { TYPE_VIRTIO_IOMMU_PCI, "granule", "4k" },
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index e80f02bef41..461fcaa1b48 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -78,6 +78,9 @@
  { "qemu64-" TYPE_X86_CPU, "model-id", "QEMU Virtual CPU version " v, },\
  { "athlon-" TYPE_X86_CPU, "model-id", "QEMU Virtual CPU version " v, },
  
+GlobalProperty pc_compat_9_0[] = {};

+const size_t pc_compat_9_0_len = G_N_ELEMENTS(pc_compat_9_0);
+
  GlobalProperty pc_compat_8_2[] = {};
  const size_t pc_compat_8_2_len = G_N_ELEMENTS(pc_compat_8_2);
  
diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c

index 18ba0766092..8850c49c66a 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -513,13 +513,26 @@ static void pc_i440fx_machine_options(MachineClass *m)
   "Use a different south bridge than 
PIIX3");
  }
  
-static void pc_i440fx_9_0_machine_options(MachineClass *m)

+static void pc_i440fx_9_1_machine_options(MachineClass *m)
  {
  pc_i440fx_machine_options(m);
  m->alias = "pc";
  m->is_default = true;
  }
  
+DEFINE_I440FX_MACHINE(v9_1, "pc-i440fx-9.1", NULL,

+  pc_i440fx_9_1_machine_options);
+
+static void pc_i440fx_9_0_machine_options(MachineClass *m)
+{
+pc_i440fx_9_1_machine_options(m);
+m->alias = NULL;
+m->is_default = false;
+
+compat_props_add(m->compat_props, hw_compat_9_0, hw_compat_9_0_len);
+compat_props_add(m->compat_props, pc_compat_9_0, pc_compat_9_0_len);
+}
+
  DEFINE_I440FX_MACHINE(v9_0, "pc-i440fx-9.0", NULL,
pc_i440fx_9_0_machine_options);
  
@@ -528,8 +541,6 @@ static void pc_i440fx_8_2_machine_options(MachineClass *m)

  PCMachineClass *pcmc = PC_MACHINE_CLASS(m);
  
  pc_i440fx_9_0_machine_options(m);

-m->alias = NULL;
-m->is_default = false;
  
  compat_props_add(m->compat_props, hw_compat_8_2, hw_compat_8_2_len);

  compat_props_add(m->compat_props, pc_compat_8_2, pc_compat_8_2_len);
di

Re: [PATCH] hw/ppc/spapr: Include missing 'sysemu/tcg.h' header

2024-03-25 Thread Harsh Prateek Bora





On 3/22/24 21:54, Philippe Mathieu-Daudé wrote:

"sysemu/tcg.h" declares tcg_enabled(), and is implicitly included.
Include it explicitly to avoid the following error when refactoring
headers:

   hw/ppc/spapr.c:2612:9: error: call to undeclared function 'tcg_enabled'; ISO 
C99 and later do not support implicit function declarations 
[-Wimplicit-function-declaration]
 if (tcg_enabled()) {
 ^

Signed-off-by: Philippe Mathieu-Daudé 


Reviewed-by: Harsh Prateek Bora 


---
  hw/ppc/spapr.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index c417f9dd52..e9bc97fee0 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -35,6 +35,7 @@
  #include "sysemu/sysemu.h"
  #include "sysemu/hostmem.h"
  #include "sysemu/numa.h"
+#include "sysemu/tcg.h"
  #include "sysemu/qtest.h"
  #include "sysemu/reset.h"
  #include "sysemu/runstate.h"

Re: [PATCH] target/ppc: Do not clear MSR[ME] on MCE interrupts to supervisor

2024-03-22 Thread Harsh Prateek Bora





On 3/21/24 11:24, Nicholas Piggin wrote:

Hardware clears the MSR[ME] bit when delivering a machine check
interrupt, so that is what QEMU does.

The spapr environment runs in supervisor mode though, and receives
machine check interrupts after they are processed by the hypervisor,
and MSR[ME] must always be enabled in supervisor mode (otherwise it
could checkstop the system). So MSR[ME] must not be cleared when
delivering machine checks to the supervisor.

The fix to prevent supervisor mode from modifying MSR[ME] also
prevented it from re-enabling the incorrectly cleared MSR[ME] bit
when returning from handling the interrupt. Before that fix, the
problem was not very noticable with well-behaved code. So the
Fixes tag is not strictly correct, but practically they go together.

Found by kvm-unit-tests machine check tests (not yet upstream).

Fixes: 678b6f1af75ef ("target/ppc: Prevent supervisor from modifying MSR[ME]")
Signed-off-by: Nicholas Piggin 


Reviewed-by: Harsh Prateek Bora 


---
  target/ppc/excp_helper.c | 5 +++--
  1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
index 80f584f933..674c05a2ce 100644
--- a/target/ppc/excp_helper.c
+++ b/target/ppc/excp_helper.c
@@ -1345,9 +1345,10 @@ static void powerpc_excp_books(PowerPCCPU *cpu, int excp)
   * clear (e.g., see FWNMI in PAPR).
   */
  new_msr |= (target_ulong)MSR_HVB;
+
+/* HV machine check exceptions don't have ME set */
+new_msr &= ~((target_ulong)1 << MSR_ME);
  }
-/* machine check exceptions don't have ME set */
-new_msr &= ~((target_ulong)1 << MSR_ME);
  
  msr |= env->error_code;

  break;

Re: [PATCH V8 1/8] accel/kvm: Extract common KVM vCPU {creation, parking} code

2024-03-22 Thread Harsh Prateek Bora


+ Vaibhav, Shiva

Hi Salil,

I came across your patch while trying to solve a related problem on 
spapr. One query below ..


On 3/12/24 07:29, Salil Mehta via wrote:

KVM vCPU creation is done once during the vCPU realization when Qemu vCPU thread
is spawned. This is common to all the architectures as of now.

Hot-unplug of vCPU results in destruction of the vCPU object in QOM but the
corresponding KVM vCPU object in the Host KVM is not destroyed as KVM doesn't
support vCPU removal. Therefore, its representative KVM vCPU object/context in
Qemu is parked.

Refactor architecture common logic so that some APIs could be reused by vCPU
Hotplug code of some architectures likes ARM, Loongson etc. Update new/old APIs
with trace events instead of DPRINTF. No functional change is intended here.

Signed-off-by: Salil Mehta 
Reviewed-by: Gavin Shan 
Tested-by: Vishnu Pajjuri 
Reviewed-by: Jonathan Cameron 
Tested-by: Xianglai Li 
Tested-by: Miguel Luis 
Reviewed-by: Shaoqin Huang 
---
  accel/kvm/kvm-all.c| 64 --
  accel/kvm/trace-events |  5 +++-
  include/sysemu/kvm.h   | 16 +++
  3 files changed, 69 insertions(+), 16 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index a8cecd040e..3bc3207bda 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -126,6 +126,7 @@ static QemuMutex kml_slots_lock;
  #define kvm_slots_unlock()  qemu_mutex_unlock(_slots_lock)
  
  static void kvm_slot_init_dirty_bitmap(KVMSlot *mem);

+static int kvm_get_vcpu(KVMState *s, unsigned long vcpu_id);
  
  static inline void kvm_resample_fd_remove(int gsi)

  {
@@ -314,14 +315,53 @@ err:
  return ret;
  }
  
+void kvm_park_vcpu(CPUState *cpu)

+{
+struct KVMParkedVcpu *vcpu;
+
+trace_kvm_park_vcpu(cpu->cpu_index, kvm_arch_vcpu_id(cpu));
+
+vcpu = g_malloc0(sizeof(*vcpu));
+vcpu->vcpu_id = kvm_arch_vcpu_id(cpu);
+vcpu->kvm_fd = cpu->kvm_fd;
+QLIST_INSERT_HEAD(_state->kvm_parked_vcpus, vcpu, node);
+}
+
+int kvm_create_vcpu(CPUState *cpu)
+{
+unsigned long vcpu_id = kvm_arch_vcpu_id(cpu);
+KVMState *s = kvm_state;
+int kvm_fd;
+
+trace_kvm_create_vcpu(cpu->cpu_index, kvm_arch_vcpu_id(cpu));
+
+/* check if the KVM vCPU already exist but is parked */
+kvm_fd = kvm_get_vcpu(s, vcpu_id);
+if (kvm_fd < 0) {
+/* vCPU not parked: create a new KVM vCPU */
+kvm_fd = kvm_vm_ioctl(s, KVM_CREATE_VCPU, vcpu_id);
+if (kvm_fd < 0) {
+error_report("KVM_CREATE_VCPU IOCTL failed for vCPU %lu", vcpu_id);
+return kvm_fd;
+}
+}
+
+cpu->kvm_fd = kvm_fd;
+cpu->kvm_state = s;
+cpu->vcpu_dirty = true;
+cpu->dirty_pages = 0;
+cpu->throttle_us_per_full = 0;
+
+return 0;
+}
+
  static int do_kvm_destroy_vcpu(CPUState *cpu)
  {
  KVMState *s = kvm_state;
  long mmap_size;
-struct KVMParkedVcpu *vcpu = NULL;
  int ret = 0;
  
-trace_kvm_destroy_vcpu();

+trace_kvm_destroy_vcpu(cpu->cpu_index, kvm_arch_vcpu_id(cpu));
  
  ret = kvm_arch_destroy_vcpu(cpu);

  if (ret < 0) {
@@ -347,10 +387,7 @@ static int do_kvm_destroy_vcpu(CPUState *cpu)
  }
  }
  
-vcpu = g_malloc0(sizeof(*vcpu));

-vcpu->vcpu_id = kvm_arch_vcpu_id(cpu);
-vcpu->kvm_fd = cpu->kvm_fd;
-QLIST_INSERT_HEAD(_state->kvm_parked_vcpus, vcpu, node);
+kvm_park_vcpu(cpu);
  err:
  return ret;
  }
@@ -371,6 +408,8 @@ static int kvm_get_vcpu(KVMState *s, unsigned long vcpu_id)
  if (cpu->vcpu_id == vcpu_id) {
  int kvm_fd;
  
+trace_kvm_get_vcpu(vcpu_id);

+
  QLIST_REMOVE(cpu, node);
  kvm_fd = cpu->kvm_fd;
  g_free(cpu);
@@ -378,7 +417,7 @@ static int kvm_get_vcpu(KVMState *s, unsigned long vcpu_id)
  }
  }
  
-return kvm_vm_ioctl(s, KVM_CREATE_VCPU, (void *)vcpu_id);

+return -ENOENT;
  }
  
  int kvm_init_vcpu(CPUState *cpu, Error **errp)

@@ -389,19 +428,14 @@ int kvm_init_vcpu(CPUState *cpu, Error **errp)
  
  trace_kvm_init_vcpu(cpu->cpu_index, kvm_arch_vcpu_id(cpu));
  
-ret = kvm_get_vcpu(s, kvm_arch_vcpu_id(cpu));

+ret = kvm_create_vcpu(cpu);
  if (ret < 0) {
-error_setg_errno(errp, -ret, "kvm_init_vcpu: kvm_get_vcpu failed 
(%lu)",
+error_setg_errno(errp, -ret,
+ "kvm_init_vcpu: kvm_create_vcpu failed (%lu)",
   kvm_arch_vcpu_id(cpu));


If a vcpu hotplug fails due to failure with kvm_create_vcpu ioctl,
current behaviour would be to bring down the guest as errp is
_fatal. Any thoughts on how do we ensure that a failure with
kvm_create_vcpu ioctl for hotplugged cpus (only) doesnt bring down the
guest and fail gracefully (by reporting error to user on monitor?)?

regards,
Harsh

  goto err;
  }
  
-cpu->kvm_fd = ret;

-cpu->kvm_state = s;
-cpu->vcpu_dirty = true;
-cpu->dirty_pages = 0;
-cpu->throttle_us_per_full = 0;
-

Re: [PATCH v2 08/10] ppc/pnv: Set POWER9, POWER10 ibm,pa-features bits

2024-03-12 Thread Harsh Prateek Bora





On 3/12/24 18:44, Nicholas Piggin wrote:

Copy the pa-features arrays from spapr, adjusting slightly as
described in comments.

Signed-off-by: Nicholas Piggin 


Although future re-org is expected per discussion on v1, but for now:

Reviewed-by: Harsh Prateek Bora 


---
  hw/ppc/pnv.c   | 67 --
  hw/ppc/spapr.c |  1 +
  2 files changed, 66 insertions(+), 2 deletions(-)

diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
index 52d964f77a..8a502dea90 100644
--- a/hw/ppc/pnv.c
+++ b/hw/ppc/pnv.c
@@ -332,6 +332,35 @@ static void pnv_chip_power8_dt_populate(PnvChip *chip, 
void *fdt)
  }
  }
  
+/*

+ * Same as spapr pa_features_300 except pnv always enables CI largepages bit.
+ */
+static const uint8_t pa_features_300[] = { 66, 0,
+/* 0: MMU|FPU|SLB|RUN|DABR|NX, 1: CILRG|fri[nzpm]|DABRX|SPRG3|SLB0|PP110 */
+/* 2: VPM|DS205|PPR|DS202|DS206, 3: LSD|URG, 5: LE|CFAR|EB|LSQ */
+0xf6, 0x3f, 0xc7, 0xc0, 0x00, 0xf0, /* 0 - 5 */
+/* 6: DS207 */
+0x80, 0x00, 0x00, 0x00, 0x00, 0x00, /* 6 - 11 */
+/* 16: Vector */
+0x00, 0x00, 0x00, 0x00, 0x80, 0x00, /* 12 - 17 */
+/* 18: Vec. Scalar, 20: Vec. XOR, 22: HTM */
+0x80, 0x00, 0x80, 0x00, 0x80, 0x00, /* 18 - 23 */
+/* 24: Ext. Dec, 26: 64 bit ftrs, 28: PM ftrs */
+0x80, 0x00, 0x80, 0x00, 0x80, 0x00, /* 24 - 29 */
+/* 32: LE atomic, 34: EBB + ext EBB */
+0x00, 0x00, 0x80, 0x00, 0xC0, 0x00, /* 30 - 35 */
+/* 40: Radix MMU */
+0x00, 0x00, 0x00, 0x00, 0x80, 0x00, /* 36 - 41 */
+/* 42: PM, 44: PC RA, 46: SC vec'd */
+0x80, 0x00, 0x80, 0x00, 0x80, 0x00, /* 42 - 47 */
+/* 48: SIMD, 50: QP BFP, 52: String */
+0x80, 0x00, 0x80, 0x00, 0x80, 0x00, /* 48 - 53 */
+/* 54: DecFP, 56: DecI, 58: SHA */
+0x80, 0x00, 0x80, 0x00, 0x80, 0x00, /* 54 - 59 */
+/* 60: NM atomic, 62: RNG */
+0x80, 0x00, 0x80, 0x00, 0x00, 0x00, /* 60 - 65 */
+};
+
  static void pnv_chip_power9_dt_populate(PnvChip *chip, void *fdt)
  {
  static const char compat[] = "ibm,power9-xscom\0ibm,xscom";
@@ -349,7 +378,7 @@ static void pnv_chip_power9_dt_populate(PnvChip *chip, void 
*fdt)
  offset = pnv_dt_core(chip, pnv_core, fdt);
  
  _FDT((fdt_setprop(fdt, offset, "ibm,pa-features",

-   pa_features_207, sizeof(pa_features_207;
+   pa_features_300, sizeof(pa_features_300;
  }
  
  if (chip->ram_size) {

@@ -359,6 +388,40 @@ static void pnv_chip_power9_dt_populate(PnvChip *chip, 
void *fdt)
  pnv_dt_lpc(chip, fdt, 0, PNV9_LPCM_BASE(chip), PNV9_LPCM_SIZE);
  }
  
+/*

+ * Same as spapr pa_features_31 except pnv always enables CI largepages bit,
+ * always disables copy/paste.
+ */
+static const uint8_t pa_features_31[] = { 74, 0,
+/* 0: MMU|FPU|SLB|RUN|DABR|NX, 1: CILRG|fri[nzpm]|DABRX|SPRG3|SLB0|PP110 */
+/* 2: VPM|DS205|PPR|DS202|DS206, 3: LSD|URG, 5: LE|CFAR|EB|LSQ */
+0xf6, 0x3f, 0xc7, 0xc0, 0x00, 0xf0, /* 0 - 5 */
+/* 6: DS207 */
+0x80, 0x00, 0x00, 0x00, 0x00, 0x00, /* 6 - 11 */
+/* 16: Vector */
+0x00, 0x00, 0x00, 0x00, 0x80, 0x00, /* 12 - 17 */
+/* 18: Vec. Scalar, 20: Vec. XOR */
+0x80, 0x00, 0x80, 0x00, 0x00, 0x00, /* 18 - 23 */
+/* 24: Ext. Dec, 26: 64 bit ftrs, 28: PM ftrs */
+0x80, 0x00, 0x80, 0x00, 0x80, 0x00, /* 24 - 29 */
+/* 32: LE atomic, 34: EBB + ext EBB */
+0x00, 0x00, 0x80, 0x00, 0xC0, 0x00, /* 30 - 35 */
+/* 40: Radix MMU */
+0x00, 0x00, 0x00, 0x00, 0x80, 0x00, /* 36 - 41 */
+/* 42: PM, 44: PC RA, 46: SC vec'd */
+0x80, 0x00, 0x80, 0x00, 0x80, 0x00, /* 42 - 47 */
+/* 48: SIMD, 50: QP BFP, 52: String */
+0x80, 0x00, 0x80, 0x00, 0x80, 0x00, /* 48 - 53 */
+/* 54: DecFP, 56: DecI, 58: SHA */
+0x80, 0x00, 0x80, 0x00, 0x80, 0x00, /* 54 - 59 */
+/* 60: NM atomic, 62: RNG */
+0x80, 0x00, 0x80, 0x00, 0x00, 0x00, /* 60 - 65 */
+/* 68: DEXCR[SBHE|IBRTPDUS|SRAPD|NPHIE|PHIE] */
+0x00, 0x00, 0xce, 0x00, 0x00, 0x00, /* 66 - 71 */
+/* 72: [P]HASHST/[P]HASHCHK */
+0x80, 0x00, /* 72 - 73 */
+};
+
  static void pnv_chip_power10_dt_populate(PnvChip *chip, void *fdt)
  {
  static const char compat[] = "ibm,power10-xscom\0ibm,xscom";
@@ -376,7 +439,7 @@ static void pnv_chip_power10_dt_populate(PnvChip *chip, 
void *fdt)
  offset = pnv_dt_core(chip, pnv_core, fdt);
  
  _FDT((fdt_setprop(fdt, offset, "ibm,pa-features",

-   pa_features_207, sizeof(pa_features_207;
+   pa_features_31, sizeof(pa_features_31;
  }
  
  if (chip->ram_size) {

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index a684e0d9dc..abd484023a 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -243,6 +243,7 @@ static void spapr_dt_pa_features(SpaprMachineState *spapr,
   * so there isn't much need for it anyway.
   */
  
+/* These should be kept in sync with pn

Re: [PATCH v2 07/10] ppc/pnv: Permit ibm, pa-features set per machine variant

2024-03-12 Thread Harsh Prateek Bora





On 3/12/24 18:44, Nicholas Piggin wrote:

This allows different pa-features for powernv8/9/10.

Signed-off-by: Nicholas Piggin 


Reviewed-by: Harsh Prateek Bora 


---
  hw/ppc/pnv.c | 41 +
  1 file changed, 29 insertions(+), 12 deletions(-)

diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
index aa9786e970..52d964f77a 100644
--- a/hw/ppc/pnv.c
+++ b/hw/ppc/pnv.c
@@ -133,7 +133,7 @@ static int get_cpus_node(void *fdt)
   * device tree, used in XSCOM to address cores and in interrupt
   * servers.
   */
-static void pnv_dt_core(PnvChip *chip, PnvCore *pc, void *fdt)
+static int pnv_dt_core(PnvChip *chip, PnvCore *pc, void *fdt)
  {
  PowerPCCPU *cpu = pc->threads[0];
  CPUState *cs = CPU(cpu);
@@ -149,11 +149,6 @@ static void pnv_dt_core(PnvChip *chip, PnvCore *pc, void 
*fdt)
  uint32_t cpufreq = 10;
  uint32_t page_sizes_prop[64];
  size_t page_sizes_prop_size;
-const uint8_t pa_features[] = { 24, 0,
-0xf6, 0x3f, 0xc7, 0xc0, 0x00, 0xf0,
-0x80, 0x00, 0x00, 0x00, 0x00, 0x00,
-0x00, 0x00, 0x00, 0x00, 0x80, 0x00,
-0x80, 0x00, 0x80, 0x00, 0x80, 0x00 };
  int offset;
  char *nodename;
  int cpus_offset = get_cpus_node(fdt);
@@ -236,15 +231,14 @@ static void pnv_dt_core(PnvChip *chip, PnvCore *pc, void 
*fdt)
 page_sizes_prop, page_sizes_prop_size)));
  }
  
-_FDT((fdt_setprop(fdt, offset, "ibm,pa-features",

-   pa_features, sizeof(pa_features;
-
  /* Build interrupt servers properties */
  for (i = 0; i < smt_threads; i++) {
  servers_prop[i] = cpu_to_be32(pc->pir + i);
  }
  _FDT((fdt_setprop(fdt, offset, "ibm,ppc-interrupt-server#s",
 servers_prop, sizeof(*servers_prop) * smt_threads)));
+
+return offset;
  }
  
  static void pnv_dt_icp(PnvChip *chip, void *fdt, uint32_t pir,

@@ -299,6 +293,17 @@ PnvChip *pnv_chip_add_phb(PnvChip *chip, PnvPHB *phb)
  return chip;
  }
  
+/*

+ * Same as spapr pa_features_207 except pnv always enables CI largepages bit.
+ * HTM is always enabled because TCG does implement HTM, it's just a
+ * degenerate implementation.
+ */
+static const uint8_t pa_features_207[] = { 24, 0,
+ 0xf6, 0x3f, 0xc7, 0xc0, 0x00, 0xf0,
+ 0x80, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00, 0x80, 0x00,
+ 0x80, 0x00, 0x80, 0x00, 0x80, 0x00 };
+
  static void pnv_chip_power8_dt_populate(PnvChip *chip, void *fdt)
  {
  static const char compat[] = "ibm,power8-xscom\0ibm,xscom";
@@ -311,8 +316,12 @@ static void pnv_chip_power8_dt_populate(PnvChip *chip, 
void *fdt)
  
  for (i = 0; i < chip->nr_cores; i++) {

  PnvCore *pnv_core = chip->cores[i];
+int offset;
+
+offset = pnv_dt_core(chip, pnv_core, fdt);
  
-pnv_dt_core(chip, pnv_core, fdt);

+_FDT((fdt_setprop(fdt, offset, "ibm,pa-features",
+   pa_features_207, sizeof(pa_features_207;
  
  /* Interrupt Control Presenters (ICP). One per core. */

  pnv_dt_icp(chip, fdt, pnv_core->pir, CPU_CORE(pnv_core)->nr_threads);
@@ -335,8 +344,12 @@ static void pnv_chip_power9_dt_populate(PnvChip *chip, 
void *fdt)
  
  for (i = 0; i < chip->nr_cores; i++) {

  PnvCore *pnv_core = chip->cores[i];
+int offset;
  
-pnv_dt_core(chip, pnv_core, fdt);

+offset = pnv_dt_core(chip, pnv_core, fdt);
+
+_FDT((fdt_setprop(fdt, offset, "ibm,pa-features",
+   pa_features_207, sizeof(pa_features_207;
  }
  
  if (chip->ram_size) {

@@ -358,8 +371,12 @@ static void pnv_chip_power10_dt_populate(PnvChip *chip, 
void *fdt)
  
  for (i = 0; i < chip->nr_cores; i++) {

  PnvCore *pnv_core = chip->cores[i];
+int offset;
+
+offset = pnv_dt_core(chip, pnv_core, fdt);
  
-pnv_dt_core(chip, pnv_core, fdt);

+_FDT((fdt_setprop(fdt, offset, "ibm,pa-features",
+   pa_features_207, sizeof(pa_features_207;
  }
  
  if (chip->ram_size) {

Re: [PATCH v5 14/14] spapr: nested: Introduce cap-nested-papr for Nested PAPR API

2024-03-12 Thread Harsh Prateek Bora


Hi Nick,

Updated incremental fix below:

On 3/12/24 18:21, Harsh Prateek Bora wrote:



On 3/12/24 18:17, Harsh Prateek Bora wrote:

Hi Nick,

On 3/12/24 17:41, Harsh Prateek Bora wrote:

Hi Nick,

On 3/12/24 17:21, Nicholas Piggin wrote:

On Fri Mar 8, 2024 at 9:19 PM AEST, Harsh Prateek Bora wrote:

Introduce a SPAPR capability cap-nested-papr which enables nested PAPR
API for nested guests. This new API is to enable support for KVM on 
PowerVM

and the support in Linux kernel has already merged upstream.

Signed-off-by: Michael Neuling 
Signed-off-by: Harsh Prateek Bora 
---
  include/hw/ppc/spapr.h |  6 +++-
  hw/ppc/spapr.c |  2 ++
  hw/ppc/spapr_caps.c    | 62 
++

  hw/ppc/spapr_nested.c  |  8 --
  4 files changed, 74 insertions(+), 4 deletions(-)






+static void cap_nested_papr_apply(SpaprMachineState *spapr,
+    uint8_t val, Error **errp)
+{
+    ERRP_GUARD();
+    PowerPCCPU *cpu = POWERPC_CPU(first_cpu);
+    CPUPPCState *env = >env;
+
+    if (!val) {
+    /* capability disabled by default */
+    return;
+    }
+
+    if (tcg_enabled()) {
+    if (!(env->insns_flags2 & PPC2_ISA300)) {
+    error_setg(errp, "Nested-PAPR only supported on POWER9 
and later");

+    error_append_hint(errp,
+  "Try appending -machine 
cap-nested-papr=off\n");

+    return;
+    }
+    if (spapr_nested_api(spapr) &&
+    spapr_nested_api(spapr) != NESTED_API_PAPR) {
+    error_setg(errp, "Nested-HV APIs are mutually 
exclusive/incompatible");
+    error_append_hint(errp, "Please use either 
cap-nested-hv or "

+    "cap-nested-papr to proceed.\n");
+    return;
+    } else {
+    spapr->nested.api = NESTED_API_PAPR;
+    }
+
+    } else if (kvm_enabled()) {
+    /*
+ * this gets executed in L1 qemu when L2 is launched,
+ * needs kvm-hv support in L1 kernel.
+ */
+    if (!kvmppc_has_cap_nested_kvm_hv()) {
+    error_setg(errp,
+   "KVM implementation does not support 
Nested-HV");

+    } else if (kvmppc_set_cap_nested_kvm_hv(val) < 0) {
+    error_setg(errp, "Error enabling Nested-HV with KVM");
+    }


I'll just disable this on KVM for now. With that changed,

Reviewed-by: Nicholas Piggin 



AFAIK, v2 api also expects this capability to be enabled on L1 kernel.
I guess the reason is the L1 implementation has used the same capab and
extended to be used with v2 api. So, this check is needed in L1 Qemu for
now. We may revisit L1 implementation later to see if a change is
appropriate.


Please ignore above response. I think my observation was based on 
older version of L1 implementation. This doesnt seem to be an issue 
with upstream L1. You may disable the kvm_enabled() path for now. I 
just tested and it works fine.


Here's the incremental fix:



Updated to keep error_setg for kvm_enabled() case:

diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c
index d6d5a6b8df..92d8966d60 100644
--- a/hw/ppc/spapr_caps.c
+++ b/hw/ppc/spapr_caps.c
@@ -527,18 +527,9 @@ static void cap_nested_papr_apply(SpaprMachineState 
*spapr,

 } else {
 spapr->nested.api = NESTED_API_PAPR;
 }
-
 } else if (kvm_enabled()) {
-/*
- * this gets executed in L1 qemu when L2 is launched,
- * needs kvm-hv support in L1 kernel.
- */
-if (!kvmppc_has_cap_nested_kvm_hv()) {
 error_setg(errp,
-   "KVM implementation does not support Nested-HV");
-} else if (kvmppc_set_cap_nested_kvm_hv(val) < 0) {
-error_setg(errp, "Error enabling Nested-HV with KVM");
-}
+   "KVM implementation does not support Nested-PAPR");
 }
 }





regards,
Harsh


regards,
Harsh



regards,
Harsh


  }
  }
@@ -735,6 +787,15 @@ SpaprCapabilityInfo 
capability_table[SPAPR_CAP_NUM] = {

  .type = "bool",
  .apply = cap_nested_kvm_hv_apply,
  },
+    [SPAPR_CAP_NESTED_PAPR] = {
+    .name = "nested-papr",
+    .description = "Allow Nested HV (PAPR API)",
+    .index = SPAPR_CAP_NESTED_PAPR,
+    .get = spapr_cap_get_bool,
+    .set = spapr_cap_set_bool,
+    .type = "bool",
+    .apply = cap_nested_papr_apply,
+    },
  [SPAPR_CAP_LARGE_DECREMENTER] = {
  .name = "large-decr",
  .description = "Allow Large Decrementer",
@@ -919,6 +980,7 @@ SPAPR_CAP_MIG_STATE(sbbc, SPAPR_CAP_SBBC);
  SPAPR_CAP_MIG_STATE(ibs, SPAPR_CAP_IBS);
  SPAPR_CAP_MIG_STATE(hpt_maxpagesize, SPAPR_CAP_HPT_MAXPAGESIZE);
  SPAPR_CAP_MIG_STATE(nested_kvm_hv, SPAPR_CAP_NE

Re: [PATCH v5 14/14] spapr: nested: Introduce cap-nested-papr for Nested PAPR API

2024-03-12 Thread Harsh Prateek Bora





On 3/12/24 18:17, Harsh Prateek Bora wrote:

Hi Nick,

On 3/12/24 17:41, Harsh Prateek Bora wrote:

Hi Nick,

On 3/12/24 17:21, Nicholas Piggin wrote:

On Fri Mar 8, 2024 at 9:19 PM AEST, Harsh Prateek Bora wrote:

Introduce a SPAPR capability cap-nested-papr which enables nested PAPR
API for nested guests. This new API is to enable support for KVM on 
PowerVM

and the support in Linux kernel has already merged upstream.

Signed-off-by: Michael Neuling 
Signed-off-by: Harsh Prateek Bora 
---
  include/hw/ppc/spapr.h |  6 +++-
  hw/ppc/spapr.c |  2 ++
  hw/ppc/spapr_caps.c    | 62 
++

  hw/ppc/spapr_nested.c  |  8 --
  4 files changed, 74 insertions(+), 4 deletions(-)

diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 6223873641..4aaf23d28f 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -81,8 +81,10 @@ typedef enum {
  #define SPAPR_CAP_RPT_INVALIDATE    0x0B
  /* Support for AIL modes */
  #define SPAPR_CAP_AIL_MODE_3    0x0C
+/* Nested PAPR */
+#define SPAPR_CAP_NESTED_PAPR   0x0D
  /* Num Caps */
-#define SPAPR_CAP_NUM   (SPAPR_CAP_AIL_MODE_3 + 1)
+#define SPAPR_CAP_NUM   (SPAPR_CAP_NESTED_PAPR + 1)
  /*
   * Capability Values
@@ -592,6 +594,7 @@ struct SpaprMachineState {
  #define H_GUEST_CREATE_VCPU  0x474
  #define H_GUEST_GET_STATE    0x478
  #define H_GUEST_SET_STATE    0x47C
+#define H_GUEST_RUN_VCPU 0x480
  #define H_GUEST_DELETE   0x488
  #define MAX_HCALL_OPCODE H_GUEST_DELETE
@@ -996,6 +999,7 @@ extern const VMStateDescription 
vmstate_spapr_cap_sbbc;

  extern const VMStateDescription vmstate_spapr_cap_ibs;
  extern const VMStateDescription vmstate_spapr_cap_hpt_maxpagesize;
  extern const VMStateDescription vmstate_spapr_cap_nested_kvm_hv;
+extern const VMStateDescription vmstate_spapr_cap_nested_papr;
  extern const VMStateDescription vmstate_spapr_cap_large_decr;
  extern const VMStateDescription vmstate_spapr_cap_ccf_assist;
  extern const VMStateDescription vmstate_spapr_cap_fwnmi;
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 54fc01e462..beb23fae8f 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -2121,6 +2121,7 @@ static const VMStateDescription vmstate_spapr = {
  _spapr_cap_fwnmi,
  _spapr_fwnmi,
  _spapr_cap_rpt_invalidate,
+    _spapr_cap_nested_papr,
  NULL
  }
  };
@@ -4687,6 +4688,7 @@ static void 
spapr_machine_class_init(ObjectClass *oc, void *data)

  smc->default_caps.caps[SPAPR_CAP_IBS] = SPAPR_CAP_WORKAROUND;
  smc->default_caps.caps[SPAPR_CAP_HPT_MAXPAGESIZE] = 16; /* 
64kiB */

  smc->default_caps.caps[SPAPR_CAP_NESTED_KVM_HV] = SPAPR_CAP_OFF;
+    smc->default_caps.caps[SPAPR_CAP_NESTED_PAPR] = SPAPR_CAP_OFF;
  smc->default_caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = 
SPAPR_CAP_ON;

  smc->default_caps.caps[SPAPR_CAP_CCF_ASSIST] = SPAPR_CAP_ON;
  smc->default_caps.caps[SPAPR_CAP_FWNMI] = SPAPR_CAP_ON;
diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c
index e889244e52..d6d5a6b8df 100644
--- a/hw/ppc/spapr_caps.c
+++ b/hw/ppc/spapr_caps.c
@@ -487,6 +487,58 @@ static void 
cap_nested_kvm_hv_apply(SpaprMachineState *spapr,
  error_append_hint(errp, "Try appending -machine 
cap-nested-hv=off "

  "or use threads=1 with -smp\n");
  }
+    if (spapr_nested_api(spapr) &&
+    spapr_nested_api(spapr) != NESTED_API_KVM_HV) {
+    error_setg(errp, "Nested-HV APIs are mutually 
exclusive/incompatible");
+    error_append_hint(errp, "Please use either 
cap-nested-hv or "

+    "cap-nested-papr to proceed.\n");
+    return;
+    } else {
+    spapr->nested.api = NESTED_API_KVM_HV;
+    }
+    }
+}
+
+static void cap_nested_papr_apply(SpaprMachineState *spapr,
+    uint8_t val, Error **errp)
+{
+    ERRP_GUARD();
+    PowerPCCPU *cpu = POWERPC_CPU(first_cpu);
+    CPUPPCState *env = >env;
+
+    if (!val) {
+    /* capability disabled by default */
+    return;
+    }
+
+    if (tcg_enabled()) {
+    if (!(env->insns_flags2 & PPC2_ISA300)) {
+    error_setg(errp, "Nested-PAPR only supported on POWER9 
and later");

+    error_append_hint(errp,
+  "Try appending -machine 
cap-nested-papr=off\n");

+    return;
+    }
+    if (spapr_nested_api(spapr) &&
+    spapr_nested_api(spapr) != NESTED_API_PAPR) {
+    error_setg(errp, "Nested-HV APIs are mutually 
exclusive/incompatible");
+    error_append_hint(errp, "Please use either 
cap-nested-hv or "

+    "cap-nested-papr t

Re: [PATCH v5 14/14] spapr: nested: Introduce cap-nested-papr for Nested PAPR API

2024-03-12 Thread Harsh Prateek Bora


Hi Nick,

On 3/12/24 17:41, Harsh Prateek Bora wrote:

Hi Nick,

On 3/12/24 17:21, Nicholas Piggin wrote:

On Fri Mar 8, 2024 at 9:19 PM AEST, Harsh Prateek Bora wrote:

Introduce a SPAPR capability cap-nested-papr which enables nested PAPR
API for nested guests. This new API is to enable support for KVM on 
PowerVM

and the support in Linux kernel has already merged upstream.

Signed-off-by: Michael Neuling 
Signed-off-by: Harsh Prateek Bora 
---
  include/hw/ppc/spapr.h |  6 +++-
  hw/ppc/spapr.c |  2 ++
  hw/ppc/spapr_caps.c    | 62 ++
  hw/ppc/spapr_nested.c  |  8 --
  4 files changed, 74 insertions(+), 4 deletions(-)

diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 6223873641..4aaf23d28f 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -81,8 +81,10 @@ typedef enum {
  #define SPAPR_CAP_RPT_INVALIDATE    0x0B
  /* Support for AIL modes */
  #define SPAPR_CAP_AIL_MODE_3    0x0C
+/* Nested PAPR */
+#define SPAPR_CAP_NESTED_PAPR   0x0D
  /* Num Caps */
-#define SPAPR_CAP_NUM   (SPAPR_CAP_AIL_MODE_3 + 1)
+#define SPAPR_CAP_NUM   (SPAPR_CAP_NESTED_PAPR + 1)
  /*
   * Capability Values
@@ -592,6 +594,7 @@ struct SpaprMachineState {
  #define H_GUEST_CREATE_VCPU  0x474
  #define H_GUEST_GET_STATE    0x478
  #define H_GUEST_SET_STATE    0x47C
+#define H_GUEST_RUN_VCPU 0x480
  #define H_GUEST_DELETE   0x488
  #define MAX_HCALL_OPCODE H_GUEST_DELETE
@@ -996,6 +999,7 @@ extern const VMStateDescription 
vmstate_spapr_cap_sbbc;

  extern const VMStateDescription vmstate_spapr_cap_ibs;
  extern const VMStateDescription vmstate_spapr_cap_hpt_maxpagesize;
  extern const VMStateDescription vmstate_spapr_cap_nested_kvm_hv;
+extern const VMStateDescription vmstate_spapr_cap_nested_papr;
  extern const VMStateDescription vmstate_spapr_cap_large_decr;
  extern const VMStateDescription vmstate_spapr_cap_ccf_assist;
  extern const VMStateDescription vmstate_spapr_cap_fwnmi;
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 54fc01e462..beb23fae8f 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -2121,6 +2121,7 @@ static const VMStateDescription vmstate_spapr = {
  _spapr_cap_fwnmi,
  _spapr_fwnmi,
  _spapr_cap_rpt_invalidate,
+    _spapr_cap_nested_papr,
  NULL
  }
  };
@@ -4687,6 +4688,7 @@ static void 
spapr_machine_class_init(ObjectClass *oc, void *data)

  smc->default_caps.caps[SPAPR_CAP_IBS] = SPAPR_CAP_WORKAROUND;
  smc->default_caps.caps[SPAPR_CAP_HPT_MAXPAGESIZE] = 16; /* 
64kiB */

  smc->default_caps.caps[SPAPR_CAP_NESTED_KVM_HV] = SPAPR_CAP_OFF;
+    smc->default_caps.caps[SPAPR_CAP_NESTED_PAPR] = SPAPR_CAP_OFF;
  smc->default_caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = 
SPAPR_CAP_ON;

  smc->default_caps.caps[SPAPR_CAP_CCF_ASSIST] = SPAPR_CAP_ON;
  smc->default_caps.caps[SPAPR_CAP_FWNMI] = SPAPR_CAP_ON;
diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c
index e889244e52..d6d5a6b8df 100644
--- a/hw/ppc/spapr_caps.c
+++ b/hw/ppc/spapr_caps.c
@@ -487,6 +487,58 @@ static void 
cap_nested_kvm_hv_apply(SpaprMachineState *spapr,
  error_append_hint(errp, "Try appending -machine 
cap-nested-hv=off "

  "or use threads=1 with -smp\n");
  }
+    if (spapr_nested_api(spapr) &&
+    spapr_nested_api(spapr) != NESTED_API_KVM_HV) {
+    error_setg(errp, "Nested-HV APIs are mutually 
exclusive/incompatible");
+    error_append_hint(errp, "Please use either cap-nested-hv 
or "

+    "cap-nested-papr to proceed.\n");
+    return;
+    } else {
+    spapr->nested.api = NESTED_API_KVM_HV;
+    }
+    }
+}
+
+static void cap_nested_papr_apply(SpaprMachineState *spapr,
+    uint8_t val, Error **errp)
+{
+    ERRP_GUARD();
+    PowerPCCPU *cpu = POWERPC_CPU(first_cpu);
+    CPUPPCState *env = >env;
+
+    if (!val) {
+    /* capability disabled by default */
+    return;
+    }
+
+    if (tcg_enabled()) {
+    if (!(env->insns_flags2 & PPC2_ISA300)) {
+    error_setg(errp, "Nested-PAPR only supported on POWER9 
and later");

+    error_append_hint(errp,
+  "Try appending -machine 
cap-nested-papr=off\n");

+    return;
+    }
+    if (spapr_nested_api(spapr) &&
+    spapr_nested_api(spapr) != NESTED_API_PAPR) {
+    error_setg(errp, "Nested-HV APIs are mutually 
exclusive/incompatible");
+    error_append_hint(errp, "Please use either cap-nested-hv 
or "

+    "cap-nested-papr to proceed.\n");
+    return;
+    } else {
+

Re: [PATCH v5 14/14] spapr: nested: Introduce cap-nested-papr for Nested PAPR API

2024-03-12 Thread Harsh Prateek Bora


Hi Nick,

On 3/12/24 17:21, Nicholas Piggin wrote:

On Fri Mar 8, 2024 at 9:19 PM AEST, Harsh Prateek Bora wrote:

Introduce a SPAPR capability cap-nested-papr which enables nested PAPR
API for nested guests. This new API is to enable support for KVM on PowerVM
and the support in Linux kernel has already merged upstream.

Signed-off-by: Michael Neuling 
Signed-off-by: Harsh Prateek Bora 
---
  include/hw/ppc/spapr.h |  6 +++-
  hw/ppc/spapr.c |  2 ++
  hw/ppc/spapr_caps.c| 62 ++
  hw/ppc/spapr_nested.c  |  8 --
  4 files changed, 74 insertions(+), 4 deletions(-)

diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 6223873641..4aaf23d28f 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -81,8 +81,10 @@ typedef enum {
  #define SPAPR_CAP_RPT_INVALIDATE0x0B
  /* Support for AIL modes */
  #define SPAPR_CAP_AIL_MODE_30x0C
+/* Nested PAPR */
+#define SPAPR_CAP_NESTED_PAPR   0x0D
  /* Num Caps */
-#define SPAPR_CAP_NUM   (SPAPR_CAP_AIL_MODE_3 + 1)
+#define SPAPR_CAP_NUM   (SPAPR_CAP_NESTED_PAPR + 1)
  
  /*

   * Capability Values
@@ -592,6 +594,7 @@ struct SpaprMachineState {
  #define H_GUEST_CREATE_VCPU  0x474
  #define H_GUEST_GET_STATE0x478
  #define H_GUEST_SET_STATE0x47C
+#define H_GUEST_RUN_VCPU 0x480
  #define H_GUEST_DELETE   0x488
  
  #define MAX_HCALL_OPCODE H_GUEST_DELETE

@@ -996,6 +999,7 @@ extern const VMStateDescription vmstate_spapr_cap_sbbc;
  extern const VMStateDescription vmstate_spapr_cap_ibs;
  extern const VMStateDescription vmstate_spapr_cap_hpt_maxpagesize;
  extern const VMStateDescription vmstate_spapr_cap_nested_kvm_hv;
+extern const VMStateDescription vmstate_spapr_cap_nested_papr;
  extern const VMStateDescription vmstate_spapr_cap_large_decr;
  extern const VMStateDescription vmstate_spapr_cap_ccf_assist;
  extern const VMStateDescription vmstate_spapr_cap_fwnmi;
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 54fc01e462..beb23fae8f 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -2121,6 +2121,7 @@ static const VMStateDescription vmstate_spapr = {
  _spapr_cap_fwnmi,
  _spapr_fwnmi,
  _spapr_cap_rpt_invalidate,
+_spapr_cap_nested_papr,
  NULL
  }
  };
@@ -4687,6 +4688,7 @@ static void spapr_machine_class_init(ObjectClass *oc, 
void *data)
  smc->default_caps.caps[SPAPR_CAP_IBS] = SPAPR_CAP_WORKAROUND;
  smc->default_caps.caps[SPAPR_CAP_HPT_MAXPAGESIZE] = 16; /* 64kiB */
  smc->default_caps.caps[SPAPR_CAP_NESTED_KVM_HV] = SPAPR_CAP_OFF;
+smc->default_caps.caps[SPAPR_CAP_NESTED_PAPR] = SPAPR_CAP_OFF;
  smc->default_caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = SPAPR_CAP_ON;
  smc->default_caps.caps[SPAPR_CAP_CCF_ASSIST] = SPAPR_CAP_ON;
  smc->default_caps.caps[SPAPR_CAP_FWNMI] = SPAPR_CAP_ON;
diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c
index e889244e52..d6d5a6b8df 100644
--- a/hw/ppc/spapr_caps.c
+++ b/hw/ppc/spapr_caps.c
@@ -487,6 +487,58 @@ static void cap_nested_kvm_hv_apply(SpaprMachineState 
*spapr,
  error_append_hint(errp, "Try appending -machine cap-nested-hv=off 
"
  "or use threads=1 with -smp\n");
  }
+if (spapr_nested_api(spapr) &&
+spapr_nested_api(spapr) != NESTED_API_KVM_HV) {
+error_setg(errp, "Nested-HV APIs are mutually 
exclusive/incompatible");
+error_append_hint(errp, "Please use either cap-nested-hv or "
+"cap-nested-papr to proceed.\n");
+return;
+} else {
+spapr->nested.api = NESTED_API_KVM_HV;
+}
+}
+}
+
+static void cap_nested_papr_apply(SpaprMachineState *spapr,
+uint8_t val, Error **errp)
+{
+ERRP_GUARD();
+PowerPCCPU *cpu = POWERPC_CPU(first_cpu);
+CPUPPCState *env = >env;
+
+if (!val) {
+/* capability disabled by default */
+return;
+}
+
+if (tcg_enabled()) {
+if (!(env->insns_flags2 & PPC2_ISA300)) {
+error_setg(errp, "Nested-PAPR only supported on POWER9 and later");
+error_append_hint(errp,
+  "Try appending -machine cap-nested-papr=off\n");
+return;
+}
+if (spapr_nested_api(spapr) &&
+spapr_nested_api(spapr) != NESTED_API_PAPR) {
+error_setg(errp, "Nested-HV APIs are mutually 
exclusive/incompatible");
+error_append_hint(errp, "Please use either cap-nested-hv or "
+"cap-nested-papr to proceed.\n");
+return;
+} else {
+spapr->nested.api = NESTED_API_PAPR;
+}
+

Re: [PATCH v5 12/14] spapr: nested: Use correct source for parttbl info for nested PAPR API.

2024-03-12 Thread Harsh Prateek Bora





On 3/12/24 17:11, Nicholas Piggin wrote:

On Fri Mar 8, 2024 at 9:19 PM AEST, Harsh Prateek Bora wrote:

For nested PAPR API, we use SpaprMachineStateNestedGuest struct to store
partition table info, use the same in spapr_get_pate_nested() via
helper.

Signed-off-by: Michael Neuling 
Signed-off-by: Harsh Prateek Bora 
---
  include/hw/ppc/spapr_nested.h |  4 
  hw/ppc/spapr.c|  6 --
  hw/ppc/spapr_nested.c | 22 +-
  3 files changed, 29 insertions(+), 3 deletions(-)

diff --git a/include/hw/ppc/spapr_nested.h b/include/hw/ppc/spapr_nested.h
index bd43c6b6ef..152019fe3d 100644
--- a/include/hw/ppc/spapr_nested.h
+++ b/include/hw/ppc/spapr_nested.h
@@ -518,4 +518,8 @@ bool spapr_get_pate_nested_hv(SpaprMachineState *spapr, 
PowerPCCPU *cpu,
target_ulong lpid, ppc_v3_pate_t *entry);
  uint8_t spapr_nested_api(SpaprMachineState *spapr);
  void spapr_nested_gsb_init(void);
+bool spapr_get_pate_nested_papr(SpaprMachineState *spapr, PowerPCCPU *cpu,
+target_ulong lpid, ppc_v3_pate_t *entry);
+SpaprMachineStateNestedGuest *spapr_get_nested_guest(SpaprMachineState *spapr,
+ target_ulong lpid);


Why is this made non-static? Doesn't seem to be needed in later patches
either? Other than that,

Reviewed-by: Nicholas Piggin 



You're right, looks like I missed it in v5, but kindly squash in below
incremental update with this patch, just relocating static helper above
caller?

regards,
Harsh

diff --git a/include/hw/ppc/spapr_nested.h b/include/hw/ppc/spapr_nested.h
index 152019fe3d..3a36ba446b 100644
--- a/include/hw/ppc/spapr_nested.h
+++ b/include/hw/ppc/spapr_nested.h
@@ -520,6 +520,4 @@ uint8_t spapr_nested_api(SpaprMachineState *spapr);
 void spapr_nested_gsb_init(void);
 bool spapr_get_pate_nested_papr(SpaprMachineState *spapr, PowerPCCPU *cpu,
 target_ulong lpid, ppc_v3_pate_t *entry);
-SpaprMachineStateNestedGuest *spapr_get_nested_guest(SpaprMachineState 
*spapr,

- target_ulong lpid);
 #endif /* HW_SPAPR_NESTED_H */
diff --git a/hw/ppc/spapr_nested.c b/hw/ppc/spapr_nested.c
index 8db9dc19e3..df22bd69bd 100644
--- a/hw/ppc/spapr_nested.c
+++ b/hw/ppc/spapr_nested.c
@@ -60,6 +60,15 @@ bool spapr_get_pate_nested_hv(SpaprMachineState 
*spapr, PowerPCCPU *cpu,

 return true;
 }

+static SpaprMachineStateNestedGuest 
*spapr_get_nested_guest(SpaprMachineState *spapr,

+ target_ulong guestid)
+{
+SpaprMachineStateNestedGuest *guest;
+
+guest = g_hash_table_lookup(spapr->nested.guests, 
GINT_TO_POINTER(guestid));

+return guest;
+}
+
 bool spapr_get_pate_nested_papr(SpaprMachineState *spapr, PowerPCCPU *cpu,
 target_ulong lpid, ppc_v3_pate_t *entry)
 {
@@ -549,15 +558,6 @@ static void spapr_exit_nested_hv(PowerPCCPU *cpu, 
int excp)

 address_space_unmap(CPU(cpu)->as, regs, len, len, true);
 }

-SpaprMachineStateNestedGuest *spapr_get_nested_guest(SpaprMachineState 
*spapr,

- target_ulong guestid)
-{
-SpaprMachineStateNestedGuest *guest;
-
-guest = g_hash_table_lookup(spapr->nested.guests, 
GINT_TO_POINTER(guestid));

-return guest;
-}
-
 static bool spapr_nested_vcpu_check(SpaprMachineStateNestedGuest *guest,
 target_ulong vcpuid, bool inoutbuf)
 {




  #endif /* HW_SPAPR_NESTED_H */
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index a08ffe55b6..54fc01e462 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1376,11 +1376,13 @@ static bool spapr_get_pate(PPCVirtualHypervisor *vhyp, 
PowerPCCPU *cpu,
  entry->dw1 = spapr->patb_entry;
  return true;
  } else {
-assert(spapr_nested_api(spapr));
  if (spapr_nested_api(spapr) == NESTED_API_KVM_HV) {
  return spapr_get_pate_nested_hv(spapr, cpu, lpid, entry);
+} else if (spapr_nested_api(spapr) == NESTED_API_PAPR) {
+return spapr_get_pate_nested_papr(spapr, cpu, lpid, entry);
+} else {
+g_assert_not_reached();
  }
-return false;
  }
  }
  
diff --git a/hw/ppc/spapr_nested.c b/hw/ppc/spapr_nested.c

index ca99805ce8..e0b234c786 100644
--- a/hw/ppc/spapr_nested.c
+++ b/hw/ppc/spapr_nested.c
@@ -58,6 +58,21 @@ bool spapr_get_pate_nested_hv(SpaprMachineState *spapr, 
PowerPCCPU *cpu,
  return true;
  }
  
+bool spapr_get_pate_nested_papr(SpaprMachineState *spapr, PowerPCCPU *cpu,

+target_ulong lpid, ppc_v3_pate_t *entry)
+{
+SpaprMachineStateNestedGuest *guest;
+assert(lpid != 0);
+guest = spapr_get_nested_guest(spapr, lpid);
+if (!guest) {
+return false;
+}
+
+entry->dw0 = guest->parttbl[0];
+entry->dw1 = guest->p

Re: [PATCH 09/13] target/ppc: Prevent supervisor from modifying MSR[ME]

2024-03-12 Thread Harsh Prateek Bora





On 3/12/24 00:21, Nicholas Piggin wrote:

Prevent guest state modifying the MSR[ME] bit. Per ISA:

   An attempt to modify MSRME in privileged but non-hypervisor state is


s/MSRME/MSR[ME] ?


   ignored (i.e., the bit is not changed).

Signed-off-by: Nicholas Piggin 
---
  target/ppc/helper_regs.c | 5 +
  1 file changed, 5 insertions(+)

diff --git a/target/ppc/helper_regs.c b/target/ppc/helper_regs.c
index 410b39c231..25258986e3 100644
--- a/target/ppc/helper_regs.c
+++ b/target/ppc/helper_regs.c
@@ -264,6 +264,11 @@ int hreg_store_msr(CPUPPCState *env, target_ulong value, 
int alter_hv)
  value &= ~MSR_HVB;
  value |= env->msr & MSR_HVB;
  }
+/* Attempt to modify MSR[ME] in guest state is ignored */
+if (is_book3s_arch2x(env) && !(env->msr & MSR_HVB)) {
+value &= ~(1 << MSR_ME);
+value |= env->msr & (1 << MSR_ME);
+}


Reviewed-by: Harsh Prateek Bora 


  if ((value ^ env->msr) & (R_MSR_IR_MASK | R_MSR_DR_MASK)) {
  cpu_interrupt_exittb(cs);
  }

Re: [PATCH 10/13] spapr: set MSR[ME] and MSR[FP] on client entry

2024-03-12 Thread Harsh Prateek Bora





On 3/12/24 00:21, Nicholas Piggin wrote:

The initial MSR state for PAPR specifies MSR[ME] and MSR[FP] are set.

Signed-off-by: Nicholas Piggin 


It would be good to mention PAPR section numbers suggesting the same.
Anyways,

Reviewed-by: Harsh Prateek Bora 


---
  hw/ppc/spapr_cpu_core.c | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c
index 50523ead25..f3b01b0801 100644
--- a/hw/ppc/spapr_cpu_core.c
+++ b/hw/ppc/spapr_cpu_core.c
@@ -42,6 +42,8 @@ static void spapr_reset_vcpu(PowerPCCPU *cpu)
   * as 32bit (MSR_SF=0) in "8.2.1. Initial Register Values".
   */
  env->msr &= ~(1ULL << MSR_SF);
+env->msr |= (1ULL << MSR_ME) | (1ULL << MSR_FP);
+
  env->spr[SPR_HIOR] = 0;
  
  lpcr = env->spr[SPR_LPCR];

Re: [PATCH 06/13] ppc/spapr: Add pa-features for POWER10 machines

2024-03-12 Thread Harsh Prateek Bora





On 3/12/24 00:21, Nicholas Piggin wrote:

From: Benjamin Gray 

Add POWER10 pa-features entry.

Notably DEXCR and and [P]HASHST/[P]HASHCHK instruction support is


s/and and/and


advertised. Each DEXCR aspect is allocated a bit in the device tree,
using the 68--71 byte range (inclusive). The functionality of the
[P]HASHST/[P]HASHCHK instructions is separately declared in byte 72,
bit 0 (BE).

Signed-off-by: Benjamin Gray 
[npiggin: reword title and changelog, adjust a few bits]
Signed-off-by: Nicholas Piggin 
---
  hw/ppc/spapr.c | 34 ++
  1 file changed, 34 insertions(+)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 247f920f07..128bfe11a8 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -265,6 +265,36 @@ static void spapr_dt_pa_features(SpaprMachineState *spapr,
  /* 60: NM atomic, 62: RNG */
  0x80, 0x00, 0x80, 0x00, 0x00, 0x00, /* 60 - 65 */
  };
+/* 3.1 removes SAO, HTM support */
+uint8_t pa_features_31[] = { 74, 0,
+/* 0: MMU|FPU|SLB|RUN|DABR|NX, 1: fri[nzpm]|DABRX|SPRG3|SLB0|PP110 */
+/* 2: VPM|DS205|PPR|DS202|DS206, 3: LSD|URG, 5: LE|CFAR|EB|LSQ */
+0xf6, 0x1f, 0xc7, 0xc0, 0x00, 0xf0, /* 0 - 5 */
+/* 6: DS207 */
+0x80, 0x00, 0x00, 0x00, 0x00, 0x00, /* 6 - 11 */
+/* 16: Vector */
+0x00, 0x00, 0x00, 0x00, 0x80, 0x00, /* 12 - 17 */
+/* 18: Vec. Scalar, 20: Vec. XOR */
+0x80, 0x00, 0x80, 0x00, 0x00, 0x00, /* 18 - 23 */
+/* 24: Ext. Dec, 26: 64 bit ftrs, 28: PM ftrs */
+0x80, 0x00, 0x80, 0x00, 0x80, 0x00, /* 24 - 29 */
+/* 32: LE atomic, 34: EBB + ext EBB */
+0x00, 0x00, 0x80, 0x00, 0xC0, 0x00, /* 30 - 35 */
+/* 40: Radix MMU */
+0x00, 0x00, 0x00, 0x00, 0x80, 0x00, /* 36 - 41 */
+/* 42: PM, 44: PC RA, 46: SC vec'd */
+0x80, 0x00, 0x80, 0x00, 0x80, 0x00, /* 42 - 47 */
+/* 48: SIMD, 50: QP BFP, 52: String */
+0x80, 0x00, 0x80, 0x00, 0x80, 0x00, /* 48 - 53 */
+/* 54: DecFP, 56: DecI, 58: SHA */
+0x80, 0x00, 0x80, 0x00, 0x80, 0x00, /* 54 - 59 */
+/* 60: NM atomic, 62: RNG */
+0x80, 0x00, 0x80, 0x00, 0x00, 0x00, /* 60 - 65 */
+/* 68: DEXCR[SBHE|IBRTPDUS|SRAPD|NPHIE|PHIE] */
+0x00, 0x00, 0xce, 0x00, 0x00, 0x00, /* 66 - 71 */
+/* 72: [P]HASHCHK */


Do we want to mention [P]HASHST as well in comment above ?


+0x80, 0x00, /* 72 - 73 */
+};
  uint8_t *pa_features = NULL;
  size_t pa_size;
  


In future, we may want to have helpers returning pointer to the
pa_features array and corresponding size conditionally based on the
required ISA support needed, instead of having local arrays bloat this
routine.

For now, with cosmetic fixes,

Reviewed-by: Harsh Prateek Bora 


@@ -280,6 +310,10 @@ static void spapr_dt_pa_features(SpaprMachineState *spapr,
  pa_features = pa_features_300;
  pa_size = sizeof(pa_features_300);
  }
+if (ppc_check_compat(cpu, CPU_POWERPC_LOGICAL_3_10, 0, cpu->compat_pvr)) {
+pa_features = pa_features_31;
+pa_size = sizeof(pa_features_31);
+}
  if (!pa_features) {
  return;
  }

Re: [PATCH 05/13] ppc/spapr: Adjust ibm,pa-features for POWER9

2024-03-12 Thread Harsh Prateek Bora





On 3/12/24 00:21, Nicholas Piggin wrote:

"MMR" and "SPR SO" are not implemented in POWER9, so clear those bits.
HTM is not set by default, and only later if the cap is set, so remove
the comment that suggests otherwise.

Signed-off-by: Nicholas Piggin 


Reviewed-by: Harsh Prateek Bora 


---
  hw/ppc/spapr.c | 10 +-
  1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 7d7da30f60..247f920f07 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -248,14 +248,14 @@ static void spapr_dt_pa_features(SpaprMachineState *spapr,
  0x80, 0x00, 0x00, 0x00, 0x00, 0x00, /* 6 - 11 */
  /* 16: Vector */
  0x00, 0x00, 0x00, 0x00, 0x80, 0x00, /* 12 - 17 */
-/* 18: Vec. Scalar, 20: Vec. XOR, 22: HTM */
+/* 18: Vec. Scalar, 20: Vec. XOR */
  0x80, 0x00, 0x80, 0x00, 0x00, 0x00, /* 18 - 23 */
  /* 24: Ext. Dec, 26: 64 bit ftrs, 28: PM ftrs */
  0x80, 0x00, 0x80, 0x00, 0x80, 0x00, /* 24 - 29 */
-/* 30: MMR, 32: LE atomic, 34: EBB + ext EBB */
-0x80, 0x00, 0x80, 0x00, 0xC0, 0x00, /* 30 - 35 */
-/* 36: SPR SO, 40: Radix MMU */
-0x80, 0x00, 0x00, 0x00, 0x80, 0x00, /* 36 - 41 */
+/* 32: LE atomic, 34: EBB + ext EBB */
+0x00, 0x00, 0x80, 0x00, 0xC0, 0x00, /* 30 - 35 */
+/* 40: Radix MMU */
+0x00, 0x00, 0x00, 0x00, 0x80, 0x00, /* 36 - 41 */
  /* 42: PM, 44: PC RA, 46: SC vec'd */
  0x80, 0x00, 0x80, 0x00, 0x80, 0x00, /* 42 - 47 */
  /* 48: SIMD, 50: QP BFP, 52: String */

Re: [PATCH 01/13] ppc: Drop support for POWER9 and POWER10 DD1 chips

2024-03-12 Thread Harsh Prateek Bora





On 3/12/24 14:29, Nicholas Piggin wrote:

On Tue Mar 12, 2024 at 2:55 PM AEST, Harsh Prateek Bora wrote:



On 3/12/24 10:20, Harsh Prateek Bora wrote:



On 3/12/24 00:21, Nicholas Piggin wrote:

The POWER9 DD1 and POWER10 DD1 chips are not public and are no longer of
any use in QEMU. Remove them.

Signed-off-by: Nicholas Piggin 
---
   hw/ppc/spapr_cpu_core.c |  2 --
   target/ppc/cpu-models.c |  4 
   target/ppc/cpu_init.c   |  7 ++-
   target/ppc/kvm.c    | 11 ---
   4 files changed, 2 insertions(+), 22 deletions(-)


Do we want to squash in removal of the macro as well?




Actually both, correcting diff:

diff --git a/target/ppc/cpu-models.h b/target/ppc/cpu-models.h
index 0229ef3a9a..7d89b41214 100644
--- a/target/ppc/cpu-models.h
+++ b/target/ppc/cpu-models.h
@@ -348,11 +348,9 @@ enum {
   CPU_POWERPC_POWER8NVL_BASE = 0x004C,
   CPU_POWERPC_POWER8NVL_v10  = 0x004C0100,
   CPU_POWERPC_POWER9_BASE= 0x004E,
-CPU_POWERPC_POWER9_DD1 = 0x004E1100,
   CPU_POWERPC_POWER9_DD20= 0x004E1200,
   CPU_POWERPC_POWER9_DD22= 0x004E1202,
   CPU_POWERPC_POWER10_BASE   = 0x0080,
-CPU_POWERPC_POWER10_DD1= 0x00801100,
   CPU_POWERPC_POWER10_DD20   = 0x00801200,
   CPU_POWERPC_970_v22= 0x00390202,
   CPU_POWERPC_970FX_v10  = 0x00391100,


That would make sense, but we do seem to use this list as somewhat of a
reference or at least historic graveyard too (note all the other CPUs we
no longer support). So I was going to just leave them there.


Oh ok, in that case, it's fine.

regards,
Harsh


Thanks,
Nick

Re: [PATCH] spapr: avoid overhead of finding vhyp class in critical operations

2024-03-12 Thread Harsh Prateek Bora





On 3/12/24 14:18, Nicholas Piggin wrote:

On Tue Mar 12, 2024 at 4:38 PM AEST, Harsh Prateek Bora wrote:

Hi Nick,

One minor comment below:

On 2/24/24 13:03, Nicholas Piggin wrote:

PPC_VIRTUAL_HYPERVISOR_GET_CLASS is used in critical operations like
interrupts and TLB misses and is quite costly. Running the
kvm-unit-tests sieve program with radix MMU enabled thrashes the TCG
TLB and spends a lot of time in TLB and page table walking code. The
test takes 67 seconds to complete with a lot of time being spent in
code related to finding the vhyp class:

 12.01%  [.] g_str_hash
  8.94%  [.] g_hash_table_lookup
  8.06%  [.] object_class_dynamic_cast
  6.21%  [.] address_space_ldq
  4.94%  [.] __strcmp_avx2
  4.28%  [.] tlb_set_page_full
  4.08%  [.] address_space_translate_internal
  3.17%  [.] object_class_dynamic_cast_assert
  2.84%  [.] ppc_radix64_xlate

Keep a pointer to the class and avoid this lookup. This reduces the
execution time to 40 seconds.

Signed-off-by: Nicholas Piggin 
---
This feels a bit ugly, but the performance problem of looking up the
class in fast paths can't be ignored. Is there a "nicer" way to get the
same result?

Thanks,
Nick

   target/ppc/cpu.h   |  3 ++-
   target/ppc/mmu-book3s-v3.h |  4 +---
   hw/ppc/pegasos2.c  |  1 +
   target/ppc/cpu_init.c  |  9 +++--
   target/ppc/excp_helper.c   | 16 
   target/ppc/kvm.c   |  4 +---
   target/ppc/mmu-hash64.c| 16 
   target/ppc/mmu-radix64.c   |  4 +---
   8 files changed, 17 insertions(+), 40 deletions(-)

diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index ec14574d14..eb85d9aa71 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -1437,6 +1437,7 @@ struct ArchCPU {
   int vcpu_id;
   uint32_t compat_pvr;
   PPCVirtualHypervisor *vhyp;
+PPCVirtualHypervisorClass *vhyp_class;
   void *machine_data;
   int32_t node_id; /* NUMA node this CPU belongs to */
   PPCHash64Options *hash64_opts;
@@ -1535,7 +1536,7 @@ DECLARE_OBJ_CHECKERS(PPCVirtualHypervisor, 
PPCVirtualHypervisorClass,
   
   static inline bool vhyp_cpu_in_nested(PowerPCCPU *cpu)

   {
-return PPC_VIRTUAL_HYPERVISOR_GET_CLASS(cpu->vhyp)->cpu_in_nested(cpu);
+return cpu->vhyp_class->cpu_in_nested(cpu);
   }
   #endif /* CONFIG_USER_ONLY */
   
diff --git a/target/ppc/mmu-book3s-v3.h b/target/ppc/mmu-book3s-v3.h

index 674377a19e..f3f7993958 100644
--- a/target/ppc/mmu-book3s-v3.h
+++ b/target/ppc/mmu-book3s-v3.h
@@ -108,9 +108,7 @@ static inline hwaddr ppc_hash64_hpt_mask(PowerPCCPU *cpu)
   uint64_t base;
   
   if (cpu->vhyp) {


All the checks for cpu->vhyp needs to be changed to check for
cpu->vhyp_class now, for all such instances.


It wasn't supposed to, because vhyp != NULL implies vhyp_class != NULL.
It's supposed to be an equivalent transformation just changing the
lookup function.


I agree, but not just it appears a bit odd, my only worry is if a future
change cause vhyp_class to be NULL before the control reaches here, this
check wont really serve the purpose. Anyways, not a mandatory
requirement for now, so I shall leave it to your choice.

regards,
Harsh



Okay to leave it as is?

Thanks,
Nick



With that,

Reviewed-by: Harsh Prateek Bora 



-PPCVirtualHypervisorClass *vhc =
-PPC_VIRTUAL_HYPERVISOR_GET_CLASS(cpu->vhyp);
-return vhc->hpt_mask(cpu->vhyp);
+return cpu->vhyp_class->hpt_mask(cpu->vhyp);
   }
   if (cpu->env.mmu_model == POWERPC_MMU_3_00) {
   ppc_v3_pate_t pate;
diff --git a/hw/ppc/pegasos2.c b/hw/ppc/pegasos2.c
index 04d6decb2b..c22e8b336d 100644
--- a/hw/ppc/pegasos2.c
+++ b/hw/ppc/pegasos2.c
@@ -400,6 +400,7 @@ static void pegasos2_machine_reset(MachineState *machine, 
ShutdownCause reason)
   machine->fdt = fdt;
   
   pm->cpu->vhyp = PPC_VIRTUAL_HYPERVISOR(machine);

+pm->cpu->vhyp_class = PPC_VIRTUAL_HYPERVISOR_GET_CLASS(pm->cpu->vhyp);
   }
   
   enum pegasos2_rtas_tokens {

diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
index 9bccddb350..63d0094024 100644
--- a/target/ppc/cpu_init.c
+++ b/target/ppc/cpu_init.c
@@ -6631,6 +6631,7 @@ void cpu_ppc_set_vhyp(PowerPCCPU *cpu, 
PPCVirtualHypervisor *vhyp)
   CPUPPCState *env = >env;
   
   cpu->vhyp = vhyp;

+cpu->vhyp_class = PPC_VIRTUAL_HYPERVISOR_GET_CLASS(vhyp);
   
   /*

* With a virtual hypervisor mode we never allow the CPU to go
@@ -7224,9 +7225,7 @@ static void ppc_cpu_exec_enter(CPUState *cs)
   PowerPCCPU *cpu = POWERPC_CPU(cs);
   
   if (cpu->vhyp) {

-PPCVirtualHypervisorClass *vhc =
-PPC_VIRTUAL_HYPERVISOR_GET_CLASS(cpu->vhyp);
-vhc->cpu_exec_enter(cpu->vhyp, cpu);
+cpu->vhyp_class->cpu_exec_enter(cpu->vhyp, cpu);
   }
   }
   
@@ -7235,9 +7234,7 @@ static v

Re: [PATCH 04/13] ppc/spapr: Remove copy-paste from pa-features

2024-03-12 Thread Harsh Prateek Bora


Hi Nick,

On 3/12/24 00:21, Nicholas Piggin wrote:

TCG does not support copy/paste instructions. Remove it from
ibm,pa-features. This has never been implemented under TCG or


s/or/nor ?


practically usable under KVM, so it won't be missed.

Signed-off-by: Nicholas Piggin 
---
  hw/ppc/spapr.c | 8 ++--
  1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 5099f12cc6..7d7da30f60 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -254,8 +254,8 @@ static void spapr_dt_pa_features(SpaprMachineState *spapr,
  0x80, 0x00, 0x80, 0x00, 0x80, 0x00, /* 24 - 29 */
  /* 30: MMR, 32: LE atomic, 34: EBB + ext EBB */
  0x80, 0x00, 0x80, 0x00, 0xC0, 0x00, /* 30 - 35 */
-/* 36: SPR SO, 38: Copy/Paste, 40: Radix MMU */
-0x80, 0x00, 0x80, 0x00, 0x80, 0x00, /* 36 - 41 */
+/* 36: SPR SO, 40: Radix MMU */
+0x80, 0x00, 0x00, 0x00, 0x80, 0x00, /* 36 - 41 */
  /* 42: PM, 44: PC RA, 46: SC vec'd */
  0x80, 0x00, 0x80, 0x00, 0x80, 0x00, /* 42 - 47 */
  /* 48: SIMD, 50: QP BFP, 52: String */
@@ -288,6 +288,10 @@ static void spapr_dt_pa_features(SpaprMachineState *spapr,
   * SSO (SAO) ordering is supported on KVM and thread=single hosts,
   * but not MTTCG, so disable it. To advertise it, a cap would have
   * to be added, or support implemented for MTTCG.
+ *
+ * Copy/paste is not supported by TCG, so it is not advertised. KVM
+ * can execute them but it has no accelerator drivers which are usable,
+ * so there isn't much need for it anyway.
   */


If doing a re-spin, you may consider comments on prev patch applicable
above as well. Either ways, with prev typo fixed:

Reviewed-by: Harsh Prateek Bora 

  
  if (ppc_hash64_has(cpu, PPC_HASH64_CI_LARGEPAGE)) {

Re: [PATCH 03/13] ppc/spapr|pnv: Remove SAO from pa-features

2024-03-12 Thread Harsh Prateek Bora


Hi Nick,

One cosmetic comment, in case you are doing a re-spin:

On 3/12/24 00:21, Nicholas Piggin wrote:

SAO is a page table attribute that strengthens the memory ordering of
accesses. QEMU with MTTCG does not implement this, so clear it in
ibm,pa-features. This is an obscure feature that has been removed from
POWER10 ISA v3.1, there isn't much concern with removing it.

Signed-off-by: Nicholas Piggin 
---
  hw/ppc/pnv.c   |  2 +-
  hw/ppc/spapr.c | 14 ++
  2 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
index 0b47b92baa..aa9786e970 100644
--- a/hw/ppc/pnv.c
+++ b/hw/ppc/pnv.c
@@ -150,7 +150,7 @@ static void pnv_dt_core(PnvChip *chip, PnvCore *pc, void 
*fdt)
  uint32_t page_sizes_prop[64];
  size_t page_sizes_prop_size;
  const uint8_t pa_features[] = { 24, 0,
-0xf6, 0x3f, 0xc7, 0xc0, 0x80, 0xf0,
+0xf6, 0x3f, 0xc7, 0xc0, 0x00, 0xf0,
  0x80, 0x00, 0x00, 0x00, 0x00, 0x00,
  0x00, 0x00, 0x00, 0x00, 0x80, 0x00,
  0x80, 0x00, 0x80, 0x00, 0x80, 0x00 };
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 55263f0815..5099f12cc6 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -234,16 +234,16 @@ static void spapr_dt_pa_features(SpaprMachineState *spapr,
   void *fdt, int offset)
  {
  uint8_t pa_features_206[] = { 6, 0,
-0xf6, 0x1f, 0xc7, 0x00, 0x80, 0xc0 };
+0xf6, 0x1f, 0xc7, 0x00, 0x00, 0xc0 };
  uint8_t pa_features_207[] = { 24, 0,
-0xf6, 0x1f, 0xc7, 0xc0, 0x80, 0xf0,
+0xf6, 0x1f, 0xc7, 0xc0, 0x00, 0xf0,
  0x80, 0x00, 0x00, 0x00, 0x00, 0x00,
  0x00, 0x00, 0x00, 0x00, 0x80, 0x00,
  0x80, 0x00, 0x80, 0x00, 0x00, 0x00 };
  uint8_t pa_features_300[] = { 66, 0,
  /* 0: MMU|FPU|SLB|RUN|DABR|NX, 1: fri[nzpm]|DABRX|SPRG3|SLB0|PP110 */
-/* 2: VPM|DS205|PPR|DS202|DS206, 3: LSD|URG, SSO, 5: LE|CFAR|EB|LSQ */
-0xf6, 0x1f, 0xc7, 0xc0, 0x80, 0xf0, /* 0 - 5 */
+/* 2: VPM|DS205|PPR|DS202|DS206, 3: LSD|URG, 5: LE|CFAR|EB|LSQ */


Do we want to mention in comments SSO (disabled), also ..


+0xf6, 0x1f, 0xc7, 0xc0, 0x00, 0xf0, /* 0 - 5 */
  /* 6: DS207 */
  0x80, 0x00, 0x00, 0x00, 0x00, 0x00, /* 6 - 11 */
  /* 16: Vector */
@@ -284,6 +284,12 @@ static void spapr_dt_pa_features(SpaprMachineState *spapr,
  return;
  }
  
+/*

+ * SSO (SAO) ordering is supported on KVM and thread=single hosts,
+ * but not MTTCG, so disable it. To advertise it, a cap would have
+ * to be added, or support implemented for MTTCG.
+ */
+


This comment could go in the beginning where we are actually disabling it.

Otherwise,

Reviewed-by: Harsh Prateek Bora 



  if (ppc_hash64_has(cpu, PPC_HASH64_CI_LARGEPAGE)) {
  /*
   * Note: we keep CI large pages off by default because a 64K capable

Re: [PATCH 02/13] target/ppc: POWER10 does not have transactional memory

2024-03-12 Thread Harsh Prateek Bora


Hi Nick,

One query/comment below:

On 3/12/24 00:21, Nicholas Piggin wrote:

POWER10 hardware implements a degenerate transactional memory facility
in POWER8/9 PCR compatibility modes to permit migration from older
CPUs, but POWER10 / ISA v3.1 mode does not support it so the CPU model
should not support it.

Signed-off-by: Nicholas Piggin 
---
  target/ppc/cpu_init.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
index 572cbdf25f..d7e84a2f40 100644
--- a/target/ppc/cpu_init.c
+++ b/target/ppc/cpu_init.c
@@ -6573,7 +6573,7 @@ POWERPC_FAMILY(POWER10)(ObjectClass *oc, void *data)
  PPC2_FP_TST_ISA206 | PPC2_BCTAR_ISA207 |
  PPC2_LSQ_ISA207 | PPC2_ALTIVEC_207 |
  PPC2_ISA205 | PPC2_ISA207S | PPC2_FP_CVT_S64 |
-PPC2_TM | PPC2_ISA300 | PPC2_PRCNTL | PPC2_ISA310 |
+PPC2_ISA300 | PPC2_PRCNTL | PPC2_ISA310 |
  PPC2_MEM_LWSYNC | PPC2_BCDA_ISA206;
  pcc->msr_mask = (1ull << MSR_SF) |
  (1ull << MSR_HV) |
@@ -6617,7 +6617,7 @@ POWERPC_FAMILY(POWER10)(ObjectClass *oc, void *data)
  pcc->flags = POWERPC_FLAG_VRE | POWERPC_FLAG_SE |
   POWERPC_FLAG_BE | POWERPC_FLAG_PMM |
   POWERPC_FLAG_BUS_CLK | POWERPC_FLAG_CFAR |
- POWERPC_FLAG_VSX | POWERPC_FLAG_TM | POWERPC_FLAG_SCV;
+ POWERPC_FLAG_VSX | POWERPC_FLAG_SCV;
  pcc->l1_dcache_size = 0x8000;
  pcc->l1_icache_size = 0x8000;
  }


Shouldn't we also have below change included with this:

diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
index aac095e5fd..faefc0420e 100644
--- a/target/ppc/cpu_init.c
+++ b/target/ppc/cpu_init.c
@@ -6641,7 +6641,6 @@ POWERPC_FAMILY(POWER10)(ObjectClass *oc, void *data)
 PPC2_MEM_LWSYNC | PPC2_BCDA_ISA206 | PPC2_ATTN;
 pcc->msr_mask = (1ull << MSR_SF) |
 (1ull << MSR_HV) |
-(1ull << MSR_TM) |
 (1ull << MSR_VR) |
 (1ull << MSR_VSX) |
     (1ull << MSR_EE) |

Otherwise,
Reviewed-by: Harsh Prateek Bora

Re: [PATCH] spapr: avoid overhead of finding vhyp class in critical operations

2024-03-12 Thread Harsh Prateek Bora


Hi Nick,

One minor comment below:

On 2/24/24 13:03, Nicholas Piggin wrote:

PPC_VIRTUAL_HYPERVISOR_GET_CLASS is used in critical operations like
interrupts and TLB misses and is quite costly. Running the
kvm-unit-tests sieve program with radix MMU enabled thrashes the TCG
TLB and spends a lot of time in TLB and page table walking code. The
test takes 67 seconds to complete with a lot of time being spent in
code related to finding the vhyp class:

12.01%  [.] g_str_hash
 8.94%  [.] g_hash_table_lookup
 8.06%  [.] object_class_dynamic_cast
 6.21%  [.] address_space_ldq
 4.94%  [.] __strcmp_avx2
 4.28%  [.] tlb_set_page_full
 4.08%  [.] address_space_translate_internal
 3.17%  [.] object_class_dynamic_cast_assert
 2.84%  [.] ppc_radix64_xlate

Keep a pointer to the class and avoid this lookup. This reduces the
execution time to 40 seconds.

Signed-off-by: Nicholas Piggin 
---
This feels a bit ugly, but the performance problem of looking up the
class in fast paths can't be ignored. Is there a "nicer" way to get the
same result?

Thanks,
Nick

  target/ppc/cpu.h   |  3 ++-
  target/ppc/mmu-book3s-v3.h |  4 +---
  hw/ppc/pegasos2.c  |  1 +
  target/ppc/cpu_init.c  |  9 +++--
  target/ppc/excp_helper.c   | 16 
  target/ppc/kvm.c   |  4 +---
  target/ppc/mmu-hash64.c| 16 
  target/ppc/mmu-radix64.c   |  4 +---
  8 files changed, 17 insertions(+), 40 deletions(-)

diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index ec14574d14..eb85d9aa71 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -1437,6 +1437,7 @@ struct ArchCPU {
  int vcpu_id;
  uint32_t compat_pvr;
  PPCVirtualHypervisor *vhyp;
+PPCVirtualHypervisorClass *vhyp_class;
  void *machine_data;
  int32_t node_id; /* NUMA node this CPU belongs to */
  PPCHash64Options *hash64_opts;
@@ -1535,7 +1536,7 @@ DECLARE_OBJ_CHECKERS(PPCVirtualHypervisor, 
PPCVirtualHypervisorClass,
  
  static inline bool vhyp_cpu_in_nested(PowerPCCPU *cpu)

  {
-return PPC_VIRTUAL_HYPERVISOR_GET_CLASS(cpu->vhyp)->cpu_in_nested(cpu);
+return cpu->vhyp_class->cpu_in_nested(cpu);
  }
  #endif /* CONFIG_USER_ONLY */
  
diff --git a/target/ppc/mmu-book3s-v3.h b/target/ppc/mmu-book3s-v3.h

index 674377a19e..f3f7993958 100644
--- a/target/ppc/mmu-book3s-v3.h
+++ b/target/ppc/mmu-book3s-v3.h
@@ -108,9 +108,7 @@ static inline hwaddr ppc_hash64_hpt_mask(PowerPCCPU *cpu)
  uint64_t base;
  
  if (cpu->vhyp) {


All the checks for cpu->vhyp needs to be changed to check for 
cpu->vhyp_class now, for all such instances.


With that,

Reviewed-by: Harsh Prateek Bora 



-PPCVirtualHypervisorClass *vhc =
-PPC_VIRTUAL_HYPERVISOR_GET_CLASS(cpu->vhyp);
-return vhc->hpt_mask(cpu->vhyp);
+return cpu->vhyp_class->hpt_mask(cpu->vhyp);
  }
  if (cpu->env.mmu_model == POWERPC_MMU_3_00) {
  ppc_v3_pate_t pate;
diff --git a/hw/ppc/pegasos2.c b/hw/ppc/pegasos2.c
index 04d6decb2b..c22e8b336d 100644
--- a/hw/ppc/pegasos2.c
+++ b/hw/ppc/pegasos2.c
@@ -400,6 +400,7 @@ static void pegasos2_machine_reset(MachineState *machine, 
ShutdownCause reason)
  machine->fdt = fdt;
  
  pm->cpu->vhyp = PPC_VIRTUAL_HYPERVISOR(machine);

+pm->cpu->vhyp_class = PPC_VIRTUAL_HYPERVISOR_GET_CLASS(pm->cpu->vhyp);
  }
  
  enum pegasos2_rtas_tokens {

diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
index 9bccddb350..63d0094024 100644
--- a/target/ppc/cpu_init.c
+++ b/target/ppc/cpu_init.c
@@ -6631,6 +6631,7 @@ void cpu_ppc_set_vhyp(PowerPCCPU *cpu, 
PPCVirtualHypervisor *vhyp)
  CPUPPCState *env = >env;
  
  cpu->vhyp = vhyp;

+cpu->vhyp_class = PPC_VIRTUAL_HYPERVISOR_GET_CLASS(vhyp);
  
  /*

   * With a virtual hypervisor mode we never allow the CPU to go
@@ -7224,9 +7225,7 @@ static void ppc_cpu_exec_enter(CPUState *cs)
  PowerPCCPU *cpu = POWERPC_CPU(cs);
  
  if (cpu->vhyp) {

-PPCVirtualHypervisorClass *vhc =
-PPC_VIRTUAL_HYPERVISOR_GET_CLASS(cpu->vhyp);
-vhc->cpu_exec_enter(cpu->vhyp, cpu);
+cpu->vhyp_class->cpu_exec_enter(cpu->vhyp, cpu);
  }
  }
  
@@ -7235,9 +7234,7 @@ static void ppc_cpu_exec_exit(CPUState *cs)

  PowerPCCPU *cpu = POWERPC_CPU(cs);
  
  if (cpu->vhyp) {

-PPCVirtualHypervisorClass *vhc =
-PPC_VIRTUAL_HYPERVISOR_GET_CLASS(cpu->vhyp);
-vhc->cpu_exec_exit(cpu->vhyp, cpu);
+cpu->vhyp_class->cpu_exec_exit(cpu->vhyp, cpu);
  }
  }
  #endif /* CONFIG_TCG */
diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
index 98952de267..445350488c 100644
--- a/target/ppc/excp_helper.c
+++ b/target/ppc/excp_helper.c
@@ -840,9 +840,7 @@ static void powerpc_excp_7xx(PowerPCCPU *cpu, int excp)
   * HV mode, we need

Re: [PATCH 01/13] ppc: Drop support for POWER9 and POWER10 DD1 chips

2024-03-11 Thread Harsh Prateek Bora





On 3/12/24 10:20, Harsh Prateek Bora wrote:



On 3/12/24 00:21, Nicholas Piggin wrote:

The POWER9 DD1 and POWER10 DD1 chips are not public and are no longer of
any use in QEMU. Remove them.

Signed-off-by: Nicholas Piggin 
---
  hw/ppc/spapr_cpu_core.c |  2 --
  target/ppc/cpu-models.c |  4 
  target/ppc/cpu_init.c   |  7 ++-
  target/ppc/kvm.c    | 11 ---
  4 files changed, 2 insertions(+), 22 deletions(-)


Do we want to squash in removal of the macro as well?




Actually both, correcting diff:

diff --git a/target/ppc/cpu-models.h b/target/ppc/cpu-models.h
index 0229ef3a9a..7d89b41214 100644
--- a/target/ppc/cpu-models.h
+++ b/target/ppc/cpu-models.h
@@ -348,11 +348,9 @@ enum {
 CPU_POWERPC_POWER8NVL_BASE = 0x004C,
 CPU_POWERPC_POWER8NVL_v10  = 0x004C0100,
 CPU_POWERPC_POWER9_BASE= 0x004E,
-CPU_POWERPC_POWER9_DD1 = 0x004E1100,
 CPU_POWERPC_POWER9_DD20= 0x004E1200,
 CPU_POWERPC_POWER9_DD22= 0x004E1202,
 CPU_POWERPC_POWER10_BASE   = 0x0080,
-CPU_POWERPC_POWER10_DD1= 0x00801100,
 CPU_POWERPC_POWER10_DD20   = 0x00801200,
 CPU_POWERPC_970_v22= 0x00390202,
 CPU_POWERPC_970FX_v10  = 0x00391100,



With that,

Reviewed-by: Harsh Prateek Bora 

regards,
Harsh



diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c
index 40b7c52f7f..50523ead25 100644
--- a/hw/ppc/spapr_cpu_core.c
+++ b/hw/ppc/spapr_cpu_core.c
@@ -394,10 +394,8 @@ static const TypeInfo spapr_cpu_core_type_infos[] 
= {

  DEFINE_SPAPR_CPU_CORE_TYPE("power8_v2.0"),
  DEFINE_SPAPR_CPU_CORE_TYPE("power8e_v2.1"),
  DEFINE_SPAPR_CPU_CORE_TYPE("power8nvl_v1.0"),
-    DEFINE_SPAPR_CPU_CORE_TYPE("power9_v1.0"),
  DEFINE_SPAPR_CPU_CORE_TYPE("power9_v2.0"),
  DEFINE_SPAPR_CPU_CORE_TYPE("power9_v2.2"),
-    DEFINE_SPAPR_CPU_CORE_TYPE("power10_v1.0"),
  DEFINE_SPAPR_CPU_CORE_TYPE("power10_v2.0"),
  #ifdef CONFIG_KVM
  DEFINE_SPAPR_CPU_CORE_TYPE("host"),
diff --git a/target/ppc/cpu-models.c b/target/ppc/cpu-models.c
index 36e465b390..f2301b43f7 100644
--- a/target/ppc/cpu-models.c
+++ b/target/ppc/cpu-models.c
@@ -728,14 +728,10 @@
  "POWER8 v2.0")
  POWERPC_DEF("power8nvl_v1.0", CPU_POWERPC_POWER8NVL_v10, 
POWER8,

  "POWER8NVL v1.0")
-    POWERPC_DEF("power9_v1.0",   CPU_POWERPC_POWER9_DD1, 
POWER9,

-    "POWER9 v1.0")
  POWERPC_DEF("power9_v2.0",   CPU_POWERPC_POWER9_DD20,
POWER9,

  "POWER9 v2.0")
  POWERPC_DEF("power9_v2.2",   CPU_POWERPC_POWER9_DD22,
POWER9,

  "POWER9 v2.2")
-    POWERPC_DEF("power10_v1.0",  CPU_POWERPC_POWER10_DD1,
POWER10,

-    "POWER10 v1.0")
  POWERPC_DEF("power10_v2.0",  CPU_POWERPC_POWER10_DD20,   
POWER10,

  "POWER10 v2.0")
  #endif /* defined (TARGET_PPC64) */
diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
index 1d3d1db7c3..572cbdf25f 100644
--- a/target/ppc/cpu_init.c
+++ b/target/ppc/cpu_init.c
@@ -6350,10 +6350,7 @@ static bool 
ppc_pvr_match_power9(PowerPCCPUClass *pcc, uint32_t pvr, bool best)

  return false;
  }
-    if ((pvr & 0x0f00) == 0x100) {
-    /* DD1.x always matches power9_v1.0 */
-    return true;
-    } else if ((pvr & 0x0f00) == 0x200) {
+    if ((pvr & 0x0f00) == 0x200) {
  if ((pvr & 0xf) < 2) {
  /* DD2.0, DD2.1 match power9_v2.0 */
  if ((pcc->pvr & 0xf) == 0) {
@@ -6536,7 +6533,7 @@ static bool 
ppc_pvr_match_power10(PowerPCCPUClass *pcc, uint32_t pvr, bool best)

  }
  if ((pvr & 0x0f00) == (pcc->pvr & 0x0f00)) {
-    /* Major DD version matches to power10_v1.0 and power10_v2.0 */
+    /* Major DD version matches power10_v2.0 */
  return true;
  }
diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
index bcf30a5400..525fbe3892 100644
--- a/target/ppc/kvm.c
+++ b/target/ppc/kvm.c
@@ -2369,17 +2369,6 @@ static void 
kvmppc_host_cpu_class_init(ObjectClass *oc, void *data)

  #if defined(TARGET_PPC64)
  pcc->radix_page_info = kvmppc_get_radix_page_info();
-
-    if ((pcc->pvr & 0xff00) == CPU_POWERPC_POWER9_DD1) {
-    /*
- * POWER9 DD1 has some bugs which make it not really ISA 3.00
- * compliant.  More importantly, advertising ISA 3.00
- * architected mode may prevent guests from activating
- * necessary DD1 workarounds.
- */
-    pcc->pcr_supported &= ~(PCR_COMPAT_3_00 | PCR_COMPAT_2_07
-    | PCR_COMPAT_2_06 | PCR_COMPAT_2_05);
-    }
  #endif /* defined(TARGET_PPC64) */
  }

Re: [PATCH 01/13] ppc: Drop support for POWER9 and POWER10 DD1 chips

2024-03-11 Thread Harsh Prateek Bora





On 3/12/24 00:21, Nicholas Piggin wrote:

The POWER9 DD1 and POWER10 DD1 chips are not public and are no longer of
any use in QEMU. Remove them.

Signed-off-by: Nicholas Piggin 
---
  hw/ppc/spapr_cpu_core.c |  2 --
  target/ppc/cpu-models.c |  4 
  target/ppc/cpu_init.c   |  7 ++-
  target/ppc/kvm.c| 11 ---
  4 files changed, 2 insertions(+), 22 deletions(-)


Do we want to squash in removal of the macro as well?

diff --git a/target/ppc/cpu-models.h b/target/ppc/cpu-models.h
index 0229ef3a9a..a5167873ae 100644
--- a/target/ppc/cpu-models.h
+++ b/target/ppc/cpu-models.h
@@ -348,7 +348,6 @@ enum {
 CPU_POWERPC_POWER8NVL_BASE = 0x004C,
 CPU_POWERPC_POWER8NVL_v10  = 0x004C0100,
 CPU_POWERPC_POWER9_BASE= 0x004E,
-CPU_POWERPC_POWER9_DD1 = 0x004E1100,
 CPU_POWERPC_POWER9_DD20= 0x004E1200,
 CPU_POWERPC_POWER9_DD22= 0x004E1202,
 CPU_POWERPC_POWER10_BASE   = 0x0080,

With that,

Reviewed-by: Harsh Prateek Bora 

regards,
Harsh



diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c
index 40b7c52f7f..50523ead25 100644
--- a/hw/ppc/spapr_cpu_core.c
+++ b/hw/ppc/spapr_cpu_core.c
@@ -394,10 +394,8 @@ static const TypeInfo spapr_cpu_core_type_infos[] = {
  DEFINE_SPAPR_CPU_CORE_TYPE("power8_v2.0"),
  DEFINE_SPAPR_CPU_CORE_TYPE("power8e_v2.1"),
  DEFINE_SPAPR_CPU_CORE_TYPE("power8nvl_v1.0"),
-DEFINE_SPAPR_CPU_CORE_TYPE("power9_v1.0"),
  DEFINE_SPAPR_CPU_CORE_TYPE("power9_v2.0"),
  DEFINE_SPAPR_CPU_CORE_TYPE("power9_v2.2"),
-DEFINE_SPAPR_CPU_CORE_TYPE("power10_v1.0"),
  DEFINE_SPAPR_CPU_CORE_TYPE("power10_v2.0"),
  #ifdef CONFIG_KVM
  DEFINE_SPAPR_CPU_CORE_TYPE("host"),
diff --git a/target/ppc/cpu-models.c b/target/ppc/cpu-models.c
index 36e465b390..f2301b43f7 100644
--- a/target/ppc/cpu-models.c
+++ b/target/ppc/cpu-models.c
@@ -728,14 +728,10 @@
  "POWER8 v2.0")
  POWERPC_DEF("power8nvl_v1.0", CPU_POWERPC_POWER8NVL_v10, POWER8,
  "POWER8NVL v1.0")
-POWERPC_DEF("power9_v1.0",   CPU_POWERPC_POWER9_DD1, POWER9,
-"POWER9 v1.0")
  POWERPC_DEF("power9_v2.0",   CPU_POWERPC_POWER9_DD20,POWER9,
  "POWER9 v2.0")
  POWERPC_DEF("power9_v2.2",   CPU_POWERPC_POWER9_DD22,POWER9,
  "POWER9 v2.2")
-POWERPC_DEF("power10_v1.0",  CPU_POWERPC_POWER10_DD1,POWER10,
-"POWER10 v1.0")
  POWERPC_DEF("power10_v2.0",  CPU_POWERPC_POWER10_DD20,   POWER10,
  "POWER10 v2.0")
  #endif /* defined (TARGET_PPC64) */
diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
index 1d3d1db7c3..572cbdf25f 100644
--- a/target/ppc/cpu_init.c
+++ b/target/ppc/cpu_init.c
@@ -6350,10 +6350,7 @@ static bool ppc_pvr_match_power9(PowerPCCPUClass *pcc, 
uint32_t pvr, bool best)
  return false;
  }
  
-if ((pvr & 0x0f00) == 0x100) {

-/* DD1.x always matches power9_v1.0 */
-return true;
-} else if ((pvr & 0x0f00) == 0x200) {
+if ((pvr & 0x0f00) == 0x200) {
  if ((pvr & 0xf) < 2) {
  /* DD2.0, DD2.1 match power9_v2.0 */
  if ((pcc->pvr & 0xf) == 0) {
@@ -6536,7 +6533,7 @@ static bool ppc_pvr_match_power10(PowerPCCPUClass *pcc, 
uint32_t pvr, bool best)
  }
  
  if ((pvr & 0x0f00) == (pcc->pvr & 0x0f00)) {

-/* Major DD version matches to power10_v1.0 and power10_v2.0 */
+/* Major DD version matches power10_v2.0 */
  return true;
  }
  
diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c

index bcf30a5400..525fbe3892 100644
--- a/target/ppc/kvm.c
+++ b/target/ppc/kvm.c
@@ -2369,17 +2369,6 @@ static void kvmppc_host_cpu_class_init(ObjectClass *oc, 
void *data)
  
  #if defined(TARGET_PPC64)

  pcc->radix_page_info = kvmppc_get_radix_page_info();
-
-if ((pcc->pvr & 0xff00) == CPU_POWERPC_POWER9_DD1) {
-/*
- * POWER9 DD1 has some bugs which make it not really ISA 3.00
- * compliant.  More importantly, advertising ISA 3.00
- * architected mode may prevent guests from activating
- * necessary DD1 workarounds.
- */
-pcc->pcr_supported &= ~(PCR_COMPAT_3_00 | PCR_COMPAT_2_07
-| PCR_COMPAT_2_06 | PCR_COMPAT_2_05);
-}
  #endif /* defined(TARGET_PPC64) */
  }

[PATCH v5 05/14] spapr: nested: Document Nested PAPR API

2024-03-08 Thread Harsh Prateek Bora

Adding initial documentation about Nested PAPR API to describe the set
of APIs and its usage. Also talks about the Guest State Buffer elements
and it's format which is used between L0/L1 to communicate L2 state.

Signed-off-by: Michael Neuling 
Signed-off-by: Harsh Prateek Bora 
---
 docs/devel/nested-papr.txt | 119 +
 1 file changed, 119 insertions(+)
 create mode 100644 docs/devel/nested-papr.txt

diff --git a/docs/devel/nested-papr.txt b/docs/devel/nested-papr.txt
new file mode 100644
index 00..90943650db
--- /dev/null
+++ b/docs/devel/nested-papr.txt
@@ -0,0 +1,119 @@
+Nested PAPR API (aka KVM on PowerVM)
+
+
+This API aims at providing support to enable nested virtualization with
+KVM on PowerVM. While the existing support for nested KVM on PowerNV was
+introduced with cap-nested-hv option, however, with a slight design change,
+to enable this on papr/pseries, a new cap-nested-papr option is added. eg:
+
+  qemu-system-ppc64 -cpu POWER10 -machine pseries,cap-nested-papr=true ...
+
+Work by:
+Michael Neuling 
+Vaibhav Jain 
+Jordan Niethe 
+Harsh Prateek Bora 
+Shivaprasad G Bhat 
+Kautuk Consul 
+
+Below taken from the kernel documentation:
+
+Introduction
+
+
+This document explains how a guest operating system can act as a
+hypervisor and run nested guests through the use of hypercalls, if the
+hypervisor has implemented them. The terms L0, L1, and L2 are used to
+refer to different software entities. L0 is the hypervisor mode entity
+that would normally be called the "host" or "hypervisor". L1 is a
+guest virtual machine that is directly run under L0 and is initiated
+and controlled by L0. L2 is a guest virtual machine that is initiated
+and controlled by L1 acting as a hypervisor. A significant design change
+wrt existing API is that now the entire L2 state is maintained within L0.
+
+Existing Nested-HV API
+==
+
+Linux/KVM has had support for Nesting as an L0 or L1 since 2018
+
+The L0 code was added::
+
+   commit 8e3f5fc1045dc49fd175b978c5457f5f51e7a2ce
+   Author: Paul Mackerras 
+   Date:   Mon Oct 8 16:31:03 2018 +1100
+   KVM: PPC: Book3S HV: Framework and hcall stubs for nested virtualization
+
+The L1 code was added::
+
+   commit 360cae313702cdd0b90f82c261a8302fecef030a
+   Author: Paul Mackerras 
+   Date:   Mon Oct 8 16:31:04 2018 +1100
+   KVM: PPC: Book3S HV: Nested guest entry via hypercall
+
+This API works primarily using a signal hcall h_enter_nested(). This
+call made by the L1 to tell the L0 to start an L2 vCPU with the given
+state. The L0 then starts this L2 and runs until an L2 exit condition
+is reached. Once the L2 exits, the state of the L2 is given back to
+the L1 by the L0. The full L2 vCPU state is always transferred from
+and to L1 when the L2 is run. The L0 doesn't keep any state on the L2
+vCPU (except in the short sequence in the L0 on L1 -> L2 entry and L2
+-> L1 exit).
+
+The only state kept by the L0 is the partition table. The L1 registers
+it's partition table using the h_set_partition_table() hcall. All
+other state held by the L0 about the L2s is cached state (such as
+shadow page tables).
+
+The L1 may run any L2 or vCPU without first informing the L0. It
+simply starts the vCPU using h_enter_nested(). The creation of L2s and
+vCPUs is done implicitly whenever h_enter_nested() is called.
+
+In this document, we call this existing API the v1 API.
+
+New PAPR API
+===
+
+The new PAPR API changes from the v1 API such that the creating L2 and
+associated vCPUs is explicit. In this document, we call this the v2
+API.
+
+h_enter_nested() is replaced with H_GUEST_VCPU_RUN().  Before this can
+be called the L1 must explicitly create the L2 using h_guest_create()
+and any associated vCPUs() created with h_guest_create_vCPU(). Getting
+and setting vCPU state can also be performed using h_guest_{g|s}et
+hcall.
+
+The basic execution flow is for an L1 to create an L2, run it, and
+delete it is:
+
+- L1 and L0 negotiate capabilities with H_GUEST_{G,S}ET_CAPABILITIES()
+  (normally at L1 boot time).
+
+- L1 requests the L0 to create an L2 with H_GUEST_CREATE() and receives a token
+
+- L1 requests the L0 to create an L2 vCPU with H_GUEST_CREATE_VCPU()
+
+- L1 and L0 communicate the vCPU state using the H_GUEST_{G,S}ET() hcall
+
+- L1 requests the L0 to run the vCPU using H_GUEST_RUN_VCPU() hcall
+
+- L1 deletes L2 with H_GUEST_DELETE()
+
+For more details, please refer:
+
+[1] Linux Kernel documentation (upstream documentation commit):
+
+commit 476652297f94a2e5e5ef29e734b0da37ade94110
+Author: Michael Neuling 
+Date:   Thu Sep 14 13:06:00 2023 +1000
+
+docs: powerpc: Document nested KVM on POWER
+
+Document support for nested KVM on POWER using the existing API as well
+as the new PAPR API. This includes the new HCALL interface and how it
+used by KVM.
+
+Signed-off-by: Michae

[PATCH v5 07/14] spapr: nested: Introduce H_GUEST_[CREATE|DELETE] hcalls.

2024-03-08 Thread Harsh Prateek Bora

Introduce the nested PAPR hcalls:
- H_GUEST_CREATE which is used to create and allocate resources for
nested guest being created.
- H_GUEST_DELETE which is used to delete and deallocate resources
for the nested guest being deleted. It also supports deleting all nested
guests at once using a deleteAll flag.

Signed-off-by: Michael Neuling 
Signed-off-by: Harsh Prateek Bora 
---
 include/hw/ppc/spapr.h|   4 +-
 include/hw/ppc/spapr_nested.h |   7 +++
 hw/ppc/spapr_nested.c | 103 ++
 3 files changed, 113 insertions(+), 1 deletion(-)

diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 2906d59137..13416fc3d7 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -586,8 +586,10 @@ struct SpaprMachineState {
 #define H_WATCHDOG  0x45C
 #define H_GUEST_GET_CAPABILITIES 0x460
 #define H_GUEST_SET_CAPABILITIES 0x464
+#define H_GUEST_CREATE   0x470
+#define H_GUEST_DELETE   0x488
 
-#define MAX_HCALL_OPCODE H_GUEST_SET_CAPABILITIES
+#define MAX_HCALL_OPCODE H_GUEST_DELETE
 
 /* The hcalls above are standardized in PAPR and implemented by pHyp
  * as well.
diff --git a/include/hw/ppc/spapr_nested.h b/include/hw/ppc/spapr_nested.h
index 73687e03e4..56d43e540b 100644
--- a/include/hw/ppc/spapr_nested.h
+++ b/include/hw/ppc/spapr_nested.h
@@ -9,8 +9,13 @@ typedef struct SpaprMachineStateNested {
 #define NESTED_API_KVM_HV  1
 bool capabilities_set;
 uint32_t pvr_base;
+GHashTable *guests;
 } SpaprMachineStateNested;
 
+typedef struct SpaprMachineStateNestedGuest {
+uint32_t pvr_logical;
+} SpaprMachineStateNestedGuest;
+
 /* Nested PAPR API related macros */
 #define H_GUEST_CAPABILITIES_COPY_MEM 0x8000
 #define H_GUEST_CAPABILITIES_P9_MODE  0x4000
@@ -20,6 +25,8 @@ typedef struct SpaprMachineStateNested {
 #define H_GUEST_CAP_COPY_MEM_BMAP 0
 #define H_GUEST_CAP_P9_MODE_BMAP  1
 #define H_GUEST_CAP_P10_MODE_BMAP 2
+#define PAPR_NESTED_GUEST_MAX 4096
+#define H_GUEST_DELETE_ALL_FLAG   0x8000ULL
 
 /*
  * Register state for entering a nested guest with H_ENTER_NESTED.
diff --git a/hw/ppc/spapr_nested.c b/hw/ppc/spapr_nested.c
index 601f669060..13674c0857 100644
--- a/hw/ppc/spapr_nested.c
+++ b/hw/ppc/spapr_nested.c
@@ -520,6 +520,105 @@ static target_ulong h_guest_set_capabilities(PowerPCCPU 
*cpu,
 }
 }
 
+static void
+destroy_guest_helper(gpointer value)
+{
+struct SpaprMachineStateNestedGuest *guest = value;
+g_free(guest);
+}
+
+static target_ulong h_guest_create(PowerPCCPU *cpu,
+   SpaprMachineState *spapr,
+   target_ulong opcode,
+   target_ulong *args)
+{
+CPUPPCState *env = >env;
+target_ulong flags = args[0];
+target_ulong continue_token = args[1];
+uint64_t guestid;
+int nguests = 0;
+struct SpaprMachineStateNestedGuest *guest;
+
+if (flags) { /* don't handle any flags for now */
+return H_UNSUPPORTED_FLAG;
+}
+
+if (continue_token != -1) {
+return H_P2;
+}
+
+if (!spapr->nested.capabilities_set) {
+return H_STATE;
+}
+
+if (!spapr->nested.guests) {
+spapr->nested.guests = g_hash_table_new_full(NULL,
+ NULL,
+ NULL,
+ destroy_guest_helper);
+}
+
+nguests = g_hash_table_size(spapr->nested.guests);
+
+if (nguests == PAPR_NESTED_GUEST_MAX) {
+return H_NO_MEM;
+}
+
+/* Lookup for available guestid */
+for (guestid = 1; guestid < PAPR_NESTED_GUEST_MAX; guestid++) {
+if (!(g_hash_table_lookup(spapr->nested.guests,
+  GINT_TO_POINTER(guestid {
+break;
+}
+}
+
+if (guestid == PAPR_NESTED_GUEST_MAX) {
+return H_NO_MEM;
+}
+
+guest = g_try_new0(struct SpaprMachineStateNestedGuest, 1);
+if (!guest) {
+return H_NO_MEM;
+}
+
+guest->pvr_logical = spapr->nested.pvr_base;
+g_hash_table_insert(spapr->nested.guests, GINT_TO_POINTER(guestid), guest);
+env->gpr[4] = guestid;
+
+return H_SUCCESS;
+}
+
+static target_ulong h_guest_delete(PowerPCCPU *cpu,
+   SpaprMachineState *spapr,
+   target_ulong opcode,
+   target_ulong *args)
+{
+target_ulong flags = args[0];
+target_ulong guestid = args[1];
+struct SpaprMachineStateNestedGuest *guest;
+
+/*
+ * handle flag deleteAllGuests, if set:
+ * guestid is ignored and all guests are deleted
+ *
+ */
+if (flags & ~H_GUEST_DELETE_ALL_FLAG) {
+return H_UNSUPPORTED_FLAG; /* other flag bits reserved */
+} els

[PATCH v5 11/14] spapr: nested: Introduce H_GUEST_[GET|SET]_STATE hcalls.

2024-03-08 Thread Harsh Prateek Bora

Introduce the nested PAPR hcalls:
- H_GUEST_GET_STATE which is used to get state of a nested guest or
  a guest VCPU. The value field for each element in the request is
  destination to be updated to reflect current state on success.
- H_GUEST_SET_STATE which is used to modify the state of a guest or
  a guest VCPU. On success, guest (or its VCPU) state shall be
  updated as per the value field for the requested element(s).

Signed-off-by: Michael Neuling 
Signed-off-by: Harsh Prateek Bora 
---
 include/hw/ppc/spapr.h|   3 +
 include/hw/ppc/spapr_nested.h |  23 +++
 hw/ppc/spapr_nested.c | 268 ++
 3 files changed, 294 insertions(+)

diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 070135793a..6223873641 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -366,6 +366,7 @@ struct SpaprMachineState {
 #define H_OVERLAP -68
 #define H_STATE   -75
 #define H_IN_USE  -77
+#define H_INVALID_ELEMENT_VALUE-81
 #define H_UNSUPPORTED_FLAG -256
 #define H_MULTI_THREADS_ACTIVE -9005
 
@@ -589,6 +590,8 @@ struct SpaprMachineState {
 #define H_GUEST_SET_CAPABILITIES 0x464
 #define H_GUEST_CREATE   0x470
 #define H_GUEST_CREATE_VCPU  0x474
+#define H_GUEST_GET_STATE0x478
+#define H_GUEST_SET_STATE0x47C
 #define H_GUEST_DELETE   0x488
 
 #define MAX_HCALL_OPCODE H_GUEST_DELETE
diff --git a/include/hw/ppc/spapr_nested.h b/include/hw/ppc/spapr_nested.h
index 433d93c480..bd43c6b6ef 100644
--- a/include/hw/ppc/spapr_nested.h
+++ b/include/hw/ppc/spapr_nested.h
@@ -224,6 +224,10 @@ typedef struct SpaprMachineStateNestedGuest {
 #define HVMASK_MSR0xEBBFEFFF
 #define HVMASK_HDEXCR 0x
 #define HVMASK_TB_OFFSET  0x00FF
+#define GSB_MAX_BUF_SIZE  (1024 * 1024)
+#define H_GUEST_GETSET_STATE_FLAG_GUEST_WIDE 0x8000
+#define GUEST_STATE_REQUEST_GUEST_WIDE   0x1
+#define GUEST_STATE_REQUEST_SET  0x2
 
 /* As per ISA v3.1B, following bits are reserved:
  *  0:2
@@ -321,6 +325,25 @@ typedef struct SpaprMachineStateNestedGuest {
 #define GSE_ENV_DWM(i, f, m) \
 GUEST_STATE_ELEMENT_MSK(i, 8, f, copy_state_8to8, m)
 
+struct guest_state_element {
+uint16_t id;
+uint16_t size;
+uint8_t value[];
+} QEMU_PACKED;
+
+struct guest_state_buffer {
+uint32_t num_elements;
+struct guest_state_element elements[];
+} QEMU_PACKED;
+
+/* Actual buffer plus some metadata about the request */
+struct guest_state_request {
+struct guest_state_buffer *gsb;
+int64_t buf;
+int64_t len;
+uint16_t flags;
+};
+
 /*
  * Register state for entering a nested guest with H_ENTER_NESTED.
  * New member must be added at the end.
diff --git a/hw/ppc/spapr_nested.c b/hw/ppc/spapr_nested.c
index 07dc294c5a..ca99805ce8 100644
--- a/hw/ppc/spapr_nested.c
+++ b/hw/ppc/spapr_nested.c
@@ -1028,6 +1028,140 @@ void spapr_nested_gsb_init(void)
 }
 }
 
+static struct guest_state_element *guest_state_element_next(
+struct guest_state_element *element,
+int64_t *len,
+int64_t *num_elements)
+{
+uint16_t size;
+
+/* size is of element->value[] only. Not whole guest_state_element */
+size = be16_to_cpu(element->size);
+
+if (len) {
+*len -= size + offsetof(struct guest_state_element, value);
+}
+
+if (num_elements) {
+*num_elements -= 1;
+}
+
+return (struct guest_state_element *)(element->value + size);
+}
+
+static
+struct guest_state_element_type *guest_state_element_type_find(uint16_t id)
+{
+int i;
+
+for (i = 0; i < ARRAY_SIZE(guest_state_element_types); i++)
+if (id == guest_state_element_types[i].id) {
+return _state_element_types[i];
+}
+
+return NULL;
+}
+
+static void log_element(struct guest_state_element *element,
+struct guest_state_request *gsr)
+{
+qemu_log_mask(LOG_GUEST_ERROR, "h_guest_%s_state id:0x%04x size:0x%04x",
+  gsr->flags & GUEST_STATE_REQUEST_SET ? "set" : "get",
+  be16_to_cpu(element->id), be16_to_cpu(element->size));
+qemu_log_mask(LOG_GUEST_ERROR, "buf:0x%016lx ...\n",
+  be64_to_cpu(*(uint64_t *)element->value));
+}
+
+static bool guest_state_request_check(struct guest_state_request *gsr)
+{
+int64_t num_elements, len = gsr->len;
+struct guest_state_buffer *gsb = gsr->gsb;
+struct guest_state_element *element;
+struct guest_state_element_type *type;
+uint16_t id, size;
+
+/* gsb->num_elements = 0 == 32 bits long */
+assert(len >= 4);
+
+num_elements = be32_to_cpu(gsb->num_elements);
+element = gsb->elements;
+len -= sizeof(gsb->num_elements);
+
+/* Walk the buffer to validate the leng

[PATCH v5 04/14] spapr: nested: keep nested-hv related code restricted to its API.

2024-03-08 Thread Harsh Prateek Bora

spapr_exit_nested and spapr_get_pate_nested_hv contains code which
is specific to nested-hv API. Isolating code flows based on API
helps extending it to be used with different API as well.

Signed-off-by: Harsh Prateek Bora 
Suggested-by: Nicholas Piggin 
---
 include/hw/ppc/spapr_nested.h |  3 +++
 hw/ppc/spapr.c|  6 +-
 hw/ppc/spapr_nested.c | 25 ++---
 3 files changed, 30 insertions(+), 4 deletions(-)

diff --git a/include/hw/ppc/spapr_nested.h b/include/hw/ppc/spapr_nested.h
index 2488ea98da..bf3a7b8d89 100644
--- a/include/hw/ppc/spapr_nested.h
+++ b/include/hw/ppc/spapr_nested.h
@@ -5,6 +5,8 @@
 
 typedef struct SpaprMachineStateNested {
 uint64_t ptcr;
+uint8_t api;
+#define NESTED_API_KVM_HV  1
 } SpaprMachineStateNested;
 
 /*
@@ -103,4 +105,5 @@ void spapr_exit_nested(PowerPCCPU *cpu, int excp);
 typedef struct SpaprMachineState SpaprMachineState;
 bool spapr_get_pate_nested_hv(SpaprMachineState *spapr, PowerPCCPU *cpu,
   target_ulong lpid, ppc_v3_pate_t *entry);
+uint8_t spapr_nested_api(SpaprMachineState *spapr);
 #endif /* HW_SPAPR_NESTED_H */
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 65d766b898..a08ffe55b6 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1376,7 +1376,11 @@ static bool spapr_get_pate(PPCVirtualHypervisor *vhyp, 
PowerPCCPU *cpu,
 entry->dw1 = spapr->patb_entry;
 return true;
 } else {
-return spapr_get_pate_nested_hv(spapr, cpu, lpid, entry);
+assert(spapr_nested_api(spapr));
+if (spapr_nested_api(spapr) == NESTED_API_KVM_HV) {
+return spapr_get_pate_nested_hv(spapr, cpu, lpid, entry);
+}
+return false;
 }
 }
 
diff --git a/hw/ppc/spapr_nested.c b/hw/ppc/spapr_nested.c
index c2a33fc3a9..12fdbe2aba 100644
--- a/hw/ppc/spapr_nested.c
+++ b/hw/ppc/spapr_nested.c
@@ -11,11 +11,19 @@
 void spapr_nested_reset(SpaprMachineState *spapr)
 {
 if (spapr_get_cap(spapr, SPAPR_CAP_NESTED_KVM_HV)) {
+spapr->nested.api = NESTED_API_KVM_HV;
 spapr_unregister_nested_hv();
 spapr_register_nested_hv();
+} else {
+spapr->nested.api = 0;
 }
 }
 
+uint8_t spapr_nested_api(SpaprMachineState *spapr)
+{
+return spapr->nested.api;
+}
+
 #ifdef CONFIG_TCG
 
 bool spapr_get_pate_nested_hv(SpaprMachineState *spapr, PowerPCCPU *cpu,
@@ -310,7 +318,7 @@ static target_ulong h_enter_nested(PowerPCCPU *cpu,
 return env->gpr[3];
 }
 
-void spapr_exit_nested(PowerPCCPU *cpu, int excp)
+static void spapr_exit_nested_hv(PowerPCCPU *cpu, int excp)
 {
 CPUPPCState *env = >env;
 SpaprCpuState *spapr_cpu = spapr_cpu_state(cpu);
@@ -322,8 +330,6 @@ void spapr_exit_nested(PowerPCCPU *cpu, int excp)
 struct kvmppc_pt_regs *regs;
 hwaddr len;
 
-assert(spapr_cpu->in_nested);
-
 nested_save_state(_state, cpu);
 hsrr0 = env->spr[SPR_HSRR0];
 hsrr1 = env->spr[SPR_HSRR1];
@@ -413,6 +419,19 @@ void spapr_exit_nested(PowerPCCPU *cpu, int excp)
 address_space_unmap(CPU(cpu)->as, regs, len, len, true);
 }
 
+void spapr_exit_nested(PowerPCCPU *cpu, int excp)
+{
+SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
+SpaprCpuState *spapr_cpu = spapr_cpu_state(cpu);
+
+assert(spapr_cpu->in_nested);
+if (spapr_nested_api(spapr) == NESTED_API_KVM_HV) {
+spapr_exit_nested_hv(cpu, excp);
+} else {
+g_assert_not_reached();
+}
+}
+
 void spapr_register_nested_hv(void)
 {
 spapr_register_hypercall(KVMPPC_H_SET_PARTITION_TABLE, h_set_ptbl);
-- 
2.39.3

[PATCH v5 08/14] spapr: nested: Introduce H_GUEST_CREATE_VCPU hcall.

2024-03-08 Thread Harsh Prateek Bora

Introduce the nested PAPR hcall H_GUEST_CREATE_VCPU which is used to
create and initialize the specified VCPU resource for the previously
created guest. Each guest can have multiple VCPUs upto max 2048.
All VCPUs for a guest gets deallocated on guest delete.

Signed-off-by: Michael Neuling 
Signed-off-by: Harsh Prateek Bora 
---
 include/hw/ppc/spapr.h|  2 ++
 include/hw/ppc/spapr_nested.h |  8 +
 hw/ppc/spapr_nested.c | 61 +++
 3 files changed, 71 insertions(+)

diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 13416fc3d7..070135793a 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -365,6 +365,7 @@ struct SpaprMachineState {
 #define H_UNSUPPORTED -67
 #define H_OVERLAP -68
 #define H_STATE   -75
+#define H_IN_USE  -77
 #define H_UNSUPPORTED_FLAG -256
 #define H_MULTI_THREADS_ACTIVE -9005
 
@@ -587,6 +588,7 @@ struct SpaprMachineState {
 #define H_GUEST_GET_CAPABILITIES 0x460
 #define H_GUEST_SET_CAPABILITIES 0x464
 #define H_GUEST_CREATE   0x470
+#define H_GUEST_CREATE_VCPU  0x474
 #define H_GUEST_DELETE   0x488
 
 #define MAX_HCALL_OPCODE H_GUEST_DELETE
diff --git a/include/hw/ppc/spapr_nested.h b/include/hw/ppc/spapr_nested.h
index 56d43e540b..2ac3076fac 100644
--- a/include/hw/ppc/spapr_nested.h
+++ b/include/hw/ppc/spapr_nested.h
@@ -14,6 +14,8 @@ typedef struct SpaprMachineStateNested {
 
 typedef struct SpaprMachineStateNestedGuest {
 uint32_t pvr_logical;
+unsigned long nr_vcpus;
+struct SpaprMachineStateNestedGuestVcpu *vcpus;
 } SpaprMachineStateNestedGuest;
 
 /* Nested PAPR API related macros */
@@ -27,6 +29,7 @@ typedef struct SpaprMachineStateNestedGuest {
 #define H_GUEST_CAP_P10_MODE_BMAP 2
 #define PAPR_NESTED_GUEST_MAX 4096
 #define H_GUEST_DELETE_ALL_FLAG   0x8000ULL
+#define PAPR_NESTED_GUEST_VCPU_MAX2048
 
 /*
  * Register state for entering a nested guest with H_ENTER_NESTED.
@@ -120,6 +123,11 @@ struct nested_ppc_state {
 int64_t tb_offset;
 };
 
+typedef struct SpaprMachineStateNestedGuestVcpu {
+bool enabled;
+struct nested_ppc_state state;
+} SpaprMachineStateNestedGuestVcpu;
+
 void spapr_exit_nested(PowerPCCPU *cpu, int excp);
 typedef struct SpaprMachineState SpaprMachineState;
 bool spapr_get_pate_nested_hv(SpaprMachineState *spapr, PowerPCCPU *cpu,
diff --git a/hw/ppc/spapr_nested.c b/hw/ppc/spapr_nested.c
index 13674c0857..4c0e2e91e1 100644
--- a/hw/ppc/spapr_nested.c
+++ b/hw/ppc/spapr_nested.c
@@ -8,6 +8,7 @@
 #include "hw/ppc/spapr_nested.h"
 #include "mmu-book3s-v3.h"
 #include "cpu-models.h"
+#include "qemu/log.h"
 
 void spapr_nested_reset(SpaprMachineState *spapr)
 {
@@ -434,6 +435,16 @@ void spapr_exit_nested(PowerPCCPU *cpu, int excp)
 }
 }
 
+static
+SpaprMachineStateNestedGuest *spapr_get_nested_guest(SpaprMachineState *spapr,
+ target_ulong guestid)
+{
+SpaprMachineStateNestedGuest *guest;
+
+guest = g_hash_table_lookup(spapr->nested.guests, 
GINT_TO_POINTER(guestid));
+return guest;
+}
+
 static target_ulong h_guest_get_capabilities(PowerPCCPU *cpu,
  SpaprMachineState *spapr,
  target_ulong opcode,
@@ -524,6 +535,7 @@ static void
 destroy_guest_helper(gpointer value)
 {
 struct SpaprMachineStateNestedGuest *guest = value;
+g_free(guest->vcpus);
 g_free(guest);
 }
 
@@ -619,6 +631,53 @@ static target_ulong h_guest_delete(PowerPCCPU *cpu,
 return H_SUCCESS;
 }
 
+static target_ulong h_guest_create_vcpu(PowerPCCPU *cpu,
+SpaprMachineState *spapr,
+target_ulong opcode,
+target_ulong *args)
+{
+target_ulong flags = args[0];
+target_ulong guestid = args[1];
+target_ulong vcpuid = args[2];
+SpaprMachineStateNestedGuest *guest;
+
+if (flags) { /* don't handle any flags for now */
+return H_UNSUPPORTED_FLAG;
+}
+
+guest = spapr_get_nested_guest(spapr, guestid);
+if (!guest) {
+return H_P2;
+}
+
+if (vcpuid < guest->nr_vcpus) {
+qemu_log_mask(LOG_UNIMP, "vcpuid %ld already in use, return.", vcpuid);
+return H_IN_USE;
+}
+/* linear vcpuid allocation only */
+assert(vcpuid == guest->nr_vcpus);
+
+if (guest->nr_vcpus >= PAPR_NESTED_GUEST_VCPU_MAX) {
+return H_P3;
+}
+
+SpaprMachineStateNestedGuestVcpu *vcpus, *curr_vcpu;
+vcpus = g_try_renew(struct SpaprMachineStateNestedGuestVcpu,
+guest->vcpus,
+guest->nr_vcpus + 1);
+if (!vcpus) {
+return H_NO_MEM;
+}
+guest->vcpus = vcpus;
+curr_vcpu = [guest->nr_vcpus];
+memset(curr_v

[PATCH v5 09/14] spapr: nested: Extend nested_ppc_state for nested PAPR API

2024-03-08 Thread Harsh Prateek Bora

Currently, nested_ppc_state stores a certain set of registers and works
with nested_[load|save]_state() for state transfer as reqd for nested-hv API.
Extending these with additional registers state as reqd for nested PAPR API.

Signed-off-by: Harsh Prateek Bora 
Suggested-by: Nicholas Piggin 
---
 include/hw/ppc/spapr_nested.h |  50 
 target/ppc/cpu.h  |   2 +
 hw/ppc/spapr_nested.c | 106 ++
 3 files changed, 158 insertions(+)

diff --git a/include/hw/ppc/spapr_nested.h b/include/hw/ppc/spapr_nested.h
index 2ac3076fac..d232014ccb 100644
--- a/include/hw/ppc/spapr_nested.h
+++ b/include/hw/ppc/spapr_nested.h
@@ -7,6 +7,7 @@ typedef struct SpaprMachineStateNested {
 uint64_t ptcr;
 uint8_t api;
 #define NESTED_API_KVM_HV  1
+#define NESTED_API_PAPR2
 bool capabilities_set;
 uint32_t pvr_base;
 GHashTable *guests;
@@ -121,6 +122,55 @@ struct nested_ppc_state {
 uint64_t ppr;
 
 int64_t tb_offset;
+/* Nested PAPR API */
+uint64_t amor;
+uint64_t dawr0;
+uint64_t dawrx0;
+uint64_t ciabr;
+uint64_t purr;
+uint64_t spurr;
+uint64_t ic;
+uint64_t vtb;
+uint64_t hdar;
+uint64_t hdsisr;
+uint64_t heir;
+uint64_t asdr;
+uint64_t dawr1;
+uint64_t dawrx1;
+uint64_t dexcr;
+uint64_t hdexcr;
+uint64_t hashkeyr;
+uint64_t hashpkeyr;
+ppc_vsr_t vsr[64] QEMU_ALIGNED(16);
+uint64_t ebbhr;
+uint64_t tar;
+uint64_t ebbrr;
+uint64_t bescr;
+uint64_t iamr;
+uint64_t amr;
+uint64_t uamor;
+uint64_t dscr;
+uint64_t fscr;
+uint64_t pspb;
+uint64_t ctrl;
+uint64_t vrsave;
+uint64_t dar;
+uint64_t dsisr;
+uint64_t pmc1;
+uint64_t pmc2;
+uint64_t pmc3;
+uint64_t pmc4;
+uint64_t pmc5;
+uint64_t pmc6;
+uint64_t mmcr0;
+uint64_t mmcr1;
+uint64_t mmcr2;
+uint64_t mmcra;
+uint64_t sdar;
+uint64_t siar;
+uint64_t sier;
+uint32_t vscr;
+uint64_t fpscr;
 };
 
 typedef struct SpaprMachineStateNestedGuestVcpu {
diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index 0133da4e07..4cffd46c79 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -1755,9 +1755,11 @@ void ppc_compat_add_property(Object *obj, const char 
*name,
 #define SPR_PSPB  (0x09F)
 #define SPR_DPDES (0x0B0)
 #define SPR_DAWR0 (0x0B4)
+#define SPR_DAWR1 (0x0B5)
 #define SPR_RPR   (0x0BA)
 #define SPR_CIABR (0x0BB)
 #define SPR_DAWRX0(0x0BC)
+#define SPR_DAWRX1(0x0BD)
 #define SPR_HFSCR (0x0BE)
 #define SPR_VRSAVE(0x100)
 #define SPR_USPRG0(0x100)
diff --git a/hw/ppc/spapr_nested.c b/hw/ppc/spapr_nested.c
index 4c0e2e91e1..09ebf42a57 100644
--- a/hw/ppc/spapr_nested.c
+++ b/hw/ppc/spapr_nested.c
@@ -108,6 +108,7 @@ static target_ulong h_copy_tofrom_guest(PowerPCCPU *cpu,
 static void nested_save_state(struct nested_ppc_state *save, PowerPCCPU *cpu)
 {
 CPUPPCState *env = >env;
+SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
 
 memcpy(save->gpr, env->gpr, sizeof(save->gpr));
 
@@ -134,6 +135,58 @@ static void nested_save_state(struct nested_ppc_state 
*save, PowerPCCPU *cpu)
 save->pidr = env->spr[SPR_BOOKS_PID];
 save->ppr = env->spr[SPR_PPR];
 
+if (spapr_nested_api(spapr) == NESTED_API_PAPR) {
+save->pvr = env->spr[SPR_PVR];
+save->amor = env->spr[SPR_AMOR];
+save->dawr0 = env->spr[SPR_DAWR0];
+save->dawrx0 = env->spr[SPR_DAWRX0];
+save->ciabr = env->spr[SPR_CIABR];
+save->purr = env->spr[SPR_PURR];
+save->spurr = env->spr[SPR_SPURR];
+save->ic = env->spr[SPR_IC];
+save->vtb = env->spr[SPR_VTB];
+save->hdar = env->spr[SPR_HDAR];
+save->hdsisr = env->spr[SPR_HDSISR];
+save->heir = env->spr[SPR_HEIR];
+save->asdr = env->spr[SPR_ASDR];
+save->dawr1 = env->spr[SPR_DAWR1];
+save->dawrx1 = env->spr[SPR_DAWRX1];
+save->dexcr = env->spr[SPR_DEXCR];
+save->hdexcr = env->spr[SPR_HDEXCR];
+save->hashkeyr = env->spr[SPR_HASHKEYR];
+save->hashpkeyr = env->spr[SPR_HASHPKEYR];
+memcpy(save->vsr, env->vsr, sizeof(save->vsr));
+save->ebbhr = env->spr[SPR_EBBHR];
+save->tar = env->spr[SPR_TAR];
+save->ebbrr = env->spr[SPR_EBBRR];
+save->bescr = env->spr[SPR_BESCR];
+save->iamr = env->spr[SPR_IAMR];
+save->amr = env->spr[SPR_AMR];
+save->uamor = env->spr[SPR_UAMOR];
+save->dscr = env->spr[SPR_DSCR];
+save->fscr = env->spr[SPR_FSCR];
+save->pspb = env->spr[SPR_PSPB];
+save->ctrl = env->

[PATCH v5 01/14] spapr: nested: register nested-hv api hcalls only for cap-nested-hv

2024-03-08 Thread Harsh Prateek Bora

Since cap-nested-hv is an optional capability, it makes sense to register
api specfic hcalls only when respective capability is enabled. This
requires to introduce a new API to unregister hypercalls to maintain
sanity across guest reboot since caps are re-applied across reboots and
re-registeration of hypercalls would hit assert otherwise.

Signed-off-by: Harsh Prateek Bora 
Reviewed-by: Nicholas Piggin 
---
 include/hw/ppc/spapr.h|  4 
 include/hw/ppc/spapr_nested.h |  1 -
 hw/ppc/spapr.c|  1 +
 hw/ppc/spapr_hcall.c  | 24 ++--
 hw/ppc/spapr_nested.c | 25 +++--
 5 files changed, 50 insertions(+), 5 deletions(-)

diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 5b5ba9ef77..78a736297b 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -631,6 +631,7 @@ typedef target_ulong (*spapr_hcall_fn)(PowerPCCPU *cpu, 
SpaprMachineState *sm,
target_ulong *args);
 
 void spapr_register_hypercall(target_ulong opcode, spapr_hcall_fn fn);
+void spapr_unregister_hypercall(target_ulong opcode);
 target_ulong spapr_hypercall(PowerPCCPU *cpu, target_ulong opcode,
  target_ulong *args);
 
@@ -1028,5 +1029,8 @@ void spapr_vof_client_dt_finalize(SpaprMachineState 
*spapr, void *fdt);
 
 /* H_WATCHDOG */
 void spapr_watchdog_init(SpaprMachineState *spapr);
+void spapr_register_nested_hv(void);
+void spapr_unregister_nested_hv(void);
+void spapr_nested_reset(SpaprMachineState *spapr);
 
 #endif /* HW_SPAPR_H */
diff --git a/include/hw/ppc/spapr_nested.h b/include/hw/ppc/spapr_nested.h
index d312a5d61d..09d95182b2 100644
--- a/include/hw/ppc/spapr_nested.h
+++ b/include/hw/ppc/spapr_nested.h
@@ -95,7 +95,6 @@ struct nested_ppc_state {
 int64_t tb_offset;
 };
 
-void spapr_register_nested(void);
 void spapr_exit_nested(PowerPCCPU *cpu, int excp);
 
 #endif /* HW_SPAPR_NESTED_H */
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 55263f0815..0d3c740c5b 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1689,6 +1689,7 @@ static void spapr_machine_reset(MachineState *machine, 
ShutdownCause reason)
 
 pef_kvm_reset(machine->cgs, _fatal);
 spapr_caps_apply(spapr);
+spapr_nested_reset(spapr);
 
 first_ppc_cpu = POWERPC_CPU(first_cpu);
 if (kvm_enabled() && kvmppc_has_cap_mmu_radix() &&
diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
index 75c2d12978..5e1d020e3d 100644
--- a/hw/ppc/spapr_hcall.c
+++ b/hw/ppc/spapr_hcall.c
@@ -1525,6 +1525,28 @@ void spapr_register_hypercall(target_ulong opcode, 
spapr_hcall_fn fn)
 *slot = fn;
 }
 
+void spapr_unregister_hypercall(target_ulong opcode)
+{
+spapr_hcall_fn *slot;
+
+if (opcode <= MAX_HCALL_OPCODE) {
+assert((opcode & 0x3) == 0);
+
+slot = _hypercall_table[opcode / 4];
+} else if (opcode >= SVM_HCALL_BASE && opcode <= SVM_HCALL_MAX) {
+/* we only have SVM-related hcall numbers assigned in multiples of 4 */
+assert((opcode & 0x3) == 0);
+
+slot = _hypercall_table[(opcode - SVM_HCALL_BASE) / 4];
+} else {
+assert((opcode >= KVMPPC_HCALL_BASE) && (opcode <= KVMPPC_HCALL_MAX));
+
+slot = _hypercall_table[opcode - KVMPPC_HCALL_BASE];
+}
+
+*slot = NULL;
+}
+
 target_ulong spapr_hypercall(PowerPCCPU *cpu, target_ulong opcode,
  target_ulong *args)
 {
@@ -1638,8 +1660,6 @@ static void hypercall_register_types(void)
 spapr_register_hypercall(KVMPPC_H_CAS, h_client_architecture_support);
 
 spapr_register_hypercall(KVMPPC_H_UPDATE_DT, h_update_dt);
-
-spapr_register_nested();
 }
 
 type_init(hypercall_register_types)
diff --git a/hw/ppc/spapr_nested.c b/hw/ppc/spapr_nested.c
index 121aa96ddc..8e0ee0d22f 100644
--- a/hw/ppc/spapr_nested.c
+++ b/hw/ppc/spapr_nested.c
@@ -7,6 +7,14 @@
 #include "hw/ppc/spapr_cpu_core.h"
 #include "hw/ppc/spapr_nested.h"
 
+void spapr_nested_reset(SpaprMachineState *spapr)
+{
+if (spapr_get_cap(spapr, SPAPR_CAP_NESTED_KVM_HV)) {
+spapr_unregister_nested_hv();
+spapr_register_nested_hv();
+}
+}
+
 #ifdef CONFIG_TCG
 #define PRTS_MASK  0x1f
 
@@ -375,20 +383,33 @@ void spapr_exit_nested(PowerPCCPU *cpu, int excp)
 address_space_unmap(CPU(cpu)->as, regs, len, len, true);
 }
 
-void spapr_register_nested(void)
+void spapr_register_nested_hv(void)
 {
 spapr_register_hypercall(KVMPPC_H_SET_PARTITION_TABLE, h_set_ptbl);
 spapr_register_hypercall(KVMPPC_H_ENTER_NESTED, h_enter_nested);
 spapr_register_hypercall(KVMPPC_H_TLB_INVALIDATE, h_tlb_invalidate);
 spapr_register_hypercall(KVMPPC_H_COPY_TOFROM_GUEST, h_copy_tofrom_guest);
 }
+
+void spapr_unregister_nested_hv(void)
+{
+spapr_unregister_hypercall(KVMPPC_H_SET_PARTITION_TABLE);
+spapr_unregister_hypercall(KVMPPC_H_ENTER_NESTED);
+spa

[PATCH v5 03/14] spapr: nested: Introduce SpaprMachineStateNested to store related info.

2024-03-08 Thread Harsh Prateek Bora

Currently, nested_ptcr is being used by existing nested-hv API to store
nested guest related info. This need to be organised to extend support
for the nested PAPR API which would need to store additional info
related to nested guests in next series of patches.

Signed-off-by: Michael Neuling 
Signed-off-by: Harsh Prateek Bora 
Reviewed-by: Nicholas Piggin 
---
 include/hw/ppc/spapr.h| 3 ++-
 include/hw/ppc/spapr_nested.h | 5 +
 hw/ppc/spapr_nested.c | 8 
 3 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 78a736297b..0eb01ea6fd 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -12,6 +12,7 @@
 #include "hw/ppc/spapr_xive.h"  /* For SpaprXive */
 #include "hw/ppc/xics.h"/* For ICSState */
 #include "hw/ppc/spapr_tpm_proxy.h"
+#include "hw/ppc/spapr_nested.h" /* For SpaprMachineStateNested */
 
 struct SpaprVioBus;
 struct SpaprPhbState;
@@ -213,7 +214,7 @@ struct SpaprMachineState {
 uint32_t vsmt;   /* Virtual SMT mode (KVM's "core stride") */
 
 /* Nested HV support (TCG only) */
-uint64_t nested_ptcr;
+SpaprMachineStateNested nested;
 
 Notifier epow_notifier;
 QTAILQ_HEAD(, SpaprEventLogEntry) pending_events;
diff --git a/include/hw/ppc/spapr_nested.h b/include/hw/ppc/spapr_nested.h
index 1df1ce14f6..2488ea98da 100644
--- a/include/hw/ppc/spapr_nested.h
+++ b/include/hw/ppc/spapr_nested.h
@@ -3,6 +3,10 @@
 
 #include "target/ppc/cpu.h"
 
+typedef struct SpaprMachineStateNested {
+uint64_t ptcr;
+} SpaprMachineStateNested;
+
 /*
  * Register state for entering a nested guest with H_ENTER_NESTED.
  * New member must be added at the end.
@@ -96,6 +100,7 @@ struct nested_ppc_state {
 };
 
 void spapr_exit_nested(PowerPCCPU *cpu, int excp);
+typedef struct SpaprMachineState SpaprMachineState;
 bool spapr_get_pate_nested_hv(SpaprMachineState *spapr, PowerPCCPU *cpu,
   target_ulong lpid, ppc_v3_pate_t *entry);
 #endif /* HW_SPAPR_NESTED_H */
diff --git a/hw/ppc/spapr_nested.c b/hw/ppc/spapr_nested.c
index f7888ca8bd..c2a33fc3a9 100644
--- a/hw/ppc/spapr_nested.c
+++ b/hw/ppc/spapr_nested.c
@@ -25,8 +25,8 @@ bool spapr_get_pate_nested_hv(SpaprMachineState *spapr, 
PowerPCCPU *cpu,
 
 assert(lpid != 0);
 
-patb = spapr->nested_ptcr & PTCR_PATB;
-pats = spapr->nested_ptcr & PTCR_PATS;
+patb = spapr->nested.ptcr & PTCR_PATB;
+pats = spapr->nested.ptcr & PTCR_PATS;
 
 /* Check if partition table is properly aligned */
 if (patb & MAKE_64BIT_MASK(0, pats + 12)) {
@@ -63,7 +63,7 @@ static target_ulong h_set_ptbl(PowerPCCPU *cpu,
 return H_PARAMETER;
 }
 
-spapr->nested_ptcr = ptcr; /* Save new partition table */
+spapr->nested.ptcr = ptcr; /* Save new partition table */
 
 return H_SUCCESS;
 }
@@ -195,7 +195,7 @@ static target_ulong h_enter_nested(PowerPCCPU *cpu,
 struct kvmppc_pt_regs *regs;
 hwaddr len;
 
-if (spapr->nested_ptcr == 0) {
+if (spapr->nested.ptcr == 0) {
 return H_NOT_AVAILABLE;
 }
 
-- 
2.39.3

[PATCH v5 10/14] spapr: nested: Initialize the GSB elements lookup table.

2024-03-08 Thread Harsh Prateek Bora

Nested PAPR API provides a standard Guest State Buffer (GSB) format
with unique IDs for each guest state element for which get/set state is
supported by the API. Some of the elements are read-only and/or guest-wide.
Introducing additional required GSB elements and helper routines for state
exchange of each of the nested guest state elements for which get/set state
should be supported by the API.

[amachhiw: set the PCR whenever logical PVR is set]

Signed-off-by: Michael Neuling 
Signed-off-by: Shivaprasad G Bhat 
Signed-off-by: Amit Machhiwal 
Signed-off-by: Harsh Prateek Bora 
---
 include/hw/ppc/spapr_nested.h | 312 ++
 hw/ppc/spapr_nested.c | 486 +-
 2 files changed, 796 insertions(+), 2 deletions(-)

diff --git a/include/hw/ppc/spapr_nested.h b/include/hw/ppc/spapr_nested.h
index d232014ccb..433d93c480 100644
--- a/include/hw/ppc/spapr_nested.h
+++ b/include/hw/ppc/spapr_nested.h
@@ -3,6 +3,191 @@
 
 #include "target/ppc/cpu.h"
 
+/* Guest State Buffer Element IDs */
+#define GSB_HV_VCPU_IGNORED_ID  0x /* An element whose value is ignored */
+#define GSB_HV_VCPU_STATE_SIZE  0x0001 /* HV internal format VCPU state size */
+#define GSB_VCPU_OUT_BUF_MIN_SZ 0x0002 /* Min size of the Run VCPU o/p buffer 
*/
+#define GSB_VCPU_LPVR   0x0003 /* Logical PVR */
+#define GSB_TB_OFFSET   0x0004 /* Timebase Offset */
+#define GSB_PART_SCOPED_PAGETBL 0x0005 /* Partition Scoped Page Table */
+#define GSB_PROCESS_TBL 0x0006 /* Process Table */
+/* RESERVED 0x0007 - 0x0BFF */
+#define GSB_VCPU_IN_BUFFER  0x0C00 /* Run VCPU Input Buffer */
+#define GSB_VCPU_OUT_BUFFER 0x0C01 /* Run VCPU Out Buffer */
+#define GSB_VCPU_VPA0x0C02 /* HRA to Guest VCPU VPA */
+/* RESERVED 0x0C03 - 0x0FFF */
+#define GSB_VCPU_GPR0   0x1000
+#define GSB_VCPU_GPR1   0x1001
+#define GSB_VCPU_GPR2   0x1002
+#define GSB_VCPU_GPR3   0x1003
+#define GSB_VCPU_GPR4   0x1004
+#define GSB_VCPU_GPR5   0x1005
+#define GSB_VCPU_GPR6   0x1006
+#define GSB_VCPU_GPR7   0x1007
+#define GSB_VCPU_GPR8   0x1008
+#define GSB_VCPU_GPR9   0x1009
+#define GSB_VCPU_GPR10  0x100A
+#define GSB_VCPU_GPR11  0x100B
+#define GSB_VCPU_GPR12  0x100C
+#define GSB_VCPU_GPR13  0x100D
+#define GSB_VCPU_GPR14  0x100E
+#define GSB_VCPU_GPR15  0x100F
+#define GSB_VCPU_GPR16  0x1010
+#define GSB_VCPU_GPR17  0x1011
+#define GSB_VCPU_GPR18  0x1012
+#define GSB_VCPU_GPR19  0x1013
+#define GSB_VCPU_GPR20  0x1014
+#define GSB_VCPU_GPR21  0x1015
+#define GSB_VCPU_GPR22  0x1016
+#define GSB_VCPU_GPR23  0x1017
+#define GSB_VCPU_GPR24  0x1018
+#define GSB_VCPU_GPR25  0x1019
+#define GSB_VCPU_GPR26  0x101A
+#define GSB_VCPU_GPR27  0x101B
+#define GSB_VCPU_GPR28  0x101C
+#define GSB_VCPU_GPR29  0x101D
+#define GSB_VCPU_GPR30  0x101E
+#define GSB_VCPU_GPR31  0x101F
+#define GSB_VCPU_HDEC_EXPIRY_TB 0x1020
+#define GSB_VCPU_SPR_NIA0x1021
+#define GSB_VCPU_SPR_MSR0x1022
+#define GSB_VCPU_SPR_LR 0x1023
+#define GSB_VCPU_SPR_XER0x1024
+#define GSB_VCPU_SPR_CTR0x1025
+#define GSB_VCPU_SPR_CFAR   0x1026
+#define GSB_VCPU_SPR_SRR0   0x1027
+#define GSB_VCPU_SPR_SRR1   0x1028
+#define GSB_VCPU_SPR_DAR0x1029
+#define GSB_VCPU_DEC_EXPIRE_TB  0x102A
+#define GSB_VCPU_SPR_VTB0x102B
+#define GSB_VCPU_SPR_LPCR   0x102C
+#define GSB_VCPU_SPR_HFSCR  0x102D
+#define GSB_VCPU_SPR_FSCR   0x102E
+#define GSB_VCPU_SPR_FPSCR  0x102F
+#define GSB_VCPU_SPR_DAWR0  0x1030
+#define GSB_VCPU_SPR_DAWR1  0x1031
+#define GSB_VCPU_SPR_CIABR  0x1032
+#define GSB_VCPU_SPR_PURR   0x1033
+#define GSB_VCPU_SPR_SPURR  0x1034
+#define GSB_VCPU_SPR_IC 0x1035
+#define GSB_VCPU_SPR_SPRG0  0x1036
+#define GSB_VCPU_SPR_SPRG1  0x1037
+#define GSB_VCPU_SPR_SPRG2  0x1038
+#define GSB_VCPU_SPR_SPRG3  0x1039
+#define GSB_VCPU_SPR_PPR0x103A
+#define GSB_VCPU_SPR_MMCR0  0x103B
+#define GSB_VCPU_SPR_MMCR1  0x103C
+#define GSB_VCPU_SPR_MMCR2  0x103D
+#define GSB_VCPU_SPR_MMCR3  0x103E
+#define GSB_VCPU_SPR_MMCRA  0x103F
+#define GSB_VCPU_SPR_SIER   0x1040
+#define GSB_VCPU_SPR_SIER2  0x1041
+#define GSB_VCPU_SPR_SIER3  0x1042
+#define GSB_VCPU_SPR_BESCR  0x1043
+#define GSB_VCPU_SPR_EBBHR  0x1044
+#define GSB_VCPU_SPR_EBBRR  0x1045
+#define GSB_VCPU_SPR_AMR0x1046
+#define GSB_VCPU_SPR_IAMR   0x1047
+#define GSB_VCPU_SPR_AMOR   0x1048
+#define GSB_VCPU_SPR_UAMOR  0x1049
+#define GSB_VCPU_SPR_SDAR   0x104A
+#define GSB_VCPU_SPR_SIAR   0x104B
+#define GSB_VCPU_SPR_DSCR   0x104C
+#define GSB_VCPU_SPR_TAR0x104D
+#define GSB_VCPU

[PATCH v5 06/14] spapr: nested: Introduce H_GUEST_[GET|SET]_CAPABILITIES hcalls.

2024-03-08 Thread Harsh Prateek Bora

Introduce the nested PAPR hcalls:
 - H_GUEST_GET_CAPABILITIES which is used to query the capabilities
   of the API and the L2 guests it provides.
 - H_GUEST_SET_CAPABILITIES which is used to set the Guest API
   capabilities that the Host Partition supports and may use.

[amachhiw: support for p9 compat mode and return register bug fixes]

Signed-off-by: Michael Neuling 
Signed-off-by: Amit Machhiwal 
Signed-off-by: Harsh Prateek Bora 
---
 include/hw/ppc/spapr.h|   7 ++-
 include/hw/ppc/spapr_nested.h |  12 
 hw/ppc/spapr_nested.c | 112 ++
 3 files changed, 130 insertions(+), 1 deletion(-)

diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 0eb01ea6fd..2906d59137 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -364,6 +364,7 @@ struct SpaprMachineState {
 #define H_NOOP-63
 #define H_UNSUPPORTED -67
 #define H_OVERLAP -68
+#define H_STATE   -75
 #define H_UNSUPPORTED_FLAG -256
 #define H_MULTI_THREADS_ACTIVE -9005
 
@@ -583,8 +584,10 @@ struct SpaprMachineState {
 #define H_RPT_INVALIDATE0x448
 #define H_SCM_FLUSH 0x44C
 #define H_WATCHDOG  0x45C
+#define H_GUEST_GET_CAPABILITIES 0x460
+#define H_GUEST_SET_CAPABILITIES 0x464
 
-#define MAX_HCALL_OPCODEH_WATCHDOG
+#define MAX_HCALL_OPCODE H_GUEST_SET_CAPABILITIES
 
 /* The hcalls above are standardized in PAPR and implemented by pHyp
  * as well.
@@ -1033,5 +1036,7 @@ void spapr_watchdog_init(SpaprMachineState *spapr);
 void spapr_register_nested_hv(void);
 void spapr_unregister_nested_hv(void);
 void spapr_nested_reset(SpaprMachineState *spapr);
+void spapr_register_nested_papr(void);
+void spapr_unregister_nested_papr(void);
 
 #endif /* HW_SPAPR_H */
diff --git a/include/hw/ppc/spapr_nested.h b/include/hw/ppc/spapr_nested.h
index bf3a7b8d89..73687e03e4 100644
--- a/include/hw/ppc/spapr_nested.h
+++ b/include/hw/ppc/spapr_nested.h
@@ -7,8 +7,20 @@ typedef struct SpaprMachineStateNested {
 uint64_t ptcr;
 uint8_t api;
 #define NESTED_API_KVM_HV  1
+bool capabilities_set;
+uint32_t pvr_base;
 } SpaprMachineStateNested;
 
+/* Nested PAPR API related macros */
+#define H_GUEST_CAPABILITIES_COPY_MEM 0x8000
+#define H_GUEST_CAPABILITIES_P9_MODE  0x4000
+#define H_GUEST_CAPABILITIES_P10_MODE 0x2000
+#define H_GUEST_CAP_VALID_MASK(H_GUEST_CAPABILITIES_P10_MODE | \
+   H_GUEST_CAPABILITIES_P9_MODE)
+#define H_GUEST_CAP_COPY_MEM_BMAP 0
+#define H_GUEST_CAP_P9_MODE_BMAP  1
+#define H_GUEST_CAP_P10_MODE_BMAP 2
+
 /*
  * Register state for entering a nested guest with H_ENTER_NESTED.
  * New member must be added at the end.
diff --git a/hw/ppc/spapr_nested.c b/hw/ppc/spapr_nested.c
index 12fdbe2aba..601f669060 100644
--- a/hw/ppc/spapr_nested.c
+++ b/hw/ppc/spapr_nested.c
@@ -7,6 +7,7 @@
 #include "hw/ppc/spapr_cpu_core.h"
 #include "hw/ppc/spapr_nested.h"
 #include "mmu-book3s-v3.h"
+#include "cpu-models.h"
 
 void spapr_nested_reset(SpaprMachineState *spapr)
 {
@@ -16,6 +17,7 @@ void spapr_nested_reset(SpaprMachineState *spapr)
 spapr_register_nested_hv();
 } else {
 spapr->nested.api = 0;
+spapr->nested.capabilities_set = false;
 }
 }
 
@@ -432,6 +434,92 @@ void spapr_exit_nested(PowerPCCPU *cpu, int excp)
 }
 }
 
+static target_ulong h_guest_get_capabilities(PowerPCCPU *cpu,
+ SpaprMachineState *spapr,
+ target_ulong opcode,
+ target_ulong *args)
+{
+CPUPPCState *env = >env;
+target_ulong flags = args[0];
+
+if (flags) { /* don't handle any flags capabilities for now */
+return H_PARAMETER;
+}
+
+/* P10 capabilities */
+if (ppc_check_compat(cpu, CPU_POWERPC_LOGICAL_3_10, 0,
+spapr->max_compat_pvr)) {
+env->gpr[4] |= H_GUEST_CAPABILITIES_P10_MODE;
+}
+
+/* P9 capabilities */
+if (ppc_check_compat(cpu, CPU_POWERPC_LOGICAL_3_00, 0,
+spapr->max_compat_pvr)) {
+env->gpr[4] |= H_GUEST_CAPABILITIES_P9_MODE;
+}
+
+return H_SUCCESS;
+}
+
+static target_ulong h_guest_set_capabilities(PowerPCCPU *cpu,
+ SpaprMachineState *spapr,
+ target_ulong opcode,
+  target_ulong *args)
+{
+CPUPPCState *env = >env;
+target_ulong flags = args[0];
+target_ulong capabilities = args[1];
+env->gpr[4] = 0;
+
+if (flags) { /* don't handle any flags capabilities for now */
+return H_PARAMETER;
+}
+
+if (capabilities & H_GUEST_CAPABILITIES_COPY_MEM) {
+env->gpr[4] = 1;
+return H_P2; /* isn't supported */
+}

[PATCH v5 00/14] Nested PAPR API (KVM on PowerVM)

2024-03-08 Thread Harsh Prateek Bora

There is an existing Nested-HV API to enable nested guests on powernv
machines. However, that is not supported on pseries/PowerVM LPARs.
This patch series implements required hcall interfaces to enable nested
guests with KVM on PowerVM.
Unlike Nested-HV, with this API, entire L2 state is retained by L0
during guest entry/exit and uses pre-defined Guest State Buffer (GSB)
format to communicate guest state between L1 and L2 via L0.

L0 here refers to the phyp/PowerVM, or launching a Qemu TCG L0 with the
newly introduced option cap-nested-papr=true.
L1 refers to the LPAR host on PowerVM or Linux booted on Qemu TCG with
above mentioned option cap-nested-papr=true.
L2 refers to nested guest running on top of L1 using KVM.
No SW changes needed for Qemu running in L1 Linux as well as L2 Kernel.

Linux Kernel side support is already merged upstream:
---
commit 19d31c5f115754c369c0995df47479c384757f82
Author: Jordan Niethe 
Date:   Thu Sep 14 13:05:59 2023 +1000

KVM: PPC: Add support for nestedv2 guests
---
For more details, documentation can be referred in either of patch
series.

There are scripts available to assist in setting up an environment for
testing nested guests at https://github.com/iamjpn/kvm-powervm-test

A tree with this series is available at:
https://github.com/planetharsh/qemu/tree/upstream-0305-v5

Thanks to Michael Neuling, Shivaprasad Bhat, Amit Machhiwal, Kautuk
Consul, Vaibhav Jain and Jordan Niethe.

Changelog:
v5: addressed review comments from Nick on v4
v4: 
https://lore.kernel.org/qemu-devel/20240220083609.748325-1-hars...@linux.ibm.com/
v3: 
https://lore.kernel.org/qemu-devel/20240118052438.1475437-1-hars...@linux.ibm.com/
v2: 
https://lore.kernel.org/qemu-devel/20231012104951.194876-1-hars...@linux.ibm.com/
v1: 
https://lore.kernel.org/qemu-devel/2023090604.448244-1-hars...@linux.ibm.com/

Harsh Prateek Bora (14):
  spapr: nested: register nested-hv api hcalls only for cap-nested-hv
  spapr: nested: move nested part of spapr_get_pate into spapr_nested.c
  spapr: nested: Introduce SpaprMachineStateNested to store related
info.
  spapr: nested: keep nested-hv related code restricted to its API.
  spapr: nested: Document Nested PAPR API
  spapr: nested: Introduce H_GUEST_[GET|SET]_CAPABILITIES hcalls.
  spapr: nested: Introduce H_GUEST_[CREATE|DELETE] hcalls.
  spapr: nested: Introduce H_GUEST_CREATE_VCPU hcall.
  spapr: nested: Extend nested_ppc_state for nested PAPR API
  spapr: nested: Initialize the GSB elements lookup table.
  spapr: nested: Introduce H_GUEST_[GET|SET]_STATE hcalls.
  spapr: nested: Use correct source for parttbl info for nested PAPR
API.
  spapr: nested: Introduce H_GUEST_RUN_VCPU hcall.
  spapr: nested: Introduce cap-nested-papr for Nested PAPR API

 docs/devel/nested-papr.txt|  119 +++
 include/hw/ppc/spapr.h|   27 +-
 include/hw/ppc/spapr_nested.h |  428 -
 target/ppc/cpu.h  |4 +
 hw/ppc/ppc.c  |   10 +
 hw/ppc/spapr.c|   35 +-
 hw/ppc/spapr_caps.c   |   62 ++
 hw/ppc/spapr_hcall.c  |   24 +-
 hw/ppc/spapr_nested.c | 1550 -
 9 files changed, 2204 insertions(+), 55 deletions(-)
 create mode 100644 docs/devel/nested-papr.txt

-- 
2.39.3

[PATCH v5 12/14] spapr: nested: Use correct source for parttbl info for nested PAPR API.

2024-03-08 Thread Harsh Prateek Bora

For nested PAPR API, we use SpaprMachineStateNestedGuest struct to store
partition table info, use the same in spapr_get_pate_nested() via
helper.

Signed-off-by: Michael Neuling 
Signed-off-by: Harsh Prateek Bora 
---
 include/hw/ppc/spapr_nested.h |  4 
 hw/ppc/spapr.c|  6 --
 hw/ppc/spapr_nested.c | 22 +-
 3 files changed, 29 insertions(+), 3 deletions(-)

diff --git a/include/hw/ppc/spapr_nested.h b/include/hw/ppc/spapr_nested.h
index bd43c6b6ef..152019fe3d 100644
--- a/include/hw/ppc/spapr_nested.h
+++ b/include/hw/ppc/spapr_nested.h
@@ -518,4 +518,8 @@ bool spapr_get_pate_nested_hv(SpaprMachineState *spapr, 
PowerPCCPU *cpu,
   target_ulong lpid, ppc_v3_pate_t *entry);
 uint8_t spapr_nested_api(SpaprMachineState *spapr);
 void spapr_nested_gsb_init(void);
+bool spapr_get_pate_nested_papr(SpaprMachineState *spapr, PowerPCCPU *cpu,
+target_ulong lpid, ppc_v3_pate_t *entry);
+SpaprMachineStateNestedGuest *spapr_get_nested_guest(SpaprMachineState *spapr,
+ target_ulong lpid);
 #endif /* HW_SPAPR_NESTED_H */
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index a08ffe55b6..54fc01e462 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1376,11 +1376,13 @@ static bool spapr_get_pate(PPCVirtualHypervisor *vhyp, 
PowerPCCPU *cpu,
 entry->dw1 = spapr->patb_entry;
 return true;
 } else {
-assert(spapr_nested_api(spapr));
 if (spapr_nested_api(spapr) == NESTED_API_KVM_HV) {
 return spapr_get_pate_nested_hv(spapr, cpu, lpid, entry);
+} else if (spapr_nested_api(spapr) == NESTED_API_PAPR) {
+return spapr_get_pate_nested_papr(spapr, cpu, lpid, entry);
+} else {
+g_assert_not_reached();
 }
-return false;
 }
 }
 
diff --git a/hw/ppc/spapr_nested.c b/hw/ppc/spapr_nested.c
index ca99805ce8..e0b234c786 100644
--- a/hw/ppc/spapr_nested.c
+++ b/hw/ppc/spapr_nested.c
@@ -58,6 +58,21 @@ bool spapr_get_pate_nested_hv(SpaprMachineState *spapr, 
PowerPCCPU *cpu,
 return true;
 }
 
+bool spapr_get_pate_nested_papr(SpaprMachineState *spapr, PowerPCCPU *cpu,
+target_ulong lpid, ppc_v3_pate_t *entry)
+{
+SpaprMachineStateNestedGuest *guest;
+assert(lpid != 0);
+guest = spapr_get_nested_guest(spapr, lpid);
+if (!guest) {
+return false;
+}
+
+entry->dw0 = guest->parttbl[0];
+entry->dw1 = guest->parttbl[1];
+return true;
+}
+
 #define PRTS_MASK  0x1f
 
 static target_ulong h_set_ptbl(PowerPCCPU *cpu,
@@ -540,7 +555,6 @@ void spapr_exit_nested(PowerPCCPU *cpu, int excp)
 }
 }
 
-static
 SpaprMachineStateNestedGuest *spapr_get_nested_guest(SpaprMachineState *spapr,
  target_ulong guestid)
 {
@@ -1585,6 +1599,12 @@ bool spapr_get_pate_nested_hv(SpaprMachineState *spapr, 
PowerPCCPU *cpu,
 return false;
 }
 
+bool spapr_get_pate_nested_papr(SpaprMachineState *spapr, PowerPCCPU *cpu,
+target_ulong lpid, ppc_v3_pate_t *entry)
+{
+return false;
+}
+
 void spapr_register_nested_papr(void)
 {
 /* DO NOTHING */
-- 
2.39.3

[PATCH v5 14/14] spapr: nested: Introduce cap-nested-papr for Nested PAPR API

2024-03-08 Thread Harsh Prateek Bora

Introduce a SPAPR capability cap-nested-papr which enables nested PAPR
API for nested guests. This new API is to enable support for KVM on PowerVM
and the support in Linux kernel has already merged upstream.

Signed-off-by: Michael Neuling 
Signed-off-by: Harsh Prateek Bora 
---
 include/hw/ppc/spapr.h |  6 +++-
 hw/ppc/spapr.c |  2 ++
 hw/ppc/spapr_caps.c| 62 ++
 hw/ppc/spapr_nested.c  |  8 --
 4 files changed, 74 insertions(+), 4 deletions(-)

diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 6223873641..4aaf23d28f 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -81,8 +81,10 @@ typedef enum {
 #define SPAPR_CAP_RPT_INVALIDATE0x0B
 /* Support for AIL modes */
 #define SPAPR_CAP_AIL_MODE_30x0C
+/* Nested PAPR */
+#define SPAPR_CAP_NESTED_PAPR   0x0D
 /* Num Caps */
-#define SPAPR_CAP_NUM   (SPAPR_CAP_AIL_MODE_3 + 1)
+#define SPAPR_CAP_NUM   (SPAPR_CAP_NESTED_PAPR + 1)
 
 /*
  * Capability Values
@@ -592,6 +594,7 @@ struct SpaprMachineState {
 #define H_GUEST_CREATE_VCPU  0x474
 #define H_GUEST_GET_STATE0x478
 #define H_GUEST_SET_STATE0x47C
+#define H_GUEST_RUN_VCPU 0x480
 #define H_GUEST_DELETE   0x488
 
 #define MAX_HCALL_OPCODE H_GUEST_DELETE
@@ -996,6 +999,7 @@ extern const VMStateDescription vmstate_spapr_cap_sbbc;
 extern const VMStateDescription vmstate_spapr_cap_ibs;
 extern const VMStateDescription vmstate_spapr_cap_hpt_maxpagesize;
 extern const VMStateDescription vmstate_spapr_cap_nested_kvm_hv;
+extern const VMStateDescription vmstate_spapr_cap_nested_papr;
 extern const VMStateDescription vmstate_spapr_cap_large_decr;
 extern const VMStateDescription vmstate_spapr_cap_ccf_assist;
 extern const VMStateDescription vmstate_spapr_cap_fwnmi;
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 54fc01e462..beb23fae8f 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -2121,6 +2121,7 @@ static const VMStateDescription vmstate_spapr = {
 _spapr_cap_fwnmi,
 _spapr_fwnmi,
 _spapr_cap_rpt_invalidate,
+_spapr_cap_nested_papr,
 NULL
 }
 };
@@ -4687,6 +4688,7 @@ static void spapr_machine_class_init(ObjectClass *oc, 
void *data)
 smc->default_caps.caps[SPAPR_CAP_IBS] = SPAPR_CAP_WORKAROUND;
 smc->default_caps.caps[SPAPR_CAP_HPT_MAXPAGESIZE] = 16; /* 64kiB */
 smc->default_caps.caps[SPAPR_CAP_NESTED_KVM_HV] = SPAPR_CAP_OFF;
+smc->default_caps.caps[SPAPR_CAP_NESTED_PAPR] = SPAPR_CAP_OFF;
 smc->default_caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = SPAPR_CAP_ON;
 smc->default_caps.caps[SPAPR_CAP_CCF_ASSIST] = SPAPR_CAP_ON;
 smc->default_caps.caps[SPAPR_CAP_FWNMI] = SPAPR_CAP_ON;
diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c
index e889244e52..d6d5a6b8df 100644
--- a/hw/ppc/spapr_caps.c
+++ b/hw/ppc/spapr_caps.c
@@ -487,6 +487,58 @@ static void cap_nested_kvm_hv_apply(SpaprMachineState 
*spapr,
 error_append_hint(errp, "Try appending -machine cap-nested-hv=off "
 "or use threads=1 with -smp\n");
 }
+if (spapr_nested_api(spapr) &&
+spapr_nested_api(spapr) != NESTED_API_KVM_HV) {
+error_setg(errp, "Nested-HV APIs are mutually 
exclusive/incompatible");
+error_append_hint(errp, "Please use either cap-nested-hv or "
+"cap-nested-papr to proceed.\n");
+return;
+} else {
+spapr->nested.api = NESTED_API_KVM_HV;
+}
+}
+}
+
+static void cap_nested_papr_apply(SpaprMachineState *spapr,
+uint8_t val, Error **errp)
+{
+ERRP_GUARD();
+PowerPCCPU *cpu = POWERPC_CPU(first_cpu);
+CPUPPCState *env = >env;
+
+if (!val) {
+/* capability disabled by default */
+return;
+}
+
+if (tcg_enabled()) {
+if (!(env->insns_flags2 & PPC2_ISA300)) {
+error_setg(errp, "Nested-PAPR only supported on POWER9 and later");
+error_append_hint(errp,
+  "Try appending -machine cap-nested-papr=off\n");
+return;
+}
+if (spapr_nested_api(spapr) &&
+spapr_nested_api(spapr) != NESTED_API_PAPR) {
+error_setg(errp, "Nested-HV APIs are mutually 
exclusive/incompatible");
+error_append_hint(errp, "Please use either cap-nested-hv or "
+"cap-nested-papr to proceed.\n");
+return;
+} else {
+spapr->nested.api = NESTED_API_PAPR;
+}
+
+} else if (kvm_enabled()) {
+/*
+ * this gets executed in L1 qemu when L2 is launched,
+ * needs kvm-hv support in L1 kernel.
+

[PATCH v5 02/14] spapr: nested: move nested part of spapr_get_pate into spapr_nested.c

2024-03-08 Thread Harsh Prateek Bora

Most of the nested code has already been moved to spapr_nested.c
This logic inside spapr_get_pate is related to nested guests and
better suited for spapr_nested.c, hence moving there.

Signed-off-by: Harsh Prateek Bora 
Reviewed-by: Nicholas Piggin 
---
 include/hw/ppc/spapr_nested.h |  3 ++-
 hw/ppc/spapr.c| 28 ++-
 hw/ppc/spapr_nested.c | 36 +++
 3 files changed, 40 insertions(+), 27 deletions(-)

diff --git a/include/hw/ppc/spapr_nested.h b/include/hw/ppc/spapr_nested.h
index 09d95182b2..1df1ce14f6 100644
--- a/include/hw/ppc/spapr_nested.h
+++ b/include/hw/ppc/spapr_nested.h
@@ -96,5 +96,6 @@ struct nested_ppc_state {
 };
 
 void spapr_exit_nested(PowerPCCPU *cpu, int excp);
-
+bool spapr_get_pate_nested_hv(SpaprMachineState *spapr, PowerPCCPU *cpu,
+  target_ulong lpid, ppc_v3_pate_t *entry);
 #endif /* HW_SPAPR_NESTED_H */
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 0d3c740c5b..65d766b898 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1362,7 +1362,6 @@ void spapr_init_all_lpcrs(target_ulong value, 
target_ulong mask)
 }
 }
 
-
 static bool spapr_get_pate(PPCVirtualHypervisor *vhyp, PowerPCCPU *cpu,
target_ulong lpid, ppc_v3_pate_t *entry)
 {
@@ -1375,33 +1374,10 @@ static bool spapr_get_pate(PPCVirtualHypervisor *vhyp, 
PowerPCCPU *cpu,
 /* Copy PATE1:GR into PATE0:HR */
 entry->dw0 = spapr->patb_entry & PATE0_HR;
 entry->dw1 = spapr->patb_entry;
-
+return true;
 } else {
-uint64_t patb, pats;
-
-assert(lpid != 0);
-
-patb = spapr->nested_ptcr & PTCR_PATB;
-pats = spapr->nested_ptcr & PTCR_PATS;
-
-/* Check if partition table is properly aligned */
-if (patb & MAKE_64BIT_MASK(0, pats + 12)) {
-return false;
-}
-
-/* Calculate number of entries */
-pats = 1ull << (pats + 12 - 4);
-if (pats <= lpid) {
-return false;
-}
-
-/* Grab entry */
-patb += 16 * lpid;
-entry->dw0 = ldq_phys(CPU(cpu)->as, patb);
-entry->dw1 = ldq_phys(CPU(cpu)->as, patb + 8);
+return spapr_get_pate_nested_hv(spapr, cpu, lpid, entry);
 }
-
-return true;
 }
 
 #define HPTE(_table, _i)   (void *)(((uint64_t *)(_table)) + ((_i) * 2))
diff --git a/hw/ppc/spapr_nested.c b/hw/ppc/spapr_nested.c
index 8e0ee0d22f..f7888ca8bd 100644
--- a/hw/ppc/spapr_nested.c
+++ b/hw/ppc/spapr_nested.c
@@ -6,6 +6,7 @@
 #include "hw/ppc/spapr.h"
 #include "hw/ppc/spapr_cpu_core.h"
 #include "hw/ppc/spapr_nested.h"
+#include "mmu-book3s-v3.h"
 
 void spapr_nested_reset(SpaprMachineState *spapr)
 {
@@ -16,6 +17,35 @@ void spapr_nested_reset(SpaprMachineState *spapr)
 }
 
 #ifdef CONFIG_TCG
+
+bool spapr_get_pate_nested_hv(SpaprMachineState *spapr, PowerPCCPU *cpu,
+  target_ulong lpid, ppc_v3_pate_t *entry)
+{
+uint64_t patb, pats;
+
+assert(lpid != 0);
+
+patb = spapr->nested_ptcr & PTCR_PATB;
+pats = spapr->nested_ptcr & PTCR_PATS;
+
+/* Check if partition table is properly aligned */
+if (patb & MAKE_64BIT_MASK(0, pats + 12)) {
+return false;
+}
+
+/* Calculate number of entries */
+pats = 1ull << (pats + 12 - 4);
+if (pats <= lpid) {
+return false;
+}
+
+/* Grab entry */
+patb += 16 * lpid;
+entry->dw0 = ldq_phys(CPU(cpu)->as, patb);
+entry->dw1 = ldq_phys(CPU(cpu)->as, patb + 8);
+return true;
+}
+
 #define PRTS_MASK  0x1f
 
 static target_ulong h_set_ptbl(PowerPCCPU *cpu,
@@ -413,4 +443,10 @@ void spapr_unregister_nested_hv(void)
 {
 /* DO NOTHING */
 }
+
+bool spapr_get_pate_nested_hv(SpaprMachineState *spapr, PowerPCCPU *cpu,
+  target_ulong lpid, ppc_v3_pate_t *entry)
+{
+return false;
+}
 #endif
-- 
2.39.3

[PATCH v5 13/14] spapr: nested: Introduce H_GUEST_RUN_VCPU hcall.

2024-03-08 Thread Harsh Prateek Bora

The H_GUEST_RUN_VCPU hcall is used to start execution of a Guest VCPU.
The Hypervisor will update the state of the Guest VCPU based on the
input buffer, restore the saved Guest VCPU state, and start its
execution.

The Guest VCPU can stop running for numerous reasons including HCALLs,
hypervisor exceptions, or an outstanding Host Partition Interrupt.
The reason that the Guest VCPU stopped running is communicated through
R4 and the output buffer will be filled in with any relevant state.

Signed-off-by: Michael Neuling 
Signed-off-by: Harsh Prateek Bora 
---
 target/ppc/cpu.h  |   2 +
 hw/ppc/ppc.c  |  10 ++
 hw/ppc/spapr_nested.c | 334 ++
 3 files changed, 316 insertions(+), 30 deletions(-)

diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index 4cffd46c79..95b7c86eb3 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -1584,6 +1584,8 @@ uint64_t cpu_ppc_load_atbl(CPUPPCState *env);
 uint32_t cpu_ppc_load_atbu(CPUPPCState *env);
 void cpu_ppc_store_atbl(CPUPPCState *env, uint32_t value);
 void cpu_ppc_store_atbu(CPUPPCState *env, uint32_t value);
+void cpu_ppc_increase_tb_by_offset (CPUPPCState *env, int64_t offset);
+void cpu_ppc_decrease_tb_by_offset (CPUPPCState *env, int64_t offset);
 uint64_t cpu_ppc_load_vtb(CPUPPCState *env);
 void cpu_ppc_store_vtb(CPUPPCState *env, uint64_t value);
 bool ppc_decr_clear_on_delivery(CPUPPCState *env);
diff --git a/hw/ppc/ppc.c b/hw/ppc/ppc.c
index fadb8f5239..55860b9a83 100644
--- a/hw/ppc/ppc.c
+++ b/hw/ppc/ppc.c
@@ -633,6 +633,16 @@ void cpu_ppc_store_atbu (CPUPPCState *env, uint32_t value)
  ((uint64_t)value << 32) | tb);
 }
 
+void cpu_ppc_increase_tb_by_offset (CPUPPCState *env, int64_t offset)
+{
+env->tb_env->tb_offset += offset;
+}
+
+void cpu_ppc_decrease_tb_by_offset (CPUPPCState *env, int64_t offset)
+{
+env->tb_env->tb_offset -= offset;
+}
+
 uint64_t cpu_ppc_load_vtb(CPUPPCState *env)
 {
 ppc_tb_t *tb_env = env->tb_env;
diff --git a/hw/ppc/spapr_nested.c b/hw/ppc/spapr_nested.c
index e0b234c786..597dba7fdc 100644
--- a/hw/ppc/spapr_nested.c
+++ b/hw/ppc/spapr_nested.c
@@ -200,14 +200,28 @@ static void nested_save_state(struct nested_ppc_state 
*save, PowerPCCPU *cpu)
 save->sier = env->spr[SPR_POWER_SIER];
 save->vscr = ppc_get_vscr(env);
 save->fpscr = env->fpscr;
+} else if (spapr_nested_api(spapr) == NESTED_API_KVM_HV) {
+save->tb_offset = env->tb_env->tb_offset;
 }
+}
 
-save->tb_offset = env->tb_env->tb_offset;
+static void nested_post_load_state(CPUPPCState *env, CPUState *cs)
+{
+/*
+ * compute hflags and possible interrupts.
+ */
+hreg_compute_hflags(env);
+ppc_maybe_interrupt(env);
+/*
+ * Nested HV does not tag TLB entries between L1 and L2, so must
+ * flush on transition.
+ */
+tlb_flush(cs);
+env->reserve_addr = -1; /* Reset the reservation */
 }
 
 static void nested_load_state(PowerPCCPU *cpu, struct nested_ppc_state *load)
 {
-CPUState *cs = CPU(cpu);
 CPUPPCState *env = >env;
 SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
 
@@ -285,22 +299,9 @@ static void nested_load_state(PowerPCCPU *cpu, struct 
nested_ppc_state *load)
 env->spr[SPR_POWER_SIER] = load->sier;
 ppc_store_vscr(env, load->vscr);
 ppc_store_fpscr(env, load->fpscr);
+} else if (spapr_nested_api(spapr) == NESTED_API_KVM_HV) {
+env->tb_env->tb_offset = load->tb_offset;
 }
-
-env->tb_env->tb_offset = load->tb_offset;
-
-/*
- * MSR updated, compute hflags and possible interrupts.
- */
-hreg_compute_hflags(env);
-ppc_maybe_interrupt(env);
-
-/*
- * Nested HV does not tag TLB entries between L1 and L2, so must
- * flush on transition.
- */
-tlb_flush(cs);
-env->reserve_addr = -1; /* Reset the reservation */
 }
 
 /*
@@ -315,6 +316,7 @@ static target_ulong h_enter_nested(PowerPCCPU *cpu,
 {
 PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu);
 CPUPPCState *env = >env;
+CPUState *cs = CPU(cpu);
 SpaprCpuState *spapr_cpu = spapr_cpu_state(cpu);
 struct nested_ppc_state l2_state;
 target_ulong hv_ptr = args[0];
@@ -413,6 +415,7 @@ static target_ulong h_enter_nested(PowerPCCPU *cpu,
  * Switch to the nested guest environment and start the "hdec" timer.
  */
 nested_load_state(cpu, _state);
+nested_post_load_state(env, cs);
 
 hdec = hv_state.hdec_expiry - now;
 cpu_ppc_hdecr_init(env);
@@ -444,6 +447,7 @@ static target_ulong h_enter_nested(PowerPCCPU *cpu,
 static void spapr_exit_nested_hv(PowerPCCPU *cpu, int excp)
 {
 CPUPPCState *env = >env;
+CPUState *cs = CPU(cpu);
 SpaprCpuState *spapr_cpu = spapr_cpu_state(cpu);
 struct nested_ppc_state l2_state;
 target_ulong hv_ptr = spapr_cpu->nested_host_state->gpr[4];
@@ -

Re: [PATCH v4 15/15] spapr: nested: Set the PCR when logical PVR is set

2024-03-05 Thread Harsh Prateek Bora





On 2/27/24 15:53, Nicholas Piggin wrote:

On Tue Feb 20, 2024 at 6:36 PM AEST, Harsh Prateek Bora wrote:

From: Amit Machhiwal 

In APIv1, KVM L0 sets the PCR, while in the nested papr APIv2, this
doesn't work as the PCR can't be set via the guest state buffer; the
logical PVR is set via the GSB though.

This change sets the PCR whenever the logical PVR is set via the GSB.
Also, unlike the other registers, the value 1 in a defined bit in the
PCR makes the affected resources unavailable and the value 0 makes
them available. Hence, the PCR is set accordingly.


Should this be squashed in as a fix?


Yeh, it can be squashed with 10/15 GSB initialization patch, will update 
as suggested in v5.


regards,
Harsh


Thanks,
Nick



Signed-off-by: Amit Machhiwal 
Signed-off-by: Harsh Prateek Bora 
---
  include/hw/ppc/spapr_nested.h |  9 +
  hw/ppc/spapr_nested.c | 24 
  2 files changed, 33 insertions(+)

diff --git a/include/hw/ppc/spapr_nested.h b/include/hw/ppc/spapr_nested.h
index da918d2dd0..f67c721f53 100644
--- a/include/hw/ppc/spapr_nested.h
+++ b/include/hw/ppc/spapr_nested.h
@@ -229,6 +229,15 @@ typedef struct SpaprMachineStateNestedGuest {
  #define GUEST_STATE_REQUEST_GUEST_WIDE   0x1
  #define GUEST_STATE_REQUEST_SET  0x2
  
+/* As per ISA v3.1B, following bits are reserved:

+ *  0:2
+ *  4:57  (ISA mentions bit 58 as well but it should be used for P10)
+ *  61:63 (hence, haven't included PCR bits for v2.06 and v2.05
+ * in LOW BITS)
+ */
+#define PCR_LOW_BITS   (PCR_COMPAT_3_10 | PCR_COMPAT_3_00)
+#define HVMASK_PCR ~PCR_LOW_BITS
+
  #define GUEST_STATE_ELEMENT(i, sz, s, f, ptr, c) { \
  .id = (i), \
  .size = (sz),  \
diff --git a/hw/ppc/spapr_nested.c b/hw/ppc/spapr_nested.c
index 6e6a90616e..af8a482337 100644
--- a/hw/ppc/spapr_nested.c
+++ b/hw/ppc/spapr_nested.c
@@ -740,9 +740,11 @@ static void out_buf_min_size(void *a, void *b, bool set)
  
  static void copy_logical_pvr(void *a, void *b, bool set)

  {
+SpaprMachineStateNestedGuest *guest;
  uint32_t *buf; /* 1 word */
  uint32_t *pvr_logical_ptr;
  uint32_t pvr_logical;
+target_ulong pcr = 0;
  
  pvr_logical_ptr = a;

  buf = b;
@@ -755,6 +757,28 @@ static void copy_logical_pvr(void *a, void *b, bool set)
  pvr_logical = be32_to_cpu(buf[0]);
  
  *pvr_logical_ptr = pvr_logical;

+
+if (*pvr_logical_ptr) {
+switch (*pvr_logical_ptr) {
+case CPU_POWERPC_LOGICAL_3_10:
+pcr = PCR_COMPAT_3_10 | PCR_COMPAT_3_00;
+break;
+case CPU_POWERPC_LOGICAL_3_00:
+pcr = PCR_COMPAT_3_00;
+break;
+default:
+qemu_log_mask(LOG_GUEST_ERROR,
+"Could not set PCR for LPVR=0x%08x\n", *pvr_logical_ptr);
+return;
+}
+}
+
+guest = container_of(pvr_logical_ptr,
+ struct SpaprMachineStateNestedGuest,
+ pvr_logical);
+for (int i = 0; i < guest->vcpus; i++) {
+guest->vcpu[i].state.pcr = ~pcr | HVMASK_PCR;
+}
  }
  
  static void copy_tb_offset(void *a, void *b, bool set)

Re: [PATCH v4 14/15] spapr: nested: Introduce cap-nested-papr for Nested PAPR API

2024-03-05 Thread Harsh Prateek Bora





On 2/27/24 15:52, Nicholas Piggin wrote:

On Tue Feb 20, 2024 at 6:36 PM AEST, Harsh Prateek Bora wrote:

Introduce a SPAPR capability cap-nested-papr which enables nested PAPR
API for nested guests. This new API is to enable support for KVM on PowerVM
and the support in Linux kernel has already merged upstream.

Signed-off-by: Michael Neuling 
Signed-off-by: Harsh Prateek Bora 
---
  include/hw/ppc/spapr.h |  6 -
  hw/ppc/spapr.c |  2 ++
  hw/ppc/spapr_caps.c| 56 ++
  hw/ppc/spapr_nested.c  | 19 --
  4 files changed, 80 insertions(+), 3 deletions(-)

diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 036a7db2bc..1b1d37123a 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -81,8 +81,10 @@ typedef enum {
  #define SPAPR_CAP_RPT_INVALIDATE0x0B
  /* Support for AIL modes */
  #define SPAPR_CAP_AIL_MODE_30x0C
+/* Nested PAPR */
+#define SPAPR_CAP_NESTED_PAPR   0x0D
  /* Num Caps */
-#define SPAPR_CAP_NUM   (SPAPR_CAP_AIL_MODE_3 + 1)
+#define SPAPR_CAP_NUM   (SPAPR_CAP_NESTED_PAPR + 1)
  
  /*

   * Capability Values
@@ -994,6 +996,7 @@ extern const VMStateDescription vmstate_spapr_cap_sbbc;
  extern const VMStateDescription vmstate_spapr_cap_ibs;
  extern const VMStateDescription vmstate_spapr_cap_hpt_maxpagesize;
  extern const VMStateDescription vmstate_spapr_cap_nested_kvm_hv;
+extern const VMStateDescription vmstate_spapr_cap_nested_papr;
  extern const VMStateDescription vmstate_spapr_cap_large_decr;
  extern const VMStateDescription vmstate_spapr_cap_ccf_assist;
  extern const VMStateDescription vmstate_spapr_cap_fwnmi;
@@ -1041,5 +1044,6 @@ void spapr_watchdog_init(SpaprMachineState *spapr);
  void spapr_register_nested_hv(void);
  void spapr_unregister_nested_hv(void);
  void spapr_register_nested_papr(void);
+void spapr_unregister_nested_papr(void);
  
  #endif /* HW_SPAPR_H */

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 3453b30a57..cb556ae6a8 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -2120,6 +2120,7 @@ static const VMStateDescription vmstate_spapr = {
  _spapr_cap_fwnmi,
  _spapr_fwnmi,
  _spapr_cap_rpt_invalidate,
+_spapr_cap_nested_papr,
  NULL
  }
  };
@@ -4688,6 +4689,7 @@ static void spapr_machine_class_init(ObjectClass *oc, 
void *data)
  smc->default_caps.caps[SPAPR_CAP_IBS] = SPAPR_CAP_WORKAROUND;
  smc->default_caps.caps[SPAPR_CAP_HPT_MAXPAGESIZE] = 16; /* 64kiB */
  smc->default_caps.caps[SPAPR_CAP_NESTED_KVM_HV] = SPAPR_CAP_OFF;
+smc->default_caps.caps[SPAPR_CAP_NESTED_PAPR] = SPAPR_CAP_OFF;
  smc->default_caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = SPAPR_CAP_ON;
  smc->default_caps.caps[SPAPR_CAP_CCF_ASSIST] = SPAPR_CAP_ON;
  smc->default_caps.caps[SPAPR_CAP_FWNMI] = SPAPR_CAP_ON;
diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c
index 721ddad23b..9a29ce1872 100644
--- a/hw/ppc/spapr_caps.c
+++ b/hw/ppc/spapr_caps.c
@@ -487,12 +487,58 @@ static void cap_nested_kvm_hv_apply(SpaprMachineState 
*spapr,
  error_append_hint(errp, "Try appending -machine cap-nested-hv=off 
"
  "or use threads=1 with -smp\n");
  }
+if (spapr->nested.api) {
+warn_report("nested.api already set as %d, re-init to kvm-hv",
+spapr->nested.api);
+}


Does this warning trigger when you reset the machine?

It's trying to catch both caps enabled? I would make that an error and
fail and tell user to enable only one or the other.

(In a future patch I think we should try permit both to be enabled at
the same time, but for now restricting it is fine)


Yeh, we had kept it mutually exclusive initially in v1, and looks like 
we want it to be exclusive for now. Future possibilities can be explored 
later as suggested.






  spapr->nested.api = NESTED_API_KVM_HV;
  spapr_unregister_nested_hv(); /* reset across reboots */
  spapr_register_nested_hv();
  }
  }
  
+static void cap_nested_papr_apply(SpaprMachineState *spapr,

+uint8_t val, Error **errp)
+{
+ERRP_GUARD();
+PowerPCCPU *cpu = POWERPC_CPU(first_cpu);
+CPUPPCState *env = >env;
+
+if (!val) {
+/* capability disabled by default */
+return;
+}
+
+if (tcg_enabled()) {
+if (!(env->insns_flags2 & PPC2_ISA300)) {
+error_setg(errp, "Nested-PAPR only supported on POWER9 and later");
+error_append_hint(errp,
+  "Try appending -machine cap-nested-papr=off\n");
+return;
+}
+} else if (kvm_enabled()) {
+/*
+ * this gets executed in L1 qemu when L2 is launched,
+ * needs kvm-hv support in L1 kernel.

Re: [PATCH v4 12/15] spapr: nested: Use correct source for parttbl info for nested PAPR API.

2024-03-05 Thread Harsh Prateek Bora


Hi Nick,

On 2/27/24 15:46, Nicholas Piggin wrote:

On Tue Feb 20, 2024 at 6:36 PM AEST, Harsh Prateek Bora wrote:

For nested PAPR API, we use SpaprMachineStateNestedGuest struct to store
partition table info, use the same in spapr_get_pate_nested() via
helper.

Signed-off-by: Michael Neuling 
Signed-off-by: Harsh Prateek Bora 
---
  include/hw/ppc/spapr_nested.h |  4 
  hw/ppc/spapr.c|  2 ++
  hw/ppc/spapr_nested.c | 20 +++-
  3 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/include/hw/ppc/spapr_nested.h b/include/hw/ppc/spapr_nested.h
index 1b7e55f12a..da918d2dd0 100644
--- a/include/hw/ppc/spapr_nested.h
+++ b/include/hw/ppc/spapr_nested.h
@@ -511,4 +511,8 @@ bool spapr_get_pate_nested_hv(SpaprMachineState *spapr, 
PowerPCCPU *cpu,
  void spapr_nested_init(SpaprMachineState *spapr);
  uint8_t spapr_nested_api(SpaprMachineState *spapr);
  void spapr_nested_gsb_init(void);
+bool spapr_get_pate_nested_papr(SpaprMachineState *spapr, PowerPCCPU *cpu,
+target_ulong lpid, ppc_v3_pate_t *entry);
+SpaprMachineStateNestedGuest *spapr_get_nested_guest(SpaprMachineState *spapr,
+ target_ulong lpid);
  #endif /* HW_SPAPR_NESTED_H */
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 51a1be027a..3453b30a57 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1379,6 +1379,8 @@ static bool spapr_get_pate(PPCVirtualHypervisor *vhyp, 
PowerPCCPU *cpu,
  assert(spapr_nested_api(spapr));
  if (spapr_nested_api(spapr) == NESTED_API_KVM_HV) {
  return spapr_get_pate_nested_hv(spapr, cpu, lpid, entry);
+} else if (spapr_nested_api(spapr) == NESTED_API_PAPR) {
+return spapr_get_pate_nested_papr(spapr, cpu, lpid, entry);
  }
  return false;
  }


BTW. I would change these asserts to } else { g_assert_not_reached(); }


Sure, updating as suggested.




diff --git a/hw/ppc/spapr_nested.c b/hw/ppc/spapr_nested.c
index aba4b25da6..0edb362709 100644
--- a/hw/ppc/spapr_nested.c
+++ b/hw/ppc/spapr_nested.c
@@ -52,6 +52,19 @@ bool spapr_get_pate_nested_hv(SpaprMachineState *spapr, 
PowerPCCPU *cpu,
  return true;
  }
  
+bool spapr_get_pate_nested_papr(SpaprMachineState *spapr, PowerPCCPU *cpu,

+target_ulong lpid, ppc_v3_pate_t *entry)
+{
+SpaprMachineStateNestedGuest *guest;
+assert(lpid != 0);
+guest = spapr_get_nested_guest(spapr, lpid);
+assert(guest != NULL);
+
+entry->dw0 = guest->parttbl[0];
+entry->dw1 = guest->parttbl[1];
+return true;
+}


Asserts should not need to be changed to proper error handling, right?


Hmm, changing !guest check to return false as more appropriate.
lpid check shall remain an assert.

regards,
Harsh



Thanks,
Nick

Re: [PATCH v4 11/15] spapr: nested: Introduce H_GUEST_[GET|SET]_STATE hcalls.

2024-03-05 Thread Harsh Prateek Bora


Hi Nick,

On 2/27/24 15:40, Nicholas Piggin wrote:

On Tue Feb 20, 2024 at 6:36 PM AEST, Harsh Prateek Bora wrote:

Introduce the nested PAPR hcalls:
 - H_GUEST_GET_STATE which is used to get state of a nested guest or
   a guest VCPU. The value field for each element in the request is
   ignored and on success, will be updated to reflect current state.


This is a bit hard to parse. The value fields are destinations for
values to be stored (from the point of view of the caller), which is
familiar to most get or read type APIs.

I don't think you need to say it's ignored. The value it contains is
ignored and overwritten, but the field itself is not actually ignored :)

Patch looks okay though.


Sure, updating the commit log to make it more clear.




 - H_GUEST_SET_STATE which is used to modify the state of a guest or
   a guest VCPU. On success, guest (or its VCPU) state shall be
   updated as per the value field for the requested element(s).

Signed-off-by: Michael Neuling 
Signed-off-by: Harsh Prateek Bora 
---
  include/hw/ppc/spapr.h|   3 +
  include/hw/ppc/spapr_nested.h |  23 +++
  hw/ppc/spapr_nested.c | 267 ++
  3 files changed, 293 insertions(+)

diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 82b077bdd2..aabc32f268 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -366,6 +366,7 @@ struct SpaprMachineState {
  #define H_OVERLAP -68
  #define H_STATE   -75
  #define H_IN_USE  -77
+#define H_INVALID_ELEMENT_VALUE-81
  #define H_UNSUPPORTED_FLAG -256
  #define H_MULTI_THREADS_ACTIVE -9005
  
@@ -589,6 +590,8 @@ struct SpaprMachineState {

  #define H_GUEST_SET_CAPABILITIES 0x464
  #define H_GUEST_CREATE   0x470
  #define H_GUEST_CREATE_VCPU  0x474
+#define H_GUEST_GET_STATE0x478
+#define H_GUEST_SET_STATE0x47C
  #define H_GUEST_DELETE   0x488
  
  #define MAX_HCALL_OPCODE H_GUEST_DELETE

diff --git a/include/hw/ppc/spapr_nested.h b/include/hw/ppc/spapr_nested.h
index 492302a21d..1b7e55f12a 100644
--- a/include/hw/ppc/spapr_nested.h
+++ b/include/hw/ppc/spapr_nested.h
@@ -224,6 +224,10 @@ typedef struct SpaprMachineStateNestedGuest {
  #define HVMASK_MSR0xEBBFEFFF
  #define HVMASK_HDEXCR 0x
  #define HVMASK_TB_OFFSET  0x00FF
+#define GSB_MAX_BUF_SIZE  (1024 * 1024)
+#define H_GUEST_GETSET_STATE_FLAG_GUEST_WIDE 0x8000
+#define GUEST_STATE_REQUEST_GUEST_WIDE   0x1
+#define GUEST_STATE_REQUEST_SET  0x2
  
  #define GUEST_STATE_ELEMENT(i, sz, s, f, ptr, c) { \

  .id = (i), \
@@ -312,6 +316,25 @@ typedef struct SpaprMachineStateNestedGuest {
  #define GSE_ENV_DWM(i, f, m) \
  GUEST_STATE_ELEMENT_MSK(i, 8, f, copy_state_8to8, m)
  
+struct guest_state_element {

+uint16_t id;
+uint16_t size;
+uint8_t value[];
+} QEMU_PACKED;
+
+struct guest_state_buffer {
+uint32_t num_elements;
+struct guest_state_element elements[];
+} QEMU_PACKED;
+
+/* Actual buffer plus some metadata about the request */
+struct guest_state_request {
+struct guest_state_buffer *gsb;
+int64_t buf;
+int64_t len;
+uint16_t flags;
+};
+
  /*
   * Register state for entering a nested guest with H_ENTER_NESTED.
   * New member must be added at the end.
diff --git a/hw/ppc/spapr_nested.c b/hw/ppc/spapr_nested.c
index faba27dd50..aba4b25da6 100644
--- a/hw/ppc/spapr_nested.c
+++ b/hw/ppc/spapr_nested.c
@@ -8,6 +8,7 @@
  #include "hw/ppc/spapr_nested.h"
  #include "mmu-book3s-v3.h"
  #include "cpu-models.h"
+#include "qemu/log.h"
  
  void spapr_nested_init(SpaprMachineState *spapr)

  {
@@ -999,6 +1000,140 @@ void spapr_nested_gsb_init(void)
  }
  }
  
+static struct guest_state_element *guest_state_element_next(

+struct guest_state_element *element,
+int64_t *len,
+int64_t *num_elements)
+{
+uint16_t size;
+
+/* size is of element->value[] only. Not whole guest_state_element */
+size = be16_to_cpu(element->size);
+
+if (len) {
+*len -= size + offsetof(struct guest_state_element, value);
+}
+
+if (num_elements) {
+*num_elements -= 1;
+}
+
+return (struct guest_state_element *)(element->value + size);
+}
+
+static
+struct guest_state_element_type *guest_state_element_type_find(uint16_t id)
+{
+int i;
+
+for (i = 0; i < ARRAY_SIZE(guest_state_element_types); i++)
+if (id == guest_state_element_types[i].id) {
+return _state_element_types[i];
+}
+
+return NULL;
+}
+
+static void log_element(struct guest_state_element *element,
+struct guest_state_request *gsr)
+{
+qemu_log_mask(LOG_GUEST_ERROR, "h_guest_%s_state id:0x%04x size:0x%04x",
+

Re: [PATCH v4 10/15] spapr: nested: Initialize the GSB elements lookup table.

2024-02-29 Thread Harsh Prateek Bora





On 2/27/24 15:32, Nicholas Piggin wrote:

On Tue Feb 20, 2024 at 6:36 PM AEST, Harsh Prateek Bora wrote:

Nested PAPR API provides a standard Guest State Buffer (GSB) format
with unique IDs for each guest state element for which get/set state is
supported by the API. Some of the elements are read-only and/or guest-wide.
Introducing helper routines for state exchange of each of the nested guest
state elements for which get/set state should be supported by the API.



This is doing more than just adding helper routines for the GSB access.


Yes, some of the GSB elements are also introduced along with respective 
helpers.




[snip]


+
  typedef struct SpaprMachineStateNested {
  uint64_t ptcr;
  uint8_t api;
@@ -16,6 +201,8 @@ typedef struct SpaprMachineStateNested {
  typedef struct SpaprMachineStateNestedGuest {
  uint32_t pvr_logical;
  unsigned long vcpus;
+uint64_t parttbl[2];
+uint64_t tb_offset;
  struct SpaprMachineStateNestedGuestVcpu *vcpu;
  } SpaprMachineStateNestedGuest;
  

[snip]

  
  /*

   * Register state for entering a nested guest with H_ENTER_NESTED.
@@ -172,17 +452,40 @@ struct nested_ppc_state {
  uint64_t sier;
  uint32_t vscr;
  uint64_t fpscr;
+int64_t dec_expiry_tb;
+};
+
+struct SpaprMachineStateNestedGuestVcpuRunBuf {
+uint64_t addr;
+uint64_t size;
  };
  
  typedef struct SpaprMachineStateNestedGuestVcpu {

  bool enabled;
  struct nested_ppc_state state;
+struct SpaprMachineStateNestedGuestVcpuRunBuf runbufin;
+struct SpaprMachineStateNestedGuestVcpuRunBuf runbufout;
+int64_t tb_offset;
+uint64_t hdecr_expiry_tb;
  } SpaprMachineStateNestedGuestVcpu;


It's adding new fields in existing nested guest state
structures. This should be explained a bit more, split into
another patch, or moved to patches where they get used.


Yes, these new fields are actually representing GSB elements.
These elements were explained in the documentation patch which shall now
point to the documentation in the kernel docs as suggested earlier.
Let me know if we need to document additionally in this patch commit log
also.

regards,
Harsh




Thanks,
Nick

Re: [PATCH v4 09/15] spapr: nested: Extend nested_ppc_state for nested PAPR API

2024-02-29 Thread Harsh Prateek Bora





On 2/27/24 15:29, Nicholas Piggin wrote:

On Tue Feb 20, 2024 at 6:36 PM AEST, Harsh Prateek Bora wrote:

Currently, nested_ppc_state stores a certain set of registers and works
with nested_[load|save]_state() for state transfer as reqd for nested-hv API.
Extending these with additional registers state as reqd for nested PAPR API.

Signed-off-by: Harsh Prateek Bora 
Suggested-by: Nicholas Piggin 
---
  include/hw/ppc/spapr_nested.h |  49 
  target/ppc/cpu.h  |   2 +
  hw/ppc/spapr_nested.c | 106 ++
  3 files changed, 157 insertions(+)

diff --git a/include/hw/ppc/spapr_nested.h b/include/hw/ppc/spapr_nested.h
index 24e87bca08..a3b61eb79a 100644
--- a/include/hw/ppc/spapr_nested.h
+++ b/include/hw/ppc/spapr_nested.h
@@ -7,6 +7,7 @@ typedef struct SpaprMachineStateNested {
  uint64_t ptcr;
  uint8_t api;
  #define NESTED_API_KVM_HV  1
+#define NESTED_API_PAPR2
  bool capabilities_set;
  uint32_t pvr_base;
  GHashTable *guests;
@@ -123,6 +124,54 @@ struct nested_ppc_state {
  int64_t tb_offset;
  /* Nested PAPR API */
  uint64_t pvr;
+uint64_t amor;
+uint64_t dawr0;
+uint64_t dawrx0;
+uint64_t ciabr;
+uint64_t purr;
+uint64_t spurr;
+uint64_t ic;
+uint64_t vtb;
+uint64_t hdar;
+uint64_t hdsisr;
+uint64_t heir;
+uint64_t asdr;
+uint64_t dawr1;
+uint64_t dawrx1;
+uint64_t dexcr;
+uint64_t hdexcr;
+uint64_t hashkeyr;
+uint64_t hashpkeyr;
+ppc_vsr_t vsr[64] QEMU_ALIGNED(16);
+uint64_t ebbhr;
+uint64_t tar;
+uint64_t ebbrr;
+uint64_t bescr;
+uint64_t iamr;
+uint64_t amr;
+uint64_t uamor;
+uint64_t dscr;
+uint64_t fscr;
+uint64_t pspb;
+uint64_t ctrl;
+uint64_t vrsave;
+uint64_t dar;
+uint64_t dsisr;
+uint64_t pmc1;
+uint64_t pmc2;
+uint64_t pmc3;
+uint64_t pmc4;
+uint64_t pmc5;
+uint64_t pmc6;
+uint64_t mmcr0;
+uint64_t mmcr1;
+uint64_t mmcr2;
+uint64_t mmcra;
+uint64_t sdar;
+uint64_t siar;
+uint64_t sier;
+uint32_t vscr;
+uint64_t fpscr;
  };
  
  typedef struct SpaprMachineStateNestedGuestVcpu {

diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index a44de22ca4..11205bb9e3 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -1737,9 +1737,11 @@ void ppc_compat_add_property(Object *obj, const char 
*name,
  #define SPR_PSPB  (0x09F)
  #define SPR_DPDES (0x0B0)
  #define SPR_DAWR0 (0x0B4)
+#define SPR_DAWR1 (0x0B5)
  #define SPR_RPR   (0x0BA)
  #define SPR_CIABR (0x0BB)
  #define SPR_DAWRX0(0x0BC)
+#define SPR_DAWRX1(0x0BD)
  #define SPR_HFSCR (0x0BE)
  #define SPR_VRSAVE(0x100)
  #define SPR_USPRG0(0x100)


Might try to put the DAWR1 enable ahead of this, but if not we'll have
to drop these until that is done. Leave it in for now I'll sort it out
if necessary.


Ok




diff --git a/hw/ppc/spapr_nested.c b/hw/ppc/spapr_nested.c
index 3cc704adda..39d0c087f1 100644
--- a/hw/ppc/spapr_nested.c
+++ b/hw/ppc/spapr_nested.c
@@ -101,6 +101,7 @@ static target_ulong h_copy_tofrom_guest(PowerPCCPU *cpu,
  static void nested_save_state(struct nested_ppc_state *save, PowerPCCPU *cpu)
  {
  CPUPPCState *env = >env;
+SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());


I hope that won't be a big overhead... nested entry/exit performance probably
isn't top priority at the moment though, so for now okay. Should be
something to look at though.


Hmm.. Ok





  memcpy(save->gpr, env->gpr, sizeof(save->gpr));
  
@@ -127,6 +128,58 @@ static void nested_save_state(struct nested_ppc_state *save, PowerPCCPU *cpu)

  save->pidr = env->spr[SPR_BOOKS_PID];
  save->ppr = env->spr[SPR_PPR];
  
+if (spapr_nested_api(spapr) == NESTED_API_PAPR) {

+save->pvr = env->spr[SPR_PVR];
+save->amor = env->spr[SPR_AMOR];
+save->dawr0 = env->spr[SPR_DAWR0];
+save->dawrx0 = env->spr[SPR_DAWRX0];
+save->ciabr = env->spr[SPR_CIABR];
+save->purr = env->spr[SPR_PURR];
+save->spurr = env->spr[SPR_SPURR];
+save->ic = env->spr[SPR_IC];
+save->vtb = env->spr[SPR_VTB];
+save->hdar = env->spr[SPR_HDAR];
+save->hdsisr = env->spr[SPR_HDSISR];
+save->heir = env->spr[SPR_HEIR];
+save->asdr = env->spr[SPR_ASDR];
+save->dawr1 = env->spr[SPR_DAWR1];
+save->dawrx1 = env->spr[SPR_DAWRX1];
+save->dexcr = env->spr[SPR_DEXCR];
+save->hdexcr = env->spr[SPR_HDEXCR];
+save->hashkeyr = env->spr[SPR_HASHKEYR];
+save->hashpkeyr = env->spr[SPR_HASHPKEYR];
+memcpy(save->vsr, env->vsr, sizeof(save->vsr));
+sav

Re: [PATCH v4 08/15] spapr: nested: Introduce H_GUEST_CREATE_VCPU hcall.

2024-02-29 Thread Harsh Prateek Bora





On 2/27/24 15:21, Nicholas Piggin wrote:

On Tue Feb 20, 2024 at 6:36 PM AEST, Harsh Prateek Bora wrote:

Introduce the nested PAPR hcall H_GUEST_CREATE_VCPU which is used to
create and initialize the specified VCPU resource for the previously
created guest. Each guest can have multiple VCPUs upto max 2048.
All VCPUs for a guest gets deallocated on guest delete.

Signed-off-by: Michael Neuling 
Signed-off-by: Harsh Prateek Bora 
---
  include/hw/ppc/spapr.h|  2 +
  include/hw/ppc/spapr_nested.h | 10 
  hw/ppc/spapr_nested.c | 96 +++
  3 files changed, 108 insertions(+)

diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index c4a79a1785..82b077bdd2 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -365,6 +365,7 @@ struct SpaprMachineState {
  #define H_UNSUPPORTED -67
  #define H_OVERLAP -68
  #define H_STATE   -75
+#define H_IN_USE  -77
  #define H_UNSUPPORTED_FLAG -256
  #define H_MULTI_THREADS_ACTIVE -9005
  
@@ -587,6 +588,7 @@ struct SpaprMachineState {

  #define H_GUEST_GET_CAPABILITIES 0x460
  #define H_GUEST_SET_CAPABILITIES 0x464
  #define H_GUEST_CREATE   0x470
+#define H_GUEST_CREATE_VCPU  0x474
  #define H_GUEST_DELETE   0x488
  
  #define MAX_HCALL_OPCODE H_GUEST_DELETE

diff --git a/include/hw/ppc/spapr_nested.h b/include/hw/ppc/spapr_nested.h
index f282479275..24e87bca08 100644
--- a/include/hw/ppc/spapr_nested.h
+++ b/include/hw/ppc/spapr_nested.h
@@ -14,6 +14,8 @@ typedef struct SpaprMachineStateNested {
  
  typedef struct SpaprMachineStateNestedGuest {

  uint32_t pvr_logical;
+unsigned long vcpus;
+struct SpaprMachineStateNestedGuestVcpu *vcpu;
  } SpaprMachineStateNestedGuest;
  
  /* Nested PAPR API related macros */

@@ -27,6 +29,7 @@ typedef struct SpaprMachineStateNestedGuest {
  #define H_GUEST_CAP_P10_MODE_BMAP 2
  #define PAPR_NESTED_GUEST_MAX 4096
  #define H_GUEST_DELETE_ALL_FLAG   0x8000ULL
+#define PAPR_NESTED_GUEST_VCPU_MAX2048
  
  /*

   * Register state for entering a nested guest with H_ENTER_NESTED.
@@ -118,8 +121,15 @@ struct nested_ppc_state {
  uint64_t ppr;
  
  int64_t tb_offset;

+/* Nested PAPR API */
+uint64_t pvr;
  };
  
+typedef struct SpaprMachineStateNestedGuestVcpu {

+bool enabled;
+struct nested_ppc_state state;
+} SpaprMachineStateNestedGuestVcpu;
+
  void spapr_exit_nested(PowerPCCPU *cpu, int excp);
  typedef struct SpaprMachineState SpaprMachineState;
  bool spapr_get_pate_nested_hv(SpaprMachineState *spapr, PowerPCCPU *cpu,
diff --git a/hw/ppc/spapr_nested.c b/hw/ppc/spapr_nested.c
index 09c4a35908..3cc704adda 100644
--- a/hw/ppc/spapr_nested.c
+++ b/hw/ppc/spapr_nested.c
@@ -428,6 +428,41 @@ void spapr_exit_nested(PowerPCCPU *cpu, int excp)
  }
  }
  
+static

+SpaprMachineStateNestedGuest *spapr_get_nested_guest(SpaprMachineState *spapr,
+ target_ulong guestid)
+{
+SpaprMachineStateNestedGuest *guest;
+
+guest = g_hash_table_lookup(spapr->nested.guests, 
GINT_TO_POINTER(guestid));
+return guest;
+}
+
+static bool spapr_nested_vcpu_check(SpaprMachineStateNestedGuest *guest,
+target_ulong vcpuid)
+{
+struct SpaprMachineStateNestedGuestVcpu *vcpu;
+/*
+ * Perform sanity checks for the provided vcpuid of a guest.
+ * For now, ensure its valid, allocated and enabled for use.
+ */
+
+if (vcpuid >= PAPR_NESTED_GUEST_VCPU_MAX) {
+return false;
+}
+
+if (!(vcpuid < guest->vcpus)) {
+return false;
+}
+
+vcpu = >vcpu[vcpuid];
+if (!vcpu->enabled) {
+return false;
+}
+
+return true;
+}
+
  static target_ulong h_guest_get_capabilities(PowerPCCPU *cpu,
   SpaprMachineState *spapr,
   target_ulong opcode,
@@ -518,6 +553,7 @@ static void
  destroy_guest_helper(gpointer value)
  {
  struct SpaprMachineStateNestedGuest *guest = value;
+g_free(guest->vcpu);
  g_free(guest);
  }
  
@@ -613,6 +649,65 @@ static target_ulong h_guest_delete(PowerPCCPU *cpu,

  return H_SUCCESS;
  }
  
+static target_ulong h_guest_create_vcpu(PowerPCCPU *cpu,

+SpaprMachineState *spapr,
+target_ulong opcode,
+target_ulong *args)
+{
+CPUPPCState *env = >env;
+struct nested_ppc_state *l2_state;
+target_ulong flags = args[0];
+target_ulong guestid = args[1];
+target_ulong vcpuid = args[2];
+SpaprMachineStateNestedGuest *guest;
+
+if (flags) { /* don't handle any flags for now */
+return H_UNSUPPORTED_FLAG;
+}
+
+guest = spapr_get_nested_guest(spapr, guestid);
+if (!guest) {
+return H_P2;
+}
+
+i

Re: [PATCH v4 05/15] spapr: nested: Document Nested PAPR API

2024-02-28 Thread Harsh Prateek Bora





On 2/27/24 16:09, Nicholas Piggin wrote:

On Tue Feb 27, 2024 at 7:31 PM AEST, Harsh Prateek Bora wrote:



On 2/27/24 14:59, Nicholas Piggin wrote:

On Tue Feb 20, 2024 at 6:35 PM AEST, Harsh Prateek Bora wrote:

Adding initial documentation about Nested PAPR API to describe the set
of APIs and its usage. Also talks about the Guest State Buffer elements
and it's format which is used between L0/L1 to communicate L2 state.

Signed-off-by: Michael Neuling 
Signed-off-by: Harsh Prateek Bora 


v2 is upstream in Linux now, I suppose you could reference that too?


Yes, upstream Linux commit is mentioned in the doc at the end.


The API doc commit is mentioned as a reference. I would expect something
following the comments under the Existing Nested-HV API heading for the
New PAPR API.

Oh, is it lifted directly from linux.git docs? Sigh, in that case never
mind, it's better to stick with them. Although could be just a link or
reference.



Well, initially the documentation was floated across with both
kernel/qemu patches, and now we have the kernel side merged. Although, I
think it would be more appropriate to keep it in Qemu which actually
implements the L0 functionality and the related APIs, however, for now,
let's keep a single source of documentation and we can provide a link
after the brief intro about the two APIs.

regards,
Harsh


Thanks,
Nick

1 2 3 4 5 >

1 - 100 of 449 matches

Mail list logo