date:20220315

[PATCH v3 09/17] target/m68k: Fix stack frame for EXCP_ILLEGAL

2022-03-15 Thread Richard Henderson

According to the M68040 Users Manual, section 8.4.3, Four word
stack frame (format 0), includes Illegal Instruction.  Use the
correct frame format, which does not use the ADDR argument.

Signed-off-by: Richard Henderson 
---
 target/m68k/op_helper.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/target/m68k/op_helper.c b/target/m68k/op_helper.c
index 4140f65422..6aebf9d737 100644
--- a/target/m68k/op_helper.c
+++ b/target/m68k/op_helper.c
@@ -391,11 +391,14 @@ static void m68k_interrupt_all(CPUM68KState *env, int 
is_hw)
 }
 break;
 
+case EXCP_ILLEGAL:
+do_stack_frame(env, , 0, oldsr, 0, env->pc);
+break;
+
 case EXCP_ADDRESS:
 do_stack_frame(env, , 2, oldsr, 0, env->pc);
 break;
 
-case EXCP_ILLEGAL:
 case EXCP_TRAPCC:
 /* FIXME: addr is not only env->pc */
 do_stack_frame(env, , 2, oldsr, env->pc, env->pc);
-- 
2.25.1

[PATCH v3 02/17] target/m68k: Switch over exception type in m68k_interrupt_all

2022-03-15 Thread Richard Henderson

Replace an if ladder with a switch for clarity.

Reviewed-by: Laurent Vivier 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 target/m68k/op_helper.c | 49 +
 1 file changed, 30 insertions(+), 19 deletions(-)

diff --git a/target/m68k/op_helper.c b/target/m68k/op_helper.c
index d30f988ae0..2b94a6ec84 100644
--- a/target/m68k/op_helper.c
+++ b/target/m68k/op_helper.c
@@ -333,7 +333,8 @@ static void m68k_interrupt_all(CPUM68KState *env, int is_hw)
 sp &= ~1;
 }
 
-if (cs->exception_index == EXCP_ACCESS) {
+switch (cs->exception_index) {
+case EXCP_ACCESS:
 if (env->mmu.fault) {
 cpu_abort(cs, "DOUBLE MMU FAULT\n");
 }
@@ -391,29 +392,39 @@ static void m68k_interrupt_all(CPUM68KState *env, int 
is_hw)
  "ssw:  %08x ea:   %08x sfc:  %ddfc: %d\n",
  env->mmu.ssw, env->mmu.ar, env->sfc, env->dfc);
 }
-} else if (cs->exception_index == EXCP_ADDRESS) {
+break;
+
+case EXCP_ADDRESS:
 do_stack_frame(env, , 2, oldsr, 0, retaddr);
-} else if (cs->exception_index == EXCP_ILLEGAL ||
-   cs->exception_index == EXCP_DIV0 ||
-   cs->exception_index == EXCP_CHK ||
-   cs->exception_index == EXCP_TRAPCC ||
-   cs->exception_index == EXCP_TRACE) {
+break;
+
+case EXCP_ILLEGAL:
+case EXCP_DIV0:
+case EXCP_CHK:
+case EXCP_TRAPCC:
+case EXCP_TRACE:
 /* FIXME: addr is not only env->pc */
 do_stack_frame(env, , 2, oldsr, env->pc, retaddr);
-} else if (is_hw && oldsr & SR_M &&
-   cs->exception_index >= EXCP_SPURIOUS &&
-   cs->exception_index <= EXCP_INT_LEVEL_7) {
-do_stack_frame(env, , 0, oldsr, 0, retaddr);
-oldsr = sr;
-env->aregs[7] = sp;
-cpu_m68k_set_sr(env, sr &= ~SR_M);
-sp = env->aregs[7];
-if (!m68k_feature(env, M68K_FEATURE_UNALIGNED_DATA)) {
-sp &= ~1;
+break;
+
+case EXCP_SPURIOUS ... EXCP_INT_LEVEL_7:
+if (is_hw && oldsr & SR_M) {
+do_stack_frame(env, , 0, oldsr, 0, retaddr);
+oldsr = sr;
+env->aregs[7] = sp;
+cpu_m68k_set_sr(env, sr &= ~SR_M);
+sp = env->aregs[7];
+if (!m68k_feature(env, M68K_FEATURE_UNALIGNED_DATA)) {
+sp &= ~1;
+}
+do_stack_frame(env, , 1, oldsr, 0, retaddr);
+break;
 }
-do_stack_frame(env, , 1, oldsr, 0, retaddr);
-} else {
+/* fall through */
+
+default:
 do_stack_frame(env, , 0, oldsr, 0, retaddr);
+break;
 }
 
 env->aregs[7] = sp;
-- 
2.25.1

[PATCH v3 03/17] target/m68k: Fix coding style in m68k_interrupt_all

2022-03-15 Thread Richard Henderson

Add parenthesis around & vs &&.

Remove assignment to sr in function call argument -- note that
sr is unused after the call, so the assignment was never needed,
only the result of the & expression.

Suggested-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 target/m68k/op_helper.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/m68k/op_helper.c b/target/m68k/op_helper.c
index 2b94a6ec84..0f41c2dce3 100644
--- a/target/m68k/op_helper.c
+++ b/target/m68k/op_helper.c
@@ -408,11 +408,11 @@ static void m68k_interrupt_all(CPUM68KState *env, int 
is_hw)
 break;
 
 case EXCP_SPURIOUS ... EXCP_INT_LEVEL_7:
-if (is_hw && oldsr & SR_M) {
+if (is_hw && (oldsr & SR_M)) {
 do_stack_frame(env, , 0, oldsr, 0, retaddr);
 oldsr = sr;
 env->aregs[7] = sp;
-cpu_m68k_set_sr(env, sr &= ~SR_M);
+cpu_m68k_set_sr(env, sr & ~SR_M);
 sp = env->aregs[7];
 if (!m68k_feature(env, M68K_FEATURE_UNALIGNED_DATA)) {
 sp &= ~1;
-- 
2.25.1

[PATCH v3 00/17] target/m68k: Conditional traps + trap cleanup

2022-03-15 Thread Richard Henderson

I should have gotten back to this right away after 7.0 devel
tree opened, but oh well.  There's always 7.1.

I believe I've fixed up all of the comments from [v2].


r~


v1: 
https://lore.kernel.org/qemu-devel/20211130103752.72099-1-richard.hender...@linaro.org/
v2: 
https://lore.kernel.org/qemu-devel/20211202204900.50973-1-richard.hender...@linaro.org/

Richard Henderson (17):
  target/m68k: Raise the TRAPn exception with the correct pc
  target/m68k: Switch over exception type in m68k_interrupt_all
  target/m68k: Fix coding style in m68k_interrupt_all
  linux-user/m68k: Handle EXCP_TRAP1 through EXCP_TRAP15
  target/m68k: Remove retaddr in m68k_interrupt_all
  target/m68k: Fix address argument for EXCP_CHK
  target/m68k: Fix pc, c flag, and address argument for EXCP_DIV0
  target/m68k: Fix address argument for EXCP_TRACE
  target/m68k: Fix stack frame for EXCP_ILLEGAL
  target/m68k: Implement TRAPcc
  target/m68k: Implement TPF in terms of TRAPcc
  target/m68k: Implement TRAPV
  target/m68k: Implement FTRAPcc
  tests/tcg/m68k: Add trap.c
  linux-user/strace: Fix print_syscall_err
  linux-user/strace: Adjust get_thread_area for m68k
  target/m68k: Mark helper_raise_exception as noreturn

 target/m68k/cpu.h  |   8 ++
 target/m68k/helper.h   |  14 +--
 linux-user/m68k/cpu_loop.c |  11 +-
 linux-user/strace.c|   4 +-
 target/m68k/cpu.c  |   1 +
 target/m68k/op_helper.c| 173 --
 target/m68k/translate.c| 190 -
 tests/tcg/m68k/trap.c  | 129 ++
 linux-user/strace.list |   5 +
 tests/tcg/m68k/Makefile.target |   3 +
 10 files changed, 394 insertions(+), 144 deletions(-)
 create mode 100644 tests/tcg/m68k/trap.c

-- 
2.25.1

[PATCH v2] coreaudio: Commit the result of init in the end

2022-03-15 Thread Akihiko Odaki

init_out_device may only commit some part of the result and leave the
state inconsistent when it encounters an error. Commit the result in
the end of the function so that it commits the result iif it sees no
error.

With this change, handle_voice_change can rely on core->outputDeviceID
to know whether the output device is initialized after calling
init_out_device.

Signed-off-by: Akihiko Odaki 
---
 audio/coreaudio.m | 49 ++-
 1 file changed, 27 insertions(+), 22 deletions(-)

diff --git a/audio/coreaudio.m b/audio/coreaudio.m
index 3186b68474d..127a368ac23 100644
--- a/audio/coreaudio.m
+++ b/audio/coreaudio.m
@@ -360,7 +360,10 @@ static OSStatus audioDeviceIOProc(
 static OSStatus init_out_device(coreaudioVoiceOut *core)
 {
 OSStatus status;
+AudioDeviceID deviceID;
 AudioValueRange frameRange;
+UInt32 audioDevicePropertyBufferFrameSize;
+AudioDeviceIOProcID ioprocid;
 
 AudioStreamBasicDescription streamBasicDescription = {
 .mBitsPerChannel = core->hw.info.bits,
@@ -373,20 +376,19 @@ static OSStatus init_out_device(coreaudioVoiceOut *core)
 .mSampleRate = core->hw.info.freq
 };
 
-status = coreaudio_get_voice(>outputDeviceID);
+status = coreaudio_get_voice();
 if (status != kAudioHardwareNoError) {
 coreaudio_playback_logerr (status,
"Could not get default output Device\n");
 return status;
 }
-if (core->outputDeviceID == kAudioDeviceUnknown) {
+if (deviceID == kAudioDeviceUnknown) {
 dolog ("Could not initialize playback - Unknown Audiodevice\n");
 return status;
 }
 
 /* get minimum and maximum buffer frame sizes */
-status = coreaudio_get_framesizerange(core->outputDeviceID,
-  );
+status = coreaudio_get_framesizerange(deviceID, );
 if (status == kAudioHardwareBadObjectError) {
 return 0;
 }
@@ -397,31 +399,31 @@ static OSStatus init_out_device(coreaudioVoiceOut *core)
 }
 
 if (frameRange.mMinimum > core->frameSizeSetting) {
-core->audioDevicePropertyBufferFrameSize = (UInt32) 
frameRange.mMinimum;
+audioDevicePropertyBufferFrameSize = (UInt32) frameRange.mMinimum;
 dolog ("warning: Upsizing Buffer Frames to %f\n", frameRange.mMinimum);
 } else if (frameRange.mMaximum < core->frameSizeSetting) {
-core->audioDevicePropertyBufferFrameSize = (UInt32) 
frameRange.mMaximum;
+audioDevicePropertyBufferFrameSize = (UInt32) frameRange.mMaximum;
 dolog ("warning: Downsizing Buffer Frames to %f\n", 
frameRange.mMaximum);
 } else {
-core->audioDevicePropertyBufferFrameSize = core->frameSizeSetting;
+audioDevicePropertyBufferFrameSize = core->frameSizeSetting;
 }
 
 /* set Buffer Frame Size */
-status = coreaudio_set_framesize(core->outputDeviceID,
- 
>audioDevicePropertyBufferFrameSize);
+status = coreaudio_set_framesize(deviceID,
+ );
 if (status == kAudioHardwareBadObjectError) {
 return 0;
 }
 if (status != kAudioHardwareNoError) {
 coreaudio_playback_logerr (status,
 "Could not set device buffer frame size %" 
PRIu32 "\n",
-
(uint32_t)core->audioDevicePropertyBufferFrameSize);
+
(uint32_t)audioDevicePropertyBufferFrameSize);
 return status;
 }
 
 /* get Buffer Frame Size */
-status = coreaudio_get_framesize(core->outputDeviceID,
- 
>audioDevicePropertyBufferFrameSize);
+status = coreaudio_get_framesize(deviceID,
+ );
 if (status == kAudioHardwareBadObjectError) {
 return 0;
 }
@@ -430,11 +432,9 @@ static OSStatus init_out_device(coreaudioVoiceOut *core)
 "Could not get device buffer frame 
size\n");
 return status;
 }
-core->hw.samples = core->bufferCount * 
core->audioDevicePropertyBufferFrameSize;
 
 /* set Samplerate */
-status = coreaudio_set_streamformat(core->outputDeviceID,
-);
+status = coreaudio_set_streamformat(deviceID, );
 if (status == kAudioHardwareBadObjectError) {
 return 0;
 }
@@ -442,7 +442,6 @@ static OSStatus init_out_device(coreaudioVoiceOut *core)
 coreaudio_playback_logerr (status,
"Could not set samplerate %lf\n",
streamBasicDescription.mSampleRate);
-core->outputDeviceID = kAudioDeviceUnknown;
 return status;
 }
 
@@ -456,20 +455,24 @@ static OSStatus init_out_device(coreaudioVoiceOut *core)
  * Therefore, the specified callback must be designed to avoid a deadlock
  * with the callers of

Re: [PATCH v20 5/7] net/vmnet: implement bridged mode (vmnet-bridged)

2022-03-15 Thread Akihiko Odaki


On 2022/03/16 8:07, Vladislav Yaroshchuk wrote:

Signed-off-by: Vladislav Yaroshchuk 
---
  net/vmnet-bridged.m | 128 ++--
  1 file changed, 123 insertions(+), 5 deletions(-)

diff --git a/net/vmnet-bridged.m b/net/vmnet-bridged.m
index 91c1a2f2c7..5936c87718 100644
--- a/net/vmnet-bridged.m
+++ b/net/vmnet-bridged.m
@@ -10,16 +10,134 @@
  
  #include "qemu/osdep.h"

  #include "qapi/qapi-types-net.h"
-#include "vmnet_int.h"
-#include "clients.h"
-#include "qemu/error-report.h"
  #include "qapi/error.h"
+#include "clients.h"
+#include "vmnet_int.h"
  
  #include 
  
+

+static bool validate_ifname(const char *ifname)
+{
+xpc_object_t shared_if_list = vmnet_copy_shared_interface_list();
+bool match = false;
+if (!xpc_array_get_count(shared_if_list)) {
+goto done;
+}
+
+match = !xpc_array_apply(
+shared_if_list,
+^bool(size_t index, xpc_object_t value) {
+return strcmp(xpc_string_get_string_ptr(value), ifname) != 0;
+});
+
+done:
+xpc_release(shared_if_list);
+return match;
+}
+
+
+static bool get_valid_ifnames(char *output_buf)
+{
+xpc_object_t shared_if_list = vmnet_copy_shared_interface_list();
+__block const char *ifname = NULL;
+__block int str_offset = 0;
+bool interfaces_available = true;
+
+if (!xpc_array_get_count(shared_if_list)) {
+interfaces_available = false;
+goto done;
+}
+
+xpc_array_apply(
+shared_if_list,
+^bool(size_t index, xpc_object_t value) {
+/* build list of strings like "en0 en1 en2 " */
+ifname = xpc_string_get_string_ptr(value);
+strcpy(output_buf + str_offset, ifname);
+strcpy(output_buf + str_offset + strlen(ifname), " ");
+str_offset += strlen(ifname) + 1;
+return true;
+});
+
+done:
+xpc_release(shared_if_list);
+return interfaces_available;
+}
+
+
+static bool validate_options(const Netdev *netdev, Error **errp)
+{
+const NetdevVmnetBridgedOptions *options = &(netdev->u.vmnet_bridged);
+char ifnames[1024];


There is no guarantee it fits in 1024 bytes. It was 256 bytes in an old 
version, but growing into some arbitrary size is not an appropriate fix. 
It should be dynamically allocated as it was done in an older version.


I'm sorry for missing things repeatedly. This should be *really* the 
last comment so please have a look at this.


P.S. I'm testing the current version and it is pleasantly working well. 
(I'm actually writing this email on QEMU with this series.)


Regards,
Akihiko Odaki


+
+if (!validate_ifname(options->ifname)) {
+if (get_valid_ifnames(ifnames)) {
+error_setg(errp,
+   "unsupported ifname '%s', expected one of [ %s]",
+   options->ifname,
+   ifnames);
+return false;
+}
+error_setg(errp,
+   "unsupported ifname '%s', no supported "
+   "interfaces available",
+   options->ifname);
+return false;
+}
+
+#if !defined(MAC_OS_VERSION_11_0) || \
+MAC_OS_X_VERSION_MIN_REQUIRED < MAC_OS_VERSION_11_0
+if (options->has_isolated) {
+error_setg(errp,
+   "vmnet-bridged.isolated feature is "
+   "unavailable: outdated vmnet.framework API");
+return false;
+}
+#endif
+return true;
+}
+
+
+static xpc_object_t build_if_desc(const Netdev *netdev)
+{
+const NetdevVmnetBridgedOptions *options = &(netdev->u.vmnet_bridged);
+xpc_object_t if_desc = xpc_dictionary_create(NULL, NULL, 0);
+
+xpc_dictionary_set_uint64(if_desc,
+  vmnet_operation_mode_key,
+  VMNET_BRIDGED_MODE
+);
+
+xpc_dictionary_set_string(if_desc,
+  vmnet_shared_interface_name_key,
+  options->ifname);
+
+#if defined(MAC_OS_VERSION_11_0) && \
+MAC_OS_X_VERSION_MIN_REQUIRED >= MAC_OS_VERSION_11_0
+xpc_dictionary_set_bool(if_desc,
+vmnet_enable_isolation_key,
+options->isolated);
+#endif
+return if_desc;
+}
+
+
+static NetClientInfo net_vmnet_bridged_info = {
+.type = NET_CLIENT_DRIVER_VMNET_BRIDGED,
+.size = sizeof(VmnetState),
+.receive = vmnet_receive_common,
+.cleanup = vmnet_cleanup_common,
+};
+
+
  int net_init_vmnet_bridged(const Netdev *netdev, const char *name,
 NetClientState *peer, Error **errp)
  {
-  error_setg(errp, "vmnet-bridged is not implemented yet");
-  return -1;
+NetClientState *nc = qemu_new_net_client(_vmnet_bridged_info,
+ peer, "vmnet-bridged", name);
+if (!validate_options(netdev, errp)) {
+return -1;
+}
+return vmnet_if_create(nc, build_if_desc(netdev), errp);
  }

Re: [PATCH] softmmu/physmem: Use qemu_madvise

2022-03-15 Thread Peter Xu

On Tue, Mar 15, 2022 at 11:04:05PM -0500, Andrew Deason wrote:
> We have a thin wrapper around madvise, called qemu_madvise, which
> provides consistent behavior for the !CONFIG_MADVISE case, and works
> around some platform-specific quirks (some platforms only provide
> posix_madvise, and some don't offer all 'advise' types). This specific
> caller of madvise has never used it, tracing back to its original
> introduction in commit e0b266f01dd2 ("migration_completion: Take
> current state").
> 
> Call qemu_madvise here, to follow the same logic as all of our other
> madvise callers. This slightly changes the behavior for
> !CONFIG_MADVISE (EINVAL instead of ENOSYS, and a slightly different
> error message), but this is now more consistent with other callers
> that use qemu_madvise.
> 
> Signed-off-by: Andrew Deason 

Reviewed-by: Peter Xu 

-- 
Peter Xu

Re: [PATCH v20 8/9] migration-test: Export migration-test util funtions

2022-03-15 Thread Peter Xu

On Wed, Mar 16, 2022 at 10:21:38AM +0800, huang...@chinatelecom.cn wrote:
> +void cleanup(const char *filename)
> +{
> +g_autofree char *path = g_strdup_printf("%s/%s", tmpfs, filename);
> +
> +unlink(path);
> +}

If to move most of these tmpfs helpers out anyway, shouldn't we also move
all tmpfs ops into this helper file?  E.g. initializations of tmpfs var is
still separately done.  That's a bit odd.

Ideally IIUC tmpfs doesn't need to be exported in migration-helpers.h at
all below, but hidden.

> diff --git a/tests/qtest/migration-helpers.h b/tests/qtest/migration-helpers.h
> index d63bba9..d08551f 100644
> --- a/tests/qtest/migration-helpers.h
> +++ b/tests/qtest/migration-helpers.h
> @@ -14,7 +14,14 @@
>  
>  #include "libqos/libqtest.h"
>  
> +/* For dirty ring test; so far only x86_64 is supported */
> +#if defined(__linux__) && defined(HOST_X86_64)
> +#include "linux/kvm.h"
> +#endif
> +#include 
> +
>  extern bool got_stop;
> +extern const char *tmpfs;

-- 
Peter Xu

[PATCH] softmmu/physmem: Use qemu_madvise

2022-03-15 Thread Andrew Deason

We have a thin wrapper around madvise, called qemu_madvise, which
provides consistent behavior for the !CONFIG_MADVISE case, and works
around some platform-specific quirks (some platforms only provide
posix_madvise, and some don't offer all 'advise' types). This specific
caller of madvise has never used it, tracing back to its original
introduction in commit e0b266f01dd2 ("migration_completion: Take
current state").

Call qemu_madvise here, to follow the same logic as all of our other
madvise callers. This slightly changes the behavior for
!CONFIG_MADVISE (EINVAL instead of ENOSYS, and a slightly different
error message), but this is now more consistent with other callers
that use qemu_madvise.

Signed-off-by: Andrew Deason 
---
Looking at the history of commits that touch this madvise() call, it
doesn't _look_ like there's any reason to be directly calling madvise vs
qemu_advise (I don't see anything mentioned), but I'm not sure.

 softmmu/physmem.c | 12 ++--
 1 file changed, 2 insertions(+), 10 deletions(-)

diff --git a/softmmu/physmem.c b/softmmu/physmem.c
index 43ae70fbe2..900c692b5e 100644
--- a/softmmu/physmem.c
+++ b/softmmu/physmem.c
@@ -3584,40 +3584,32 @@ int ram_block_discard_range(RAMBlock *rb, uint64_t 
start, size_t length)
  rb->idstr, start, length, ret);
 goto err;
 #endif
 }
 if (need_madvise) {
 /* For normal RAM this causes it to be unmapped,
  * for shared memory it causes the local mapping to disappear
  * and to fall back on the file contents (which we just
  * fallocate'd away).
  */
-#if defined(CONFIG_MADVISE)
 if (qemu_ram_is_shared(rb) && rb->fd < 0) {
-ret = madvise(host_startaddr, length, QEMU_MADV_REMOVE);
+ret = qemu_madvise(host_startaddr, length, QEMU_MADV_REMOVE);
 } else {
-ret = madvise(host_startaddr, length, QEMU_MADV_DONTNEED);
+ret = qemu_madvise(host_startaddr, length, QEMU_MADV_DONTNEED);
 }
 if (ret) {
 ret = -errno;
 error_report("ram_block_discard_range: Failed to discard range 
"
  "%s:%" PRIx64 " +%zx (%d)",
  rb->idstr, start, length, ret);
 goto err;
 }
-#else
-ret = -ENOSYS;
-error_report("ram_block_discard_range: MADVISE not available"
- "%s:%" PRIx64 " +%zx (%d)",
- rb->idstr, start, length, ret);
-goto err;
-#endif
 }
 trace_ram_block_discard_range(rb->idstr, host_startaddr, length,
   need_madvise, need_fallocate, ret);
 } else {
 error_report("ram_block_discard_range: Overrun block '%s' (%" PRIu64
  "/%zx/" RAM_ADDR_FMT")",
  rb->idstr, start, length, rb->max_length);
 }
 
 err:
-- 
2.11.0

Re: [PATCH v6 06/11] s390x: topology: Adding books to CPU topology

2022-03-15 Thread wangyanan (Y)


Hi Pierre,

On 2022/2/17 21:41, Pierre Morel wrote:

S390 CPU topology may have up to 5 topology containers.
The first container above the cores is level 2, the sockets.
We introduce here the books, book is the level containing sockets.

Let's add books, level3, containers to the CPU topology.

Signed-off-by: Pierre Morel 
---
  hw/core/machine-smp.c  | 29 ++---
  hw/core/machine.c  |  2 ++
  hw/s390x/s390-virtio-ccw.c |  1 +
  include/hw/boards.h|  4 
  qapi/machine.json  |  7 ++-
  softmmu/vl.c   |  3 +++
  6 files changed, 38 insertions(+), 8 deletions(-)

diff --git a/hw/core/machine-smp.c b/hw/core/machine-smp.c
index b39ed21e65..d7aa39d540 100644
--- a/hw/core/machine-smp.c
+++ b/hw/core/machine-smp.c
@@ -31,6 +31,10 @@ static char *cpu_hierarchy_to_string(MachineState *ms)
  MachineClass *mc = MACHINE_GET_CLASS(ms);
  GString *s = g_string_new(NULL);
  
+if (mc->smp_props.books_supported) {

+g_string_append_printf(s, " * books (%u)", ms->smp.books);
+}
+
  g_string_append_printf(s, "sockets (%u)", ms->smp.sockets);
  

Now books become the top level container, string format for sockets should
be tweaked as " * sockets (%u)". Also we need to cut off the " * " at 
the head

of the composite topology string.

  if (mc->smp_props.dies_supported) {
@@ -73,6 +77,7 @@ void machine_parse_smp_config(MachineState *ms,
  {
  MachineClass *mc = MACHINE_GET_CLASS(ms);
  unsigned cpus= config->has_cpus ? config->cpus : 0;
+unsigned books   = config->has_books ? config->books : 0;
  unsigned sockets = config->has_sockets ? config->sockets : 0;
  unsigned dies= config->has_dies ? config->dies : 0;
  unsigned clusters = config->has_clusters ? config->clusters : 0;
@@ -85,6 +90,7 @@ void machine_parse_smp_config(MachineState *ms,
   * explicit configuration like "cpus=0" is not allowed.
   */
  if ((config->has_cpus && config->cpus == 0) ||
+(config->has_books && config->books == 0) ||
  (config->has_sockets && config->sockets == 0) ||
  (config->has_dies && config->dies == 0) ||
  (config->has_clusters && config->clusters == 0) ||
@@ -111,6 +117,13 @@ void machine_parse_smp_config(MachineState *ms,
  dies = dies > 0 ? dies : 1;
  clusters = clusters > 0 ? clusters : 1;
  
+if (!mc->smp_props.books_supported && books > 1) {

+error_setg(errp, "books not supported by this machine's CPU topology");
+return;
+}
nit: maybe move above part to the similar sanity checks of "dies and 
clusters"?...

+
+books = books > 0 ? books : 1;
...and put this line together with the similar operation of "dies and 
clusters"

+
  /* compute missing values based on the provided ones */
  if (cpus == 0 && maxcpus == 0) {
  sockets = sockets > 0 ? sockets : 1;
@@ -124,33 +137,35 @@ void machine_parse_smp_config(MachineState *ms,
  if (sockets == 0) {
  cores = cores > 0 ? cores : 1;
  threads = threads > 0 ? threads : 1;
-sockets = maxcpus / (dies * clusters * cores * threads);
+sockets = maxcpus / (books * dies * clusters * cores * 
threads);
  } else if (cores == 0) {
  threads = threads > 0 ? threads : 1;
-cores = maxcpus / (sockets * dies * clusters * threads);
+cores = maxcpus / (books * sockets * dies * clusters * 
threads);
  }
  } else {
  /* prefer cores over sockets since 6.2 */
  if (cores == 0) {
  sockets = sockets > 0 ? sockets : 1;
  threads = threads > 0 ? threads : 1;
-cores = maxcpus / (sockets * dies * clusters * threads);
+cores = maxcpus / (books * sockets * dies * clusters * 
threads);
  } else if (sockets == 0) {
  threads = threads > 0 ? threads : 1;
-sockets = maxcpus / (dies * clusters * cores * threads);
+sockets = maxcpus / (books * dies * clusters * cores * 
threads);
  }
  }
  
  /* try to calculate omitted threads at last */

  if (threads == 0) {
-threads = maxcpus / (sockets * dies * clusters * cores);
+threads = maxcpus / (books * sockets * dies * clusters * cores);
  }
  }
  
-maxcpus = maxcpus > 0 ? maxcpus : sockets * dies * clusters * cores * threads;

+maxcpus = maxcpus > 0 ? maxcpus : books * sockets * dies *
+  clusters * cores * threads;
  cpus = cpus > 0 ? cpus : maxcpus;
  
  ms->smp.cpus = cpus;

+ms->smp.books = books;
  ms->smp.sockets = sockets;
  ms->smp.dies = dies;
  ms->smp.clusters = clusters;
@@ -159,7 +174,7 @@ void machine_parse_smp_config(MachineState *ms,
  ms->smp.max_cpus = maxcpus;
  
  /* sanity-check of

[PATCH v3 2/3] hw/i386/acpi-build: Avoid 'sun' identifier

2022-03-15 Thread Andrew Deason

On Solaris, 'sun' is #define'd to 1, which causes errors if a variable
is named 'sun'. Slightly change the name of the var for the Slot User
Number so we can build on Solaris.

Reviewed-by: Ani Sinha 
Signed-off-by: Andrew Deason 
---
 hw/i386/acpi-build.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 4ad4d7286c..dcf6ece3d0 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -576,32 +576,32 @@ static void build_append_pci_bus_devices(Aml 
*parent_scope, PCIBus *bus,
 }
 
 Aml *aml_pci_device_dsm(void)
 {
 Aml *method, *UUID, *ifctx, *ifctx1, *ifctx2, *ifctx3, *elsectx;
 Aml *acpi_index = aml_local(0);
 Aml *zero = aml_int(0);
 Aml *bnum = aml_arg(4);
 Aml *func = aml_arg(2);
 Aml *rev = aml_arg(1);
-Aml *sun = aml_arg(5);
+Aml *sunum = aml_arg(5);
 
 method = aml_method("PDSM", 6, AML_SERIALIZED);
 
 /*
  * PCI Firmware Specification 3.1
  * 4.6.  _DSM Definitions for PCI
  */
 UUID = aml_touuid("E5C937D0-3553-4D7A-9117-EA4D19C3434D");
 ifctx = aml_if(aml_equal(aml_arg(0), UUID));
 {
-aml_append(ifctx, aml_store(aml_call2("AIDX", bnum, sun), acpi_index));
+aml_append(ifctx, aml_store(aml_call2("AIDX", bnum, sunum), 
acpi_index));
 ifctx1 = aml_if(aml_equal(func, zero));
 {
 uint8_t byte_list[1];
 
 ifctx2 = aml_if(aml_equal(rev, aml_int(2)));
 {
 /*
  * advertise function 7 if device has acpi-index
  * acpi_index values:
  *0: not present (default value)
-- 
2.11.0

[PATCH v3 3/3] util/osdep: Remove some early cruft

2022-03-15 Thread Andrew Deason

The include for statvfs.h has not been needed since all statvfs calls
were removed in commit 4a1418e07bdc ("Unbreak large mem support by
removing kqemu").

The comment mentioning CONFIG_BSD hasn't made sense since an include
for config-host.h was removed in commit aafd75841001 ("util: Clean up
includes").

Remove this cruft.

Reviewed-by: Peter Maydell 
Signed-off-by: Andrew Deason 
---
 util/osdep.c | 7 ---
 1 file changed, 7 deletions(-)

diff --git a/util/osdep.c b/util/osdep.c
index 1825399bcf..394804d32e 100644
--- a/util/osdep.c
+++ b/util/osdep.c
@@ -16,27 +16,20 @@
  * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
  * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
  * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
  * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
  * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
  * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
  * THE SOFTWARE.
  */
 #include "qemu/osdep.h"
 #include "qapi/error.h"
-
-/* Needed early for CONFIG_BSD etc. */
-
-#ifdef CONFIG_SOLARIS
-#include 
-#endif
-
 #include "qemu-common.h"
 #include "qemu/cutils.h"
 #include "qemu/sockets.h"
 #include "qemu/error-report.h"
 #include "qemu/madvise.h"
 #include "qemu/mprotect.h"
 #include "qemu/hw-version.h"
 #include "monitor/monitor.h"
 
 static bool fips_enabled = false;
-- 
2.11.0

[PATCH v3 1/3] util/osdep: Avoid madvise proto on modern Solaris

2022-03-15 Thread Andrew Deason

On older Solaris releases (before Solaris 11), we didn't get a
prototype for madvise, and so util/osdep.c provides its own prototype.
Some time between the public Solaris 11.4 release and Solaris 11.4.42
CBE, we started getting an madvise prototype that looks like this:

extern int madvise(void *, size_t, int);

which conflicts with the prototype in util/osdeps.c. Instead of always
declaring this prototype, check if we're missing the madvise()
prototype, and only declare it ourselves if the prototype is missing.
Move the prototype to include/qemu/osdep.h, the normal place to handle
platform-specific header quirks.

The 'missing_madvise_proto' meson check contains an obviously wrong
prototype for madvise. So if that code compiles and links, we must be
missing the actual prototype for madvise.

Signed-off-by: Andrew Deason 
---
Changes since v2:
- Rename new symbol to HAVE_MADVISE_WITHOUT_PROTOTYPE
- Move madvise prototype to include/qemu/osdep.h
- More comments in meson.build

Changes since v1:
- madvise prototype check changed to not be platforms-specific, and turned into
  CONFIG_MADVISE_MISSING_PROTOTYPE.

 include/qemu/osdep.h |  8 
 meson.build  | 23 +--
 util/osdep.c |  3 ---
 3 files changed, 29 insertions(+), 5 deletions(-)

diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index 322103aadb..f2274b24cb 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -393,20 +393,28 @@ void qemu_anon_ram_free(void *ptr, size_t size);
 
 #if defined(__linux__) || defined(__FreeBSD__) ||   \
 defined(__FreeBSD_kernel__) || defined(__DragonFly__)
 #define HAVE_CHARDEV_PARPORT 1
 #endif
 
 #if defined(__HAIKU__)
 #define SIGIO SIGPOLL
 #endif
 
+#ifdef HAVE_MADVISE_WITHOUT_PROTOTYPE
+/*
+ * See MySQL bug #7156 (http://bugs.mysql.com/bug.php?id=7156) for discussion
+ * about Solaris missing the madvise() prototype.
+ */
+extern int madvise(char *, size_t, int);
+#endif
+
 #if defined(CONFIG_LINUX)
 #ifndef BUS_MCEERR_AR
 #define BUS_MCEERR_AR 4
 #endif
 #ifndef BUS_MCEERR_AO
 #define BUS_MCEERR_AO 5
 #endif
 #endif
 
 #if defined(__linux__) && \
diff --git a/meson.build b/meson.build
index bae62efc9c..282e7c4650 100644
--- a/meson.build
+++ b/meson.build
@@ -1708,25 +1708,44 @@ config_host_data.set('CONFIG_EVENTFD', cc.links('''
   int main(void) { return eventfd(0, EFD_NONBLOCK | EFD_CLOEXEC); }'''))
 config_host_data.set('CONFIG_FDATASYNC', cc.links(gnu_source_prefix + '''
   #include 
   int main(void) {
   #if defined(_POSIX_SYNCHRONIZED_IO) && _POSIX_SYNCHRONIZED_IO > 0
   return fdatasync(0);
   #else
   #error Not supported
   #endif
   }'''))
-config_host_data.set('CONFIG_MADVISE', cc.links(gnu_source_prefix + '''
+
+has_madvise = cc.links(gnu_source_prefix + '''
   #include 
   #include 
   #include 
-  int main(void) { return madvise(NULL, 0, MADV_DONTNEED); }'''))
+  int main(void) { return madvise(NULL, 0, MADV_DONTNEED); }''')
+missing_madvise_proto = false
+if has_madvise
+  # Some platforms (illumos and Solaris before Solaris 11) provide madvise()
+  # but forget to prototype it. In this case, has_madvise will be true (the
+  # test program links despite a compile warning). To detect the
+  # missing-prototype case, we try again with a definitely-bogus prototype.
+  # This will only compile if the system headers don't provide the prototype;
+  # otherwise the conflicting prototypes will cause a compiler error.
+  missing_madvise_proto = cc.links(gnu_source_prefix + '''
+#include 
+#include 
+#include 
+extern int madvise(int);
+int main(void) { return madvise(0); }''')
+endif
+config_host_data.set('CONFIG_MADVISE', has_madvise)
+config_host_data.set('HAVE_MADVISE_WITHOUT_PROTOTYPE', missing_madvise_proto)
+
 config_host_data.set('CONFIG_MEMFD', cc.links(gnu_source_prefix + '''
   #include 
   int main(void) { return memfd_create("foo", MFD_ALLOW_SEALING); }'''))
 config_host_data.set('CONFIG_OPEN_BY_HANDLE', cc.links(gnu_source_prefix + '''
   #include 
   #if !defined(AT_EMPTY_PATH)
   # error missing definition
   #else
   int main(void) { struct file_handle fh; return open_by_handle_at(0, , 0); 
}
   #endif'''))
diff --git a/util/osdep.c b/util/osdep.c
index 7c4deda6fe..1825399bcf 100644
--- a/util/osdep.c
+++ b/util/osdep.c
@@ -21,23 +21,20 @@
  * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
  * THE SOFTWARE.
  */
 #include "qemu/osdep.h"
 #include "qapi/error.h"
 
 /* Needed early for CONFIG_BSD etc. */
 
 #ifdef CONFIG_SOLARIS
 #include 
-/* See MySQL bug #7156 (http://bugs.mysql.com/bug.php?id=7156) for
-   discussion about Solaris header problems */
-extern int madvise(char *, size_t, int);
 #endif
 
 #include "qemu-common.h"
 #include "qemu/cutils.h"
 #include "qemu/sockets.h"
 #include "qemu/error-report.h"
 #include "qemu/madvise.h"
 #include "qemu/mprotect.h"
 #include "qemu/hw-version.h"
 #include "monitor/monitor.h"
-- 
2.11.0

[PATCH v3 0/3] Fixes for building on Solaris 11.4.42 CBE

2022-03-15 Thread Andrew Deason

With these minor fixes, I can build qemu on Solaris 11.4.42 CBE
(Oracle's new rolling release thing), using '--disable-rdma
--enable-modules --disable-dbus-display --target-list=x86_64-softmmu'.
I'm just interested in the guest agent right now, so that's all I've
tested (briefly), but the rest of the build wasn't hard to get working.
With this, the guest agent runs fine using isa-serial.

Changes since v2:
- Rename new symbol to HAVE_MADVISE_WITHOUT_PROTOTYPE
- Move madvise prototype to include/qemu/osdep.h
- More comments in meson.build

Changes since v1:
- Change the CONFIG_MADVISE checks to not be platform-specific
- Add the last commit removing util/osdep.c cruft

Andrew Deason (3):
  util/osdep: Avoid madvise proto on modern Solaris
  hw/i386/acpi-build: Avoid 'sun' identifier
  util/osdep: Remove some early cruft

 hw/i386/acpi-build.c |  4 ++--
 include/qemu/osdep.h |  8 
 meson.build  | 23 +--
 util/osdep.c | 10 --
 4 files changed, 31 insertions(+), 14 deletions(-)

-- 
2.11.0

Re: Time to introduce a migration protocol negotiation (Re: [PATCH v2 00/25] migration: Postcopy Preemption)

2022-03-15 Thread Peter Xu

On Tue, Mar 15, 2022 at 11:15:41AM +, Daniel P. Berrangé wrote:
> > I still remember you mentioned the upper layer softwares can have
> > assumption on using only 1 pair of socket for migration, I think that makes
> > postcopy-preempt by default impossible.
> > 
> > Why multifd is different here?
> 
> It isn't different. We went through the pain to extending libvirt
> to know how to open many channels for multifd. We'll have todo
> the same with this postcopy-pre-empt. To this day though, management
> apps above libvirt largely don't enable multifd, which is a real
> shame. This is the key reason I think we need to handle this at
> the QEMU level automatically.

But I still don't undertand how QEMU could know about those tunnels, which
should be beyond QEMU's awareness?

The tunneling program can be some admin initiated socat tcp forwarding
programs, which by default may not allow >1 socket pairs.

Or maybe I have mis-understood on what's the tunneling we're discussing?

> 
> > > This post-copy is another case.  We should start off knowing
> > > we can switch to post-copy at any time.
> > 
> > This one is kind of special and it'll be harder, IMHO.
> > 
> > AFAIU, postcopy users will always initiate the migration with at least a
> > full round of precopy, with the hope that all the static guest pages will
> > be migrated.
> 
> I think I didn't explain myself properly here. Today there are
> two parts to postcopy usage in libvirt
> 
>   - Pass the "VIR_MIGRATE_POSTCOPY" when starting the migration.
> The migration still runs in pre-copy mode. This merely ensures
> we configure a bi-directional socket, so the app has the option
> to swtich to postcopy later
> 
>   - Invoke virDomainMigrateStartPostCopy  to flip from pre-copy
> to post-copy phase. This requires you previously passed
> VIR_MIGRATE_POSTCOPY to enable its use.
> 
> The first point using 'VIR_MIGRATE_POSTCOPY' should not exist.
> That should be automaticaly negotiated and handled by QEMU.
> 
> Libvirt and mgmt apps should only need to care about whether
> or not they call virDomainMigrateStartPostCopy to flip to
> post-copy mode.

Ah I see.  I think Dave also mentioned it'll be a bit tricky to do so, but
it'll be at least sounds doable.

> 
> > > We should further be able to add pre-emption if we find it available.
> > 
> > Yeah here I have the same question per multifd above.  I just have no idea
> > whether QEMU has such knowledge on making this decision.  E.g., how could
> > QEMU know whether upper app is not tunneling the migration stream?  How
> > could QEMU know whether the upper app could handle multiple tcp sockets
> > well?
> 
> It can't do this today - that's why we need the new migration protocol
> feature negotiation I describe below.
> 
> > > So rather than following our historical practice, anjd adding
> > > yet another migration parameter for a specific feature, I'd
> > > really encourage us to put a stop to it and future proof
> > > ourselves.
> > > 
> > > 
> > > Introduce one *final-no-more-never-again-after-this* migration
> > > capability called "protocol-negotiation".
> > 
> > Let's see how Juan/Dave/others think.. anyway, that's something I always
> > wanted.
> > 
> > IMHO an even simpler term can be as simple as:
> > 
> >   -global migration.handshake=on
> 
> This is just inventing a new migration capability framework. We
> can just use existing QMP for this.

It's not a new one, it's just that a few years ago we exported the
migration capabilities to cmdline too (2081475841fe8), even if it's mostly
for debugging purpose.  In my daily tests it's quite handy.

> 
> > > When that capability is set, first declare that henceforth the
> > > migration transport is REQUIRED to support **multiple**,
> > > **bi-directional** channels.
> > 
> > This new capability will simply need to depend on the return-path
> > capability we already have.  E.g. exec-typed migration won't be able to
> > enable return-path, so not applicable to this one too.
> 
> 'exec' can be made to work if desired. Currently we only create
> a unidirectuional pipe and wire it up to stdin for outgoing
> migration. Nothing stops us declaring 'exec' uses a socketpair
> wired to stdin + stdout, and supprot invoking 'exec' multiple
> times to get many sockets

Yeah sounds working, it's just that we need to have users of it first.  One
point is that exec shouldn't be used in production but for quick hacks or
experiments, so supporting new/perf/enhancement features for it sounds a
bit over-engineering unless explicitly useful.

Thanks,

-- 
Peter Xu

[ANNOUNCE] QEMU 7.0.0-rc0 is now available

2022-03-15 Thread Michael Roth

Hello,

On behalf of the QEMU Team, I'd like to announce the availability of the
first release candidate for the QEMU 7.0 release. This release is meant
for testing purposes and should not be used in a production environment.

  http://download.qemu-project.org/qemu-7.0.0-rc0.tar.xz
  http://download.qemu-project.org/qemu-7.0.0-rc0.tar.xz.sig

You can help improve the quality of the QEMU 7.0 release by testing this
release and reporting bugs using our GitLab issue tracker:

  https://gitlab.com/qemu-project/qemu/-/issues

The release plan, as well a documented known issues for release
candidates, are available at:

  http://wiki.qemu.org/Planning/7.0

Please add entries to the ChangeLog for the 7.0 release below:

  http://wiki.qemu.org/ChangeLog/7.0

Thank you to everyone involved!

Re: [PATCH v2 6/6] libvduse: Add support for reconnecting

2022-03-15 Thread Yongji Xie

On Tue, Mar 15, 2022 at 9:48 PM Stefan Hajnoczi  wrote:
>
> On Tue, Feb 15, 2022 at 06:59:43PM +0800, Xie Yongji wrote:
> > +static int vduse_queue_inflight_get(VduseVirtq *vq, int desc_idx)
> > +{
> > +vq->log->inflight.desc[desc_idx].counter = vq->counter++;
> > +vq->log->inflight.desc[desc_idx].inflight = 1;
>
> Is a barrier needed between these two lines to prevent inflight = 1 with
> an undefined counter value?

Yes, I will fix it in v3.

Thanks,
Yongji

[PATCH v20 9/9] tests: Add dirty page rate limit test

2022-03-15 Thread huangy81

From: Hyman Huang(黄勇) 

Add dirty page rate limit test if kernel support dirty ring,
create a standalone file to implement the test case.

The following qmp commands are covered by this test case:
"calc-dirty-rate", "query-dirty-rate", "set-vcpu-dirty-limit",
"cancel-vcpu-dirty-limit" and "query-vcpu-dirty-limit".

Signed-off-by: Hyman Huang(黄勇) 
---
 tests/qtest/dirtylimit-test.c | 319 ++
 tests/qtest/meson.build   |   2 +
 2 files changed, 321 insertions(+)
 create mode 100644 tests/qtest/dirtylimit-test.c

diff --git a/tests/qtest/dirtylimit-test.c b/tests/qtest/dirtylimit-test.c
new file mode 100644
index 000..c9af620
--- /dev/null
+++ b/tests/qtest/dirtylimit-test.c
@@ -0,0 +1,319 @@
+/*
+ * QTest testcase for Dirty Page Rate Limit
+ *
+ * Copyright (c) 2022 CHINA TELECOM CO.,LTD.
+ *
+ * Authors:
+ *  Hyman Huang(黄勇) 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "libqos/libqtest.h"
+#include "qapi/qmp/qdict.h"
+#include "qapi/qmp/qlist.h"
+#include "qapi/qobject-input-visitor.h"
+#include "qapi/qobject-output-visitor.h"
+
+#include "migration-helpers.h"
+#include "tests/migration/i386/a-b-bootblock.h"
+
+/*
+ * Dirtylimit stop working if dirty page rate error
+ * value less than DIRTYLIMIT_TOLERANCE_RANGE
+ */
+#define DIRTYLIMIT_TOLERANCE_RANGE  25  /* MB/s */
+
+static QDict *qmp_command(QTestState *who, const char *command, ...)
+{
+va_list ap;
+QDict *resp, *ret;
+
+va_start(ap, command);
+resp = qtest_vqmp(who, command, ap);
+va_end(ap);
+
+g_assert(!qdict_haskey(resp, "error"));
+g_assert(qdict_haskey(resp, "return"));
+
+ret = qdict_get_qdict(resp, "return");
+qobject_ref(ret);
+qobject_unref(resp);
+
+return ret;
+}
+
+static void calc_dirty_rate(QTestState *who, uint64_t calc_time)
+{
+qobject_unref(qmp_command(who,
+  "{ 'execute': 'calc-dirty-rate',"
+  "'arguments': { "
+  "'calc-time': %ld,"
+  "'mode': 'dirty-ring' }}",
+  calc_time));
+}
+
+static QDict *query_dirty_rate(QTestState *who)
+{
+return qmp_command(who, "{ 'execute': 'query-dirty-rate' }");
+}
+
+static void dirtylimit_set_all(QTestState *who, uint64_t dirtyrate)
+{
+qobject_unref(qmp_command(who,
+  "{ 'execute': 'set-vcpu-dirty-limit',"
+  "'arguments': { "
+  "'dirty-rate': %ld } }",
+  dirtyrate));
+}
+
+static void cancel_vcpu_dirty_limit(QTestState *who)
+{
+qobject_unref(qmp_command(who,
+  "{ 'execute': 'cancel-vcpu-dirty-limit' }"));
+}
+
+static QDict *query_vcpu_dirty_limit(QTestState *who)
+{
+QDict *rsp;
+
+rsp = qtest_qmp(who, "{ 'execute': 'query-vcpu-dirty-limit' }");
+g_assert(!qdict_haskey(rsp, "error"));
+g_assert(qdict_haskey(rsp, "return"));
+
+return rsp;
+}
+
+static bool calc_dirtyrate_ready(QTestState *who)
+{
+QDict *rsp_return;
+gchar *status;
+
+rsp_return = query_dirty_rate(who);
+g_assert(rsp_return);
+
+status = g_strdup(qdict_get_str(rsp_return, "status"));
+g_assert(status);
+
+return g_strcmp0(status, "measuring");
+}
+
+static void wait_for_calc_dirtyrate_complete(QTestState *who,
+ int64_t calc_time)
+{
+int max_try_count = 200;
+usleep(calc_time);
+
+while (!calc_dirtyrate_ready(who) && max_try_count--) {
+usleep(1000);
+}
+
+/*
+ * Set the timeout with 200 ms(max_try_count * 1000us),
+ * if dirtyrate measurement not complete, test failed.
+ */
+g_assert_cmpint(max_try_count, !=, 0);
+}
+
+static int64_t get_dirty_rate(QTestState *who)
+{
+QDict *rsp_return;
+gchar *status;
+QList *rates;
+const QListEntry *entry;
+QDict *rate;
+int64_t dirtyrate;
+
+rsp_return = query_dirty_rate(who);
+g_assert(rsp_return);
+
+status = g_strdup(qdict_get_str(rsp_return, "status"));
+g_assert(status);
+g_assert_cmpstr(status, ==, "measured");
+
+rates = qdict_get_qlist(rsp_return, "vcpu-dirty-rate");
+g_assert(rates && !qlist_empty(rates));
+
+entry = qlist_first(rates);
+g_assert(entry);
+
+rate = qobject_to(QDict, qlist_entry_obj(entry));
+g_assert(rate);
+
+dirtyrate = qdict_get_try_int(rate, "dirty-rate", -1);
+
+qobject_unref(rsp_return);
+return dirtyrate;
+}
+
+static int64_t get_limit_rate(QTestState *who)
+{
+QDict *rsp_return;
+QList *rates;
+const QListEntry *entry;
+QDict *rate;
+int64_t dirtyrate;
+
+rsp_return = query_vcpu_dirty_limit(who);
+g_assert(rsp_return);
+
+rates = qdict_get_qlist(rsp_return, "return");
+g_assert(rates && !qlist_empty(rates));
+
+entry = qlist_first(rates);
+g_assert(entry);
+
+rate = qobject_to(QDict,

[PATCH v20 7/9] softmmu/dirtylimit: Implement dirty page rate limit

2022-03-15 Thread huangy81

From: Hyman Huang(黄勇) 

Implement dirtyrate calculation periodically basing on
dirty-ring and throttle virtual CPU until it reachs the quota
dirty page rate given by user.

Introduce qmp commands "set-vcpu-dirty-limit",
"cancel-vcpu-dirty-limit", "query-vcpu-dirty-limit"
to enable, disable, query dirty page limit for virtual CPU.

Meanwhile, introduce corresponding hmp commands
"set_vcpu_dirty_limit", "cancel_vcpu_dirty_limit",
"info vcpu_dirty_limit" so the feature can be more usable.

"query-vcpu-dirty-limit" success depends on enabling dirty
page rate limit, so just add it to the list of skipped
command to ensure qmp-cmd-test run successfully.

Signed-off-by: Hyman Huang(黄勇) 
Acked-by: Markus Armbruster 
Reviewed-by: Peter Xu 
---
 hmp-commands-info.hx   |  13 +++
 hmp-commands.hx|  32 
 include/monitor/hmp.h  |   3 +
 qapi/migration.json|  80 +++
 softmmu/dirtylimit.c   | 195 +
 tests/qtest/qmp-cmd-test.c |   2 +
 6 files changed, 325 insertions(+)

diff --git a/hmp-commands-info.hx b/hmp-commands-info.hx
index adfa085..016717d 100644
--- a/hmp-commands-info.hx
+++ b/hmp-commands-info.hx
@@ -865,6 +865,19 @@ SRST
 Display the vcpu dirty rate information.
 ERST
 
+{
+.name   = "vcpu_dirty_limit",
+.args_type  = "",
+.params = "",
+.help   = "show dirty page limit information of all vCPU",
+.cmd= hmp_info_vcpu_dirty_limit,
+},
+
+SRST
+  ``info vcpu_dirty_limit``
+Display the vcpu dirty page limit information.
+ERST
+
 #if defined(TARGET_I386)
 {
 .name   = "sgx",
diff --git a/hmp-commands.hx b/hmp-commands.hx
index 8476277..82ab75f 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1746,3 +1746,35 @@ ERST
   "\n\t\t\t -b to specify dirty bitmap as method of 
calculation)",
 .cmd= hmp_calc_dirty_rate,
 },
+
+SRST
+``set_vcpu_dirty_limit``
+  Set dirty page rate limit on virtual CPU, the information about all the
+  virtual CPU dirty limit status can be observed with ``info vcpu_dirty_limit``
+  command.
+ERST
+
+{
+.name   = "set_vcpu_dirty_limit",
+.args_type  = "dirty_rate:l,cpu_index:l?",
+.params = "dirty_rate [cpu_index]",
+.help   = "set dirty page rate limit, use cpu_index to set limit"
+  "\n\t\t\t\t\t on a specified virtual cpu",
+.cmd= hmp_set_vcpu_dirty_limit,
+},
+
+SRST
+``cancel_vcpu_dirty_limit``
+  Cancel dirty page rate limit on virtual CPU, the information about all the
+  virtual CPU dirty limit status can be observed with ``info vcpu_dirty_limit``
+  command.
+ERST
+
+{
+.name   = "cancel_vcpu_dirty_limit",
+.args_type  = "cpu_index:l?",
+.params = "[cpu_index]",
+.help   = "cancel dirty page rate limit, use cpu_index to cancel"
+  "\n\t\t\t\t\t limit on a specified virtual cpu",
+.cmd= hmp_cancel_vcpu_dirty_limit,
+},
diff --git a/include/monitor/hmp.h b/include/monitor/hmp.h
index 96d0148..478820e 100644
--- a/include/monitor/hmp.h
+++ b/include/monitor/hmp.h
@@ -131,6 +131,9 @@ void hmp_replay_delete_break(Monitor *mon, const QDict 
*qdict);
 void hmp_replay_seek(Monitor *mon, const QDict *qdict);
 void hmp_info_dirty_rate(Monitor *mon, const QDict *qdict);
 void hmp_calc_dirty_rate(Monitor *mon, const QDict *qdict);
+void hmp_set_vcpu_dirty_limit(Monitor *mon, const QDict *qdict);
+void hmp_cancel_vcpu_dirty_limit(Monitor *mon, const QDict *qdict);
+void hmp_info_vcpu_dirty_limit(Monitor *mon, const QDict *qdict);
 void hmp_human_readable_text_helper(Monitor *mon,
 HumanReadableText *(*qmp_handler)(Error 
**));
 
diff --git a/qapi/migration.json b/qapi/migration.json
index 18e2610..910a4ff 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -1861,6 +1861,86 @@
 { 'command': 'query-dirty-rate', 'returns': 'DirtyRateInfo' }
 
 ##
+# @DirtyLimitInfo:
+#
+# Dirty page rate limit information of a virtual CPU.
+#
+# @cpu-index: index of a virtual CPU.
+#
+# @limit-rate: upper limit of dirty page rate (MB/s) for a virtual
+#  CPU, 0 means unlimited.
+#
+# @current-rate: current dirty page rate (MB/s) for a virtual CPU.
+#
+# Since: 7.0
+#
+##
+{ 'struct': 'DirtyLimitInfo',
+  'data': { 'cpu-index': 'int',
+'limit-rate': 'uint64',
+'current-rate': 'uint64' } }
+
+##
+# @set-vcpu-dirty-limit:
+#
+# Set the upper limit of dirty page rate for virtual CPUs.
+#
+# Requires KVM with accelerator property "dirty-ring-size" set.
+# A virtual CPU's dirty page rate is a measure of its memory load.
+# To observe dirty page rates, use @calc-dirty-rate.
+#
+# @cpu-index: index of a virtual CPU, default is all.
+#
+# @dirty-rate: upper limit of dirty page rate (MB/s) for virtual CPUs.
+#
+# Since: 7.0
+#
+# Example:

[PATCH v20 5/9] accel/kvm/kvm-all: Introduce kvm_dirty_ring_size function

2022-03-15 Thread huangy81

From: Hyman Huang(黄勇) 

Introduce kvm_dirty_ring_size util function to help calculate
dirty ring ful time.

Signed-off-by: Hyman Huang(黄勇) 
Acked-by: Peter Xu 
---
 accel/kvm/kvm-all.c| 5 +
 accel/stubs/kvm-stub.c | 6 ++
 include/sysemu/kvm.h   | 2 ++
 3 files changed, 13 insertions(+)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 29bf6a0..0c78bc7 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -2313,6 +2313,11 @@ bool kvm_dirty_ring_enabled(void)
 return kvm_state->kvm_dirty_ring_size ? true : false;
 }
 
+uint32_t kvm_dirty_ring_size(void)
+{
+return kvm_state->kvm_dirty_ring_size;
+}
+
 static int kvm_init(MachineState *ms)
 {
 MachineClass *mc = MACHINE_GET_CLASS(ms);
diff --git a/accel/stubs/kvm-stub.c b/accel/stubs/kvm-stub.c
index 3345882..c5aafaa 100644
--- a/accel/stubs/kvm-stub.c
+++ b/accel/stubs/kvm-stub.c
@@ -148,3 +148,9 @@ bool kvm_dirty_ring_enabled(void)
 {
 return false;
 }
+
+uint32_t kvm_dirty_ring_size(void)
+{
+return 0;
+}
+#endif
diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index a783c78..efd6dee 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -582,4 +582,6 @@ bool kvm_cpu_check_are_resettable(void);
 bool kvm_arch_cpu_check_are_resettable(void);
 
 bool kvm_dirty_ring_enabled(void);
+
+uint32_t kvm_dirty_ring_size(void);
 #endif
-- 
1.8.3.1

[PATCH v20 4/9] softmmu/dirtylimit: Implement vCPU dirtyrate calculation periodically

2022-03-15 Thread huangy81

From: Hyman Huang(黄勇) 

Introduce the third method GLOBAL_DIRTY_LIMIT of dirty
tracking for calculate dirtyrate periodly for dirty page
rate limit.

Add dirtylimit.c to implement dirtyrate calculation periodly,
which will be used for dirty page rate limit.

Add dirtylimit.h to export util functions for dirty page rate
limit implementation.

Signed-off-by: Hyman Huang(黄勇) 
Reviewed-by: Peter Xu 
---
 include/exec/memory.h   |   5 +-
 include/sysemu/dirtylimit.h |  22 +
 softmmu/dirtylimit.c| 116 
 softmmu/meson.build |   1 +
 4 files changed, 143 insertions(+), 1 deletion(-)
 create mode 100644 include/sysemu/dirtylimit.h
 create mode 100644 softmmu/dirtylimit.c

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 4d5997e..88ca510 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -69,7 +69,10 @@ static inline void fuzz_dma_read_cb(size_t addr,
 /* Dirty tracking enabled because measuring dirty rate */
 #define GLOBAL_DIRTY_DIRTY_RATE (1U << 1)
 
-#define GLOBAL_DIRTY_MASK  (0x3)
+/* Dirty tracking enabled because dirty limit */
+#define GLOBAL_DIRTY_LIMIT  (1U << 2)
+
+#define GLOBAL_DIRTY_MASK  (0x7)
 
 extern unsigned int global_dirty_tracking;
 
diff --git a/include/sysemu/dirtylimit.h b/include/sysemu/dirtylimit.h
new file mode 100644
index 000..da459f0
--- /dev/null
+++ b/include/sysemu/dirtylimit.h
@@ -0,0 +1,22 @@
+/*
+ * Dirty page rate limit common functions
+ *
+ * Copyright (c) 2022 CHINA TELECOM CO.,LTD.
+ *
+ * Authors:
+ *  Hyman Huang(黄勇) 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+#ifndef QEMU_DIRTYRLIMIT_H
+#define QEMU_DIRTYRLIMIT_H
+
+#define DIRTYLIMIT_CALC_TIME_MS 1000/* 1000ms */
+
+int64_t vcpu_dirty_rate_get(int cpu_index);
+void vcpu_dirty_rate_stat_start(void);
+void vcpu_dirty_rate_stat_stop(void);
+void vcpu_dirty_rate_stat_initialize(void);
+void vcpu_dirty_rate_stat_finalize(void);
+#endif
diff --git a/softmmu/dirtylimit.c b/softmmu/dirtylimit.c
new file mode 100644
index 000..6102e8c
--- /dev/null
+++ b/softmmu/dirtylimit.c
@@ -0,0 +1,116 @@
+/*
+ * Dirty page rate limit implementation code
+ *
+ * Copyright (c) 2022 CHINA TELECOM CO.,LTD.
+ *
+ * Authors:
+ *  Hyman Huang(黄勇) 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "qemu/main-loop.h"
+#include "qapi/qapi-commands-migration.h"
+#include "sysemu/dirtyrate.h"
+#include "sysemu/dirtylimit.h"
+#include "exec/memory.h"
+#include "hw/boards.h"
+
+struct {
+VcpuStat stat;
+bool running;
+QemuThread thread;
+} *vcpu_dirty_rate_stat;
+
+static void vcpu_dirty_rate_stat_collect(void)
+{
+VcpuStat stat;
+int i = 0;
+
+/* calculate vcpu dirtyrate */
+vcpu_calculate_dirtyrate(DIRTYLIMIT_CALC_TIME_MS,
+ ,
+ GLOBAL_DIRTY_LIMIT,
+ false);
+
+for (i = 0; i < stat.nvcpu; i++) {
+vcpu_dirty_rate_stat->stat.rates[i].id = i;
+vcpu_dirty_rate_stat->stat.rates[i].dirty_rate =
+stat.rates[i].dirty_rate;
+}
+
+free(stat.rates);
+}
+
+static void *vcpu_dirty_rate_stat_thread(void *opaque)
+{
+rcu_register_thread();
+
+/* start log sync */
+global_dirty_log_change(GLOBAL_DIRTY_LIMIT, true);
+
+while (qatomic_read(_dirty_rate_stat->running)) {
+vcpu_dirty_rate_stat_collect();
+}
+
+/* stop log sync */
+global_dirty_log_change(GLOBAL_DIRTY_LIMIT, false);
+
+rcu_unregister_thread();
+return NULL;
+}
+
+int64_t vcpu_dirty_rate_get(int cpu_index)
+{
+DirtyRateVcpu *rates = vcpu_dirty_rate_stat->stat.rates;
+return qatomic_read([cpu_index].dirty_rate);
+}
+
+void vcpu_dirty_rate_stat_start(void)
+{
+if (qatomic_read(_dirty_rate_stat->running)) {
+return;
+}
+
+qatomic_set(_dirty_rate_stat->running, 1);
+qemu_thread_create(_dirty_rate_stat->thread,
+   "dirtyrate-stat",
+   vcpu_dirty_rate_stat_thread,
+   NULL,
+   QEMU_THREAD_JOINABLE);
+}
+
+void vcpu_dirty_rate_stat_stop(void)
+{
+qatomic_set(_dirty_rate_stat->running, 0);
+qemu_mutex_unlock_iothread();
+qemu_thread_join(_dirty_rate_stat->thread);
+qemu_mutex_lock_iothread();
+}
+
+void vcpu_dirty_rate_stat_initialize(void)
+{
+MachineState *ms = MACHINE(qdev_get_machine());
+int max_cpus = ms->smp.max_cpus;
+
+vcpu_dirty_rate_stat =
+g_malloc0(sizeof(*vcpu_dirty_rate_stat));
+
+vcpu_dirty_rate_stat->stat.nvcpu = max_cpus;
+vcpu_dirty_rate_stat->stat.rates =
+g_malloc0(sizeof(DirtyRateVcpu) * max_cpus);
+
+vcpu_dirty_rate_stat->running = false;
+}
+
+void

[PATCH v20 8/9] migration-test: Export migration-test util funtions

2022-03-15 Thread huangy81

From: Hyman Huang(黄勇) 

Dirtylimit qtest can reuse the mechanisms that have been
implemented by migration-test to start a vm, so export the
relevant util functions.

Signed-off-by: Hyman Huang(黄勇) 
---
 tests/qtest/migration-helpers.c |  95 +
 tests/qtest/migration-helpers.h |  15 ++
 tests/qtest/migration-test.c| 102 
 3 files changed, 110 insertions(+), 102 deletions(-)

diff --git a/tests/qtest/migration-helpers.c b/tests/qtest/migration-helpers.c
index 4ee2601..ffec54b 100644
--- a/tests/qtest/migration-helpers.c
+++ b/tests/qtest/migration-helpers.c
@@ -16,6 +16,7 @@
 #include "migration-helpers.h"
 
 bool got_stop;
+const char *tmpfs;
 
 static void check_stop_event(QTestState *who)
 {
@@ -188,3 +189,97 @@ void wait_for_migration_fail(QTestState *from, bool 
allow_active)
 g_assert(qdict_get_bool(rsp_return, "running"));
 qobject_unref(rsp_return);
 }
+
+void init_bootfile(const char *bootpath, void *content, size_t len)
+{
+FILE *bootfile = fopen(bootpath, "wb");
+
+g_assert_cmpint(fwrite(content, len, 1, bootfile), ==, 1);
+fclose(bootfile);
+}
+
+/*
+ * Wait for some output in the serial output file,
+ * we get an 'A' followed by an endless string of 'B's
+ * but on the destination we won't have the A.
+ */
+void wait_for_serial(const char *side)
+{
+g_autofree char *serialpath = g_strdup_printf("%s/%s", tmpfs, side);
+FILE *serialfile = fopen(serialpath, "r");
+const char *arch = qtest_get_arch();
+int started = (strcmp(side, "src_serial") == 0 &&
+   strcmp(arch, "ppc64") == 0) ? 0 : 1;
+
+do {
+int readvalue = fgetc(serialfile);
+
+if (!started) {
+/* SLOF prints its banner before starting test,
+ * to ignore it, mark the start of the test with '_',
+ * ignore all characters until this marker
+ */
+switch (readvalue) {
+case '_':
+started = 1;
+break;
+case EOF:
+fseek(serialfile, 0, SEEK_SET);
+usleep(1000);
+break;
+}
+continue;
+}
+switch (readvalue) {
+case 'A':
+/* Fine */
+break;
+
+case 'B':
+/* It's alive! */
+fclose(serialfile);
+return;
+
+case EOF:
+started = (strcmp(side, "src_serial") == 0 &&
+   strcmp(arch, "ppc64") == 0) ? 0 : 1;
+fseek(serialfile, 0, SEEK_SET);
+usleep(1000);
+break;
+
+default:
+fprintf(stderr, "Unexpected %d on %s serial\n", readvalue, side);
+g_assert_not_reached();
+}
+} while (true);
+}
+
+bool kvm_dirty_ring_supported(void)
+{
+#if defined(__linux__) && defined(HOST_X86_64)
+int ret, kvm_fd = open("/dev/kvm", O_RDONLY);
+
+if (kvm_fd < 0) {
+return false;
+}
+
+ret = ioctl(kvm_fd, KVM_CHECK_EXTENSION, KVM_CAP_DIRTY_LOG_RING);
+close(kvm_fd);
+
+/* We test with 4096 slots */
+if (ret < 4096) {
+return false;
+}
+
+return true;
+#else
+return false;
+#endif
+}
+
+void cleanup(const char *filename)
+{
+g_autofree char *path = g_strdup_printf("%s/%s", tmpfs, filename);
+
+unlink(path);
+}
diff --git a/tests/qtest/migration-helpers.h b/tests/qtest/migration-helpers.h
index d63bba9..d08551f 100644
--- a/tests/qtest/migration-helpers.h
+++ b/tests/qtest/migration-helpers.h
@@ -14,7 +14,14 @@
 
 #include "libqos/libqtest.h"
 
+/* For dirty ring test; so far only x86_64 is supported */
+#if defined(__linux__) && defined(HOST_X86_64)
+#include "linux/kvm.h"
+#endif
+#include 
+
 extern bool got_stop;
+extern const char *tmpfs;
 
 GCC_FMT_ATTR(3, 4)
 QDict *wait_command_fd(QTestState *who, int fd, const char *command, ...);
@@ -34,4 +41,12 @@ void wait_for_migration_complete(QTestState *who);
 
 void wait_for_migration_fail(QTestState *from, bool allow_active);
 
+void init_bootfile(const char *bootpath, void *content, size_t len);
+
+void wait_for_serial(const char *side);
+
+bool kvm_dirty_ring_supported(void);
+
+void cleanup(const char *filename);
+
 #endif /* MIGRATION_HELPERS_H_ */
diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 0870656..eec6dd0 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -27,11 +27,6 @@
 #include "migration-helpers.h"
 #include "tests/migration/migration-test.h"
 
-/* For dirty ring test; so far only x86_64 is supported */
-#if defined(__linux__) && defined(HOST_X86_64)
-#include "linux/kvm.h"
-#endif
-
 /* TODO actually test the results and get rid of this */
 #define qtest_qmp_discard_response(...) qobject_unref(qtest_qmp(__VA_ARGS__))
 
@@ -49,7 +44,6 @@ static bool uffd_feature_thread_id;
 
 #if defined(__linux__) && defined(__NR_userfaultfd) &&

[PATCH v20 0/9] support dirty restraint on vCPU

2022-03-15 Thread huangy81

From: Hyman Huang(黄勇) 

v20
- fix the style problems and let QEMU test pass
- change the dirty limit case logic:
  test fail if dirtyrate measurement 200ms timeout

v19
- rebase on master and fix conflicts
- add test case for dirty page rate limit

Ping.

Adding an test case and hope it can be merged along with previous
patchset by the way.

Please review. Thanks,

Regards
Yong

v18
- squash commit "Ignore query-vcpu-dirty-limit test" into
  "Implement dirty page rate limit" in  [PATCH v17] to make
  the modification logic self-contained. 

Please review. Thanks,

Regards
Yong 

v17
- rebase on master
- fix qmp-cmd-test 

v16
- rebase on master
- drop the unused typedef syntax in [PATCH v15 6/7] 
- add the Reviewed-by and Acked-by tags by the way 

v15
- rebase on master
- drop the 'init_time_ms' parameter in function vcpu_calculate_dirtyrate 
- drop the 'setup' field in dirtylimit_state and call dirtylimit_process
  directly, which makes code cleaner.
- code clean in dirtylimit_adjust_throttle
- fix miss dirtylimit_state_unlock() in dirtylimit_process and
  dirtylimit_query_all
- add some comment

Please review. Thanks,

Regards
Yong 

v14
- v13 sent by accident, resend patchset. 

v13
- rebase on master
- passing NULL to kvm_dirty_ring_reap in commit
  "refactor per-vcpu dirty ring reaping" to keep the logic unchanged.
  In other word, we still try the best to reap as much PFNs as possible
  if dirtylimit not in service.
- move the cpu list gen id changes into a separate patch.   
- release the lock before sleep during dirty page rate calculation.
- move the dirty ring size fetch logic into a separate patch.
- drop the DIRTYLIMIT_LINEAR_ADJUSTMENT_WATERMARK MACRO .
- substitute bh with function pointer when implement dirtylimit.
- merge the dirtylimit_start/stop into dirtylimit_change.
- fix "cpu-index" parameter type with "int" to keep consistency.
- fix some syntax error in documents.

Please review. Thanks,

Yong

v12
- rebase on master
- add a new commmit to refactor per-vcpu dirty ring reaping, which can resolve 
  the "vcpu miss the chances to sleep" problem
- remove the dirtylimit_thread and implemtment throttle in bottom half instead.
- let the dirty ring reaper thread keep sleeping when dirtylimit is in service 
- introduce cpu_list_generation_id to identify cpu_list changing. 
- keep taking the cpu_list_lock during dirty_stat_wait to prevent vcpu 
plug/unplug
  when calculating the dirty page rate
- move the dirtylimit global initializations out of dirtylimit_set_vcpu and do
  some code clean
- add DIRTYLIMIT_LINEAR_ADJUSTMENT_WATERMARK in case of oscillation when 
throttling 
- remove the unmatched count field in dirtylimit_state
- add stub to fix build on non-x86
- refactor the documents

Thanks Peter and Markus for reviewing the previous versions, please review.

Thanks,
Yong

v11
- rebase on master
- add a commit " refactor dirty page rate calculation"  so that dirty page rate 
limit
  can reuse the calculation logic. 
- handle the cpu hotplug/unplug case in the dirty page rate calculation logic.
- modify the qmp commands according to Markus's advice.
- introduce a standalone file dirtylimit.c to implement dirty page rate limit
- check if dirty limit in service by dirtylimit_state pointer instead of global 
variable
- introduce dirtylimit_mutex to protect dirtylimit_state
- do some code clean and docs

See the commit for more detail, thanks Markus and Peter very mush for the code
review and give the experienced and insightful advices, most modifications are
based on these advices.

v10:
- rebase on master
- make the following modifications on patch [1/3]:
  1. Make "dirtylimit-calc" thread joinable and join it after quitting.

  2. Add finalize function to free dirtylimit_calc_state

  3. Do some code clean work

- make the following modifications on patch [2/3]:
  1. Remove the original implementation of throttle according to
 Peter's advice.
 
  2. Introduce a negative feedback system and implement the throttle
 on all vcpu in one thread named "dirtylimit". 

  3. Simplify the algo when calculation the throttle_us_per_full:
 increase/decrease linearly when there exists a wide difference
 between quota and current dirty page rate, increase/decrease
 a fixed time slice when the difference is narrow. This makes
 throttle responds faster and reach the quota smoothly.

  4. Introduce a unfit_cnt in algo to make sure throttle really
 takes effect.

  5. Set the max sleep time 99 times more than "ring_full_time_us". 

 


 
  6. Make "dirtylimit" thread joinable and

[PATCH v20 6/9] softmmu/dirtylimit: Implement virtual CPU throttle

2022-03-15 Thread huangy81

From: Hyman Huang(黄勇) 

Setup a negative feedback system when vCPU thread
handling KVM_EXIT_DIRTY_RING_FULL exit by introducing
throttle_us_per_full field in struct CPUState. Sleep
throttle_us_per_full microseconds to throttle vCPU
if dirtylimit is in service.

Signed-off-by: Hyman Huang(黄勇) 
Reviewed-by: Peter Xu 
---
 accel/kvm/kvm-all.c |  20 ++-
 include/hw/core/cpu.h   |   6 +
 include/sysemu/dirtylimit.h |  15 +++
 softmmu/dirtylimit.c| 291 
 softmmu/trace-events|   7 ++
 5 files changed, 338 insertions(+), 1 deletion(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 0c78bc7..2e6f319 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -45,6 +45,7 @@
 #include "qemu/guest-random.h"
 #include "sysemu/hw_accel.h"
 #include "kvm-cpus.h"
+#include "sysemu/dirtylimit.h"
 
 #include "hw/boards.h"
 
@@ -476,6 +477,7 @@ int kvm_init_vcpu(CPUState *cpu, Error **errp)
 cpu->kvm_state = s;
 cpu->vcpu_dirty = true;
 cpu->dirty_pages = 0;
+cpu->throttle_us_per_full = 0;
 
 mmap_size = kvm_ioctl(s, KVM_GET_VCPU_MMAP_SIZE, 0);
 if (mmap_size < 0) {
@@ -1469,6 +1471,11 @@ static void *kvm_dirty_ring_reaper_thread(void *data)
  */
 sleep(1);
 
+/* keep sleeping so that dirtylimit not be interfered by reaper */
+if (dirtylimit_in_service()) {
+continue;
+}
+
 trace_kvm_dirty_ring_reaper("wakeup");
 r->reaper_state = KVM_DIRTY_RING_REAPER_REAPING;
 
@@ -2965,8 +2972,19 @@ int kvm_cpu_exec(CPUState *cpu)
  */
 trace_kvm_dirty_ring_full(cpu->cpu_index);
 qemu_mutex_lock_iothread();
-kvm_dirty_ring_reap(kvm_state, NULL);
+/*
+ * We throttle vCPU by making it sleep once it exit from kernel
+ * due to dirty ring full. In the dirtylimit scenario, reaping
+ * all vCPUs after a single vCPU dirty ring get full result in
+ * the miss of sleep, so just reap the ring-fulled vCPU.
+ */
+if (dirtylimit_in_service()) {
+kvm_dirty_ring_reap(kvm_state, cpu);
+} else {
+kvm_dirty_ring_reap(kvm_state, NULL);
+}
 qemu_mutex_unlock_iothread();
+dirtylimit_vcpu_execute(cpu);
 ret = 0;
 break;
 case KVM_EXIT_SYSTEM_EVENT:
diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
index 0efc615..a3f6b2d 100644
--- a/include/hw/core/cpu.h
+++ b/include/hw/core/cpu.h
@@ -418,6 +418,12 @@ struct CPUState {
  */
 bool throttle_thread_scheduled;
 
+/*
+ * Sleep throttle_us_per_full microseconds once dirty ring is full
+ * if dirty page rate limit is enabled.
+ */
+int64_t throttle_us_per_full;
+
 bool ignore_memory_transaction_failures;
 
 /* Used for user-only emulation of prctl(PR_SET_UNALIGN). */
diff --git a/include/sysemu/dirtylimit.h b/include/sysemu/dirtylimit.h
index da459f0..8d2c1f3 100644
--- a/include/sysemu/dirtylimit.h
+++ b/include/sysemu/dirtylimit.h
@@ -19,4 +19,19 @@ void vcpu_dirty_rate_stat_start(void);
 void vcpu_dirty_rate_stat_stop(void);
 void vcpu_dirty_rate_stat_initialize(void);
 void vcpu_dirty_rate_stat_finalize(void);
+
+void dirtylimit_state_lock(void);
+void dirtylimit_state_unlock(void);
+void dirtylimit_state_initialize(void);
+void dirtylimit_state_finalize(void);
+bool dirtylimit_in_service(void);
+bool dirtylimit_vcpu_index_valid(int cpu_index);
+void dirtylimit_process(void);
+void dirtylimit_change(bool start);
+void dirtylimit_set_vcpu(int cpu_index,
+ uint64_t quota,
+ bool enable);
+void dirtylimit_set_all(uint64_t quota,
+bool enable);
+void dirtylimit_vcpu_execute(CPUState *cpu);
 #endif
diff --git a/softmmu/dirtylimit.c b/softmmu/dirtylimit.c
index 6102e8c..76d0b44 100644
--- a/softmmu/dirtylimit.c
+++ b/softmmu/dirtylimit.c
@@ -18,6 +18,26 @@
 #include "sysemu/dirtylimit.h"
 #include "exec/memory.h"
 #include "hw/boards.h"
+#include "sysemu/kvm.h"
+#include "trace.h"
+
+/*
+ * Dirtylimit stop working if dirty page rate error
+ * value less than DIRTYLIMIT_TOLERANCE_RANGE
+ */
+#define DIRTYLIMIT_TOLERANCE_RANGE  25  /* MB/s */
+/*
+ * Plus or minus vcpu sleep time linearly if dirty
+ * page rate error value percentage over
+ * DIRTYLIMIT_LINEAR_ADJUSTMENT_PCT.
+ * Otherwise, plus or minus a fixed vcpu sleep time.
+ */
+#define DIRTYLIMIT_LINEAR_ADJUSTMENT_PCT 50
+/*
+ * Max vcpu sleep time percentage during a cycle
+ * composed of dirty ring full and sleep time.
+ */
+#define DIRTYLIMIT_THROTTLE_PCT_MAX 99
 
 struct {
 VcpuStat stat;
@@ -25,6 +45,30 @@ struct {
 QemuThread thread;
 } *vcpu_dirty_rate_stat;
 
+typedef struct VcpuDirtyLimitState {
+int cpu_index;
+bool enabled;
+/*
+ * Quota dirty page rate, unit is MB/s
+ * zero if not

[PATCH v20 3/9] migration/dirtyrate: Refactor dirty page rate calculation

2022-03-15 Thread huangy81

From: Hyman Huang(黄勇) 

abstract out dirty log change logic into function
global_dirty_log_change.

abstract out dirty page rate calculation logic via
dirty-ring into function vcpu_calculate_dirtyrate.

abstract out mathematical dirty page rate calculation
into do_calculate_dirtyrate, decouple it from DirtyStat.

rename set_sample_page_period to dirty_stat_wait, which
is well-understood and will be reused in dirtylimit.

handle cpu hotplug/unplug scenario during measurement of
dirty page rate.

export util functions outside migration.

Signed-off-by: Hyman Huang(黄勇) 
Reviewed-by: Peter Xu 
---
 include/sysemu/dirtyrate.h |  28 ++
 migration/dirtyrate.c  | 227 -
 migration/dirtyrate.h  |   7 +-
 3 files changed, 174 insertions(+), 88 deletions(-)
 create mode 100644 include/sysemu/dirtyrate.h

diff --git a/include/sysemu/dirtyrate.h b/include/sysemu/dirtyrate.h
new file mode 100644
index 000..4d3b9a4
--- /dev/null
+++ b/include/sysemu/dirtyrate.h
@@ -0,0 +1,28 @@
+/*
+ * dirty page rate helper functions
+ *
+ * Copyright (c) 2022 CHINA TELECOM CO.,LTD.
+ *
+ * Authors:
+ *  Hyman Huang(黄勇) 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef QEMU_DIRTYRATE_H
+#define QEMU_DIRTYRATE_H
+
+typedef struct VcpuStat {
+int nvcpu; /* number of vcpu */
+DirtyRateVcpu *rates; /* array of dirty rate for each vcpu */
+} VcpuStat;
+
+int64_t vcpu_calculate_dirtyrate(int64_t calc_time_ms,
+ VcpuStat *stat,
+ unsigned int flag,
+ bool one_shot);
+
+void global_dirty_log_change(unsigned int flag,
+ bool start);
+#endif
diff --git a/migration/dirtyrate.c b/migration/dirtyrate.c
index d65e744..79348de 100644
--- a/migration/dirtyrate.c
+++ b/migration/dirtyrate.c
@@ -46,7 +46,7 @@ static struct DirtyRateStat DirtyStat;
 static DirtyRateMeasureMode dirtyrate_mode =
 DIRTY_RATE_MEASURE_MODE_PAGE_SAMPLING;
 
-static int64_t set_sample_page_period(int64_t msec, int64_t initial_time)
+static int64_t dirty_stat_wait(int64_t msec, int64_t initial_time)
 {
 int64_t current_time;
 
@@ -60,6 +60,132 @@ static int64_t set_sample_page_period(int64_t msec, int64_t 
initial_time)
 return msec;
 }
 
+static inline void record_dirtypages(DirtyPageRecord *dirty_pages,
+ CPUState *cpu, bool start)
+{
+if (start) {
+dirty_pages[cpu->cpu_index].start_pages = cpu->dirty_pages;
+} else {
+dirty_pages[cpu->cpu_index].end_pages = cpu->dirty_pages;
+}
+}
+
+static int64_t do_calculate_dirtyrate(DirtyPageRecord dirty_pages,
+  int64_t calc_time_ms)
+{
+uint64_t memory_size_MB;
+uint64_t increased_dirty_pages =
+dirty_pages.end_pages - dirty_pages.start_pages;
+
+memory_size_MB = (increased_dirty_pages * TARGET_PAGE_SIZE) >> 20;
+
+return memory_size_MB * 1000 / calc_time_ms;
+}
+
+void global_dirty_log_change(unsigned int flag, bool start)
+{
+qemu_mutex_lock_iothread();
+if (start) {
+memory_global_dirty_log_start(flag);
+} else {
+memory_global_dirty_log_stop(flag);
+}
+qemu_mutex_unlock_iothread();
+}
+
+/*
+ * global_dirty_log_sync
+ * 1. sync dirty log from kvm
+ * 2. stop dirty tracking if needed.
+ */
+static void global_dirty_log_sync(unsigned int flag, bool one_shot)
+{
+qemu_mutex_lock_iothread();
+memory_global_dirty_log_sync();
+if (one_shot) {
+memory_global_dirty_log_stop(flag);
+}
+qemu_mutex_unlock_iothread();
+}
+
+static DirtyPageRecord *vcpu_dirty_stat_alloc(VcpuStat *stat)
+{
+CPUState *cpu;
+DirtyPageRecord *records;
+int nvcpu = 0;
+
+CPU_FOREACH(cpu) {
+nvcpu++;
+}
+
+stat->nvcpu = nvcpu;
+stat->rates = g_malloc0(sizeof(DirtyRateVcpu) * nvcpu);
+
+records = g_malloc0(sizeof(DirtyPageRecord) * nvcpu);
+
+return records;
+}
+
+static void vcpu_dirty_stat_collect(VcpuStat *stat,
+DirtyPageRecord *records,
+bool start)
+{
+CPUState *cpu;
+
+CPU_FOREACH(cpu) {
+record_dirtypages(records, cpu, start);
+}
+}
+
+int64_t vcpu_calculate_dirtyrate(int64_t calc_time_ms,
+ VcpuStat *stat,
+ unsigned int flag,
+ bool one_shot)
+{
+DirtyPageRecord *records;
+int64_t init_time_ms;
+int64_t duration;
+int64_t dirtyrate;
+int i = 0;
+unsigned int gen_id;
+
+retry:
+init_time_ms = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+
+cpu_list_lock();
+gen_id = cpu_list_generation_id_get();
+records = vcpu_dirty_stat_alloc(stat);
+vcpu_dirty_stat_collect(stat, records, true);
+

[PATCH v20 1/9] accel/kvm/kvm-all: Refactor per-vcpu dirty ring reaping

2022-03-15 Thread huangy81

From: Hyman Huang(黄勇) 

Add a non-required argument 'CPUState' to kvm_dirty_ring_reap so
that it can cover single vcpu dirty-ring-reaping scenario.

Signed-off-by: Hyman Huang(黄勇) 
Reviewed-by: Peter Xu 
---
 accel/kvm/kvm-all.c | 23 +--
 1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 27864df..29bf6a0 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -756,17 +756,20 @@ static uint32_t kvm_dirty_ring_reap_one(KVMState *s, 
CPUState *cpu)
 }
 
 /* Must be with slots_lock held */
-static uint64_t kvm_dirty_ring_reap_locked(KVMState *s)
+static uint64_t kvm_dirty_ring_reap_locked(KVMState *s, CPUState* cpu)
 {
 int ret;
-CPUState *cpu;
 uint64_t total = 0;
 int64_t stamp;
 
 stamp = get_clock();
 
-CPU_FOREACH(cpu) {
-total += kvm_dirty_ring_reap_one(s, cpu);
+if (cpu) {
+total = kvm_dirty_ring_reap_one(s, cpu);
+} else {
+CPU_FOREACH(cpu) {
+total += kvm_dirty_ring_reap_one(s, cpu);
+}
 }
 
 if (total) {
@@ -787,7 +790,7 @@ static uint64_t kvm_dirty_ring_reap_locked(KVMState *s)
  * Currently for simplicity, we must hold BQL before calling this.  We can
  * consider to drop the BQL if we're clear with all the race conditions.
  */
-static uint64_t kvm_dirty_ring_reap(KVMState *s)
+static uint64_t kvm_dirty_ring_reap(KVMState *s, CPUState *cpu)
 {
 uint64_t total;
 
@@ -807,7 +810,7 @@ static uint64_t kvm_dirty_ring_reap(KVMState *s)
  * reset below.
  */
 kvm_slots_lock();
-total = kvm_dirty_ring_reap_locked(s);
+total = kvm_dirty_ring_reap_locked(s, cpu);
 kvm_slots_unlock();
 
 return total;
@@ -854,7 +857,7 @@ static void kvm_dirty_ring_flush(void)
  * vcpus out in a synchronous way.
  */
 kvm_cpu_synchronize_kick_all();
-kvm_dirty_ring_reap(kvm_state);
+kvm_dirty_ring_reap(kvm_state, NULL);
 trace_kvm_dirty_ring_flush(1);
 }
 
@@ -1398,7 +1401,7 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml,
  * Not easy.  Let's cross the fingers until it's fixed.
  */
 if (kvm_state->kvm_dirty_ring_size) {
-kvm_dirty_ring_reap_locked(kvm_state);
+kvm_dirty_ring_reap_locked(kvm_state, NULL);
 } else {
 kvm_slot_get_dirty_log(kvm_state, mem);
 }
@@ -1470,7 +1473,7 @@ static void *kvm_dirty_ring_reaper_thread(void *data)
 r->reaper_state = KVM_DIRTY_RING_REAPER_REAPING;
 
 qemu_mutex_lock_iothread();
-kvm_dirty_ring_reap(s);
+kvm_dirty_ring_reap(s, NULL);
 qemu_mutex_unlock_iothread();
 
 r->reaper_iteration++;
@@ -2957,7 +2960,7 @@ int kvm_cpu_exec(CPUState *cpu)
  */
 trace_kvm_dirty_ring_full(cpu->cpu_index);
 qemu_mutex_lock_iothread();
-kvm_dirty_ring_reap(kvm_state);
+kvm_dirty_ring_reap(kvm_state, NULL);
 qemu_mutex_unlock_iothread();
 ret = 0;
 break;
-- 
1.8.3.1

[PATCH v20 2/9] cpus: Introduce cpu_list_generation_id

2022-03-15 Thread huangy81

From: Hyman Huang(黄勇) 

Introduce cpu_list_generation_id to track cpu list generation so
that cpu hotplug/unplug can be detected during measurement of
dirty page rate.

cpu_list_generation_id could be used to detect changes of cpu
list, which is prepared for dirty page rate measurement.

Signed-off-by: Hyman Huang(黄勇) 
Reviewed-by: Peter Xu 
---
 cpus-common.c | 8 
 include/exec/cpu-common.h | 1 +
 2 files changed, 9 insertions(+)

diff --git a/cpus-common.c b/cpus-common.c
index 6e73d3e..31c6415 100644
--- a/cpus-common.c
+++ b/cpus-common.c
@@ -73,6 +73,12 @@ static int cpu_get_free_index(void)
 }
 
 CPUTailQ cpus = QTAILQ_HEAD_INITIALIZER(cpus);
+static unsigned int cpu_list_generation_id;
+
+unsigned int cpu_list_generation_id_get(void)
+{
+return cpu_list_generation_id;
+}
 
 void cpu_list_add(CPUState *cpu)
 {
@@ -84,6 +90,7 @@ void cpu_list_add(CPUState *cpu)
 assert(!cpu_index_auto_assigned);
 }
 QTAILQ_INSERT_TAIL_RCU(, cpu, node);
+cpu_list_generation_id++;
 }
 
 void cpu_list_remove(CPUState *cpu)
@@ -96,6 +103,7 @@ void cpu_list_remove(CPUState *cpu)
 
 QTAILQ_REMOVE_RCU(, cpu, node);
 cpu->cpu_index = UNASSIGNED_CPU_INDEX;
+cpu_list_generation_id++;
 }
 
 CPUState *qemu_get_cpu(int index)
diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
index 7f7b594..856b0e7 100644
--- a/include/exec/cpu-common.h
+++ b/include/exec/cpu-common.h
@@ -32,6 +32,7 @@ extern intptr_t qemu_host_page_mask;
 void qemu_init_cpu_list(void);
 void cpu_list_lock(void);
 void cpu_list_unlock(void);
+unsigned int cpu_list_generation_id_get(void);
 
 void tcg_flush_softmmu_tlb(CPUState *cs);
 
-- 
1.8.3.1

[PATCH v2 2/2] target/riscv: Allow software access to MIP SEIP

2022-03-15 Thread Alistair Francis

From: Alistair Francis 

The RISC-V specification states that:
  "Supervisor-level external interrupts are made pending based on the
  logical-OR of the software-writable SEIP bit and the signal from the
  external interrupt controller."

We currently only allow either the interrupt controller or software to
set the bit, which is incorrect.

This patch removes the miclaim mask when writing MIP to allow M-mode
software to inject interrupts, even with an interrupt controller.

We then also need to keep track of which source is setting MIP_SEIP. The
final value is a OR of both, so we add two bools and use that to keep
track of the current state. This way either source can change without
losing the correct value.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/904
Signed-off-by: Alistair Francis 
Reviewed-by: Bin Meng 
---
 target/riscv/cpu.h |  8 
 target/riscv/cpu.c | 10 +-
 target/riscv/csr.c |  8 ++--
 3 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index c069fe85fa..05d40f8dbd 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -173,6 +173,14 @@ struct CPUArchState {
 uint64_t mstatus;
 
 uint64_t mip;
+/*
+ * MIP contains the software writable version of SEIP ORed with the
+ * external interrupt value. The MIP register is always up-to-date.
+ * To keep track of the current source, we also save booleans of the values
+ * here.
+ */
+bool external_seip;
+bool software_seip;
 
 uint64_t miclaim;
 
diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 41b757995d..68373b769c 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -706,7 +706,6 @@ static void riscv_cpu_set_irq(void *opaque, int irq, int 
level)
 case IRQ_VS_TIMER:
 case IRQ_M_TIMER:
 case IRQ_U_EXT:
-case IRQ_S_EXT:
 case IRQ_VS_EXT:
 case IRQ_M_EXT:
 if (kvm_enabled()) {
@@ -715,6 +714,15 @@ static void riscv_cpu_set_irq(void *opaque, int irq, int 
level)
 riscv_cpu_update_mip(cpu, 1 << irq, BOOL_TO_MASK(level));
 }
  break;
+case IRQ_S_EXT:
+if (kvm_enabled()) {
+kvm_riscv_set_irq(cpu, irq, level);
+} else {
+env->external_seip = level;
+riscv_cpu_update_mip(cpu, 1 << irq,
+ BOOL_TO_MASK(level | env->software_seip));
+}
+break;
 default:
 g_assert_not_reached();
 }
diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index 0606cd0ea8..48e78cf91e 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -1403,10 +1403,14 @@ static RISCVException rmw_mip64(CPURISCVState *env, int 
csrno,
 uint64_t new_val, uint64_t wr_mask)
 {
 RISCVCPU *cpu = env_archcpu(env);
-/* Allow software control of delegable interrupts not claimed by hardware 
*/
-uint64_t old_mip, mask = wr_mask & delegable_ints & ~env->miclaim;
+uint64_t old_mip, mask = wr_mask & delegable_ints;
 uint32_t gin;
 
+if (mask & MIP_SEIP) {
+env->software_seip = new_val & MIP_SEIP;
+}
+new_val |= env->external_seip << IRQ_S_EXT;
+
 if (mask) {
 old_mip = riscv_cpu_update_mip(cpu, mask, (new_val & mask));
 } else {
-- 
2.35.1

[PATCH v2 1/2] target/riscv: cpu: Fixup indentation

2022-03-15 Thread Alistair Francis

From: Alistair Francis 

Signed-off-by: Alistair Francis 
Reviewed-by: Bin Meng 
---
 target/riscv/cpu.c | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index ddda4906ff..41b757995d 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -567,18 +567,18 @@ static void riscv_cpu_realize(DeviceState *dev, Error 
**errp)
 if (cpu->cfg.ext_i && cpu->cfg.ext_e) {
 error_setg(errp,
"I and E extensions are incompatible");
-   return;
-   }
+return;
+}
 
 if (!cpu->cfg.ext_i && !cpu->cfg.ext_e) {
 error_setg(errp,
"Either I or E extension must be set");
-   return;
-   }
+return;
+}
 
-   if (cpu->cfg.ext_g && !(cpu->cfg.ext_i & cpu->cfg.ext_m &
-   cpu->cfg.ext_a & cpu->cfg.ext_f &
-   cpu->cfg.ext_d)) {
+if (cpu->cfg.ext_g && !(cpu->cfg.ext_i & cpu->cfg.ext_m &
+cpu->cfg.ext_a & cpu->cfg.ext_f &
+cpu->cfg.ext_d)) {
 warn_report("Setting G will also set IMAFD");
 cpu->cfg.ext_i = true;
 cpu->cfg.ext_m = true;
@@ -709,11 +709,11 @@ static void riscv_cpu_set_irq(void *opaque, int irq, int 
level)
 case IRQ_S_EXT:
 case IRQ_VS_EXT:
 case IRQ_M_EXT:
- if (kvm_enabled()) {
+if (kvm_enabled()) {
 kvm_riscv_set_irq(cpu, irq, level);
- } else {
+} else {
 riscv_cpu_update_mip(cpu, 1 << irq, BOOL_TO_MASK(level));
- }
+}
  break;
 default:
 g_assert_not_reached();
-- 
2.35.1

[PATCH v2 0/2] target/riscv: Allow software access to MIP SEIP

2022-03-15 Thread Alistair Francis

From: Alistair Francis 

The RISC-V specification states that:
  "Supervisor-level external interrupts are made pending based on the
  logical-OR of the software-writable SEIP bit and the signal from the
  external interrupt controller."

We currently only allow either the interrupt controller or software to
set the bit, which is incorrect.

This patch removes the miclaim mask when writing MIP to allow M-mode
software to inject interrupts, even with an interrupt controller.

We then also need to keep track of which source is setting MIP_SEIP. The
final value is a OR of both, so we add two bools and use that to keep
track of the current state. This way either source can change without
losing the correct value.

This fixes: https://gitlab.com/qemu-project/qemu/-/issues/904

Alistair Francis (2):
  target/riscv: cpu: Fixup indentation
  target/riscv: Allow software access to MIP SEIP

 target/riscv/cpu.h |  8 
 target/riscv/cpu.c | 30 +++---
 target/riscv/csr.c |  8 ++--
 3 files changed, 33 insertions(+), 13 deletions(-)

-- 
2.35.1

[PATCH 1/1] MAINTAINERS: Update maintainers for Guest x86 HAXM CPUs

2022-03-15 Thread Wang, Wenchao

diff --git a/MAINTAINERS b/MAINTAINERS
index f2e9ce1da2..36f877cf74 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -492,7 +492,6 @@ Guest CPU Cores (HAXM)
-
X86 HAXM CPUs
M: Wenchao Wang 
-M: Colin Xu 
L: haxm-t...@intel.com
W: https://github.com/intel/haxm/issues
S: Maintained
--
2.17.1


Best Regards,
Wenchao



0001-MAINTAINERS-Update-maintainers-for-Guest-x86-HAXM-CP.patch
Description: 0001-MAINTAINERS-Update-maintainers-for-Guest-x86-HAXM-CP.patch

[PATCH v19 9/9] tests: Add dirty page rate limit test

2022-03-15 Thread huangy81

From: Hyman Huang(黄勇) 

Add dirty page rate limit test if kernel support dirty ring,
create a standalone file to implement the test case.

The following qmp commands are covered by this test case:
"calc-dirty-rate", "query-dirty-rate", "set-vcpu-dirty-limit",
"cancel-vcpu-dirty-limit" and "query-vcpu-dirty-limit".

Signed-off-by: Hyman Huang(黄勇) 
---
 tests/qtest/dirtylimit-test.c | 309 ++
 tests/qtest/meson.build   |   2 +
 2 files changed, 311 insertions(+)
 create mode 100644 tests/qtest/dirtylimit-test.c

diff --git a/tests/qtest/dirtylimit-test.c b/tests/qtest/dirtylimit-test.c
new file mode 100644
index 000..35d9f7b
--- /dev/null
+++ b/tests/qtest/dirtylimit-test.c
@@ -0,0 +1,309 @@
+/*
+ * QTest testcase for Dirty Page Rate Limit
+ *
+ * Copyright (c) 2022 CHINA TELECOM CO.,LTD.
+ *
+ * Authors:
+ *  Hyman Huang(黄勇) 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "libqos/libqtest.h"
+#include "qapi/qmp/qdict.h"
+#include "qapi/qmp/qlist.h"
+#include "qapi/qobject-input-visitor.h"
+#include "qapi/qobject-output-visitor.h"
+
+#include "migration-helpers.h"
+#include "tests/migration/i386/a-b-bootblock.h"
+
+/*
+ * Dirtylimit stop working if dirty page rate error
+ * value less than DIRTYLIMIT_TOLERANCE_RANGE
+ */
+#define DIRTYLIMIT_TOLERANCE_RANGE  25  /* MB/s */
+
+static QDict *qmp_command(QTestState *who, const char *command, ...)
+{
+va_list ap;
+QDict *resp, *ret;
+
+va_start(ap, command);
+resp = qtest_vqmp(who, command, ap);
+va_end(ap);
+
+g_assert(!qdict_haskey(resp, "error"));
+g_assert(qdict_haskey(resp, "return"));
+
+ret = qdict_get_qdict(resp, "return");
+qobject_ref(ret);
+qobject_unref(resp);
+
+return ret;
+}
+
+static void calc_dirty_rate(QTestState *who, uint64_t calc_time)
+{
+qobject_unref(qmp_command(who,
+  "{ 'execute': 'calc-dirty-rate',"
+  "'arguments': { "
+  "'calc-time': %ld,"
+  "'mode': 'dirty-ring' }}",
+  calc_time));
+}
+
+static QDict *query_dirty_rate(QTestState *who)
+{
+return qmp_command(who, "{ 'execute': 'query-dirty-rate' }");
+}
+
+static void dirtylimit_set_all(QTestState *who, uint64_t dirtyrate)
+{
+qobject_unref(qmp_command(who,
+  "{ 'execute': 'set-vcpu-dirty-limit',"
+  "'arguments': { "
+  "'dirty-rate': %ld } }",
+  dirtyrate));
+}
+
+static void cancel_vcpu_dirty_limit(QTestState *who)
+{
+qobject_unref(qmp_command(who,
+  "{ 'execute': 'cancel-vcpu-dirty-limit' }"));
+}
+
+static QDict *query_vcpu_dirty_limit(QTestState *who)
+{
+QDict *rsp;
+
+rsp = qtest_qmp(who, "{ 'execute': 'query-vcpu-dirty-limit' }");
+g_assert(!qdict_haskey(rsp, "error"));
+g_assert(qdict_haskey(rsp, "return"));
+
+return rsp;
+}
+
+static bool calc_dirtyrate_ready(QTestState *who)
+{
+QDict *rsp_return;
+gchar *status;
+
+rsp_return = query_dirty_rate(who);
+g_assert(rsp_return);
+
+status = g_strdup(qdict_get_str(rsp_return, "status"));
+g_assert(status);
+
+return g_strcmp0(status, "measuring");
+}
+
+static void wait_for_calc_dirtyrate_complete(QTestState *who,
+ int64_t timeout)
+{
+usleep(timeout);
+/*
+ * Make the test hang until calc dirtyrate return "measured".
+ */
+while (!calc_dirtyrate_ready(who)) {
+usleep(1000);
+}
+}
+
+static int64_t get_dirty_rate(QTestState *who)
+{
+QDict *rsp_return;
+gchar *status;
+QList *rates;
+const QListEntry *entry;
+QDict *rate;
+int64_t dirtyrate;
+
+rsp_return = query_dirty_rate(who);
+g_assert(rsp_return);
+
+status = g_strdup(qdict_get_str(rsp_return, "status"));
+g_assert(status);
+g_assert_cmpstr(status, ==, "measured");
+
+rates = qdict_get_qlist(rsp_return, "vcpu-dirty-rate");
+g_assert(rates && !qlist_empty(rates));
+
+entry = qlist_first(rates);
+g_assert(entry);
+
+rate = qobject_to(QDict, qlist_entry_obj(entry));
+g_assert(rate);
+
+dirtyrate = qdict_get_try_int(rate, "dirty-rate", -1);
+
+qobject_unref(rsp_return);
+return dirtyrate;
+}
+
+static int64_t get_limit_rate(QTestState *who)
+{
+QDict *rsp_return;
+QList *rates;
+const QListEntry *entry;
+QDict *rate;
+int64_t dirtyrate;
+
+rsp_return = query_vcpu_dirty_limit(who);
+g_assert(rsp_return);
+
+rates = qdict_get_qlist(rsp_return, "return");
+g_assert(rates && !qlist_empty(rates));
+
+entry = qlist_first(rates);
+g_assert(entry);
+
+rate = qobject_to(QDict, qlist_entry_obj(entry));
+g_assert(rate);
+
+dirtyrate = qdict_get_try_int(rate, "limit-rate", -1);
+
+qobject_unref(rsp_return);
+return dirtyrate;

[PATCH v19 8/9] migration-test: Export migration-test util funtions

2022-03-15 Thread huangy81

From: Hyman Huang(黄勇) 

Dirtylimit qtest can reuse the mechanisms that have been
implemented by migration-test to start a vm, so export the
relevant util functions.

Signed-off-by: Hyman Huang(黄勇) 
---
 tests/qtest/migration-helpers.c |  95 +
 tests/qtest/migration-helpers.h |  15 ++
 tests/qtest/migration-test.c| 102 
 3 files changed, 110 insertions(+), 102 deletions(-)

diff --git a/tests/qtest/migration-helpers.c b/tests/qtest/migration-helpers.c
index 4ee2601..ffec54b 100644
--- a/tests/qtest/migration-helpers.c
+++ b/tests/qtest/migration-helpers.c
@@ -16,6 +16,7 @@
 #include "migration-helpers.h"
 
 bool got_stop;
+const char *tmpfs;
 
 static void check_stop_event(QTestState *who)
 {
@@ -188,3 +189,97 @@ void wait_for_migration_fail(QTestState *from, bool 
allow_active)
 g_assert(qdict_get_bool(rsp_return, "running"));
 qobject_unref(rsp_return);
 }
+
+void init_bootfile(const char *bootpath, void *content, size_t len)
+{
+FILE *bootfile = fopen(bootpath, "wb");
+
+g_assert_cmpint(fwrite(content, len, 1, bootfile), ==, 1);
+fclose(bootfile);
+}
+
+/*
+ * Wait for some output in the serial output file,
+ * we get an 'A' followed by an endless string of 'B's
+ * but on the destination we won't have the A.
+ */
+void wait_for_serial(const char *side)
+{
+g_autofree char *serialpath = g_strdup_printf("%s/%s", tmpfs, side);
+FILE *serialfile = fopen(serialpath, "r");
+const char *arch = qtest_get_arch();
+int started = (strcmp(side, "src_serial") == 0 &&
+   strcmp(arch, "ppc64") == 0) ? 0 : 1;
+
+do {
+int readvalue = fgetc(serialfile);
+
+if (!started) {
+/* SLOF prints its banner before starting test,
+ * to ignore it, mark the start of the test with '_',
+ * ignore all characters until this marker
+ */
+switch (readvalue) {
+case '_':
+started = 1;
+break;
+case EOF:
+fseek(serialfile, 0, SEEK_SET);
+usleep(1000);
+break;
+}
+continue;
+}
+switch (readvalue) {
+case 'A':
+/* Fine */
+break;
+
+case 'B':
+/* It's alive! */
+fclose(serialfile);
+return;
+
+case EOF:
+started = (strcmp(side, "src_serial") == 0 &&
+   strcmp(arch, "ppc64") == 0) ? 0 : 1;
+fseek(serialfile, 0, SEEK_SET);
+usleep(1000);
+break;
+
+default:
+fprintf(stderr, "Unexpected %d on %s serial\n", readvalue, side);
+g_assert_not_reached();
+}
+} while (true);
+}
+
+bool kvm_dirty_ring_supported(void)
+{
+#if defined(__linux__) && defined(HOST_X86_64)
+int ret, kvm_fd = open("/dev/kvm", O_RDONLY);
+
+if (kvm_fd < 0) {
+return false;
+}
+
+ret = ioctl(kvm_fd, KVM_CHECK_EXTENSION, KVM_CAP_DIRTY_LOG_RING);
+close(kvm_fd);
+
+/* We test with 4096 slots */
+if (ret < 4096) {
+return false;
+}
+
+return true;
+#else
+return false;
+#endif
+}
+
+void cleanup(const char *filename)
+{
+g_autofree char *path = g_strdup_printf("%s/%s", tmpfs, filename);
+
+unlink(path);
+}
diff --git a/tests/qtest/migration-helpers.h b/tests/qtest/migration-helpers.h
index d63bba9..d08551f 100644
--- a/tests/qtest/migration-helpers.h
+++ b/tests/qtest/migration-helpers.h
@@ -14,7 +14,14 @@
 
 #include "libqos/libqtest.h"
 
+/* For dirty ring test; so far only x86_64 is supported */
+#if defined(__linux__) && defined(HOST_X86_64)
+#include "linux/kvm.h"
+#endif
+#include 
+
 extern bool got_stop;
+extern const char *tmpfs;
 
 GCC_FMT_ATTR(3, 4)
 QDict *wait_command_fd(QTestState *who, int fd, const char *command, ...);
@@ -34,4 +41,12 @@ void wait_for_migration_complete(QTestState *who);
 
 void wait_for_migration_fail(QTestState *from, bool allow_active);
 
+void init_bootfile(const char *bootpath, void *content, size_t len);
+
+void wait_for_serial(const char *side);
+
+bool kvm_dirty_ring_supported(void);
+
+void cleanup(const char *filename);
+
 #endif /* MIGRATION_HELPERS_H_ */
diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 0870656..eec6dd0 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -27,11 +27,6 @@
 #include "migration-helpers.h"
 #include "tests/migration/migration-test.h"
 
-/* For dirty ring test; so far only x86_64 is supported */
-#if defined(__linux__) && defined(HOST_X86_64)
-#include "linux/kvm.h"
-#endif
-
 /* TODO actually test the results and get rid of this */
 #define qtest_qmp_discard_response(...) qobject_unref(qtest_qmp(__VA_ARGS__))
 
@@ -49,7 +44,6 @@ static bool uffd_feature_thread_id;
 
 #if defined(__linux__) && defined(__NR_userfaultfd) &&

[PATCH v19 7/9] softmmu/dirtylimit: Implement dirty page rate limit

2022-03-15 Thread huangy81

From: Hyman Huang(黄勇) 

Implement dirtyrate calculation periodically basing on
dirty-ring and throttle virtual CPU until it reachs the quota
dirty page rate given by user.

Introduce qmp commands "set-vcpu-dirty-limit",
"cancel-vcpu-dirty-limit", "query-vcpu-dirty-limit"
to enable, disable, query dirty page limit for virtual CPU.

Meanwhile, introduce corresponding hmp commands
"set_vcpu_dirty_limit", "cancel_vcpu_dirty_limit",
"info vcpu_dirty_limit" so the feature can be more usable.

"query-vcpu-dirty-limit" success depends on enabling dirty
page rate limit, so just add it to the list of skipped
command to ensure qmp-cmd-test run successfully.

Signed-off-by: Hyman Huang(黄勇) 
Acked-by: Markus Armbruster 
Reviewed-by: Peter Xu 
---
 hmp-commands-info.hx   |  13 +++
 hmp-commands.hx|  32 
 include/monitor/hmp.h  |   3 +
 qapi/migration.json|  80 +++
 softmmu/dirtylimit.c   | 195 +
 tests/qtest/qmp-cmd-test.c |   2 +
 6 files changed, 325 insertions(+)

diff --git a/hmp-commands-info.hx b/hmp-commands-info.hx
index adfa085..016717d 100644
--- a/hmp-commands-info.hx
+++ b/hmp-commands-info.hx
@@ -865,6 +865,19 @@ SRST
 Display the vcpu dirty rate information.
 ERST
 
+{
+.name   = "vcpu_dirty_limit",
+.args_type  = "",
+.params = "",
+.help   = "show dirty page limit information of all vCPU",
+.cmd= hmp_info_vcpu_dirty_limit,
+},
+
+SRST
+  ``info vcpu_dirty_limit``
+Display the vcpu dirty page limit information.
+ERST
+
 #if defined(TARGET_I386)
 {
 .name   = "sgx",
diff --git a/hmp-commands.hx b/hmp-commands.hx
index 8476277..82ab75f 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1746,3 +1746,35 @@ ERST
   "\n\t\t\t -b to specify dirty bitmap as method of 
calculation)",
 .cmd= hmp_calc_dirty_rate,
 },
+
+SRST
+``set_vcpu_dirty_limit``
+  Set dirty page rate limit on virtual CPU, the information about all the
+  virtual CPU dirty limit status can be observed with ``info vcpu_dirty_limit``
+  command.
+ERST
+
+{
+.name   = "set_vcpu_dirty_limit",
+.args_type  = "dirty_rate:l,cpu_index:l?",
+.params = "dirty_rate [cpu_index]",
+.help   = "set dirty page rate limit, use cpu_index to set limit"
+  "\n\t\t\t\t\t on a specified virtual cpu",
+.cmd= hmp_set_vcpu_dirty_limit,
+},
+
+SRST
+``cancel_vcpu_dirty_limit``
+  Cancel dirty page rate limit on virtual CPU, the information about all the
+  virtual CPU dirty limit status can be observed with ``info vcpu_dirty_limit``
+  command.
+ERST
+
+{
+.name   = "cancel_vcpu_dirty_limit",
+.args_type  = "cpu_index:l?",
+.params = "[cpu_index]",
+.help   = "cancel dirty page rate limit, use cpu_index to cancel"
+  "\n\t\t\t\t\t limit on a specified virtual cpu",
+.cmd= hmp_cancel_vcpu_dirty_limit,
+},
diff --git a/include/monitor/hmp.h b/include/monitor/hmp.h
index 96d0148..478820e 100644
--- a/include/monitor/hmp.h
+++ b/include/monitor/hmp.h
@@ -131,6 +131,9 @@ void hmp_replay_delete_break(Monitor *mon, const QDict 
*qdict);
 void hmp_replay_seek(Monitor *mon, const QDict *qdict);
 void hmp_info_dirty_rate(Monitor *mon, const QDict *qdict);
 void hmp_calc_dirty_rate(Monitor *mon, const QDict *qdict);
+void hmp_set_vcpu_dirty_limit(Monitor *mon, const QDict *qdict);
+void hmp_cancel_vcpu_dirty_limit(Monitor *mon, const QDict *qdict);
+void hmp_info_vcpu_dirty_limit(Monitor *mon, const QDict *qdict);
 void hmp_human_readable_text_helper(Monitor *mon,
 HumanReadableText *(*qmp_handler)(Error 
**));
 
diff --git a/qapi/migration.json b/qapi/migration.json
index 18e2610..910a4ff 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -1861,6 +1861,86 @@
 { 'command': 'query-dirty-rate', 'returns': 'DirtyRateInfo' }
 
 ##
+# @DirtyLimitInfo:
+#
+# Dirty page rate limit information of a virtual CPU.
+#
+# @cpu-index: index of a virtual CPU.
+#
+# @limit-rate: upper limit of dirty page rate (MB/s) for a virtual
+#  CPU, 0 means unlimited.
+#
+# @current-rate: current dirty page rate (MB/s) for a virtual CPU.
+#
+# Since: 7.0
+#
+##
+{ 'struct': 'DirtyLimitInfo',
+  'data': { 'cpu-index': 'int',
+'limit-rate': 'uint64',
+'current-rate': 'uint64' } }
+
+##
+# @set-vcpu-dirty-limit:
+#
+# Set the upper limit of dirty page rate for virtual CPUs.
+#
+# Requires KVM with accelerator property "dirty-ring-size" set.
+# A virtual CPU's dirty page rate is a measure of its memory load.
+# To observe dirty page rates, use @calc-dirty-rate.
+#
+# @cpu-index: index of a virtual CPU, default is all.
+#
+# @dirty-rate: upper limit of dirty page rate (MB/s) for virtual CPUs.
+#
+# Since: 7.0
+#
+# Example:

[PATCH v19 6/9] softmmu/dirtylimit: Implement virtual CPU throttle

2022-03-15 Thread huangy81

From: Hyman Huang(黄勇) 

Setup a negative feedback system when vCPU thread
handling KVM_EXIT_DIRTY_RING_FULL exit by introducing
throttle_us_per_full field in struct CPUState. Sleep
throttle_us_per_full microseconds to throttle vCPU
if dirtylimit is in service.

Signed-off-by: Hyman Huang(黄勇) 
Reviewed-by: Peter Xu 
---
 accel/kvm/kvm-all.c |  19 ++-
 include/hw/core/cpu.h   |   6 +
 include/sysemu/dirtylimit.h |  15 +++
 softmmu/dirtylimit.c| 291 
 softmmu/trace-events|   7 ++
 5 files changed, 337 insertions(+), 1 deletion(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 0c78bc7..fd656e0 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -45,6 +45,7 @@
 #include "qemu/guest-random.h"
 #include "sysemu/hw_accel.h"
 #include "kvm-cpus.h"
+#include "sysemu/dirtylimit.h"
 
 #include "hw/boards.h"
 
@@ -476,6 +477,7 @@ int kvm_init_vcpu(CPUState *cpu, Error **errp)
 cpu->kvm_state = s;
 cpu->vcpu_dirty = true;
 cpu->dirty_pages = 0;
+cpu->throttle_us_per_full = 0;
 
 mmap_size = kvm_ioctl(s, KVM_GET_VCPU_MMAP_SIZE, 0);
 if (mmap_size < 0) {
@@ -1469,6 +1471,11 @@ static void *kvm_dirty_ring_reaper_thread(void *data)
  */
 sleep(1);
 
+/* keep sleeping so that dirtylimit not be interfered by reaper */
+if (dirtylimit_in_service()) {
+continue;
+}
+
 trace_kvm_dirty_ring_reaper("wakeup");
 r->reaper_state = KVM_DIRTY_RING_REAPER_REAPING;
 
@@ -2965,8 +2972,18 @@ int kvm_cpu_exec(CPUState *cpu)
  */
 trace_kvm_dirty_ring_full(cpu->cpu_index);
 qemu_mutex_lock_iothread();
-kvm_dirty_ring_reap(kvm_state, NULL);
+/* We throttle vCPU by making it sleep once it exit from kernel
+ * due to dirty ring full. In the dirtylimit scenario, reaping
+ * all vCPUs after a single vCPU dirty ring get full result in
+ * the miss of sleep, so just reap the ring-fulled vCPU.
+ */
+if (dirtylimit_in_service()) {
+kvm_dirty_ring_reap(kvm_state, cpu);
+} else {
+kvm_dirty_ring_reap(kvm_state, NULL);
+}
 qemu_mutex_unlock_iothread();
+dirtylimit_vcpu_execute(cpu);
 ret = 0;
 break;
 case KVM_EXIT_SYSTEM_EVENT:
diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
index 0efc615..a3f6b2d 100644
--- a/include/hw/core/cpu.h
+++ b/include/hw/core/cpu.h
@@ -418,6 +418,12 @@ struct CPUState {
  */
 bool throttle_thread_scheduled;
 
+/*
+ * Sleep throttle_us_per_full microseconds once dirty ring is full
+ * if dirty page rate limit is enabled.
+ */
+int64_t throttle_us_per_full;
+
 bool ignore_memory_transaction_failures;
 
 /* Used for user-only emulation of prctl(PR_SET_UNALIGN). */
diff --git a/include/sysemu/dirtylimit.h b/include/sysemu/dirtylimit.h
index da459f0..8d2c1f3 100644
--- a/include/sysemu/dirtylimit.h
+++ b/include/sysemu/dirtylimit.h
@@ -19,4 +19,19 @@ void vcpu_dirty_rate_stat_start(void);
 void vcpu_dirty_rate_stat_stop(void);
 void vcpu_dirty_rate_stat_initialize(void);
 void vcpu_dirty_rate_stat_finalize(void);
+
+void dirtylimit_state_lock(void);
+void dirtylimit_state_unlock(void);
+void dirtylimit_state_initialize(void);
+void dirtylimit_state_finalize(void);
+bool dirtylimit_in_service(void);
+bool dirtylimit_vcpu_index_valid(int cpu_index);
+void dirtylimit_process(void);
+void dirtylimit_change(bool start);
+void dirtylimit_set_vcpu(int cpu_index,
+ uint64_t quota,
+ bool enable);
+void dirtylimit_set_all(uint64_t quota,
+bool enable);
+void dirtylimit_vcpu_execute(CPUState *cpu);
 #endif
diff --git a/softmmu/dirtylimit.c b/softmmu/dirtylimit.c
index 6102e8c..76d0b44 100644
--- a/softmmu/dirtylimit.c
+++ b/softmmu/dirtylimit.c
@@ -18,6 +18,26 @@
 #include "sysemu/dirtylimit.h"
 #include "exec/memory.h"
 #include "hw/boards.h"
+#include "sysemu/kvm.h"
+#include "trace.h"
+
+/*
+ * Dirtylimit stop working if dirty page rate error
+ * value less than DIRTYLIMIT_TOLERANCE_RANGE
+ */
+#define DIRTYLIMIT_TOLERANCE_RANGE  25  /* MB/s */
+/*
+ * Plus or minus vcpu sleep time linearly if dirty
+ * page rate error value percentage over
+ * DIRTYLIMIT_LINEAR_ADJUSTMENT_PCT.
+ * Otherwise, plus or minus a fixed vcpu sleep time.
+ */
+#define DIRTYLIMIT_LINEAR_ADJUSTMENT_PCT 50
+/*
+ * Max vcpu sleep time percentage during a cycle
+ * composed of dirty ring full and sleep time.
+ */
+#define DIRTYLIMIT_THROTTLE_PCT_MAX 99
 
 struct {
 VcpuStat stat;
@@ -25,6 +45,30 @@ struct {
 QemuThread thread;
 } *vcpu_dirty_rate_stat;
 
+typedef struct VcpuDirtyLimitState {
+int cpu_index;
+bool enabled;
+/*
+ * Quota dirty page rate, unit is MB/s
+ * zero if not enabled.
+ */
+

[PATCH v19 5/9] accel/kvm/kvm-all: Introduce kvm_dirty_ring_size function

2022-03-15 Thread huangy81

From: Hyman Huang(黄勇) 

Introduce kvm_dirty_ring_size util function to help calculate
dirty ring ful time.

Signed-off-by: Hyman Huang(黄勇) 
Acked-by: Peter Xu 
---
 accel/kvm/kvm-all.c| 5 +
 accel/stubs/kvm-stub.c | 6 ++
 include/sysemu/kvm.h   | 2 ++
 3 files changed, 13 insertions(+)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 29bf6a0..0c78bc7 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -2313,6 +2313,11 @@ bool kvm_dirty_ring_enabled(void)
 return kvm_state->kvm_dirty_ring_size ? true : false;
 }
 
+uint32_t kvm_dirty_ring_size(void)
+{
+return kvm_state->kvm_dirty_ring_size;
+}
+
 static int kvm_init(MachineState *ms)
 {
 MachineClass *mc = MACHINE_GET_CLASS(ms);
diff --git a/accel/stubs/kvm-stub.c b/accel/stubs/kvm-stub.c
index 3345882..c5aafaa 100644
--- a/accel/stubs/kvm-stub.c
+++ b/accel/stubs/kvm-stub.c
@@ -148,3 +148,9 @@ bool kvm_dirty_ring_enabled(void)
 {
 return false;
 }
+
+uint32_t kvm_dirty_ring_size(void)
+{
+return 0;
+}
+#endif
diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index a783c78..efd6dee 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -582,4 +582,6 @@ bool kvm_cpu_check_are_resettable(void);
 bool kvm_arch_cpu_check_are_resettable(void);
 
 bool kvm_dirty_ring_enabled(void);
+
+uint32_t kvm_dirty_ring_size(void);
 #endif
-- 
1.8.3.1

[PATCH v19 3/9] migration/dirtyrate: Refactor dirty page rate calculation

2022-03-15 Thread huangy81

From: Hyman Huang(黄勇) 

abstract out dirty log change logic into function
global_dirty_log_change.

abstract out dirty page rate calculation logic via
dirty-ring into function vcpu_calculate_dirtyrate.

abstract out mathematical dirty page rate calculation
into do_calculate_dirtyrate, decouple it from DirtyStat.

rename set_sample_page_period to dirty_stat_wait, which
is well-understood and will be reused in dirtylimit.

handle cpu hotplug/unplug scenario during measurement of
dirty page rate.

export util functions outside migration.

Signed-off-by: Hyman Huang(黄勇) 
Reviewed-by: Peter Xu 
---
 include/sysemu/dirtyrate.h |  28 ++
 migration/dirtyrate.c  | 227 -
 migration/dirtyrate.h  |   7 +-
 3 files changed, 174 insertions(+), 88 deletions(-)
 create mode 100644 include/sysemu/dirtyrate.h

diff --git a/include/sysemu/dirtyrate.h b/include/sysemu/dirtyrate.h
new file mode 100644
index 000..4d3b9a4
--- /dev/null
+++ b/include/sysemu/dirtyrate.h
@@ -0,0 +1,28 @@
+/*
+ * dirty page rate helper functions
+ *
+ * Copyright (c) 2022 CHINA TELECOM CO.,LTD.
+ *
+ * Authors:
+ *  Hyman Huang(黄勇) 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef QEMU_DIRTYRATE_H
+#define QEMU_DIRTYRATE_H
+
+typedef struct VcpuStat {
+int nvcpu; /* number of vcpu */
+DirtyRateVcpu *rates; /* array of dirty rate for each vcpu */
+} VcpuStat;
+
+int64_t vcpu_calculate_dirtyrate(int64_t calc_time_ms,
+ VcpuStat *stat,
+ unsigned int flag,
+ bool one_shot);
+
+void global_dirty_log_change(unsigned int flag,
+ bool start);
+#endif
diff --git a/migration/dirtyrate.c b/migration/dirtyrate.c
index d65e744..79348de 100644
--- a/migration/dirtyrate.c
+++ b/migration/dirtyrate.c
@@ -46,7 +46,7 @@ static struct DirtyRateStat DirtyStat;
 static DirtyRateMeasureMode dirtyrate_mode =
 DIRTY_RATE_MEASURE_MODE_PAGE_SAMPLING;
 
-static int64_t set_sample_page_period(int64_t msec, int64_t initial_time)
+static int64_t dirty_stat_wait(int64_t msec, int64_t initial_time)
 {
 int64_t current_time;
 
@@ -60,6 +60,132 @@ static int64_t set_sample_page_period(int64_t msec, int64_t 
initial_time)
 return msec;
 }
 
+static inline void record_dirtypages(DirtyPageRecord *dirty_pages,
+ CPUState *cpu, bool start)
+{
+if (start) {
+dirty_pages[cpu->cpu_index].start_pages = cpu->dirty_pages;
+} else {
+dirty_pages[cpu->cpu_index].end_pages = cpu->dirty_pages;
+}
+}
+
+static int64_t do_calculate_dirtyrate(DirtyPageRecord dirty_pages,
+  int64_t calc_time_ms)
+{
+uint64_t memory_size_MB;
+uint64_t increased_dirty_pages =
+dirty_pages.end_pages - dirty_pages.start_pages;
+
+memory_size_MB = (increased_dirty_pages * TARGET_PAGE_SIZE) >> 20;
+
+return memory_size_MB * 1000 / calc_time_ms;
+}
+
+void global_dirty_log_change(unsigned int flag, bool start)
+{
+qemu_mutex_lock_iothread();
+if (start) {
+memory_global_dirty_log_start(flag);
+} else {
+memory_global_dirty_log_stop(flag);
+}
+qemu_mutex_unlock_iothread();
+}
+
+/*
+ * global_dirty_log_sync
+ * 1. sync dirty log from kvm
+ * 2. stop dirty tracking if needed.
+ */
+static void global_dirty_log_sync(unsigned int flag, bool one_shot)
+{
+qemu_mutex_lock_iothread();
+memory_global_dirty_log_sync();
+if (one_shot) {
+memory_global_dirty_log_stop(flag);
+}
+qemu_mutex_unlock_iothread();
+}
+
+static DirtyPageRecord *vcpu_dirty_stat_alloc(VcpuStat *stat)
+{
+CPUState *cpu;
+DirtyPageRecord *records;
+int nvcpu = 0;
+
+CPU_FOREACH(cpu) {
+nvcpu++;
+}
+
+stat->nvcpu = nvcpu;
+stat->rates = g_malloc0(sizeof(DirtyRateVcpu) * nvcpu);
+
+records = g_malloc0(sizeof(DirtyPageRecord) * nvcpu);
+
+return records;
+}
+
+static void vcpu_dirty_stat_collect(VcpuStat *stat,
+DirtyPageRecord *records,
+bool start)
+{
+CPUState *cpu;
+
+CPU_FOREACH(cpu) {
+record_dirtypages(records, cpu, start);
+}
+}
+
+int64_t vcpu_calculate_dirtyrate(int64_t calc_time_ms,
+ VcpuStat *stat,
+ unsigned int flag,
+ bool one_shot)
+{
+DirtyPageRecord *records;
+int64_t init_time_ms;
+int64_t duration;
+int64_t dirtyrate;
+int i = 0;
+unsigned int gen_id;
+
+retry:
+init_time_ms = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+
+cpu_list_lock();
+gen_id = cpu_list_generation_id_get();
+records = vcpu_dirty_stat_alloc(stat);
+vcpu_dirty_stat_collect(stat, records, true);
+

[PATCH v19 1/9] accel/kvm/kvm-all: Refactor per-vcpu dirty ring reaping

2022-03-15 Thread huangy81

From: Hyman Huang(黄勇) 

Add a non-required argument 'CPUState' to kvm_dirty_ring_reap so
that it can cover single vcpu dirty-ring-reaping scenario.

Signed-off-by: Hyman Huang(黄勇) 
Reviewed-by: Peter Xu 
---
 accel/kvm/kvm-all.c | 23 +--
 1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 27864df..29bf6a0 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -756,17 +756,20 @@ static uint32_t kvm_dirty_ring_reap_one(KVMState *s, 
CPUState *cpu)
 }
 
 /* Must be with slots_lock held */
-static uint64_t kvm_dirty_ring_reap_locked(KVMState *s)
+static uint64_t kvm_dirty_ring_reap_locked(KVMState *s, CPUState* cpu)
 {
 int ret;
-CPUState *cpu;
 uint64_t total = 0;
 int64_t stamp;
 
 stamp = get_clock();
 
-CPU_FOREACH(cpu) {
-total += kvm_dirty_ring_reap_one(s, cpu);
+if (cpu) {
+total = kvm_dirty_ring_reap_one(s, cpu);
+} else {
+CPU_FOREACH(cpu) {
+total += kvm_dirty_ring_reap_one(s, cpu);
+}
 }
 
 if (total) {
@@ -787,7 +790,7 @@ static uint64_t kvm_dirty_ring_reap_locked(KVMState *s)
  * Currently for simplicity, we must hold BQL before calling this.  We can
  * consider to drop the BQL if we're clear with all the race conditions.
  */
-static uint64_t kvm_dirty_ring_reap(KVMState *s)
+static uint64_t kvm_dirty_ring_reap(KVMState *s, CPUState *cpu)
 {
 uint64_t total;
 
@@ -807,7 +810,7 @@ static uint64_t kvm_dirty_ring_reap(KVMState *s)
  * reset below.
  */
 kvm_slots_lock();
-total = kvm_dirty_ring_reap_locked(s);
+total = kvm_dirty_ring_reap_locked(s, cpu);
 kvm_slots_unlock();
 
 return total;
@@ -854,7 +857,7 @@ static void kvm_dirty_ring_flush(void)
  * vcpus out in a synchronous way.
  */
 kvm_cpu_synchronize_kick_all();
-kvm_dirty_ring_reap(kvm_state);
+kvm_dirty_ring_reap(kvm_state, NULL);
 trace_kvm_dirty_ring_flush(1);
 }
 
@@ -1398,7 +1401,7 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml,
  * Not easy.  Let's cross the fingers until it's fixed.
  */
 if (kvm_state->kvm_dirty_ring_size) {
-kvm_dirty_ring_reap_locked(kvm_state);
+kvm_dirty_ring_reap_locked(kvm_state, NULL);
 } else {
 kvm_slot_get_dirty_log(kvm_state, mem);
 }
@@ -1470,7 +1473,7 @@ static void *kvm_dirty_ring_reaper_thread(void *data)
 r->reaper_state = KVM_DIRTY_RING_REAPER_REAPING;
 
 qemu_mutex_lock_iothread();
-kvm_dirty_ring_reap(s);
+kvm_dirty_ring_reap(s, NULL);
 qemu_mutex_unlock_iothread();
 
 r->reaper_iteration++;
@@ -2957,7 +2960,7 @@ int kvm_cpu_exec(CPUState *cpu)
  */
 trace_kvm_dirty_ring_full(cpu->cpu_index);
 qemu_mutex_lock_iothread();
-kvm_dirty_ring_reap(kvm_state);
+kvm_dirty_ring_reap(kvm_state, NULL);
 qemu_mutex_unlock_iothread();
 ret = 0;
 break;
-- 
1.8.3.1

[PATCH v19 0/9] support dirty restraint on vCPU

2022-03-15 Thread huangy81

From: Hyman Huang(黄勇) 

v19
- rebase on master and fix conflicts
- add test case for dirty page rate limit

Ping.

Adding an test case and hope it can be merged along with previous
patchset by the way.

Please review. Thanks,

Regards
Yong

v18
- squash commit "Ignore query-vcpu-dirty-limit test" into
  "Implement dirty page rate limit" in  [PATCH v17] to make
  the modification logic self-contained. 

Please review. Thanks,

Regards
Yong 

v17
- rebase on master
- fix qmp-cmd-test 

v16
- rebase on master
- drop the unused typedef syntax in [PATCH v15 6/7] 
- add the Reviewed-by and Acked-by tags by the way 

v15
- rebase on master
- drop the 'init_time_ms' parameter in function vcpu_calculate_dirtyrate 
- drop the 'setup' field in dirtylimit_state and call dirtylimit_process
  directly, which makes code cleaner.
- code clean in dirtylimit_adjust_throttle
- fix miss dirtylimit_state_unlock() in dirtylimit_process and
  dirtylimit_query_all
- add some comment

Please review. Thanks,

Regards
Yong 

v14
- v13 sent by accident, resend patchset. 

v13
- rebase on master
- passing NULL to kvm_dirty_ring_reap in commit
  "refactor per-vcpu dirty ring reaping" to keep the logic unchanged.
  In other word, we still try the best to reap as much PFNs as possible
  if dirtylimit not in service.
- move the cpu list gen id changes into a separate patch.   
- release the lock before sleep during dirty page rate calculation.
- move the dirty ring size fetch logic into a separate patch.
- drop the DIRTYLIMIT_LINEAR_ADJUSTMENT_WATERMARK MACRO .
- substitute bh with function pointer when implement dirtylimit.
- merge the dirtylimit_start/stop into dirtylimit_change.
- fix "cpu-index" parameter type with "int" to keep consistency.
- fix some syntax error in documents.

Please review. Thanks,

Yong

v12
- rebase on master
- add a new commmit to refactor per-vcpu dirty ring reaping, which can resolve 
  the "vcpu miss the chances to sleep" problem
- remove the dirtylimit_thread and implemtment throttle in bottom half instead.
- let the dirty ring reaper thread keep sleeping when dirtylimit is in service 
- introduce cpu_list_generation_id to identify cpu_list changing. 
- keep taking the cpu_list_lock during dirty_stat_wait to prevent vcpu 
plug/unplug
  when calculating the dirty page rate
- move the dirtylimit global initializations out of dirtylimit_set_vcpu and do
  some code clean
- add DIRTYLIMIT_LINEAR_ADJUSTMENT_WATERMARK in case of oscillation when 
throttling 
- remove the unmatched count field in dirtylimit_state
- add stub to fix build on non-x86
- refactor the documents

Thanks Peter and Markus for reviewing the previous versions, please review.

Thanks,
Yong

v11
- rebase on master
- add a commit " refactor dirty page rate calculation"  so that dirty page rate 
limit
  can reuse the calculation logic. 
- handle the cpu hotplug/unplug case in the dirty page rate calculation logic.
- modify the qmp commands according to Markus's advice.
- introduce a standalone file dirtylimit.c to implement dirty page rate limit
- check if dirty limit in service by dirtylimit_state pointer instead of global 
variable
- introduce dirtylimit_mutex to protect dirtylimit_state
- do some code clean and docs

See the commit for more detail, thanks Markus and Peter very mush for the code
review and give the experienced and insightful advices, most modifications are
based on these advices.

v10:
- rebase on master
- make the following modifications on patch [1/3]:
  1. Make "dirtylimit-calc" thread joinable and join it after quitting.

  2. Add finalize function to free dirtylimit_calc_state

  3. Do some code clean work

- make the following modifications on patch [2/3]:
  1. Remove the original implementation of throttle according to
 Peter's advice.
 
  2. Introduce a negative feedback system and implement the throttle
 on all vcpu in one thread named "dirtylimit". 

  3. Simplify the algo when calculation the throttle_us_per_full:
 increase/decrease linearly when there exists a wide difference
 between quota and current dirty page rate, increase/decrease
 a fixed time slice when the difference is narrow. This makes
 throttle responds faster and reach the quota smoothly.

  4. Introduce a unfit_cnt in algo to make sure throttle really
 takes effect.

  5. Set the max sleep time 99 times more than "ring_full_time_us". 

 


 
  6. Make "dirtylimit" thread joinable and join it after quitting.

[PATCH v19 4/9] softmmu/dirtylimit: Implement vCPU dirtyrate calculation periodically

2022-03-15 Thread huangy81

From: Hyman Huang(黄勇) 

Introduce the third method GLOBAL_DIRTY_LIMIT of dirty
tracking for calculate dirtyrate periodly for dirty page
rate limit.

Add dirtylimit.c to implement dirtyrate calculation periodly,
which will be used for dirty page rate limit.

Add dirtylimit.h to export util functions for dirty page rate
limit implementation.

Signed-off-by: Hyman Huang(黄勇) 
Reviewed-by: Peter Xu 
---
 include/exec/memory.h   |   5 +-
 include/sysemu/dirtylimit.h |  22 +
 softmmu/dirtylimit.c| 116 
 softmmu/meson.build |   1 +
 4 files changed, 143 insertions(+), 1 deletion(-)
 create mode 100644 include/sysemu/dirtylimit.h
 create mode 100644 softmmu/dirtylimit.c

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 4d5997e..88ca510 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -69,7 +69,10 @@ static inline void fuzz_dma_read_cb(size_t addr,
 /* Dirty tracking enabled because measuring dirty rate */
 #define GLOBAL_DIRTY_DIRTY_RATE (1U << 1)
 
-#define GLOBAL_DIRTY_MASK  (0x3)
+/* Dirty tracking enabled because dirty limit */
+#define GLOBAL_DIRTY_LIMIT  (1U << 2)
+
+#define GLOBAL_DIRTY_MASK  (0x7)
 
 extern unsigned int global_dirty_tracking;
 
diff --git a/include/sysemu/dirtylimit.h b/include/sysemu/dirtylimit.h
new file mode 100644
index 000..da459f0
--- /dev/null
+++ b/include/sysemu/dirtylimit.h
@@ -0,0 +1,22 @@
+/*
+ * Dirty page rate limit common functions
+ *
+ * Copyright (c) 2022 CHINA TELECOM CO.,LTD.
+ *
+ * Authors:
+ *  Hyman Huang(黄勇) 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+#ifndef QEMU_DIRTYRLIMIT_H
+#define QEMU_DIRTYRLIMIT_H
+
+#define DIRTYLIMIT_CALC_TIME_MS 1000/* 1000ms */
+
+int64_t vcpu_dirty_rate_get(int cpu_index);
+void vcpu_dirty_rate_stat_start(void);
+void vcpu_dirty_rate_stat_stop(void);
+void vcpu_dirty_rate_stat_initialize(void);
+void vcpu_dirty_rate_stat_finalize(void);
+#endif
diff --git a/softmmu/dirtylimit.c b/softmmu/dirtylimit.c
new file mode 100644
index 000..6102e8c
--- /dev/null
+++ b/softmmu/dirtylimit.c
@@ -0,0 +1,116 @@
+/*
+ * Dirty page rate limit implementation code
+ *
+ * Copyright (c) 2022 CHINA TELECOM CO.,LTD.
+ *
+ * Authors:
+ *  Hyman Huang(黄勇) 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "qemu/main-loop.h"
+#include "qapi/qapi-commands-migration.h"
+#include "sysemu/dirtyrate.h"
+#include "sysemu/dirtylimit.h"
+#include "exec/memory.h"
+#include "hw/boards.h"
+
+struct {
+VcpuStat stat;
+bool running;
+QemuThread thread;
+} *vcpu_dirty_rate_stat;
+
+static void vcpu_dirty_rate_stat_collect(void)
+{
+VcpuStat stat;
+int i = 0;
+
+/* calculate vcpu dirtyrate */
+vcpu_calculate_dirtyrate(DIRTYLIMIT_CALC_TIME_MS,
+ ,
+ GLOBAL_DIRTY_LIMIT,
+ false);
+
+for (i = 0; i < stat.nvcpu; i++) {
+vcpu_dirty_rate_stat->stat.rates[i].id = i;
+vcpu_dirty_rate_stat->stat.rates[i].dirty_rate =
+stat.rates[i].dirty_rate;
+}
+
+free(stat.rates);
+}
+
+static void *vcpu_dirty_rate_stat_thread(void *opaque)
+{
+rcu_register_thread();
+
+/* start log sync */
+global_dirty_log_change(GLOBAL_DIRTY_LIMIT, true);
+
+while (qatomic_read(_dirty_rate_stat->running)) {
+vcpu_dirty_rate_stat_collect();
+}
+
+/* stop log sync */
+global_dirty_log_change(GLOBAL_DIRTY_LIMIT, false);
+
+rcu_unregister_thread();
+return NULL;
+}
+
+int64_t vcpu_dirty_rate_get(int cpu_index)
+{
+DirtyRateVcpu *rates = vcpu_dirty_rate_stat->stat.rates;
+return qatomic_read([cpu_index].dirty_rate);
+}
+
+void vcpu_dirty_rate_stat_start(void)
+{
+if (qatomic_read(_dirty_rate_stat->running)) {
+return;
+}
+
+qatomic_set(_dirty_rate_stat->running, 1);
+qemu_thread_create(_dirty_rate_stat->thread,
+   "dirtyrate-stat",
+   vcpu_dirty_rate_stat_thread,
+   NULL,
+   QEMU_THREAD_JOINABLE);
+}
+
+void vcpu_dirty_rate_stat_stop(void)
+{
+qatomic_set(_dirty_rate_stat->running, 0);
+qemu_mutex_unlock_iothread();
+qemu_thread_join(_dirty_rate_stat->thread);
+qemu_mutex_lock_iothread();
+}
+
+void vcpu_dirty_rate_stat_initialize(void)
+{
+MachineState *ms = MACHINE(qdev_get_machine());
+int max_cpus = ms->smp.max_cpus;
+
+vcpu_dirty_rate_stat =
+g_malloc0(sizeof(*vcpu_dirty_rate_stat));
+
+vcpu_dirty_rate_stat->stat.nvcpu = max_cpus;
+vcpu_dirty_rate_stat->stat.rates =
+g_malloc0(sizeof(DirtyRateVcpu) * max_cpus);
+
+vcpu_dirty_rate_stat->running = false;
+}
+
+void

[PATCH v19 2/9] cpus: Introduce cpu_list_generation_id

2022-03-15 Thread huangy81

From: Hyman Huang(黄勇) 

Introduce cpu_list_generation_id to track cpu list generation so
that cpu hotplug/unplug can be detected during measurement of
dirty page rate.

cpu_list_generation_id could be used to detect changes of cpu
list, which is prepared for dirty page rate measurement.

Signed-off-by: Hyman Huang(黄勇) 
Reviewed-by: Peter Xu 
---
 cpus-common.c | 8 
 include/exec/cpu-common.h | 1 +
 2 files changed, 9 insertions(+)

diff --git a/cpus-common.c b/cpus-common.c
index 6e73d3e..31c6415 100644
--- a/cpus-common.c
+++ b/cpus-common.c
@@ -73,6 +73,12 @@ static int cpu_get_free_index(void)
 }
 
 CPUTailQ cpus = QTAILQ_HEAD_INITIALIZER(cpus);
+static unsigned int cpu_list_generation_id;
+
+unsigned int cpu_list_generation_id_get(void)
+{
+return cpu_list_generation_id;
+}
 
 void cpu_list_add(CPUState *cpu)
 {
@@ -84,6 +90,7 @@ void cpu_list_add(CPUState *cpu)
 assert(!cpu_index_auto_assigned);
 }
 QTAILQ_INSERT_TAIL_RCU(, cpu, node);
+cpu_list_generation_id++;
 }
 
 void cpu_list_remove(CPUState *cpu)
@@ -96,6 +103,7 @@ void cpu_list_remove(CPUState *cpu)
 
 QTAILQ_REMOVE_RCU(, cpu, node);
 cpu->cpu_index = UNASSIGNED_CPU_INDEX;
+cpu_list_generation_id++;
 }
 
 CPUState *qemu_get_cpu(int index)
diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
index 7f7b594..856b0e7 100644
--- a/include/exec/cpu-common.h
+++ b/include/exec/cpu-common.h
@@ -32,6 +32,7 @@ extern intptr_t qemu_host_page_mask;
 void qemu_init_cpu_list(void);
 void cpu_list_lock(void);
 void cpu_list_unlock(void);
+unsigned int cpu_list_generation_id_get(void);
 
 void tcg_flush_softmmu_tlb(CPUState *cs);
 
-- 
1.8.3.1

Re: [PATCH] target/riscv: Exit current TB after an sfence.vma

2022-03-15 Thread Alistair Francis

On Wed, Mar 16, 2022 at 5:26 AM Idan Horowitz  wrote:
>
> If the pages which control the translation of the currently executing
> instructions are changed, and then the TLB is flushed using sfence.vma
> we have to exit the current TB early, to ensure we don't execute stale
> instructions.
>
> Signed-off-by: Idan Horowitz 

Thanks!

Applied to riscv-to-apply.next

Alistair

> ---
>  target/riscv/insn_trans/trans_privileged.c.inc | 7 +++
>  1 file changed, 7 insertions(+)
>
> diff --git a/target/riscv/insn_trans/trans_privileged.c.inc 
> b/target/riscv/insn_trans/trans_privileged.c.inc
> index 53613682e8..f265e8202d 100644
> --- a/target/riscv/insn_trans/trans_privileged.c.inc
> +++ b/target/riscv/insn_trans/trans_privileged.c.inc
> @@ -114,6 +114,13 @@ static bool trans_sfence_vma(DisasContext *ctx, 
> arg_sfence_vma *a)
>  {
>  #ifndef CONFIG_USER_ONLY
>  gen_helper_tlb_flush(cpu_env);
> +/*
> + * The flush might have changed the backing physical memory of
> + * the instructions we're currently executing
> + */
> +gen_set_pc_imm(ctx, ctx->pc_succ_insn);
> +tcg_gen_exit_tb(NULL, 0);
> +ctx->base.is_jmp = DISAS_NORETURN;
>  return true;
>  #endif
>  return false;
> --
> 2.35.1
>
>

Re: [PATCH v19 4/7] net/vmnet: implement host mode (vmnet-host)

2022-03-15 Thread Vladislav Yaroshchuk

On Tue, Mar 15, 2022 at 11:37 PM Akihiko Odaki 
wrote:

> On 2022/03/16 5:27, Vladislav Yaroshchuk wrote:
> > Signed-off-by: Vladislav Yaroshchuk 
> > ---
> >   net/vmnet-host.c | 110 ---
> >   1 file changed, 104 insertions(+), 6 deletions(-)
> >
> > diff --git a/net/vmnet-host.c b/net/vmnet-host.c
> > index a461d507c5..8f7a638967 100644
> > --- a/net/vmnet-host.c
> > +++ b/net/vmnet-host.c
> > @@ -9,16 +9,114 @@
> >*/
> >
> >   #include "qemu/osdep.h"
> > +#include "qemu/uuid.h"
> >   #include "qapi/qapi-types-net.h"
> > -#include "vmnet_int.h"
> > -#include "clients.h"
> > -#include "qemu/error-report.h"
> >   #include "qapi/error.h"
> > +#include "clients.h"
> > +#include "vmnet_int.h"
> >
> >   #include 
> >
> > +
> > +static bool validate_options(const Netdev *netdev, Error **errp)
> > +{
> > +const NetdevVmnetHostOptions *options = &(netdev->u.vmnet_host);
> > +QemuUUID uuid;
>
> The variable uuid is used only when defined(MAC_OS_VERSION_11_0) && \
> MAC_OS_X_VERSION_MIN_REQUIRED >= MAC_OS_VERSION_11_0 and may result in a
> compilation warning otherwise. It should be in the #if block as
> network_uuid variable in build_if_desc is in the counterpart.
>
> Also I suggest to unify the names of identifiers. There are
> options->net_uuid, uuid, and network_uuid, but the differences tells
> nothing.
>
> This should be the last thing to be addressed (unless I missed something
> again.) Thank you for persistence (It's v19!). I really appreciate your
> contribution.
>
>
Thank you for your help and persistence in the review
process :)

Fixed bad naming and moved 'QemuUUID net_uuid'
definition into #if block in validate_options in v20.

Best Regards,
Vladislav Yaroshchuk

Regards,
> Akihiko Odaki
>
> > +
> > +#if defined(MAC_OS_VERSION_11_0) && \
> > +MAC_OS_X_VERSION_MIN_REQUIRED >= MAC_OS_VERSION_11_0
> > +
> > +if (options->has_net_uuid &&
> > +qemu_uuid_parse(options->net_uuid, ) < 0) {
> > +error_setg(errp, "Invalid UUID provided in 'net-uuid'");
> > +return false;
> > +}
> > +#else
> > +if (options->has_isolated) {
> > +error_setg(errp,
> > +   "vmnet-host.isolated feature is "
> > +   "unavailable: outdated vmnet.framework API");
> > +return false;
> > +}
> > +
> > +if (options->has_net_uuid) {
> > +error_setg(errp,
> > +   "vmnet-host.net-uuid feature is "
> > +   "unavailable: outdated vmnet.framework API");
> > +return false;
> > +}
> > +#endif
> > +
> > +if ((options->has_start_address ||
> > + options->has_end_address ||
> > + options->has_subnet_mask) &&
> > +!(options->has_start_address &&
> > +  options->has_end_address &&
> > +  options->has_subnet_mask)) {
> > +error_setg(errp,
> > +   "'start-address', 'end-address', 'subnet-mask' "
> > +   "should be provided together");
> > +return false;
> > +}
> > +
> > +return true;
> > +}
> > +
> > +static xpc_object_t build_if_desc(const Netdev *netdev,
> > +  NetClientState *nc)
> > +{
> > +const NetdevVmnetHostOptions *options = &(netdev->u.vmnet_host);
> > +xpc_object_t if_desc = xpc_dictionary_create(NULL, NULL, 0);
> > +
> > +xpc_dictionary_set_uint64(if_desc,
> > +  vmnet_operation_mode_key,
> > +  VMNET_HOST_MODE);
> > +
> > +#if defined(MAC_OS_VERSION_11_0) && \
> > +MAC_OS_X_VERSION_MIN_REQUIRED >= MAC_OS_VERSION_11_0
> > +
> > +xpc_dictionary_set_bool(if_desc,
> > +vmnet_enable_isolation_key,
> > +options->isolated);
> > +
> > +QemuUUID network_uuid;
> > +if (options->has_net_uuid) {
> > +qemu_uuid_parse(options->net_uuid, _uuid);
> > +xpc_dictionary_set_uuid(if_desc,
> > +vmnet_network_identifier_key,
> > +network_uuid.data);
> > +}
> > +#endif
> > +
> > +if (options->has_start_address) {
> > +xpc_dictionary_set_string(if_desc,
> > +  vmnet_start_address_key,
> > +  options->start_address);
> > +xpc_dictionary_set_string(if_desc,
> > +  vmnet_end_address_key,
> > +  options->end_address);
> > +xpc_dictionary_set_string(if_desc,
> > +  vmnet_subnet_mask_key,
> > +  options->subnet_mask);
> > +}
> > +
> > +return if_desc;
> > +}
> > +
> > +static NetClientInfo net_vmnet_host_info = {
> > +.type = NET_CLIENT_DRIVER_VMNET_HOST,
> > +.size = sizeof(VmnetState),
> > +.receive = vmnet_receive_common,
> > +.cleanup = vmnet_cleanup_common,
> > +};
> > +
> >

[PATCH v20 5/7] net/vmnet: implement bridged mode (vmnet-bridged)

2022-03-15 Thread Vladislav Yaroshchuk

Signed-off-by: Vladislav Yaroshchuk 
---
 net/vmnet-bridged.m | 128 ++--
 1 file changed, 123 insertions(+), 5 deletions(-)

diff --git a/net/vmnet-bridged.m b/net/vmnet-bridged.m
index 91c1a2f2c7..5936c87718 100644
--- a/net/vmnet-bridged.m
+++ b/net/vmnet-bridged.m
@@ -10,16 +10,134 @@
 
 #include "qemu/osdep.h"
 #include "qapi/qapi-types-net.h"
-#include "vmnet_int.h"
-#include "clients.h"
-#include "qemu/error-report.h"
 #include "qapi/error.h"
+#include "clients.h"
+#include "vmnet_int.h"
 
 #include 
 
+
+static bool validate_ifname(const char *ifname)
+{
+xpc_object_t shared_if_list = vmnet_copy_shared_interface_list();
+bool match = false;
+if (!xpc_array_get_count(shared_if_list)) {
+goto done;
+}
+
+match = !xpc_array_apply(
+shared_if_list,
+^bool(size_t index, xpc_object_t value) {
+return strcmp(xpc_string_get_string_ptr(value), ifname) != 0;
+});
+
+done:
+xpc_release(shared_if_list);
+return match;
+}
+
+
+static bool get_valid_ifnames(char *output_buf)
+{
+xpc_object_t shared_if_list = vmnet_copy_shared_interface_list();
+__block const char *ifname = NULL;
+__block int str_offset = 0;
+bool interfaces_available = true;
+
+if (!xpc_array_get_count(shared_if_list)) {
+interfaces_available = false;
+goto done;
+}
+
+xpc_array_apply(
+shared_if_list,
+^bool(size_t index, xpc_object_t value) {
+/* build list of strings like "en0 en1 en2 " */
+ifname = xpc_string_get_string_ptr(value);
+strcpy(output_buf + str_offset, ifname);
+strcpy(output_buf + str_offset + strlen(ifname), " ");
+str_offset += strlen(ifname) + 1;
+return true;
+});
+
+done:
+xpc_release(shared_if_list);
+return interfaces_available;
+}
+
+
+static bool validate_options(const Netdev *netdev, Error **errp)
+{
+const NetdevVmnetBridgedOptions *options = &(netdev->u.vmnet_bridged);
+char ifnames[1024];
+
+if (!validate_ifname(options->ifname)) {
+if (get_valid_ifnames(ifnames)) {
+error_setg(errp,
+   "unsupported ifname '%s', expected one of [ %s]",
+   options->ifname,
+   ifnames);
+return false;
+}
+error_setg(errp,
+   "unsupported ifname '%s', no supported "
+   "interfaces available",
+   options->ifname);
+return false;
+}
+
+#if !defined(MAC_OS_VERSION_11_0) || \
+MAC_OS_X_VERSION_MIN_REQUIRED < MAC_OS_VERSION_11_0
+if (options->has_isolated) {
+error_setg(errp,
+   "vmnet-bridged.isolated feature is "
+   "unavailable: outdated vmnet.framework API");
+return false;
+}
+#endif
+return true;
+}
+
+
+static xpc_object_t build_if_desc(const Netdev *netdev)
+{
+const NetdevVmnetBridgedOptions *options = &(netdev->u.vmnet_bridged);
+xpc_object_t if_desc = xpc_dictionary_create(NULL, NULL, 0);
+
+xpc_dictionary_set_uint64(if_desc,
+  vmnet_operation_mode_key,
+  VMNET_BRIDGED_MODE
+);
+
+xpc_dictionary_set_string(if_desc,
+  vmnet_shared_interface_name_key,
+  options->ifname);
+
+#if defined(MAC_OS_VERSION_11_0) && \
+MAC_OS_X_VERSION_MIN_REQUIRED >= MAC_OS_VERSION_11_0
+xpc_dictionary_set_bool(if_desc,
+vmnet_enable_isolation_key,
+options->isolated);
+#endif
+return if_desc;
+}
+
+
+static NetClientInfo net_vmnet_bridged_info = {
+.type = NET_CLIENT_DRIVER_VMNET_BRIDGED,
+.size = sizeof(VmnetState),
+.receive = vmnet_receive_common,
+.cleanup = vmnet_cleanup_common,
+};
+
+
 int net_init_vmnet_bridged(const Netdev *netdev, const char *name,
NetClientState *peer, Error **errp)
 {
-  error_setg(errp, "vmnet-bridged is not implemented yet");
-  return -1;
+NetClientState *nc = qemu_new_net_client(_vmnet_bridged_info,
+ peer, "vmnet-bridged", name);
+if (!validate_options(netdev, errp)) {
+return -1;
+}
+return vmnet_if_create(nc, build_if_desc(netdev), errp);
 }
-- 
2.34.1.vfs.0.0

Re: [PATCH experiment 00/16] C++20 coroutine backend

2022-03-15 Thread Paolo Bonzini


On 3/15/22 16:55, Daniel P. Berrangé wrote:

Expecting maintainers to enforce a subset during code review feels
like it would be a tedious burden, that will inevitably let stuff
through because humans are fallible, especially when presented
with uninspiring, tedious, repetitive tasks.

Restricting ourselves to a subset is only viable if we have
an automated tool that can reliably enforce that subset. I'm not
sure that any such tool exists, and not convinced our time is
best served by trying to write & maintainer one either.


We don't need to have a policy on which features are used.  We need to 
have goals for what to use C++ for.  I won't go into further details 
here, because I had already posted "When and how to use C++"[1] about an 
hour before your reply.



IOW, I fear one we allow C++ in any level, it won't be practical
to constrain it as much we desire. I fear us turning QEMU into
even more of a monster like other big C++ apps I see which take
all hours to compile while using all available RAM in Fedora RPM
build hosts.


Sorry but this is FUD.  There's plenty of C++ apps and libraries that do 
not "take hours to compile while using all available RAM".  You're 
probably thinking of the Chromium/Firefox/Libreoffice triplet but those 
are an order of magnitude larger than QEMU.  And in fact, QEMU is 
*already* a monster that takes longer to compile than most other 
packages, no matter the language they're written in.


Most of KDE and everything that uses Qt is written in C++, and so is 
Inkscape in GTK+ land.  LLVM and Clang are written in C++.  Hotspot and 
V8 are written in C++.  Kodi, MAME and DolphinEmu are written in C++. 
GCC and GDB have migrated to C++ and their compile times have not exploded.



My other question is whether adoption of C++ would complicate any
desire to make more use of Rust in QEMU ? I know Rust came out of
work by the Mozilla Firefox crew, and Firefox was C++, but I don't
have any idea how they integrated use of Rust with Firefox, so
whether there are any gotcha's for us or not ?


Any Rust integration would go through C APIs.  Using Rust in the block 
layer would certainly be much harder, though perhaps not impossible, if 
the block layer uses C++ coroutines.  Rust supports something similar, 
but two-direction interoperability would be hard.


For everything else, not much.  Even if using C++, the fact that QEMU's 
APIs are primarily C would not change.  Changing "timer_mod_ns(timer, 
ns)" to "timer.modify_ns(ns)" is not on the table.


But really, first of all the question should be who is doing work on 
integrating Rust with QEMU.  I typically hear about this topic exactly 
once a year at KVM Forum, and then nothing.  We have seen Marc-André's 
QAPI integration experiment, but it's not clear to me what the path 
would be from there to wider use in QEMU.


In particular, after ~3 years of talking about it, it is not even clear:

- what subsystems would benefit the most from the adoption of Rust, and 
whether that would be feasible without a rewrite which will simply never 
happen


- what the plans would be for coexistence of Rust and C code within a 
subsystem


- whether maintainers would be on board with adopting a completely 
different language, and who in the community has enough Rust experience 
to shepherd us through the learning experience


The first two questions have answers in the other message if 
s/Rust/C++/, and as to the last I think we're already further in the 
discussion.


Thanks,

Paolo

[PATCH v20 3/7] net/vmnet: implement shared mode (vmnet-shared)

2022-03-15 Thread Vladislav Yaroshchuk

Interaction with vmnet.framework in different modes
differs only on configuration stage, so we can create
common `send`, `receive`, etc. procedures and reuse them.

Signed-off-by: Phillip Tennen 
Signed-off-by: Vladislav Yaroshchuk 
---
 net/vmnet-common.m | 358 +
 net/vmnet-shared.c |  90 +++-
 net/vmnet_int.h|  40 -
 3 files changed, 483 insertions(+), 5 deletions(-)

diff --git a/net/vmnet-common.m b/net/vmnet-common.m
index 06326efb1c..2cb60b9ddd 100644
--- a/net/vmnet-common.m
+++ b/net/vmnet-common.m
@@ -10,6 +10,8 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/main-loop.h"
+#include "qemu/log.h"
 #include "qapi/qapi-types-net.h"
 #include "vmnet_int.h"
 #include "clients.h"
@@ -17,4 +19,360 @@
 #include "qapi/error.h"
 
 #include 
+#include 
 
+
+static void vmnet_send_completed(NetClientState *nc, ssize_t len);
+
+
+const char *vmnet_status_map_str(vmnet_return_t status)
+{
+switch (status) {
+case VMNET_SUCCESS:
+return "success";
+case VMNET_FAILURE:
+return "general failure (possibly not enough privileges)";
+case VMNET_MEM_FAILURE:
+return "memory allocation failure";
+case VMNET_INVALID_ARGUMENT:
+return "invalid argument specified";
+case VMNET_SETUP_INCOMPLETE:
+return "interface setup is not complete";
+case VMNET_INVALID_ACCESS:
+return "invalid access, permission denied";
+case VMNET_PACKET_TOO_BIG:
+return "packet size is larger than MTU";
+case VMNET_BUFFER_EXHAUSTED:
+return "buffers exhausted in kernel";
+case VMNET_TOO_MANY_PACKETS:
+return "packet count exceeds limit";
+#if defined(MAC_OS_VERSION_11_0) && \
+MAC_OS_X_VERSION_MIN_REQUIRED >= MAC_OS_VERSION_11_0
+case VMNET_SHARING_SERVICE_BUSY:
+return "conflict, sharing service is in use";
+#endif
+default:
+return "unknown vmnet error";
+}
+}
+
+
+/**
+ * Write packets from QEMU to vmnet interface.
+ *
+ * vmnet.framework supports iov, but writing more than
+ * one iov into vmnet interface fails with
+ * 'VMNET_INVALID_ARGUMENT'. Collecting provided iovs into
+ * one and passing it to vmnet works fine. That's the
+ * reason why receive_iov() left unimplemented. But it still
+ * works with good performance having .receive() only.
+ */
+ssize_t vmnet_receive_common(NetClientState *nc,
+ const uint8_t *buf,
+ size_t size)
+{
+VmnetState *s = DO_UPCAST(VmnetState, nc, nc);
+struct vmpktdesc packet;
+struct iovec iov;
+int pkt_cnt;
+vmnet_return_t if_status;
+
+if (size > s->max_packet_size) {
+warn_report("vmnet: packet is too big, %zu > %" PRIu64,
+packet.vm_pkt_size,
+s->max_packet_size);
+return -1;
+}
+
+iov.iov_base = (char *) buf;
+iov.iov_len = size;
+
+packet.vm_pkt_iovcnt = 1;
+packet.vm_flags = 0;
+packet.vm_pkt_size = size;
+packet.vm_pkt_iov = 
+pkt_cnt = 1;
+
+if_status = vmnet_write(s->vmnet_if, , _cnt);
+if (if_status != VMNET_SUCCESS) {
+error_report("vmnet: write error: %s\n",
+ vmnet_status_map_str(if_status));
+return -1;
+}
+
+if (pkt_cnt) {
+return size;
+}
+return 0;
+}
+
+
+/**
+ * Read packets from vmnet interface and write them
+ * to temporary buffers in VmnetState.
+ *
+ * Returns read packets number (may be 0) on success,
+ * -1 on error
+ */
+static int vmnet_read_packets(VmnetState *s)
+{
+assert(s->packets_send_current_pos == s->packets_send_end_pos);
+
+struct vmpktdesc *packets = s->packets_buf;
+vmnet_return_t status;
+int i;
+
+/* Read as many packets as present */
+s->packets_send_current_pos = 0;
+s->packets_send_end_pos = VMNET_PACKETS_LIMIT;
+for (i = 0; i < s->packets_send_end_pos; ++i) {
+packets[i].vm_pkt_size = s->max_packet_size;
+packets[i].vm_pkt_iovcnt = 1;
+packets[i].vm_flags = 0;
+}
+
+status = vmnet_read(s->vmnet_if, packets, >packets_send_end_pos);
+if (status != VMNET_SUCCESS) {
+error_printf("vmnet: read failed: %s\n",
+ vmnet_status_map_str(status));
+s->packets_send_current_pos = 0;
+s->packets_send_end_pos = 0;
+return -1;
+}
+return s->packets_send_end_pos;
+}
+
+
+/**
+ * Write packets from temporary buffers in VmnetState
+ * to QEMU.
+ */
+static void vmnet_write_packets_to_qemu(VmnetState *s)
+{
+while (s->packets_send_current_pos < s->packets_send_end_pos) {
+ssize_t size = qemu_send_packet_async(>nc,
+  
s->iov_buf[s->packets_send_current_pos].iov_base,
+  
s->packets_buf[s->packets_send_current_pos].vm_pkt_size,
+  vmnet_send_completed);
+
+if (size == 0) {
+/* QEMU is not ready to consume

[PATCH v20 2/7] net/vmnet: add vmnet backends to qapi/net

2022-03-15 Thread Vladislav Yaroshchuk

Create separate netdevs for each vmnet operating mode:
- vmnet-host
- vmnet-shared
- vmnet-bridged

Signed-off-by: Vladislav Yaroshchuk 
---
 net/clients.h   |  11 
 net/meson.build |   7 +++
 net/net.c   |  10 
 net/vmnet-bridged.m |  25 +
 net/vmnet-common.m  |  20 +++
 net/vmnet-host.c|  24 
 net/vmnet-shared.c  |  25 +
 net/vmnet_int.h |  25 +
 qapi/net.json   | 133 +++-
 9 files changed, 278 insertions(+), 2 deletions(-)
 create mode 100644 net/vmnet-bridged.m
 create mode 100644 net/vmnet-common.m
 create mode 100644 net/vmnet-host.c
 create mode 100644 net/vmnet-shared.c
 create mode 100644 net/vmnet_int.h

diff --git a/net/clients.h b/net/clients.h
index 92f9b59aed..c9157789f2 100644
--- a/net/clients.h
+++ b/net/clients.h
@@ -63,4 +63,15 @@ int net_init_vhost_user(const Netdev *netdev, const char 
*name,
 
 int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
 NetClientState *peer, Error **errp);
+#ifdef CONFIG_VMNET
+int net_init_vmnet_host(const Netdev *netdev, const char *name,
+  NetClientState *peer, Error **errp);
+
+int net_init_vmnet_shared(const Netdev *netdev, const char *name,
+  NetClientState *peer, Error **errp);
+
+int net_init_vmnet_bridged(const Netdev *netdev, const char *name,
+  NetClientState *peer, Error **errp);
+#endif /* CONFIG_VMNET */
+
 #endif /* QEMU_NET_CLIENTS_H */
diff --git a/net/meson.build b/net/meson.build
index 847bc2ac85..00a88c4951 100644
--- a/net/meson.build
+++ b/net/meson.build
@@ -42,4 +42,11 @@ softmmu_ss.add(when: 'CONFIG_POSIX', if_true: 
files(tap_posix))
 softmmu_ss.add(when: 'CONFIG_WIN32', if_true: files('tap-win32.c'))
 softmmu_ss.add(when: 'CONFIG_VHOST_NET_VDPA', if_true: files('vhost-vdpa.c'))
 
+vmnet_files = files(
+  'vmnet-common.m',
+  'vmnet-bridged.m',
+  'vmnet-host.c',
+  'vmnet-shared.c'
+)
+softmmu_ss.add(when: vmnet, if_true: vmnet_files)
 subdir('can')
diff --git a/net/net.c b/net/net.c
index f0d14dbfc1..1dbb64b935 100644
--- a/net/net.c
+++ b/net/net.c
@@ -1021,6 +1021,11 @@ static int (* const 
net_client_init_fun[NET_CLIENT_DRIVER__MAX])(
 #ifdef CONFIG_L2TPV3
 [NET_CLIENT_DRIVER_L2TPV3]= net_init_l2tpv3,
 #endif
+#ifdef CONFIG_VMNET
+[NET_CLIENT_DRIVER_VMNET_HOST] = net_init_vmnet_host,
+[NET_CLIENT_DRIVER_VMNET_SHARED] = net_init_vmnet_shared,
+[NET_CLIENT_DRIVER_VMNET_BRIDGED] = net_init_vmnet_bridged,
+#endif /* CONFIG_VMNET */
 };
 
 
@@ -1106,6 +,11 @@ void show_netdevs(void)
 #endif
 #ifdef CONFIG_VHOST_VDPA
 "vhost-vdpa",
+#endif
+#ifdef CONFIG_VMNET
+"vmnet-host",
+"vmnet-shared",
+"vmnet-bridged",
 #endif
 };
 
diff --git a/net/vmnet-bridged.m b/net/vmnet-bridged.m
new file mode 100644
index 00..91c1a2f2c7
--- /dev/null
+++ b/net/vmnet-bridged.m
@@ -0,0 +1,25 @@
+/*
+ * vmnet-bridged.m
+ *
+ * Copyright(c) 2022 Vladislav Yaroshchuk 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/qapi-types-net.h"
+#include "vmnet_int.h"
+#include "clients.h"
+#include "qemu/error-report.h"
+#include "qapi/error.h"
+
+#include 
+
+int net_init_vmnet_bridged(const Netdev *netdev, const char *name,
+   NetClientState *peer, Error **errp)
+{
+  error_setg(errp, "vmnet-bridged is not implemented yet");
+  return -1;
+}
diff --git a/net/vmnet-common.m b/net/vmnet-common.m
new file mode 100644
index 00..06326efb1c
--- /dev/null
+++ b/net/vmnet-common.m
@@ -0,0 +1,20 @@
+/*
+ * vmnet-common.m - network client wrapper for Apple vmnet.framework
+ *
+ * Copyright(c) 2022 Vladislav Yaroshchuk 
+ * Copyright(c) 2021 Phillip Tennen 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/qapi-types-net.h"
+#include "vmnet_int.h"
+#include "clients.h"
+#include "qemu/error-report.h"
+#include "qapi/error.h"
+
+#include 
+
diff --git a/net/vmnet-host.c b/net/vmnet-host.c
new file mode 100644
index 00..a461d507c5
--- /dev/null
+++ b/net/vmnet-host.c
@@ -0,0 +1,24 @@
+/*
+ * vmnet-host.c
+ *
+ * Copyright(c) 2022 Vladislav Yaroshchuk 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/qapi-types-net.h"
+#include "vmnet_int.h"
+#include "clients.h"
+#include "qemu/error-report.h"
+#include "qapi/error.h"
+
+#include 
+
+int net_init_vmnet_host(const Netdev *netdev, const char *name,
+NetClientState *peer, Error **errp) {
+  error_setg(errp, "vmnet-host is not implemented yet");
+  return -1;

[PATCH v20 7/7] net/vmnet: update hmp-commands.hx

2022-03-15 Thread Vladislav Yaroshchuk

Signed-off-by: Vladislav Yaroshchuk 
---
 hmp-commands.hx | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index 8476277aa9..8f3d78f177 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1265,7 +1265,11 @@ ERST
 {
 .name   = "netdev_add",
 .args_type  = "netdev:O",
-.params = 
"[user|tap|socket|vde|bridge|hubport|netmap|vhost-user],id=str[,prop=value][,...]",
+.params = "[user|tap|socket|vde|bridge|hubport|netmap|vhost-user"
+#ifdef CONFIG_VMNET
+  "|vmnet-host|vmnet-shared|vmnet-bridged"
+#endif
+  "],id=str[,prop=value][,...]",
 .help   = "add host network device",
 .cmd= hmp_netdev_add,
 .command_completion = netdev_add_completion,
-- 
2.34.1.vfs.0.0

[PATCH v20 6/7] net/vmnet: update qemu-options.hx

2022-03-15 Thread Vladislav Yaroshchuk

Signed-off-by: Vladislav Yaroshchuk 
---
 qemu-options.hx | 25 +
 1 file changed, 25 insertions(+)

diff --git a/qemu-options.hx b/qemu-options.hx
index 5ce0ada75e..ea00d0eeb6 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -2743,6 +2743,25 @@ DEF("netdev", HAS_ARG, QEMU_OPTION_netdev,
 #ifdef __linux__
 "-netdev vhost-vdpa,id=str,vhostdev=/path/to/dev\n"
 "configure a vhost-vdpa network,Establish a vhost-vdpa 
netdev\n"
+#endif
+#ifdef CONFIG_VMNET
+"-netdev vmnet-host,id=str[,isolated=on|off][,net-uuid=uuid]\n"
+" [,start-address=addr,end-address=addr,subnet-mask=mask]\n"
+"configure a vmnet network backend in host mode with ID 
'str',\n"
+"isolate this interface from others with 'isolated',\n"
+"configure the address range and choose a subnet mask,\n"
+"specify network UUID 'uuid' to disable DHCP and interact 
with\n"
+"vmnet-host interfaces within this isolated network\n"
+"-netdev vmnet-shared,id=str[,isolated=on|off][,nat66-prefix=addr]\n"
+" [,start-address=addr,end-address=addr,subnet-mask=mask]\n"
+"configure a vmnet network backend in shared mode with ID 
'str',\n"
+"configure the address range and choose a subnet mask,\n"
+"set IPv6 ULA prefix (of length 64) to use for internal 
network,\n"
+"isolate this interface from others with 'isolated'\n"
+"-netdev vmnet-bridged,id=str,ifname=name[,isolated=on|off]\n"
+"configure a vmnet network backend in bridged mode with ID 
'str',\n"
+"use 'ifname=name' to select a physical network interface 
to be bridged,\n"
+"isolate this interface from others with 'isolated'\n"
 #endif
 "-netdev hubport,id=str,hubid=n[,netdev=nd]\n"
 "configure a hub port on the hub with ID 'n'\n", 
QEMU_ARCH_ALL)
@@ -2762,6 +2781,9 @@ DEF("nic", HAS_ARG, QEMU_OPTION_nic,
 #endif
 #ifdef CONFIG_POSIX
 "vhost-user|"
+#endif
+#ifdef CONFIG_VMNET
+"vmnet-host|vmnet-shared|vmnet-bridged|"
 #endif
 "socket][,option][,...][mac=macaddr]\n"
 "initialize an on-board / default host NIC (using MAC 
address\n"
@@ -2784,6 +2806,9 @@ DEF("net", HAS_ARG, QEMU_OPTION_net,
 #endif
 #ifdef CONFIG_NETMAP
 "netmap|"
+#endif
+#ifdef CONFIG_VMNET
+"vmnet-host|vmnet-shared|vmnet-bridged|"
 #endif
 "socket][,option][,option][,...]\n"
 "old way to initialize a host network interface\n"
-- 
2.34.1.vfs.0.0

[PATCH v20 0/7] Add vmnet.framework based network backend

2022-03-15 Thread Vladislav Yaroshchuk

macOS provides networking API for VMs called 'vmnet.framework':
https://developer.apple.com/documentation/vmnet

We can provide its support as the new QEMU network backends which
represent three different vmnet.framework interface usage modes:

  * `vmnet-shared`:
allows the guest to communicate with other guests in shared mode and
also with external network (Internet) via NAT. Has (macOS-provided)
DHCP server; subnet mask and IP range can be configured;

  * `vmnet-host`:
allows the guest to communicate with other guests in host mode.
By default has enabled DHCP as `vmnet-shared`, but providing
network unique id (uuid) can make `vmnet-host` interfaces isolated
from each other and also disables DHCP.

  * `vmnet-bridged`:
bridges the guest with a physical network interface.

This backends cannot work on macOS Catalina 10.15 cause we use
vmnet.framework API provided only with macOS 11 and newer. Seems
that it is not a problem, because QEMU guarantees to work on two most
recent versions of macOS which now are Big Sur (11) and Monterey (12).

Also, we have one inconvenient restriction: vmnet.framework interfaces
can create only privileged user:
`$ sudo qemu-system-x86_64 -nic vmnet-shared`

Attempt of `vmnet-*` netdev creation being unprivileged user fails with
vmnet's 'general failure'.

This happens because vmnet.framework requires `com.apple.vm.networking`
entitlement which is: "restricted to developers of virtualization software.
To request this entitlement, contact your Apple representative." as Apple
documentation says:
https://developer.apple.com/documentation/bundleresources/entitlements/com_apple_vm_networking

One more note: we still have quite useful but not supported
'vmnet.framework' features as creating port forwarding rules, IPv6
NAT prefix specifying and so on.

Nevertheless, new backends work fine and tested within `qemu-system-x86-64`
on macOS Bir Sur 11.5.2 host with such nic models:
  * e1000-82545em
  * virtio-net-pci
  * vmxnet3

The guests were:
  * macOS 10.15.7
  * Ubuntu Bionic (server cloudimg)


This series partially reuses patches by Phillip Tennen:
https://patchew.org/QEMU/20210218134947.1860-1-phillip.en...@gmail.com/
So I included them signed-off line into one of the commit messages and
also here.

v1 -> v2:
 Since v1 minor typos were fixed, patches rebased onto latest master,
 redundant changes removed (small commits squashed)
v2 -> v3:
 - QAPI style fixes
 - Typos fixes in comments
 - `#include`'s updated to be in sync with recent master
v3 -> v4:
 - Support vmnet interfaces isolation feature
 - Support vmnet-host network uuid setting feature
 - Refactored sources a bit
v4 -> v5:
 - Missed 6.2 boat, now 7.0 candidate
 - Fix qapi netdev descriptions and styles
   (@subnetmask -> @subnet-mask)
 - Support vmnet-shared IPv6 prefix setting feature
v5 -> v6
 - provide detailed commit messages for commits of
   many changes
 - rename properties @dhcpstart and @dhcpend to
   @start-address and @end-address
 - improve qapi documentation about isolation
   features (@isolated, @net-uuid)
v6 -> v7:
 - update MAINTAINERS list
v7 -> v8
 - QAPI code style fixes
v8 -> v9
 - Fix building on Linux: add missing qapi
   `'if': 'CONFIG_VMNET'` statement to Netdev union
v9 -> v10
 - Disable vmnet feature for macOS < 11.0: add
   vmnet.framework API probe into meson.build.
   This fixes QEMU building on macOS < 11.0:
   https://patchew.org/QEMU/20220110034000.20221-1-jasow...@redhat.com/
v10 -> v11
 - Enable vmnet for macOS 10.15 with subset of available
   features. Disable vmnet for macOS < 10.15.
 - Fix typos
v11 -> v12
 - use more general macOS version check with
   MAC_OS_VERSION_11_0 instead of manual
   definition creating.
v12 -> v13
 - fix incorrect macOS version bound while
   'feature available since 11.0' check.
   Use MAC_OS_X_VERSION_MIN_REQUIRED instead of
   MAC_OS_X_VERSION_MAX_ALLOWED.
v13 -> v14
 - fix memory leaks
 - get rid of direct global mutex taking while resending
   packets from vmnet to QEMU, schedule a bottom half
   instead (it can be a thing to discuss, maybe exists a
   better way to perform the packets transfer)
 - update hmp commands
 - a bit refactor everything
 - change the email from which patches are being
   submitted, same to email in MAINTAINERS list
 - P.S. sorry for so late reply
v14 -> v15
 - restore --enable-vdi and --disable-vdi
   mistakenly dropped in previous series
v15 -> v16
 - common: complete sending pending packets when
   QEMU is ready, refactor, fix memory leaks
 - QAPI: change version to 7.1 (cause 7.0 feature freeze
   happened). This is the only change in QAPI, Markus Armbruster,
   please confirm if you can (decided to drop your Acked-by due
   to this change)
 - vmnet-bridged: extend "supported ifnames" message buffer len
 - fix behaviour dependence on debug (add "return -1" after
   assert_not_reached)
 - use PRIu64 for proper printing
 - NOTE: This version of patch series may be one the last
   I submit

[PATCH v20 4/7] net/vmnet: implement host mode (vmnet-host)

2022-03-15 Thread Vladislav Yaroshchuk

Signed-off-by: Vladislav Yaroshchuk 
---
 net/vmnet-host.c | 109 ---
 1 file changed, 103 insertions(+), 6 deletions(-)

diff --git a/net/vmnet-host.c b/net/vmnet-host.c
index a461d507c5..e6d01fd65e 100644
--- a/net/vmnet-host.c
+++ b/net/vmnet-host.c
@@ -9,16 +9,113 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/uuid.h"
 #include "qapi/qapi-types-net.h"
-#include "vmnet_int.h"
-#include "clients.h"
-#include "qemu/error-report.h"
 #include "qapi/error.h"
+#include "clients.h"
+#include "vmnet_int.h"
 
 #include 
 
+
+static bool validate_options(const Netdev *netdev, Error **errp)
+{
+const NetdevVmnetHostOptions *options = &(netdev->u.vmnet_host);
+
+#if defined(MAC_OS_VERSION_11_0) && \
+MAC_OS_X_VERSION_MIN_REQUIRED >= MAC_OS_VERSION_11_0
+
+QemuUUID net_uuid;
+if (options->has_net_uuid &&
+qemu_uuid_parse(options->net_uuid, _uuid) < 0) {
+error_setg(errp, "Invalid UUID provided in 'net-uuid'");
+return false;
+}
+#else
+if (options->has_isolated) {
+error_setg(errp,
+   "vmnet-host.isolated feature is "
+   "unavailable: outdated vmnet.framework API");
+return false;
+}
+
+if (options->has_net_uuid) {
+error_setg(errp,
+   "vmnet-host.net-uuid feature is "
+   "unavailable: outdated vmnet.framework API");
+return false;
+}
+#endif
+
+if ((options->has_start_address ||
+ options->has_end_address ||
+ options->has_subnet_mask) &&
+!(options->has_start_address &&
+  options->has_end_address &&
+  options->has_subnet_mask)) {
+error_setg(errp,
+   "'start-address', 'end-address', 'subnet-mask' "
+   "should be provided together");
+return false;
+}
+
+return true;
+}
+
+static xpc_object_t build_if_desc(const Netdev *netdev)
+{
+const NetdevVmnetHostOptions *options = &(netdev->u.vmnet_host);
+xpc_object_t if_desc = xpc_dictionary_create(NULL, NULL, 0);
+
+xpc_dictionary_set_uint64(if_desc,
+  vmnet_operation_mode_key,
+  VMNET_HOST_MODE);
+
+#if defined(MAC_OS_VERSION_11_0) && \
+MAC_OS_X_VERSION_MIN_REQUIRED >= MAC_OS_VERSION_11_0
+
+xpc_dictionary_set_bool(if_desc,
+vmnet_enable_isolation_key,
+options->isolated);
+
+QemuUUID net_uuid;
+if (options->has_net_uuid) {
+qemu_uuid_parse(options->net_uuid, _uuid);
+xpc_dictionary_set_uuid(if_desc,
+vmnet_network_identifier_key,
+net_uuid.data);
+}
+#endif
+
+if (options->has_start_address) {
+xpc_dictionary_set_string(if_desc,
+  vmnet_start_address_key,
+  options->start_address);
+xpc_dictionary_set_string(if_desc,
+  vmnet_end_address_key,
+  options->end_address);
+xpc_dictionary_set_string(if_desc,
+  vmnet_subnet_mask_key,
+  options->subnet_mask);
+}
+
+return if_desc;
+}
+
+static NetClientInfo net_vmnet_host_info = {
+.type = NET_CLIENT_DRIVER_VMNET_HOST,
+.size = sizeof(VmnetState),
+.receive = vmnet_receive_common,
+.cleanup = vmnet_cleanup_common,
+};
+
 int net_init_vmnet_host(const Netdev *netdev, const char *name,
-NetClientState *peer, Error **errp) {
-  error_setg(errp, "vmnet-host is not implemented yet");
-  return -1;
+NetClientState *peer, Error **errp)
+{
+NetClientState *nc = qemu_new_net_client(_vmnet_host_info,
+ peer, "vmnet-host", name);
+if (!validate_options(netdev, errp)) {
+return -1;
+}
+return vmnet_if_create(nc, build_if_desc(netdev), errp);
 }
-- 
2.34.1.vfs.0.0

Re: [PULL 00/21] Darwin patches for 2022-03-15

2022-03-15 Thread Peter Maydell

On Tue, 15 Mar 2022 at 13:02, Philippe Mathieu-Daudé
 wrote:
>
> From: Philippe Mathieu-Daudé 
>
> The following changes since commit a72ada1662ee3105c5d66ddc8930d98e9cab62be:
>
>   Merge tag 'net-pull-request' of https://github.com/jasowang/qemu into 
> staging (2022-03-15 09:53:13 +)
>
> are available in the Git repository at:
>
>   https://github.com/philmd/qemu.git tags/darwin-20220315
>
> for you to fetch changes up to c82b7ef16f3efa59e28f821f25a9c084ef84ea9d:
>
>   MAINTAINERS: Volunteer to maintain Darwin-based hosts support (2022-03-15 
> 13:36:33 +0100)
>
> 
> Darwin-based host patches
>
> - Remove various build warnings
> - Fix building with modules on macOS
> - Fix mouse/keyboard GUI interactions
>


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/7.0
for any user-visible changes.

-- PMM

[PATCH v20 1/7] net/vmnet: add vmnet dependency and customizable option

2022-03-15 Thread Vladislav Yaroshchuk

vmnet.framework dependency is added with 'vmnet' option
to enable or disable it. Default value is 'auto'.

used vmnet features are available since macOS 11.0,
but new backend can be built and work properly with
subset of them on 10.15 too.

Signed-off-by: Vladislav Yaroshchuk 
---
 meson.build   | 16 +++-
 meson_options.txt |  2 ++
 scripts/meson-buildoptions.sh |  1 +
 3 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/meson.build b/meson.build
index 2d6601467f..806f3869f9 100644
--- a/meson.build
+++ b/meson.build
@@ -522,6 +522,18 @@ if cocoa.found() and get_option('gtk').enabled()
   error('Cocoa and GTK+ cannot be enabled at the same time')
 endif
 
+vmnet = dependency('appleframeworks', modules: 'vmnet', required: 
get_option('vmnet'))
+if vmnet.found() and not cc.has_header_symbol('vmnet/vmnet.h',
+  'VMNET_BRIDGED_MODE',
+  dependencies: vmnet)
+  vmnet = not_found
+  if get_option('vmnet').enabled()
+error('vmnet.framework API is outdated')
+  else
+warning('vmnet.framework API is outdated, disabling')
+  endif
+endif
+
 seccomp = not_found
 if not get_option('seccomp').auto() or have_system or have_tools
   seccomp = dependency('libseccomp', version: '>=2.3.0',
@@ -1550,6 +1562,7 @@ config_host_data.set('CONFIG_SNAPPY', snappy.found())
 config_host_data.set('CONFIG_TPM', have_tpm)
 config_host_data.set('CONFIG_USB_LIBUSB', libusb.found())
 config_host_data.set('CONFIG_VDE', vde.found())
+config_host_data.set('CONFIG_VMNET', vmnet.found())
 config_host_data.set('CONFIG_VHOST_USER_BLK_SERVER', 
have_vhost_user_blk_server)
 config_host_data.set('CONFIG_VNC', vnc.found())
 config_host_data.set('CONFIG_VNC_JPEG', jpeg.found())
@@ -3588,7 +3601,8 @@ summary(summary_info, bool_yn: true, section: 'Crypto')
 # Libraries
 summary_info = {}
 if targetos == 'darwin'
-  summary_info += {'Cocoa support':   cocoa}
+  summary_info += {'Cocoa support':   cocoa}
+  summary_info += {'vmnet.framework support': vmnet}
 endif
 summary_info += {'SDL support':   sdl}
 summary_info += {'SDL image support': sdl_image}
diff --git a/meson_options.txt b/meson_options.txt
index 52b11cead4..d2c0b6b412 100644
--- a/meson_options.txt
+++ b/meson_options.txt
@@ -175,6 +175,8 @@ option('netmap', type : 'feature', value : 'auto',
description: 'netmap network backend support')
 option('vde', type : 'feature', value : 'auto',
description: 'vde network backend support')
+option('vmnet', type : 'feature', value : 'auto',
+   description: 'vmnet.framework network backend support')
 option('virglrenderer', type : 'feature', value : 'auto',
description: 'virgl rendering support')
 option('vnc', type : 'feature', value : 'auto',
diff --git a/scripts/meson-buildoptions.sh b/scripts/meson-buildoptions.sh
index 9ee684ef03..30946f3798 100644
--- a/scripts/meson-buildoptions.sh
+++ b/scripts/meson-buildoptions.sh
@@ -116,6 +116,7 @@ meson_options_help() {
   printf "%s\n" '  usb-redir   libusbredir support'
   printf "%s\n" '  vde vde network backend support'
   printf "%s\n" '  vdi vdi image format support'
+  printf "%s\n" '  vmnet   vmnet.framework network backend support'
   printf "%s\n" '  vhost-user-blk-server'
   printf "%s\n" '  build vhost-user-blk server'
   printf "%s\n" '  virglrenderer   virgl rendering support'
-- 
2.34.1.vfs.0.0

Re: [PATCH] target/riscv: Exit current TB after an sfence.vma

2022-03-15 Thread Alistair Francis

On Wed, Mar 16, 2022 at 5:26 AM Idan Horowitz  wrote:
>
> If the pages which control the translation of the currently executing
> instructions are changed, and then the TLB is flushed using sfence.vma
> we have to exit the current TB early, to ensure we don't execute stale
> instructions.
>
> Signed-off-by: Idan Horowitz 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  target/riscv/insn_trans/trans_privileged.c.inc | 7 +++
>  1 file changed, 7 insertions(+)
>
> diff --git a/target/riscv/insn_trans/trans_privileged.c.inc 
> b/target/riscv/insn_trans/trans_privileged.c.inc
> index 53613682e8..f265e8202d 100644
> --- a/target/riscv/insn_trans/trans_privileged.c.inc
> +++ b/target/riscv/insn_trans/trans_privileged.c.inc
> @@ -114,6 +114,13 @@ static bool trans_sfence_vma(DisasContext *ctx, 
> arg_sfence_vma *a)
>  {
>  #ifndef CONFIG_USER_ONLY
>  gen_helper_tlb_flush(cpu_env);
> +/*
> + * The flush might have changed the backing physical memory of
> + * the instructions we're currently executing
> + */
> +gen_set_pc_imm(ctx, ctx->pc_succ_insn);
> +tcg_gen_exit_tb(NULL, 0);
> +ctx->base.is_jmp = DISAS_NORETURN;
>  return true;
>  #endif
>  return false;
> --
> 2.35.1
>
>

[PATCH] linux-user: Clean up arg_start/arg_end confusion

2022-03-15 Thread Richard Henderson

We had two sets of variables: arg_start/arg_end, and
arg_strings/env_strings.  In linuxload.c, we set the
first pair to the bounds of the argv strings, but in
elfload.c, we set the first pair to the bounds of the
argv pointers and the second pair to the bounds of
the argv strings.

Remove arg_start/arg_end, replacing them with the standard
argc/argv/envc/envp values.  Retain arg_strings/env_strings.
Update linuxload.c, elfload.c, and arm-compat-semi.c to match.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/714
Signed-off-by: Richard Henderson 
---
 linux-user/qemu.h | 12 
 linux-user/elfload.c  | 10 ++
 linux-user/linuxload.c| 18 ++
 linux-user/main.c |  5 ++---
 semihosting/arm-compat-semi.c |  4 ++--
 5 files changed, 28 insertions(+), 21 deletions(-)

diff --git a/linux-user/qemu.h b/linux-user/qemu.h
index 98dfbf2096..3ac39793e1 100644
--- a/linux-user/qemu.h
+++ b/linux-user/qemu.h
@@ -40,15 +40,19 @@ struct image_info {
 abi_ulong   data_offset;
 abi_ulong   saved_auxv;
 abi_ulong   auxv_len;
-abi_ulong   arg_start;
-abi_ulong   arg_end;
-abi_ulong   arg_strings;
-abi_ulong   env_strings;
+abi_ulong   argc;
+abi_ulong   argv;
+abi_ulong   envc;
+abi_ulong   envp;
 abi_ulong   file_string;
 uint32_telf_flags;
 int personality;
 abi_ulong   alignment;
 
+/* Generic semihosting knows about these pointers. */
+abi_ulong   arg_strings;   /* strings for argv */
+abi_ulong   env_strings;   /* strings for envp; ends arg_strings */
+
 /* The fields below are used in FDPIC mode.  */
 abi_ulong   loadmap_addr;
 uint16_tnsegs;
diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index 9628a38361..828ac2d8db 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -1517,8 +1517,8 @@ static inline void init_thread(struct target_pt_regs 
*regs,
 regs->iaoq[0] = infop->entry;
 regs->iaoq[1] = infop->entry + 4;
 regs->gr[23] = 0;
-regs->gr[24] = infop->arg_start;
-regs->gr[25] = (infop->arg_end - infop->arg_start) / sizeof(abi_ulong);
+regs->gr[24] = infop->argv;
+regs->gr[25] = infop->argc;
 /* The top-of-stack contains a linkage buffer.  */
 regs->gr[30] = infop->start_stack + 64;
 regs->gr[31] = infop->entry;
@@ -2121,8 +2121,10 @@ static abi_ulong create_elf_tables(abi_ulong p, int 
argc, int envc,
 u_envp = u_argv + (argc + 1) * n;
 u_auxv = u_envp + (envc + 1) * n;
 info->saved_auxv = u_auxv;
-info->arg_start = u_argv;
-info->arg_end = u_argv + argc * n;
+info->argc = argc;
+info->envc = envc;
+info->argv = u_argv;
+info->envp = u_envp;
 
 /* This is correct because Linux defines
  * elf_addr_t as Elf32_Off / Elf64_Off
diff --git a/linux-user/linuxload.c b/linux-user/linuxload.c
index 2ed5fc45ed..eb010b0109 100644
--- a/linux-user/linuxload.c
+++ b/linux-user/linuxload.c
@@ -92,33 +92,35 @@ abi_ulong loader_build_argptr(int envc, int argc, abi_ulong 
sp,
 envp = sp;
 sp -= (argc + 1) * n;
 argv = sp;
+ts->info->envp = envp;
+ts->info->envc = envc;
+ts->info->argv = argv;
+ts->info->argc = argc;
+
 if (push_ptr) {
-/* FIXME - handle put_user() failures */
 sp -= n;
 put_user_ual(envp, sp);
 sp -= n;
 put_user_ual(argv, sp);
 }
+
 sp -= n;
-/* FIXME - handle put_user() failures */
 put_user_ual(argc, sp);
-ts->info->arg_start = stringp;
+
+ts->info->arg_strings = stringp;
 while (argc-- > 0) {
-/* FIXME - handle put_user() failures */
 put_user_ual(stringp, argv);
 argv += n;
 stringp += target_strlen(stringp) + 1;
 }
-ts->info->arg_end = stringp;
-/* FIXME - handle put_user() failures */
 put_user_ual(0, argv);
+
+ts->info->env_strings = stringp;
 while (envc-- > 0) {
-/* FIXME - handle put_user() failures */
 put_user_ual(stringp, envp);
 envp += n;
 stringp += target_strlen(stringp) + 1;
 }
-/* FIXME - handle put_user() failures */
 put_user_ual(0, envp);
 
 return sp;
diff --git a/linux-user/main.c b/linux-user/main.c
index fbc9bcfd5f..8995379aa3 100644
--- a/linux-user/main.c
+++ b/linux-user/main.c
@@ -869,9 +869,8 @@ int main(int argc, char **argv, char **envp)
 qemu_log("start_stack 0x" TARGET_ABI_FMT_lx "\n", info->start_stack);
 qemu_log("brk 0x" TARGET_ABI_FMT_lx "\n", info->brk);
 qemu_log("entry   0x" TARGET_ABI_FMT_lx "\n", info->entry);
-qemu_log("argv_start  0x" TARGET_ABI_FMT_lx "\n", info->arg_start);
-qemu_log("env_start   0x" TARGET_ABI_FMT_lx "\n",
- info->arg_end +

Re: [RFC v4 01/21] vfio-user: introduce vfio-user protocol specification

2022-03-15 Thread Alex Williamson

On Tue, 15 Mar 2022 21:43:15 +
Thanos Makatos  wrote:

> > -Original Message-
> > From: Qemu-devel  > bounces+thanos.makatos=nutanix@nongnu.org> On Behalf Of Alex  
> > Williamson
> > Sent: 09 March 2022 22:35
> > To: John Johnson 
> > Cc: qemu-devel@nongnu.org
> > Subject: Re: [RFC v4 01/21] vfio-user: introduce vfio-user protocol 
> > specification
> > 
> > On Tue, 11 Jan 2022 16:43:37 -0800
> > John Johnson  wrote:  
> > > +VFIO region info cap sparse mmap
> > > +
> > > +
> > > ++--++--+
> > > +| Name | Offset | Size |
> > > ++==++==+
> > > +| nr_areas | 0  | 4|
> > > ++--++--+
> > > +| reserved | 4  | 4|
> > > ++--++--+
> > > +| offset   | 8  | 8|
> > > ++--++--+
> > > +| size | 16 | 9|
> > > ++--++--+  
> > 
> > Typo, I'm pretty sure size isn't 9 bytes.
> >   
> > > +| ...  ||  |
> > > ++--++--+
> > > +
> > > +* *nr_areas* is the number of sparse mmap areas in the region.
> > > +* *offset* and size describe a single area that can be mapped by the 
> > > client.
> > > +  There will be *nr_areas* pairs of offset and size. The offset will be 
> > > added to
> > > +  the base offset given in the ``VFIO_USER_DEVICE_GET_REGION_INFO`` to  
> > form the  
> > > +  offset argument of the subsequent mmap() call.
> > > +
> > > +The VFIO sparse mmap area is defined in  (``struct
> > > +vfio_region_info_cap_sparse_mmap``).
> > > +
> > > +VFIO region type cap header
> > > +"""
> > > +
> > > ++--+---+
> > > +| Name | Value |
> > > ++==+===+
> > > +| id   | VFIO_REGION_INFO_CAP_TYPE |
> > > ++--+---+
> > > +| version  | 0x1   |
> > > ++--+---+
> > > +| next | |
> > > ++--+---+
> > > +| region info type | VFIO region info type |
> > > ++--+---+
> > > +
> > > +This capability is defined when a region is specific to the device.
> > > +
> > > +VFIO region info type cap
> > > +"
> > > +
> > > +The VFIO region info type is defined in 
> > > +(``struct vfio_region_info_cap_type``).
> > > +
> > > ++-++--+
> > > +| Name| Offset | Size |
> > > ++=++==+
> > > +| type| 0  | 4|
> > > ++-++--+
> > > +| subtype | 4  | 4|
> > > ++-++--+
> > > +
> > > +The only device-specific region type and subtype supported by vfio-user 
> > > is
> > > +``VFIO_REGION_TYPE_MIGRATION`` (3) and  
> > ``VFIO_REGION_SUBTYPE_MIGRATION`` (1).
> > 
> > These should be considered deprecated from the kernel interface.  I
> > hope there are plans for vfio-user to adopt the new interface that's
> > currently available in linux-next and intended for v5.18.
> > 
> > ...  
> > > +Unused VFIO ``ioctl()`` commands
> > > +
> > > +
> > > +The following VFIO commands do not have an equivalent vfio-user  
> > command:  
> > > +
> > > +* ``VFIO_GET_API_VERSION``
> > > +* ``VFIO_CHECK_EXTENSION``
> > > +* ``VFIO_SET_IOMMU``
> > > +* ``VFIO_GROUP_GET_STATUS``
> > > +* ``VFIO_GROUP_SET_CONTAINER``
> > > +* ``VFIO_GROUP_UNSET_CONTAINER``
> > > +* ``VFIO_GROUP_GET_DEVICE_FD``
> > > +* ``VFIO_IOMMU_GET_INFO``
> > > +
> > > +However, once support for live migration for VFIO devices is finalized 
> > > some
> > > +of the above commands may have to be handled by the client in their
> > > +corresponding vfio-user form. This will be addressed in a future protocol
> > > +version.  
> > 
> > As above, I'd go ahead and drop the migration region interface support,
> > it's being removed from the kernel.  Dirty page handling might also be
> > something you want to pull back on as we're expecting in-kernel vfio to
> > essentially deprecate its iommu backends in favor of a new shared
> > userspace iommufd interface.  We expect to have backwards compatibility
> > via that interface, but as QEMU migration support for vfio-pci devices
> > is experimental and there are desires not to consolidate dirty page
> > tracking behind the iommu interface in the new model, it's not clear if
> > the kernel will continue to expose the current dirty page tracking.
> > 
> > AIUI, we're expecting to see patches officially proposing the iommufd
> > interface in the kernel "soon".  Thanks,  
> 
> Are you referring to the "[RFC v2] /dev/iommu uAPI proposal" work 
> (https://lkml.org/lkml/2021/7/9/89)?

There's a more recent proposal here:

https://lore.kernel.org/all/20210919063848.1476776-1-yi.l@intel.com/

But I suspect based on

Re: [PATCH v2] gitlab: include new aarch32 job in custom-runners

2022-03-15 Thread Richard Henderson


On 3/15/22 05:19, Alex Bennée wrote:

Without linking it in it won't be presented on the UI. Also while
doing that fix the misnamed job from 20.40 to 20.04.

Fixes: cc44a16002 ("gitlab: add a new aarch32 custom runner definition")
Signed-off-by: Alex Bennée 


Reviewed-by: Richard Henderson 


r~

RE: [RFC v4 01/21] vfio-user: introduce vfio-user protocol specification

2022-03-15 Thread Thanos Makatos




> -Original Message-
> From: Qemu-devel  bounces+thanos.makatos=nutanix@nongnu.org> On Behalf Of Alex
> Williamson
> Sent: 09 March 2022 22:35
> To: John Johnson 
> Cc: qemu-devel@nongnu.org
> Subject: Re: [RFC v4 01/21] vfio-user: introduce vfio-user protocol 
> specification
> 
> On Tue, 11 Jan 2022 16:43:37 -0800
> John Johnson  wrote:
> > +VFIO region info cap sparse mmap
> > +
> > +
> > ++--++--+
> > +| Name | Offset | Size |
> > ++==++==+
> > +| nr_areas | 0  | 4|
> > ++--++--+
> > +| reserved | 4  | 4|
> > ++--++--+
> > +| offset   | 8  | 8|
> > ++--++--+
> > +| size | 16 | 9|
> > ++--++--+
> 
> Typo, I'm pretty sure size isn't 9 bytes.
> 
> > +| ...  ||  |
> > ++--++--+
> > +
> > +* *nr_areas* is the number of sparse mmap areas in the region.
> > +* *offset* and size describe a single area that can be mapped by the 
> > client.
> > +  There will be *nr_areas* pairs of offset and size. The offset will be 
> > added to
> > +  the base offset given in the ``VFIO_USER_DEVICE_GET_REGION_INFO`` to
> form the
> > +  offset argument of the subsequent mmap() call.
> > +
> > +The VFIO sparse mmap area is defined in  (``struct
> > +vfio_region_info_cap_sparse_mmap``).
> > +
> > +VFIO region type cap header
> > +"""
> > +
> > ++--+---+
> > +| Name | Value |
> > ++==+===+
> > +| id   | VFIO_REGION_INFO_CAP_TYPE |
> > ++--+---+
> > +| version  | 0x1   |
> > ++--+---+
> > +| next | |
> > ++--+---+
> > +| region info type | VFIO region info type |
> > ++--+---+
> > +
> > +This capability is defined when a region is specific to the device.
> > +
> > +VFIO region info type cap
> > +"
> > +
> > +The VFIO region info type is defined in 
> > +(``struct vfio_region_info_cap_type``).
> > +
> > ++-++--+
> > +| Name| Offset | Size |
> > ++=++==+
> > +| type| 0  | 4|
> > ++-++--+
> > +| subtype | 4  | 4|
> > ++-++--+
> > +
> > +The only device-specific region type and subtype supported by vfio-user is
> > +``VFIO_REGION_TYPE_MIGRATION`` (3) and
> ``VFIO_REGION_SUBTYPE_MIGRATION`` (1).
> 
> These should be considered deprecated from the kernel interface.  I
> hope there are plans for vfio-user to adopt the new interface that's
> currently available in linux-next and intended for v5.18.
> 
> ...
> > +Unused VFIO ``ioctl()`` commands
> > +
> > +
> > +The following VFIO commands do not have an equivalent vfio-user
> command:
> > +
> > +* ``VFIO_GET_API_VERSION``
> > +* ``VFIO_CHECK_EXTENSION``
> > +* ``VFIO_SET_IOMMU``
> > +* ``VFIO_GROUP_GET_STATUS``
> > +* ``VFIO_GROUP_SET_CONTAINER``
> > +* ``VFIO_GROUP_UNSET_CONTAINER``
> > +* ``VFIO_GROUP_GET_DEVICE_FD``
> > +* ``VFIO_IOMMU_GET_INFO``
> > +
> > +However, once support for live migration for VFIO devices is finalized some
> > +of the above commands may have to be handled by the client in their
> > +corresponding vfio-user form. This will be addressed in a future protocol
> > +version.
> 
> As above, I'd go ahead and drop the migration region interface support,
> it's being removed from the kernel.  Dirty page handling might also be
> something you want to pull back on as we're expecting in-kernel vfio to
> essentially deprecate its iommu backends in favor of a new shared
> userspace iommufd interface.  We expect to have backwards compatibility
> via that interface, but as QEMU migration support for vfio-pci devices
> is experimental and there are desires not to consolidate dirty page
> tracking behind the iommu interface in the new model, it's not clear if
> the kernel will continue to expose the current dirty page tracking.
> 
> AIUI, we're expecting to see patches officially proposing the iommufd
> interface in the kernel "soon".  Thanks,

Are you referring to the "[RFC v2] /dev/iommu uAPI proposal" work 
(https://lkml.org/lkml/2021/7/9/89)?

> 
> Alex
>

Re: [PATCH 2/2] target/arm: Log fault address for M-profile faults

2022-03-15 Thread Richard Henderson


On 3/15/22 13:43, Peter Maydell wrote:

For M-profile, the fault address is not always exposed to the guest
in a fault register (for instance the BFAR bus fault address register
is only updated for bus faults on data accesses, not instruction
accesses).  Currently we log the address only if we're putting it
into a particular guest-visible register.  Since we always have it,
log it generically, to make logs of i-side faults a bit clearer.

Signed-off-by: Peter Maydell
---
  target/arm/m_helper.c | 6 ++
  1 file changed, 6 insertions(+)


Reviewed-by: Richard Henderson 

r~

Re: [PATCH 1/2] target/arm: Log M-profile vector table accesses

2022-03-15 Thread Richard Henderson


On 3/15/22 13:43, Peter Maydell wrote:

Currently the CPU_LOG_INT logging misses some useful information
about loads from the vector table.  Add logging where we load vector
table entries.  This is particularly helpful for cases where the user
has accidentally not put a vector table in their image at all, which
can result in confusing guest crashes at startup.

Here's an example of the new logging for a case where
the vector table contains garbage:

Loaded reset SP 0x0 PC 0x0 from vector table
Loaded reset SP 0xd008f8df PC 0xf000bf00 from vector table
Taking exception 3 [Prefetch Abort] on CPU 0
...with CFSR.IACCVIOL
...BusFault with BFSR.STKERR
...taking pending nonsecure exception 3
...loading from element 3 of non-secure vector table at 0xc
...loaded new PC 0x2558

IN:
0x2558:  0879  stmdaeq  r0, {r0, r3, r4, r5, r6}

(The double reset logging is the result of our long-standing
"CPUs all get reset twice" weirdness; it looks a bit ugly
but it'll go away if we ever fix that :-))

Signed-off-by: Peter Maydell
---
  target/arm/cpu.c  | 5 +
  target/arm/m_helper.c | 5 +
  2 files changed, 10 insertions(+)


Reviewed-by: Richard Henderson 

r~

[PATCH 2/2] target/arm: Log fault address for M-profile faults

2022-03-15 Thread Peter Maydell

For M-profile, the fault address is not always exposed to the guest
in a fault register (for instance the BFAR bus fault address register
is only updated for bus faults on data accesses, not instruction
accesses).  Currently we log the address only if we're putting it
into a particular guest-visible register.  Since we always have it,
log it generically, to make logs of i-side faults a bit clearer.

Signed-off-by: Peter Maydell 
---
 target/arm/m_helper.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/target/arm/m_helper.c b/target/arm/m_helper.c
index 3bd16c0c465..b7a0fe01141 100644
--- a/target/arm/m_helper.c
+++ b/target/arm/m_helper.c
@@ -2272,7 +2272,13 @@ void arm_v7m_cpu_do_interrupt(CPUState *cs)
  * Note that for M profile we don't have a guest facing FSR, but
  * the env->exception.fsr will be populated by the code that
  * raises the fault, in the A profile short-descriptor format.
+ *
+ * Log the exception.vaddress now regardless of subtype, because
+ * logging below only logs it when it goes into a guest visible
+ * register.
  */
+qemu_log_mask(CPU_LOG_INT, "...at fault address 0x%x\n",
+  (uint32_t)env->exception.vaddress);
 switch (env->exception.fsr & 0xf) {
 case M_FAKE_FSR_NSC_EXEC:
 /*
-- 
2.25.1

[PATCH 0/2] target/arm: Improve M-profile exception logging

2022-03-15 Thread Peter Maydell

Our current logging for M-profile exceptions has a couple of holes
which are particularly confusing for the case of an exception taken
immediately out of reset:
 * we don't log the initial PC/SP loaded from the vector table
 * we don't log the PC we load from the vector table when
   we take an exception
 * we don't log the address for i-side aborts

This case is quite common where the user has failed to provide a
vector table in their ELF file and QEMU thus loads garbage for the
initial PC. At the moment the logging looks like:

$ qemu-system-arm [...] -d in_asm,cpu,exec,int
Taking exception 3 [Prefetch Abort] on CPU 0
...with CFSR.IACCVIOL
...BusFault with BFSR.STKERR
...taking pending nonsecure exception 3

IN: 
0x2558:  0879  stmdaeq  r0, {r0, r3, r4, r5, r6}


After this patchset it looks like:

$ qemu-system-arm [...] -d in_asm,cpu,exec,int
Loaded reset SP 0x0 PC 0x0 from vector table
Loaded reset SP 0xd008f8df PC 0xf000bf00 from vector table
Taking exception 3 [Prefetch Abort] on CPU 0
...at fault address 0xf000bf00
...with CFSR.IACCVIOL
...BusFault with BFSR.STKERR
...taking pending nonsecure exception 3
...loading from element 3 of non-secure vector table at 0xc
...loaded new PC 0x2558

IN: 
0x2558:  0879  stmdaeq  r0, {r0, r3, r4, r5, r6}

and I think it is somewhat clearer that we loaded a bogus
PC from the vector table at reset, faulted at that address,
loaded the HardFault entry point which was bogus but at
least readable, and started executing code from there.

The double-logging of the reset loads is the result of
the way we currently reset the CPU twice on QEMU startup.
If we ever manage to fix that silliness it'll go away.


(Patchset inspired by a stackexchange question:
https://stackoverflow.com/questions/71486314/loading-an-elf-file-into-qemu
)

thanks
-- PMM

Peter Maydell (2):
  target/arm: Log M-profile vector table accesses
  target/arm: Log fault address for M-profile faults

 target/arm/cpu.c  |  5 +
 target/arm/m_helper.c | 11 +++
 2 files changed, 16 insertions(+)

-- 
2.25.1

[PATCH 1/2] target/arm: Log M-profile vector table accesses

2022-03-15 Thread Peter Maydell

Currently the CPU_LOG_INT logging misses some useful information
about loads from the vector table.  Add logging where we load vector
table entries.  This is particularly helpful for cases where the user
has accidentally not put a vector table in their image at all, which
can result in confusing guest crashes at startup.

Here's an example of the new logging for a case where
the vector table contains garbage:

Loaded reset SP 0x0 PC 0x0 from vector table
Loaded reset SP 0xd008f8df PC 0xf000bf00 from vector table
Taking exception 3 [Prefetch Abort] on CPU 0
...with CFSR.IACCVIOL
...BusFault with BFSR.STKERR
...taking pending nonsecure exception 3
...loading from element 3 of non-secure vector table at 0xc
...loaded new PC 0x2558

IN:
0x2558:  0879  stmdaeq  r0, {r0, r3, r4, r5, r6}

(The double reset logging is the result of our long-standing
"CPUs all get reset twice" weirdness; it looks a bit ugly
but it'll go away if we ever fix that :-))

Signed-off-by: Peter Maydell 
---
 target/arm/cpu.c  | 5 +
 target/arm/m_helper.c | 5 +
 2 files changed, 10 insertions(+)

diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index 185d4e774d5..498fb9f71b3 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -21,6 +21,7 @@
 #include "qemu/osdep.h"
 #include "qemu/qemu-print.h"
 #include "qemu/timer.h"
+#include "qemu/log.h"
 #include "qemu-common.h"
 #include "target/arm/idau.h"
 #include "qemu/module.h"
@@ -366,6 +367,10 @@ static void arm_cpu_reset(DeviceState *dev)
 initial_pc = ldl_phys(s->as, vecbase + 4);
 }
 
+qemu_log_mask(CPU_LOG_INT,
+  "Loaded reset SP 0x%x PC 0x%x from vector table\n",
+  initial_msp, initial_pc);
+
 env->regs[13] = initial_msp & 0xFFFC;
 env->regs[15] = initial_pc & ~1;
 env->thumb = initial_pc & 1;
diff --git a/target/arm/m_helper.c b/target/arm/m_helper.c
index 648a3b3fc16..3bd16c0c465 100644
--- a/target/arm/m_helper.c
+++ b/target/arm/m_helper.c
@@ -679,6 +679,10 @@ static bool arm_v7m_load_vector(ARMCPU *cpu, int exc, bool 
targets_secure,
 ARMMMUIdx mmu_idx;
 bool exc_secure;
 
+qemu_log_mask(CPU_LOG_INT,
+  "...loading from element %d of %s vector table at 0x%x\n",
+  exc, targets_secure ? "secure" : "non-secure", addr);
+
 mmu_idx = arm_v7m_mmu_idx_for_secstate_and_priv(env, targets_secure, true);
 
 /*
@@ -719,6 +723,7 @@ static bool arm_v7m_load_vector(ARMCPU *cpu, int exc, bool 
targets_secure,
 goto load_fail;
 }
 *pvec = vector_entry;
+qemu_log_mask(CPU_LOG_INT, "...loaded new PC 0x%x\n", *pvec);
 return true;
 
 load_fail:
-- 
2.25.1

Re: [PATCH v19 4/7] net/vmnet: implement host mode (vmnet-host)

2022-03-15 Thread Akihiko Odaki


On 2022/03/16 5:27, Vladislav Yaroshchuk wrote:

Signed-off-by: Vladislav Yaroshchuk 
---
  net/vmnet-host.c | 110 ---
  1 file changed, 104 insertions(+), 6 deletions(-)

diff --git a/net/vmnet-host.c b/net/vmnet-host.c
index a461d507c5..8f7a638967 100644
--- a/net/vmnet-host.c
+++ b/net/vmnet-host.c
@@ -9,16 +9,114 @@
   */
  
  #include "qemu/osdep.h"

+#include "qemu/uuid.h"
  #include "qapi/qapi-types-net.h"
-#include "vmnet_int.h"
-#include "clients.h"
-#include "qemu/error-report.h"
  #include "qapi/error.h"
+#include "clients.h"
+#include "vmnet_int.h"
  
  #include 
  
+

+static bool validate_options(const Netdev *netdev, Error **errp)
+{
+const NetdevVmnetHostOptions *options = &(netdev->u.vmnet_host);
+QemuUUID uuid;


The variable uuid is used only when defined(MAC_OS_VERSION_11_0) && \
MAC_OS_X_VERSION_MIN_REQUIRED >= MAC_OS_VERSION_11_0 and may result in a 
compilation warning otherwise. It should be in the #if block as 
network_uuid variable in build_if_desc is in the counterpart.


Also I suggest to unify the names of identifiers. There are 
options->net_uuid, uuid, and network_uuid, but the differences tells 
nothing.


This should be the last thing to be addressed (unless I missed something 
again.) Thank you for persistence (It's v19!). I really appreciate your 
contribution.


Regards,
Akihiko Odaki


+
+#if defined(MAC_OS_VERSION_11_0) && \
+MAC_OS_X_VERSION_MIN_REQUIRED >= MAC_OS_VERSION_11_0
+
+if (options->has_net_uuid &&
+qemu_uuid_parse(options->net_uuid, ) < 0) {
+error_setg(errp, "Invalid UUID provided in 'net-uuid'");
+return false;
+}
+#else
+if (options->has_isolated) {
+error_setg(errp,
+   "vmnet-host.isolated feature is "
+   "unavailable: outdated vmnet.framework API");
+return false;
+}
+
+if (options->has_net_uuid) {
+error_setg(errp,
+   "vmnet-host.net-uuid feature is "
+   "unavailable: outdated vmnet.framework API");
+return false;
+}
+#endif
+
+if ((options->has_start_address ||
+ options->has_end_address ||
+ options->has_subnet_mask) &&
+!(options->has_start_address &&
+  options->has_end_address &&
+  options->has_subnet_mask)) {
+error_setg(errp,
+   "'start-address', 'end-address', 'subnet-mask' "
+   "should be provided together");
+return false;
+}
+
+return true;
+}
+
+static xpc_object_t build_if_desc(const Netdev *netdev,
+  NetClientState *nc)
+{
+const NetdevVmnetHostOptions *options = &(netdev->u.vmnet_host);
+xpc_object_t if_desc = xpc_dictionary_create(NULL, NULL, 0);
+
+xpc_dictionary_set_uint64(if_desc,
+  vmnet_operation_mode_key,
+  VMNET_HOST_MODE);
+
+#if defined(MAC_OS_VERSION_11_0) && \
+MAC_OS_X_VERSION_MIN_REQUIRED >= MAC_OS_VERSION_11_0
+
+xpc_dictionary_set_bool(if_desc,
+vmnet_enable_isolation_key,
+options->isolated);
+
+QemuUUID network_uuid;
+if (options->has_net_uuid) {
+qemu_uuid_parse(options->net_uuid, _uuid);
+xpc_dictionary_set_uuid(if_desc,
+vmnet_network_identifier_key,
+network_uuid.data);
+}
+#endif
+
+if (options->has_start_address) {
+xpc_dictionary_set_string(if_desc,
+  vmnet_start_address_key,
+  options->start_address);
+xpc_dictionary_set_string(if_desc,
+  vmnet_end_address_key,
+  options->end_address);
+xpc_dictionary_set_string(if_desc,
+  vmnet_subnet_mask_key,
+  options->subnet_mask);
+}
+
+return if_desc;
+}
+
+static NetClientInfo net_vmnet_host_info = {
+.type = NET_CLIENT_DRIVER_VMNET_HOST,
+.size = sizeof(VmnetState),
+.receive = vmnet_receive_common,
+.cleanup = vmnet_cleanup_common,
+};
+
  int net_init_vmnet_host(const Netdev *netdev, const char *name,
-NetClientState *peer, Error **errp) {
-  error_setg(errp, "vmnet-host is not implemented yet");
-  return -1;
+NetClientState *peer, Error **errp)
+{
+NetClientState *nc = qemu_new_net_client(_vmnet_host_info,
+ peer, "vmnet-host", name);
+if (!validate_options(netdev, errp)) {
+return -1;
+}
+return vmnet_if_create(nc, build_if_desc(netdev, nc), errp);
  }

[PATCH v19 7/7] net/vmnet: update hmp-commands.hx

2022-03-15 Thread Vladislav Yaroshchuk

Signed-off-by: Vladislav Yaroshchuk 
---
 hmp-commands.hx | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index 8476277aa9..8f3d78f177 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1265,7 +1265,11 @@ ERST
 {
 .name   = "netdev_add",
 .args_type  = "netdev:O",
-.params = 
"[user|tap|socket|vde|bridge|hubport|netmap|vhost-user],id=str[,prop=value][,...]",
+.params = "[user|tap|socket|vde|bridge|hubport|netmap|vhost-user"
+#ifdef CONFIG_VMNET
+  "|vmnet-host|vmnet-shared|vmnet-bridged"
+#endif
+  "],id=str[,prop=value][,...]",
 .help   = "add host network device",
 .cmd= hmp_netdev_add,
 .command_completion = netdev_add_completion,
-- 
2.34.1.vfs.0.0

Re: [PATCH v2 1/3] util/osdep: Avoid madvise proto on modern Solaris

2022-03-15 Thread Peter Maydell

On Tue, 15 Mar 2022 at 19:16, Andrew Deason  wrote:
>
> On Tue, 15 Mar 2022 18:33:33 +
> Peter Maydell  wrote:
> > Can you put the prototype in include/qemu/osdep.h, please?
> > (Exact location not very important as long as it's inside
> > the extern-C block, but I suggest just under the bit where we
> > define SIGIO for __HAIKU__.)
>
> Okay, but this will cause callers that call madvise() directly to
> "work", even though they're not going through the qemu_madvise wrapper.
> There's one area in cross-platform code you noted before, in
> softmmu/physmem.c, and that does cause the same build error if the
> prototype is missing. (I'm going to add another commit to make that use
> the wrapper in the next patchset.)
>
> I assume that's not a concern unless I hear otherwise; just pointing it
> out.

Yeah, that's fine. The idea is that osdep.h is where we fix up this
kind of odd system-header bug, and we do it for everywhere, because
otherwise it's too easy to forget to put in the "make this work
on the oddball platform" code where it's needed.

If you add the patch to change physmem.c, please cc: the author
of the commit that added it (commit cdfa56c551bb) -- it looks
a bit complicated so it's possible it is intentional.

-- PMM

[PATCH v19 5/7] net/vmnet: implement bridged mode (vmnet-bridged)

2022-03-15 Thread Vladislav Yaroshchuk

Signed-off-by: Vladislav Yaroshchuk 
---
 net/vmnet-bridged.m | 128 ++--
 1 file changed, 123 insertions(+), 5 deletions(-)

diff --git a/net/vmnet-bridged.m b/net/vmnet-bridged.m
index 91c1a2f2c7..5936c87718 100644
--- a/net/vmnet-bridged.m
+++ b/net/vmnet-bridged.m
@@ -10,16 +10,134 @@
 
 #include "qemu/osdep.h"
 #include "qapi/qapi-types-net.h"
-#include "vmnet_int.h"
-#include "clients.h"
-#include "qemu/error-report.h"
 #include "qapi/error.h"
+#include "clients.h"
+#include "vmnet_int.h"
 
 #include 
 
+
+static bool validate_ifname(const char *ifname)
+{
+xpc_object_t shared_if_list = vmnet_copy_shared_interface_list();
+bool match = false;
+if (!xpc_array_get_count(shared_if_list)) {
+goto done;
+}
+
+match = !xpc_array_apply(
+shared_if_list,
+^bool(size_t index, xpc_object_t value) {
+return strcmp(xpc_string_get_string_ptr(value), ifname) != 0;
+});
+
+done:
+xpc_release(shared_if_list);
+return match;
+}
+
+
+static bool get_valid_ifnames(char *output_buf)
+{
+xpc_object_t shared_if_list = vmnet_copy_shared_interface_list();
+__block const char *ifname = NULL;
+__block int str_offset = 0;
+bool interfaces_available = true;
+
+if (!xpc_array_get_count(shared_if_list)) {
+interfaces_available = false;
+goto done;
+}
+
+xpc_array_apply(
+shared_if_list,
+^bool(size_t index, xpc_object_t value) {
+/* build list of strings like "en0 en1 en2 " */
+ifname = xpc_string_get_string_ptr(value);
+strcpy(output_buf + str_offset, ifname);
+strcpy(output_buf + str_offset + strlen(ifname), " ");
+str_offset += strlen(ifname) + 1;
+return true;
+});
+
+done:
+xpc_release(shared_if_list);
+return interfaces_available;
+}
+
+
+static bool validate_options(const Netdev *netdev, Error **errp)
+{
+const NetdevVmnetBridgedOptions *options = &(netdev->u.vmnet_bridged);
+char ifnames[1024];
+
+if (!validate_ifname(options->ifname)) {
+if (get_valid_ifnames(ifnames)) {
+error_setg(errp,
+   "unsupported ifname '%s', expected one of [ %s]",
+   options->ifname,
+   ifnames);
+return false;
+}
+error_setg(errp,
+   "unsupported ifname '%s', no supported "
+   "interfaces available",
+   options->ifname);
+return false;
+}
+
+#if !defined(MAC_OS_VERSION_11_0) || \
+MAC_OS_X_VERSION_MIN_REQUIRED < MAC_OS_VERSION_11_0
+if (options->has_isolated) {
+error_setg(errp,
+   "vmnet-bridged.isolated feature is "
+   "unavailable: outdated vmnet.framework API");
+return false;
+}
+#endif
+return true;
+}
+
+
+static xpc_object_t build_if_desc(const Netdev *netdev)
+{
+const NetdevVmnetBridgedOptions *options = &(netdev->u.vmnet_bridged);
+xpc_object_t if_desc = xpc_dictionary_create(NULL, NULL, 0);
+
+xpc_dictionary_set_uint64(if_desc,
+  vmnet_operation_mode_key,
+  VMNET_BRIDGED_MODE
+);
+
+xpc_dictionary_set_string(if_desc,
+  vmnet_shared_interface_name_key,
+  options->ifname);
+
+#if defined(MAC_OS_VERSION_11_0) && \
+MAC_OS_X_VERSION_MIN_REQUIRED >= MAC_OS_VERSION_11_0
+xpc_dictionary_set_bool(if_desc,
+vmnet_enable_isolation_key,
+options->isolated);
+#endif
+return if_desc;
+}
+
+
+static NetClientInfo net_vmnet_bridged_info = {
+.type = NET_CLIENT_DRIVER_VMNET_BRIDGED,
+.size = sizeof(VmnetState),
+.receive = vmnet_receive_common,
+.cleanup = vmnet_cleanup_common,
+};
+
+
 int net_init_vmnet_bridged(const Netdev *netdev, const char *name,
NetClientState *peer, Error **errp)
 {
-  error_setg(errp, "vmnet-bridged is not implemented yet");
-  return -1;
+NetClientState *nc = qemu_new_net_client(_vmnet_bridged_info,
+ peer, "vmnet-bridged", name);
+if (!validate_options(netdev, errp)) {
+return -1;
+}
+return vmnet_if_create(nc, build_if_desc(netdev), errp);
 }
-- 
2.34.1.vfs.0.0

[PATCH v19 2/7] net/vmnet: add vmnet backends to qapi/net

2022-03-15 Thread Vladislav Yaroshchuk

Create separate netdevs for each vmnet operating mode:
- vmnet-host
- vmnet-shared
- vmnet-bridged

Signed-off-by: Vladislav Yaroshchuk 
---
 net/clients.h   |  11 
 net/meson.build |   7 +++
 net/net.c   |  10 
 net/vmnet-bridged.m |  25 +
 net/vmnet-common.m  |  20 +++
 net/vmnet-host.c|  24 
 net/vmnet-shared.c  |  25 +
 net/vmnet_int.h |  25 +
 qapi/net.json   | 133 +++-
 9 files changed, 278 insertions(+), 2 deletions(-)
 create mode 100644 net/vmnet-bridged.m
 create mode 100644 net/vmnet-common.m
 create mode 100644 net/vmnet-host.c
 create mode 100644 net/vmnet-shared.c
 create mode 100644 net/vmnet_int.h

diff --git a/net/clients.h b/net/clients.h
index 92f9b59aed..c9157789f2 100644
--- a/net/clients.h
+++ b/net/clients.h
@@ -63,4 +63,15 @@ int net_init_vhost_user(const Netdev *netdev, const char 
*name,
 
 int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
 NetClientState *peer, Error **errp);
+#ifdef CONFIG_VMNET
+int net_init_vmnet_host(const Netdev *netdev, const char *name,
+  NetClientState *peer, Error **errp);
+
+int net_init_vmnet_shared(const Netdev *netdev, const char *name,
+  NetClientState *peer, Error **errp);
+
+int net_init_vmnet_bridged(const Netdev *netdev, const char *name,
+  NetClientState *peer, Error **errp);
+#endif /* CONFIG_VMNET */
+
 #endif /* QEMU_NET_CLIENTS_H */
diff --git a/net/meson.build b/net/meson.build
index 847bc2ac85..00a88c4951 100644
--- a/net/meson.build
+++ b/net/meson.build
@@ -42,4 +42,11 @@ softmmu_ss.add(when: 'CONFIG_POSIX', if_true: 
files(tap_posix))
 softmmu_ss.add(when: 'CONFIG_WIN32', if_true: files('tap-win32.c'))
 softmmu_ss.add(when: 'CONFIG_VHOST_NET_VDPA', if_true: files('vhost-vdpa.c'))
 
+vmnet_files = files(
+  'vmnet-common.m',
+  'vmnet-bridged.m',
+  'vmnet-host.c',
+  'vmnet-shared.c'
+)
+softmmu_ss.add(when: vmnet, if_true: vmnet_files)
 subdir('can')
diff --git a/net/net.c b/net/net.c
index f0d14dbfc1..1dbb64b935 100644
--- a/net/net.c
+++ b/net/net.c
@@ -1021,6 +1021,11 @@ static int (* const 
net_client_init_fun[NET_CLIENT_DRIVER__MAX])(
 #ifdef CONFIG_L2TPV3
 [NET_CLIENT_DRIVER_L2TPV3]= net_init_l2tpv3,
 #endif
+#ifdef CONFIG_VMNET
+[NET_CLIENT_DRIVER_VMNET_HOST] = net_init_vmnet_host,
+[NET_CLIENT_DRIVER_VMNET_SHARED] = net_init_vmnet_shared,
+[NET_CLIENT_DRIVER_VMNET_BRIDGED] = net_init_vmnet_bridged,
+#endif /* CONFIG_VMNET */
 };
 
 
@@ -1106,6 +,11 @@ void show_netdevs(void)
 #endif
 #ifdef CONFIG_VHOST_VDPA
 "vhost-vdpa",
+#endif
+#ifdef CONFIG_VMNET
+"vmnet-host",
+"vmnet-shared",
+"vmnet-bridged",
 #endif
 };
 
diff --git a/net/vmnet-bridged.m b/net/vmnet-bridged.m
new file mode 100644
index 00..91c1a2f2c7
--- /dev/null
+++ b/net/vmnet-bridged.m
@@ -0,0 +1,25 @@
+/*
+ * vmnet-bridged.m
+ *
+ * Copyright(c) 2022 Vladislav Yaroshchuk 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/qapi-types-net.h"
+#include "vmnet_int.h"
+#include "clients.h"
+#include "qemu/error-report.h"
+#include "qapi/error.h"
+
+#include 
+
+int net_init_vmnet_bridged(const Netdev *netdev, const char *name,
+   NetClientState *peer, Error **errp)
+{
+  error_setg(errp, "vmnet-bridged is not implemented yet");
+  return -1;
+}
diff --git a/net/vmnet-common.m b/net/vmnet-common.m
new file mode 100644
index 00..06326efb1c
--- /dev/null
+++ b/net/vmnet-common.m
@@ -0,0 +1,20 @@
+/*
+ * vmnet-common.m - network client wrapper for Apple vmnet.framework
+ *
+ * Copyright(c) 2022 Vladislav Yaroshchuk 
+ * Copyright(c) 2021 Phillip Tennen 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/qapi-types-net.h"
+#include "vmnet_int.h"
+#include "clients.h"
+#include "qemu/error-report.h"
+#include "qapi/error.h"
+
+#include 
+
diff --git a/net/vmnet-host.c b/net/vmnet-host.c
new file mode 100644
index 00..a461d507c5
--- /dev/null
+++ b/net/vmnet-host.c
@@ -0,0 +1,24 @@
+/*
+ * vmnet-host.c
+ *
+ * Copyright(c) 2022 Vladislav Yaroshchuk 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/qapi-types-net.h"
+#include "vmnet_int.h"
+#include "clients.h"
+#include "qemu/error-report.h"
+#include "qapi/error.h"
+
+#include 
+
+int net_init_vmnet_host(const Netdev *netdev, const char *name,
+NetClientState *peer, Error **errp) {
+  error_setg(errp, "vmnet-host is not implemented yet");
+  return -1;

[PATCH v19 6/7] net/vmnet: update qemu-options.hx

2022-03-15 Thread Vladislav Yaroshchuk

Signed-off-by: Vladislav Yaroshchuk 
---
 qemu-options.hx | 25 +
 1 file changed, 25 insertions(+)

diff --git a/qemu-options.hx b/qemu-options.hx
index 5ce0ada75e..ea00d0eeb6 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -2743,6 +2743,25 @@ DEF("netdev", HAS_ARG, QEMU_OPTION_netdev,
 #ifdef __linux__
 "-netdev vhost-vdpa,id=str,vhostdev=/path/to/dev\n"
 "configure a vhost-vdpa network,Establish a vhost-vdpa 
netdev\n"
+#endif
+#ifdef CONFIG_VMNET
+"-netdev vmnet-host,id=str[,isolated=on|off][,net-uuid=uuid]\n"
+" [,start-address=addr,end-address=addr,subnet-mask=mask]\n"
+"configure a vmnet network backend in host mode with ID 
'str',\n"
+"isolate this interface from others with 'isolated',\n"
+"configure the address range and choose a subnet mask,\n"
+"specify network UUID 'uuid' to disable DHCP and interact 
with\n"
+"vmnet-host interfaces within this isolated network\n"
+"-netdev vmnet-shared,id=str[,isolated=on|off][,nat66-prefix=addr]\n"
+" [,start-address=addr,end-address=addr,subnet-mask=mask]\n"
+"configure a vmnet network backend in shared mode with ID 
'str',\n"
+"configure the address range and choose a subnet mask,\n"
+"set IPv6 ULA prefix (of length 64) to use for internal 
network,\n"
+"isolate this interface from others with 'isolated'\n"
+"-netdev vmnet-bridged,id=str,ifname=name[,isolated=on|off]\n"
+"configure a vmnet network backend in bridged mode with ID 
'str',\n"
+"use 'ifname=name' to select a physical network interface 
to be bridged,\n"
+"isolate this interface from others with 'isolated'\n"
 #endif
 "-netdev hubport,id=str,hubid=n[,netdev=nd]\n"
 "configure a hub port on the hub with ID 'n'\n", 
QEMU_ARCH_ALL)
@@ -2762,6 +2781,9 @@ DEF("nic", HAS_ARG, QEMU_OPTION_nic,
 #endif
 #ifdef CONFIG_POSIX
 "vhost-user|"
+#endif
+#ifdef CONFIG_VMNET
+"vmnet-host|vmnet-shared|vmnet-bridged|"
 #endif
 "socket][,option][,...][mac=macaddr]\n"
 "initialize an on-board / default host NIC (using MAC 
address\n"
@@ -2784,6 +2806,9 @@ DEF("net", HAS_ARG, QEMU_OPTION_net,
 #endif
 #ifdef CONFIG_NETMAP
 "netmap|"
+#endif
+#ifdef CONFIG_VMNET
+"vmnet-host|vmnet-shared|vmnet-bridged|"
 #endif
 "socket][,option][,option][,...]\n"
 "old way to initialize a host network interface\n"
-- 
2.34.1.vfs.0.0

[PATCH v19 4/7] net/vmnet: implement host mode (vmnet-host)

2022-03-15 Thread Vladislav Yaroshchuk

Signed-off-by: Vladislav Yaroshchuk 
---
 net/vmnet-host.c | 110 ---
 1 file changed, 104 insertions(+), 6 deletions(-)

diff --git a/net/vmnet-host.c b/net/vmnet-host.c
index a461d507c5..8f7a638967 100644
--- a/net/vmnet-host.c
+++ b/net/vmnet-host.c
@@ -9,16 +9,114 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/uuid.h"
 #include "qapi/qapi-types-net.h"
-#include "vmnet_int.h"
-#include "clients.h"
-#include "qemu/error-report.h"
 #include "qapi/error.h"
+#include "clients.h"
+#include "vmnet_int.h"
 
 #include 
 
+
+static bool validate_options(const Netdev *netdev, Error **errp)
+{
+const NetdevVmnetHostOptions *options = &(netdev->u.vmnet_host);
+QemuUUID uuid;
+
+#if defined(MAC_OS_VERSION_11_0) && \
+MAC_OS_X_VERSION_MIN_REQUIRED >= MAC_OS_VERSION_11_0
+
+if (options->has_net_uuid &&
+qemu_uuid_parse(options->net_uuid, ) < 0) {
+error_setg(errp, "Invalid UUID provided in 'net-uuid'");
+return false;
+}
+#else
+if (options->has_isolated) {
+error_setg(errp,
+   "vmnet-host.isolated feature is "
+   "unavailable: outdated vmnet.framework API");
+return false;
+}
+
+if (options->has_net_uuid) {
+error_setg(errp,
+   "vmnet-host.net-uuid feature is "
+   "unavailable: outdated vmnet.framework API");
+return false;
+}
+#endif
+
+if ((options->has_start_address ||
+ options->has_end_address ||
+ options->has_subnet_mask) &&
+!(options->has_start_address &&
+  options->has_end_address &&
+  options->has_subnet_mask)) {
+error_setg(errp,
+   "'start-address', 'end-address', 'subnet-mask' "
+   "should be provided together");
+return false;
+}
+
+return true;
+}
+
+static xpc_object_t build_if_desc(const Netdev *netdev,
+  NetClientState *nc)
+{
+const NetdevVmnetHostOptions *options = &(netdev->u.vmnet_host);
+xpc_object_t if_desc = xpc_dictionary_create(NULL, NULL, 0);
+
+xpc_dictionary_set_uint64(if_desc,
+  vmnet_operation_mode_key,
+  VMNET_HOST_MODE);
+
+#if defined(MAC_OS_VERSION_11_0) && \
+MAC_OS_X_VERSION_MIN_REQUIRED >= MAC_OS_VERSION_11_0
+
+xpc_dictionary_set_bool(if_desc,
+vmnet_enable_isolation_key,
+options->isolated);
+
+QemuUUID network_uuid;
+if (options->has_net_uuid) {
+qemu_uuid_parse(options->net_uuid, _uuid);
+xpc_dictionary_set_uuid(if_desc,
+vmnet_network_identifier_key,
+network_uuid.data);
+}
+#endif
+
+if (options->has_start_address) {
+xpc_dictionary_set_string(if_desc,
+  vmnet_start_address_key,
+  options->start_address);
+xpc_dictionary_set_string(if_desc,
+  vmnet_end_address_key,
+  options->end_address);
+xpc_dictionary_set_string(if_desc,
+  vmnet_subnet_mask_key,
+  options->subnet_mask);
+}
+
+return if_desc;
+}
+
+static NetClientInfo net_vmnet_host_info = {
+.type = NET_CLIENT_DRIVER_VMNET_HOST,
+.size = sizeof(VmnetState),
+.receive = vmnet_receive_common,
+.cleanup = vmnet_cleanup_common,
+};
+
 int net_init_vmnet_host(const Netdev *netdev, const char *name,
-NetClientState *peer, Error **errp) {
-  error_setg(errp, "vmnet-host is not implemented yet");
-  return -1;
+NetClientState *peer, Error **errp)
+{
+NetClientState *nc = qemu_new_net_client(_vmnet_host_info,
+ peer, "vmnet-host", name);
+if (!validate_options(netdev, errp)) {
+return -1;
+}
+return vmnet_if_create(nc, build_if_desc(netdev, nc), errp);
 }
-- 
2.34.1.vfs.0.0

[PATCH v19 3/7] net/vmnet: implement shared mode (vmnet-shared)

2022-03-15 Thread Vladislav Yaroshchuk

Interaction with vmnet.framework in different modes
differs only on configuration stage, so we can create
common `send`, `receive`, etc. procedures and reuse them.

Signed-off-by: Phillip Tennen 
Signed-off-by: Vladislav Yaroshchuk 
---
 net/vmnet-common.m | 358 +
 net/vmnet-shared.c |  90 +++-
 net/vmnet_int.h|  40 -
 3 files changed, 483 insertions(+), 5 deletions(-)

diff --git a/net/vmnet-common.m b/net/vmnet-common.m
index 06326efb1c..2cb60b9ddd 100644
--- a/net/vmnet-common.m
+++ b/net/vmnet-common.m
@@ -10,6 +10,8 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/main-loop.h"
+#include "qemu/log.h"
 #include "qapi/qapi-types-net.h"
 #include "vmnet_int.h"
 #include "clients.h"
@@ -17,4 +19,360 @@
 #include "qapi/error.h"
 
 #include 
+#include 
 
+
+static void vmnet_send_completed(NetClientState *nc, ssize_t len);
+
+
+const char *vmnet_status_map_str(vmnet_return_t status)
+{
+switch (status) {
+case VMNET_SUCCESS:
+return "success";
+case VMNET_FAILURE:
+return "general failure (possibly not enough privileges)";
+case VMNET_MEM_FAILURE:
+return "memory allocation failure";
+case VMNET_INVALID_ARGUMENT:
+return "invalid argument specified";
+case VMNET_SETUP_INCOMPLETE:
+return "interface setup is not complete";
+case VMNET_INVALID_ACCESS:
+return "invalid access, permission denied";
+case VMNET_PACKET_TOO_BIG:
+return "packet size is larger than MTU";
+case VMNET_BUFFER_EXHAUSTED:
+return "buffers exhausted in kernel";
+case VMNET_TOO_MANY_PACKETS:
+return "packet count exceeds limit";
+#if defined(MAC_OS_VERSION_11_0) && \
+MAC_OS_X_VERSION_MIN_REQUIRED >= MAC_OS_VERSION_11_0
+case VMNET_SHARING_SERVICE_BUSY:
+return "conflict, sharing service is in use";
+#endif
+default:
+return "unknown vmnet error";
+}
+}
+
+
+/**
+ * Write packets from QEMU to vmnet interface.
+ *
+ * vmnet.framework supports iov, but writing more than
+ * one iov into vmnet interface fails with
+ * 'VMNET_INVALID_ARGUMENT'. Collecting provided iovs into
+ * one and passing it to vmnet works fine. That's the
+ * reason why receive_iov() left unimplemented. But it still
+ * works with good performance having .receive() only.
+ */
+ssize_t vmnet_receive_common(NetClientState *nc,
+ const uint8_t *buf,
+ size_t size)
+{
+VmnetState *s = DO_UPCAST(VmnetState, nc, nc);
+struct vmpktdesc packet;
+struct iovec iov;
+int pkt_cnt;
+vmnet_return_t if_status;
+
+if (size > s->max_packet_size) {
+warn_report("vmnet: packet is too big, %zu > %" PRIu64,
+packet.vm_pkt_size,
+s->max_packet_size);
+return -1;
+}
+
+iov.iov_base = (char *) buf;
+iov.iov_len = size;
+
+packet.vm_pkt_iovcnt = 1;
+packet.vm_flags = 0;
+packet.vm_pkt_size = size;
+packet.vm_pkt_iov = 
+pkt_cnt = 1;
+
+if_status = vmnet_write(s->vmnet_if, , _cnt);
+if (if_status != VMNET_SUCCESS) {
+error_report("vmnet: write error: %s\n",
+ vmnet_status_map_str(if_status));
+return -1;
+}
+
+if (pkt_cnt) {
+return size;
+}
+return 0;
+}
+
+
+/**
+ * Read packets from vmnet interface and write them
+ * to temporary buffers in VmnetState.
+ *
+ * Returns read packets number (may be 0) on success,
+ * -1 on error
+ */
+static int vmnet_read_packets(VmnetState *s)
+{
+assert(s->packets_send_current_pos == s->packets_send_end_pos);
+
+struct vmpktdesc *packets = s->packets_buf;
+vmnet_return_t status;
+int i;
+
+/* Read as many packets as present */
+s->packets_send_current_pos = 0;
+s->packets_send_end_pos = VMNET_PACKETS_LIMIT;
+for (i = 0; i < s->packets_send_end_pos; ++i) {
+packets[i].vm_pkt_size = s->max_packet_size;
+packets[i].vm_pkt_iovcnt = 1;
+packets[i].vm_flags = 0;
+}
+
+status = vmnet_read(s->vmnet_if, packets, >packets_send_end_pos);
+if (status != VMNET_SUCCESS) {
+error_printf("vmnet: read failed: %s\n",
+ vmnet_status_map_str(status));
+s->packets_send_current_pos = 0;
+s->packets_send_end_pos = 0;
+return -1;
+}
+return s->packets_send_end_pos;
+}
+
+
+/**
+ * Write packets from temporary buffers in VmnetState
+ * to QEMU.
+ */
+static void vmnet_write_packets_to_qemu(VmnetState *s)
+{
+while (s->packets_send_current_pos < s->packets_send_end_pos) {
+ssize_t size = qemu_send_packet_async(>nc,
+  
s->iov_buf[s->packets_send_current_pos].iov_base,
+  
s->packets_buf[s->packets_send_current_pos].vm_pkt_size,
+  vmnet_send_completed);
+
+if (size == 0) {
+/* QEMU is not ready to consume

[PATCH v19 1/7] net/vmnet: add vmnet dependency and customizable option

2022-03-15 Thread Vladislav Yaroshchuk

vmnet.framework dependency is added with 'vmnet' option
to enable or disable it. Default value is 'auto'.

used vmnet features are available since macOS 11.0,
but new backend can be built and work properly with
subset of them on 10.15 too.

Signed-off-by: Vladislav Yaroshchuk 
---
 meson.build   | 16 +++-
 meson_options.txt |  2 ++
 scripts/meson-buildoptions.sh |  1 +
 3 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/meson.build b/meson.build
index 2d6601467f..806f3869f9 100644
--- a/meson.build
+++ b/meson.build
@@ -522,6 +522,18 @@ if cocoa.found() and get_option('gtk').enabled()
   error('Cocoa and GTK+ cannot be enabled at the same time')
 endif
 
+vmnet = dependency('appleframeworks', modules: 'vmnet', required: 
get_option('vmnet'))
+if vmnet.found() and not cc.has_header_symbol('vmnet/vmnet.h',
+  'VMNET_BRIDGED_MODE',
+  dependencies: vmnet)
+  vmnet = not_found
+  if get_option('vmnet').enabled()
+error('vmnet.framework API is outdated')
+  else
+warning('vmnet.framework API is outdated, disabling')
+  endif
+endif
+
 seccomp = not_found
 if not get_option('seccomp').auto() or have_system or have_tools
   seccomp = dependency('libseccomp', version: '>=2.3.0',
@@ -1550,6 +1562,7 @@ config_host_data.set('CONFIG_SNAPPY', snappy.found())
 config_host_data.set('CONFIG_TPM', have_tpm)
 config_host_data.set('CONFIG_USB_LIBUSB', libusb.found())
 config_host_data.set('CONFIG_VDE', vde.found())
+config_host_data.set('CONFIG_VMNET', vmnet.found())
 config_host_data.set('CONFIG_VHOST_USER_BLK_SERVER', 
have_vhost_user_blk_server)
 config_host_data.set('CONFIG_VNC', vnc.found())
 config_host_data.set('CONFIG_VNC_JPEG', jpeg.found())
@@ -3588,7 +3601,8 @@ summary(summary_info, bool_yn: true, section: 'Crypto')
 # Libraries
 summary_info = {}
 if targetos == 'darwin'
-  summary_info += {'Cocoa support':   cocoa}
+  summary_info += {'Cocoa support':   cocoa}
+  summary_info += {'vmnet.framework support': vmnet}
 endif
 summary_info += {'SDL support':   sdl}
 summary_info += {'SDL image support': sdl_image}
diff --git a/meson_options.txt b/meson_options.txt
index 52b11cead4..d2c0b6b412 100644
--- a/meson_options.txt
+++ b/meson_options.txt
@@ -175,6 +175,8 @@ option('netmap', type : 'feature', value : 'auto',
description: 'netmap network backend support')
 option('vde', type : 'feature', value : 'auto',
description: 'vde network backend support')
+option('vmnet', type : 'feature', value : 'auto',
+   description: 'vmnet.framework network backend support')
 option('virglrenderer', type : 'feature', value : 'auto',
description: 'virgl rendering support')
 option('vnc', type : 'feature', value : 'auto',
diff --git a/scripts/meson-buildoptions.sh b/scripts/meson-buildoptions.sh
index 9ee684ef03..30946f3798 100644
--- a/scripts/meson-buildoptions.sh
+++ b/scripts/meson-buildoptions.sh
@@ -116,6 +116,7 @@ meson_options_help() {
   printf "%s\n" '  usb-redir   libusbredir support'
   printf "%s\n" '  vde vde network backend support'
   printf "%s\n" '  vdi vdi image format support'
+  printf "%s\n" '  vmnet   vmnet.framework network backend support'
   printf "%s\n" '  vhost-user-blk-server'
   printf "%s\n" '  build vhost-user-blk server'
   printf "%s\n" '  virglrenderer   virgl rendering support'
-- 
2.34.1.vfs.0.0

[PATCH v19 0/7] Add vmnet.framework based network backend

2022-03-15 Thread Vladislav Yaroshchuk

macOS provides networking API for VMs called 'vmnet.framework':
https://developer.apple.com/documentation/vmnet

We can provide its support as the new QEMU network backends which
represent three different vmnet.framework interface usage modes:

  * `vmnet-shared`:
allows the guest to communicate with other guests in shared mode and
also with external network (Internet) via NAT. Has (macOS-provided)
DHCP server; subnet mask and IP range can be configured;

  * `vmnet-host`:
allows the guest to communicate with other guests in host mode.
By default has enabled DHCP as `vmnet-shared`, but providing
network unique id (uuid) can make `vmnet-host` interfaces isolated
from each other and also disables DHCP.

  * `vmnet-bridged`:
bridges the guest with a physical network interface.

This backends cannot work on macOS Catalina 10.15 cause we use
vmnet.framework API provided only with macOS 11 and newer. Seems
that it is not a problem, because QEMU guarantees to work on two most
recent versions of macOS which now are Big Sur (11) and Monterey (12).

Also, we have one inconvenient restriction: vmnet.framework interfaces
can create only privileged user:
`$ sudo qemu-system-x86_64 -nic vmnet-shared`

Attempt of `vmnet-*` netdev creation being unprivileged user fails with
vmnet's 'general failure'.

This happens because vmnet.framework requires `com.apple.vm.networking`
entitlement which is: "restricted to developers of virtualization software.
To request this entitlement, contact your Apple representative." as Apple
documentation says:
https://developer.apple.com/documentation/bundleresources/entitlements/com_apple_vm_networking

One more note: we still have quite useful but not supported
'vmnet.framework' features as creating port forwarding rules, IPv6
NAT prefix specifying and so on.

Nevertheless, new backends work fine and tested within `qemu-system-x86-64`
on macOS Bir Sur 11.5.2 host with such nic models:
  * e1000-82545em
  * virtio-net-pci
  * vmxnet3

The guests were:
  * macOS 10.15.7
  * Ubuntu Bionic (server cloudimg)


This series partially reuses patches by Phillip Tennen:
https://patchew.org/QEMU/20210218134947.1860-1-phillip.en...@gmail.com/
So I included them signed-off line into one of the commit messages and
also here.

v1 -> v2:
 Since v1 minor typos were fixed, patches rebased onto latest master,
 redundant changes removed (small commits squashed)
v2 -> v3:
 - QAPI style fixes
 - Typos fixes in comments
 - `#include`'s updated to be in sync with recent master
v3 -> v4:
 - Support vmnet interfaces isolation feature
 - Support vmnet-host network uuid setting feature
 - Refactored sources a bit
v4 -> v5:
 - Missed 6.2 boat, now 7.0 candidate
 - Fix qapi netdev descriptions and styles
   (@subnetmask -> @subnet-mask)
 - Support vmnet-shared IPv6 prefix setting feature
v5 -> v6
 - provide detailed commit messages for commits of
   many changes
 - rename properties @dhcpstart and @dhcpend to
   @start-address and @end-address
 - improve qapi documentation about isolation
   features (@isolated, @net-uuid)
v6 -> v7:
 - update MAINTAINERS list
v7 -> v8
 - QAPI code style fixes
v8 -> v9
 - Fix building on Linux: add missing qapi
   `'if': 'CONFIG_VMNET'` statement to Netdev union
v9 -> v10
 - Disable vmnet feature for macOS < 11.0: add
   vmnet.framework API probe into meson.build.
   This fixes QEMU building on macOS < 11.0:
   https://patchew.org/QEMU/20220110034000.20221-1-jasow...@redhat.com/
v10 -> v11
 - Enable vmnet for macOS 10.15 with subset of available
   features. Disable vmnet for macOS < 10.15.
 - Fix typos
v11 -> v12
 - use more general macOS version check with
   MAC_OS_VERSION_11_0 instead of manual
   definition creating.
v12 -> v13
 - fix incorrect macOS version bound while
   'feature available since 11.0' check.
   Use MAC_OS_X_VERSION_MIN_REQUIRED instead of
   MAC_OS_X_VERSION_MAX_ALLOWED.
v13 -> v14
 - fix memory leaks
 - get rid of direct global mutex taking while resending
   packets from vmnet to QEMU, schedule a bottom half
   instead (it can be a thing to discuss, maybe exists a
   better way to perform the packets transfer)
 - update hmp commands
 - a bit refactor everything
 - change the email from which patches are being
   submitted, same to email in MAINTAINERS list
 - P.S. sorry for so late reply
v14 -> v15
 - restore --enable-vdi and --disable-vdi
   mistakenly dropped in previous series
v15 -> v16
 - common: complete sending pending packets when
   QEMU is ready, refactor, fix memory leaks
 - QAPI: change version to 7.1 (cause 7.0 feature freeze
   happened). This is the only change in QAPI, Markus Armbruster,
   please confirm if you can (decided to drop your Acked-by due
   to this change)
 - vmnet-bridged: extend "supported ifnames" message buffer len
 - fix behaviour dependence on debug (add "return -1" after
   assert_not_reached)
 - use PRIu64 for proper printing
 - NOTE: This version of patch series may be one the last
   I submit

Re: [PATCH v18 3/7] net/vmnet: implement shared mode (vmnet-shared)

2022-03-15 Thread Vladislav Yaroshchuk

On Tue, Mar 15, 2022 at 11:07 PM Akihiko Odaki 
wrote:

> On 2022/03/16 4:59, Vladislav Yaroshchuk wrote:
> > Interaction with vmnet.framework in different modes
> > differs only on configuration stage, so we can create
> > common `send`, `receive`, etc. procedures and reuse them.
> >
> > Signed-off-by: Phillip Tennen 
> > Signed-off-by: Vladislav Yaroshchuk 
> > ---
> >   net/vmnet-common.m | 368 +
> >   net/vmnet-shared.c |  90 ++-
> >   net/vmnet_int.h|  40 -
> >   3 files changed, 493 insertions(+), 5 deletions(-)
> >
> > diff --git a/net/vmnet-common.m b/net/vmnet-common.m
> > index 06326efb1c..b9dac7b241 100644
> > --- a/net/vmnet-common.m
> > +++ b/net/vmnet-common.m
> > @@ -10,6 +10,8 @@
> >*/
> >
> >   #include "qemu/osdep.h"
> > +#include "qemu/main-loop.h"
> > +#include "qemu/log.h"
> >   #include "qapi/qapi-types-net.h"
> >   #include "vmnet_int.h"
> >   #include "clients.h"
> > @@ -17,4 +19,370 @@
> >   #include "qapi/error.h"
> >
> >   #include 
> > +#include 
> >
> > +
>

[...]


> > +/**
> > + * Bottom half callback that transfers packets from vmnet interface
> > + * to QEMU.
> > + *
> > + * The process of transferring packets is three-staged:
> > + * 1. Handle vmnet event;
> > + * 2. Read packets from vmnet interface into temporary buffer;
> > + * 3. Write packets from temporary buffer to QEMU.
> > + *
> > + * QEMU may suspend this process on the last stage, returning 0 from
> > + * qemu_send_packet_async function. If this happens, we should
> > + * respectfully wait until it is ready to consume more packets,
> > + * write left ones in temporary buffer and only after this
> > + * continue reading more packets from vmnet interface.
> > + *
> > + * Packets to be transferred are stored into packets_buf,
> > + * in the window (packets_send_current_pos..packets_send_end_pos]
> > + * excluding current_pos, including end_pos.
>
> I wonder why you changed the window from [packets_send_current_pos,
> packets_send_end_pos). It is an unconventional way to represent such
> kind of window, requires signed integers and calculating
> packets_send_current_pos + 1 before operating with the first item of the
> window.
>
>
Did this mistakenly while removing send_enabled :facepalm:
Sorry for this.

Submitting v19 with this fixed.

Best Regards,
Vladislav Yaroshchuk



> Regards,
> Akihiko Odaki
>
> + *
> > + * Thus, if QEMU is not ready, buffer is not read and
> > + * packets_send_current_pos < packets_send_end_pos.
> > + */
> > +static void vmnet_send_bh(void *opaque)
> > +{
> > +NetClientState *nc = (NetClientState *) opaque;
> > +VmnetState *s = DO_UPCAST(VmnetState, nc, nc);
> > +
> > +/*
> > + * Do nothing if QEMU is not ready - wait
> > + * for completion callback invocation
> > + */
> > +if (s->packets_send_current_pos < s->packets_send_end_pos) {
> > +return;
> > +}
> > +
> > +/* Read packets from vmnet interface */
> > +if (vmnet_read_packets(s) > 0) {
> > +/* Send them to QEMU */
> > +vmnet_write_packets_to_qemu(s);
> > +}
> > +}
> > +
> > +
> > +/**
> > + * Completion callback to be invoked by QEMU when it becomes
> > + * ready to consume more packets.
> > + */
> > +static void vmnet_send_completed(NetClientState *nc, ssize_t len)
> > +{
> > +VmnetState *s = DO_UPCAST(VmnetState, nc, nc);
> > +
> > +/* Callback is invoked eq queued packet is sent */
> > +++s->packets_send_current_pos;
> > +
> > +/* Complete sending packets left in VmnetState buffers */
> > +vmnet_write_packets_to_qemu(s);
> > +
> > +/* And read new ones from vmnet if VmnetState buffer is ready */
> > +if (s->packets_send_current_pos < s->packets_send_end_pos) {
> > +qemu_bh_schedule(s->send_bh);
> > +}
> > +}
> > +
> > +
> > +static void vmnet_bufs_init(VmnetState *s)
> > +{
> > +struct vmpktdesc *packets = s->packets_buf;
> > +struct iovec *iov = s->iov_buf;
> > +int i;
> > +
> > +for (i = 0; i < VMNET_PACKETS_LIMIT; ++i) {
> > +iov[i].iov_len = s->max_packet_size;
> > +iov[i].iov_base = g_malloc0(iov[i].iov_len);
> > +packets[i].vm_pkt_iov = iov + i;
> > +}
> > +}
> > +
> > +
> > +int vmnet_if_create(NetClientState *nc,
> > +xpc_object_t if_desc,
> > +Error **errp)
> > +{
> > +VmnetState *s = DO_UPCAST(VmnetState, nc, nc);
> > +dispatch_semaphore_t if_created_sem = dispatch_semaphore_create(0);
> > +__block vmnet_return_t if_status;
> > +
> > +s->if_queue = dispatch_queue_create(
> > +"org.qemu.vmnet.if_queue",
> > +DISPATCH_QUEUE_SERIAL
> > +);
> > +
> > +xpc_dictionary_set_bool(
> > +if_desc,
> > +vmnet_allocate_mac_address_key,
> > +false
> > +);
> > +
> > +#ifdef DEBUG
> > +qemu_log("vmnet.start.interface_desc:\n");
> > +xpc_dictionary_apply(if_desc,
> > + ^bool(const char

Re: [PATCH v18 3/7] net/vmnet: implement shared mode (vmnet-shared)

2022-03-15 Thread Akihiko Odaki


On 2022/03/16 4:59, Vladislav Yaroshchuk wrote:

Interaction with vmnet.framework in different modes
differs only on configuration stage, so we can create
common `send`, `receive`, etc. procedures and reuse them.

Signed-off-by: Phillip Tennen 
Signed-off-by: Vladislav Yaroshchuk 
---
  net/vmnet-common.m | 368 +
  net/vmnet-shared.c |  90 ++-
  net/vmnet_int.h|  40 -
  3 files changed, 493 insertions(+), 5 deletions(-)

diff --git a/net/vmnet-common.m b/net/vmnet-common.m
index 06326efb1c..b9dac7b241 100644
--- a/net/vmnet-common.m
+++ b/net/vmnet-common.m
@@ -10,6 +10,8 @@
   */
  
  #include "qemu/osdep.h"

+#include "qemu/main-loop.h"
+#include "qemu/log.h"
  #include "qapi/qapi-types-net.h"
  #include "vmnet_int.h"
  #include "clients.h"
@@ -17,4 +19,370 @@
  #include "qapi/error.h"
  
  #include 

+#include 
  
+

+static void vmnet_send_completed(NetClientState *nc, ssize_t len);
+
+
+const char *vmnet_status_map_str(vmnet_return_t status)
+{
+switch (status) {
+case VMNET_SUCCESS:
+return "success";
+case VMNET_FAILURE:
+return "general failure (possibly not enough privileges)";
+case VMNET_MEM_FAILURE:
+return "memory allocation failure";
+case VMNET_INVALID_ARGUMENT:
+return "invalid argument specified";
+case VMNET_SETUP_INCOMPLETE:
+return "interface setup is not complete";
+case VMNET_INVALID_ACCESS:
+return "invalid access, permission denied";
+case VMNET_PACKET_TOO_BIG:
+return "packet size is larger than MTU";
+case VMNET_BUFFER_EXHAUSTED:
+return "buffers exhausted in kernel";
+case VMNET_TOO_MANY_PACKETS:
+return "packet count exceeds limit";
+#if defined(MAC_OS_VERSION_11_0) && \
+MAC_OS_X_VERSION_MIN_REQUIRED >= MAC_OS_VERSION_11_0
+case VMNET_SHARING_SERVICE_BUSY:
+return "conflict, sharing service is in use";
+#endif
+default:
+return "unknown vmnet error";
+}
+}
+
+
+/**
+ * Write packets from QEMU to vmnet interface.
+ *
+ * vmnet.framework supports iov, but writing more than
+ * one iov into vmnet interface fails with
+ * 'VMNET_INVALID_ARGUMENT'. Collecting provided iovs into
+ * one and passing it to vmnet works fine. That's the
+ * reason why receive_iov() left unimplemented. But it still
+ * works with good performance having .receive() only.
+ */
+ssize_t vmnet_receive_common(NetClientState *nc,
+ const uint8_t *buf,
+ size_t size)
+{
+VmnetState *s = DO_UPCAST(VmnetState, nc, nc);
+struct vmpktdesc packet;
+struct iovec iov;
+int pkt_cnt;
+vmnet_return_t if_status;
+
+if (size > s->max_packet_size) {
+warn_report("vmnet: packet is too big, %zu > %" PRIu64,
+packet.vm_pkt_size,
+s->max_packet_size);
+return -1;
+}
+
+iov.iov_base = (char *) buf;
+iov.iov_len = size;
+
+packet.vm_pkt_iovcnt = 1;
+packet.vm_flags = 0;
+packet.vm_pkt_size = size;
+packet.vm_pkt_iov = 
+pkt_cnt = 1;
+
+if_status = vmnet_write(s->vmnet_if, , _cnt);
+if (if_status != VMNET_SUCCESS) {
+error_report("vmnet: write error: %s\n",
+ vmnet_status_map_str(if_status));
+return -1;
+}
+
+if (pkt_cnt) {
+return size;
+}
+return 0;
+}
+
+
+/**
+ * Read packets from vmnet interface and write them
+ * to temporary buffers in VmnetState.
+ *
+ * Returns read packets number (may be 0) on success,
+ * -1 on error
+ */
+static int vmnet_read_packets(VmnetState *s)
+{
+assert(s->packets_send_current_pos == s->packets_send_end_pos);
+
+struct vmpktdesc *packets = s->packets_buf;
+vmnet_return_t status;
+int pkt_cnt;
+int i;
+
+/* Read as many packets as present */
+pkt_cnt = VMNET_PACKETS_LIMIT;
+for (i = 0; i < pkt_cnt; ++i) {
+packets[i].vm_pkt_size = s->max_packet_size;
+packets[i].vm_pkt_iovcnt = 1;
+packets[i].vm_flags = 0;
+}
+
+status = vmnet_read(s->vmnet_if, packets, _cnt);
+if (status != VMNET_SUCCESS) {
+error_printf("vmnet: read failed: %s\n",
+ vmnet_status_map_str(status));
+s->packets_send_current_pos = -1;
+s->packets_send_end_pos = -1;
+return -1;
+}
+
+/*
+ * Adjust pointers: packets to be sent
+ * lay in (packets_send_current_pos..packets_send_end_pos]
+ * - excluding current_pos, including end_pos.
+ */
+s->packets_send_current_pos = -1;
+s->packets_send_end_pos = pkt_cnt - 1;
+
+return pkt_cnt;
+}
+
+
+/**
+ * Write packets from temporary buffers in VmnetState
+ * to QEMU.
+ */
+static void vmnet_write_packets_to_qemu(VmnetState *s)
+{
+while (s->packets_send_current_pos < s->packets_send_end_pos) {
+int next_packet_pos = s->packets_send_current_pos + 1;
+ssize_t size = qemu_send_packet_async(>nc,
+

[PATCH v18 5/7] net/vmnet: implement bridged mode (vmnet-bridged)

2022-03-15 Thread Vladislav Yaroshchuk

Signed-off-by: Vladislav Yaroshchuk 
---
 net/vmnet-bridged.m | 128 ++--
 1 file changed, 123 insertions(+), 5 deletions(-)

diff --git a/net/vmnet-bridged.m b/net/vmnet-bridged.m
index 91c1a2f2c7..5936c87718 100644
--- a/net/vmnet-bridged.m
+++ b/net/vmnet-bridged.m
@@ -10,16 +10,134 @@
 
 #include "qemu/osdep.h"
 #include "qapi/qapi-types-net.h"
-#include "vmnet_int.h"
-#include "clients.h"
-#include "qemu/error-report.h"
 #include "qapi/error.h"
+#include "clients.h"
+#include "vmnet_int.h"
 
 #include 
 
+
+static bool validate_ifname(const char *ifname)
+{
+xpc_object_t shared_if_list = vmnet_copy_shared_interface_list();
+bool match = false;
+if (!xpc_array_get_count(shared_if_list)) {
+goto done;
+}
+
+match = !xpc_array_apply(
+shared_if_list,
+^bool(size_t index, xpc_object_t value) {
+return strcmp(xpc_string_get_string_ptr(value), ifname) != 0;
+});
+
+done:
+xpc_release(shared_if_list);
+return match;
+}
+
+
+static bool get_valid_ifnames(char *output_buf)
+{
+xpc_object_t shared_if_list = vmnet_copy_shared_interface_list();
+__block const char *ifname = NULL;
+__block int str_offset = 0;
+bool interfaces_available = true;
+
+if (!xpc_array_get_count(shared_if_list)) {
+interfaces_available = false;
+goto done;
+}
+
+xpc_array_apply(
+shared_if_list,
+^bool(size_t index, xpc_object_t value) {
+/* build list of strings like "en0 en1 en2 " */
+ifname = xpc_string_get_string_ptr(value);
+strcpy(output_buf + str_offset, ifname);
+strcpy(output_buf + str_offset + strlen(ifname), " ");
+str_offset += strlen(ifname) + 1;
+return true;
+});
+
+done:
+xpc_release(shared_if_list);
+return interfaces_available;
+}
+
+
+static bool validate_options(const Netdev *netdev, Error **errp)
+{
+const NetdevVmnetBridgedOptions *options = &(netdev->u.vmnet_bridged);
+char ifnames[1024];
+
+if (!validate_ifname(options->ifname)) {
+if (get_valid_ifnames(ifnames)) {
+error_setg(errp,
+   "unsupported ifname '%s', expected one of [ %s]",
+   options->ifname,
+   ifnames);
+return false;
+}
+error_setg(errp,
+   "unsupported ifname '%s', no supported "
+   "interfaces available",
+   options->ifname);
+return false;
+}
+
+#if !defined(MAC_OS_VERSION_11_0) || \
+MAC_OS_X_VERSION_MIN_REQUIRED < MAC_OS_VERSION_11_0
+if (options->has_isolated) {
+error_setg(errp,
+   "vmnet-bridged.isolated feature is "
+   "unavailable: outdated vmnet.framework API");
+return false;
+}
+#endif
+return true;
+}
+
+
+static xpc_object_t build_if_desc(const Netdev *netdev)
+{
+const NetdevVmnetBridgedOptions *options = &(netdev->u.vmnet_bridged);
+xpc_object_t if_desc = xpc_dictionary_create(NULL, NULL, 0);
+
+xpc_dictionary_set_uint64(if_desc,
+  vmnet_operation_mode_key,
+  VMNET_BRIDGED_MODE
+);
+
+xpc_dictionary_set_string(if_desc,
+  vmnet_shared_interface_name_key,
+  options->ifname);
+
+#if defined(MAC_OS_VERSION_11_0) && \
+MAC_OS_X_VERSION_MIN_REQUIRED >= MAC_OS_VERSION_11_0
+xpc_dictionary_set_bool(if_desc,
+vmnet_enable_isolation_key,
+options->isolated);
+#endif
+return if_desc;
+}
+
+
+static NetClientInfo net_vmnet_bridged_info = {
+.type = NET_CLIENT_DRIVER_VMNET_BRIDGED,
+.size = sizeof(VmnetState),
+.receive = vmnet_receive_common,
+.cleanup = vmnet_cleanup_common,
+};
+
+
 int net_init_vmnet_bridged(const Netdev *netdev, const char *name,
NetClientState *peer, Error **errp)
 {
-  error_setg(errp, "vmnet-bridged is not implemented yet");
-  return -1;
+NetClientState *nc = qemu_new_net_client(_vmnet_bridged_info,
+ peer, "vmnet-bridged", name);
+if (!validate_options(netdev, errp)) {
+return -1;
+}
+return vmnet_if_create(nc, build_if_desc(netdev), errp);
 }
-- 
2.34.1.vfs.0.0

[PATCH v18 4/7] net/vmnet: implement host mode (vmnet-host)

2022-03-15 Thread Vladislav Yaroshchuk

Signed-off-by: Vladislav Yaroshchuk 
---
 net/vmnet-host.c | 110 ---
 1 file changed, 104 insertions(+), 6 deletions(-)

diff --git a/net/vmnet-host.c b/net/vmnet-host.c
index a461d507c5..8f7a638967 100644
--- a/net/vmnet-host.c
+++ b/net/vmnet-host.c
@@ -9,16 +9,114 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/uuid.h"
 #include "qapi/qapi-types-net.h"
-#include "vmnet_int.h"
-#include "clients.h"
-#include "qemu/error-report.h"
 #include "qapi/error.h"
+#include "clients.h"
+#include "vmnet_int.h"
 
 #include 
 
+
+static bool validate_options(const Netdev *netdev, Error **errp)
+{
+const NetdevVmnetHostOptions *options = &(netdev->u.vmnet_host);
+QemuUUID uuid;
+
+#if defined(MAC_OS_VERSION_11_0) && \
+MAC_OS_X_VERSION_MIN_REQUIRED >= MAC_OS_VERSION_11_0
+
+if (options->has_net_uuid &&
+qemu_uuid_parse(options->net_uuid, ) < 0) {
+error_setg(errp, "Invalid UUID provided in 'net-uuid'");
+return false;
+}
+#else
+if (options->has_isolated) {
+error_setg(errp,
+   "vmnet-host.isolated feature is "
+   "unavailable: outdated vmnet.framework API");
+return false;
+}
+
+if (options->has_net_uuid) {
+error_setg(errp,
+   "vmnet-host.net-uuid feature is "
+   "unavailable: outdated vmnet.framework API");
+return false;
+}
+#endif
+
+if ((options->has_start_address ||
+ options->has_end_address ||
+ options->has_subnet_mask) &&
+!(options->has_start_address &&
+  options->has_end_address &&
+  options->has_subnet_mask)) {
+error_setg(errp,
+   "'start-address', 'end-address', 'subnet-mask' "
+   "should be provided together");
+return false;
+}
+
+return true;
+}
+
+static xpc_object_t build_if_desc(const Netdev *netdev,
+  NetClientState *nc)
+{
+const NetdevVmnetHostOptions *options = &(netdev->u.vmnet_host);
+xpc_object_t if_desc = xpc_dictionary_create(NULL, NULL, 0);
+
+xpc_dictionary_set_uint64(if_desc,
+  vmnet_operation_mode_key,
+  VMNET_HOST_MODE);
+
+#if defined(MAC_OS_VERSION_11_0) && \
+MAC_OS_X_VERSION_MIN_REQUIRED >= MAC_OS_VERSION_11_0
+
+xpc_dictionary_set_bool(if_desc,
+vmnet_enable_isolation_key,
+options->isolated);
+
+QemuUUID network_uuid;
+if (options->has_net_uuid) {
+qemu_uuid_parse(options->net_uuid, _uuid);
+xpc_dictionary_set_uuid(if_desc,
+vmnet_network_identifier_key,
+network_uuid.data);
+}
+#endif
+
+if (options->has_start_address) {
+xpc_dictionary_set_string(if_desc,
+  vmnet_start_address_key,
+  options->start_address);
+xpc_dictionary_set_string(if_desc,
+  vmnet_end_address_key,
+  options->end_address);
+xpc_dictionary_set_string(if_desc,
+  vmnet_subnet_mask_key,
+  options->subnet_mask);
+}
+
+return if_desc;
+}
+
+static NetClientInfo net_vmnet_host_info = {
+.type = NET_CLIENT_DRIVER_VMNET_HOST,
+.size = sizeof(VmnetState),
+.receive = vmnet_receive_common,
+.cleanup = vmnet_cleanup_common,
+};
+
 int net_init_vmnet_host(const Netdev *netdev, const char *name,
-NetClientState *peer, Error **errp) {
-  error_setg(errp, "vmnet-host is not implemented yet");
-  return -1;
+NetClientState *peer, Error **errp)
+{
+NetClientState *nc = qemu_new_net_client(_vmnet_host_info,
+ peer, "vmnet-host", name);
+if (!validate_options(netdev, errp)) {
+return -1;
+}
+return vmnet_if_create(nc, build_if_desc(netdev, nc), errp);
 }
-- 
2.34.1.vfs.0.0

[PATCH v18 3/7] net/vmnet: implement shared mode (vmnet-shared)

2022-03-15 Thread Vladislav Yaroshchuk

Interaction with vmnet.framework in different modes
differs only on configuration stage, so we can create
common `send`, `receive`, etc. procedures and reuse them.

Signed-off-by: Phillip Tennen 
Signed-off-by: Vladislav Yaroshchuk 
---
 net/vmnet-common.m | 368 +
 net/vmnet-shared.c |  90 ++-
 net/vmnet_int.h|  40 -
 3 files changed, 493 insertions(+), 5 deletions(-)

diff --git a/net/vmnet-common.m b/net/vmnet-common.m
index 06326efb1c..b9dac7b241 100644
--- a/net/vmnet-common.m
+++ b/net/vmnet-common.m
@@ -10,6 +10,8 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/main-loop.h"
+#include "qemu/log.h"
 #include "qapi/qapi-types-net.h"
 #include "vmnet_int.h"
 #include "clients.h"
@@ -17,4 +19,370 @@
 #include "qapi/error.h"
 
 #include 
+#include 
 
+
+static void vmnet_send_completed(NetClientState *nc, ssize_t len);
+
+
+const char *vmnet_status_map_str(vmnet_return_t status)
+{
+switch (status) {
+case VMNET_SUCCESS:
+return "success";
+case VMNET_FAILURE:
+return "general failure (possibly not enough privileges)";
+case VMNET_MEM_FAILURE:
+return "memory allocation failure";
+case VMNET_INVALID_ARGUMENT:
+return "invalid argument specified";
+case VMNET_SETUP_INCOMPLETE:
+return "interface setup is not complete";
+case VMNET_INVALID_ACCESS:
+return "invalid access, permission denied";
+case VMNET_PACKET_TOO_BIG:
+return "packet size is larger than MTU";
+case VMNET_BUFFER_EXHAUSTED:
+return "buffers exhausted in kernel";
+case VMNET_TOO_MANY_PACKETS:
+return "packet count exceeds limit";
+#if defined(MAC_OS_VERSION_11_0) && \
+MAC_OS_X_VERSION_MIN_REQUIRED >= MAC_OS_VERSION_11_0
+case VMNET_SHARING_SERVICE_BUSY:
+return "conflict, sharing service is in use";
+#endif
+default:
+return "unknown vmnet error";
+}
+}
+
+
+/**
+ * Write packets from QEMU to vmnet interface.
+ *
+ * vmnet.framework supports iov, but writing more than
+ * one iov into vmnet interface fails with
+ * 'VMNET_INVALID_ARGUMENT'. Collecting provided iovs into
+ * one and passing it to vmnet works fine. That's the
+ * reason why receive_iov() left unimplemented. But it still
+ * works with good performance having .receive() only.
+ */
+ssize_t vmnet_receive_common(NetClientState *nc,
+ const uint8_t *buf,
+ size_t size)
+{
+VmnetState *s = DO_UPCAST(VmnetState, nc, nc);
+struct vmpktdesc packet;
+struct iovec iov;
+int pkt_cnt;
+vmnet_return_t if_status;
+
+if (size > s->max_packet_size) {
+warn_report("vmnet: packet is too big, %zu > %" PRIu64,
+packet.vm_pkt_size,
+s->max_packet_size);
+return -1;
+}
+
+iov.iov_base = (char *) buf;
+iov.iov_len = size;
+
+packet.vm_pkt_iovcnt = 1;
+packet.vm_flags = 0;
+packet.vm_pkt_size = size;
+packet.vm_pkt_iov = 
+pkt_cnt = 1;
+
+if_status = vmnet_write(s->vmnet_if, , _cnt);
+if (if_status != VMNET_SUCCESS) {
+error_report("vmnet: write error: %s\n",
+ vmnet_status_map_str(if_status));
+return -1;
+}
+
+if (pkt_cnt) {
+return size;
+}
+return 0;
+}
+
+
+/**
+ * Read packets from vmnet interface and write them
+ * to temporary buffers in VmnetState.
+ *
+ * Returns read packets number (may be 0) on success,
+ * -1 on error
+ */
+static int vmnet_read_packets(VmnetState *s)
+{
+assert(s->packets_send_current_pos == s->packets_send_end_pos);
+
+struct vmpktdesc *packets = s->packets_buf;
+vmnet_return_t status;
+int pkt_cnt;
+int i;
+
+/* Read as many packets as present */
+pkt_cnt = VMNET_PACKETS_LIMIT;
+for (i = 0; i < pkt_cnt; ++i) {
+packets[i].vm_pkt_size = s->max_packet_size;
+packets[i].vm_pkt_iovcnt = 1;
+packets[i].vm_flags = 0;
+}
+
+status = vmnet_read(s->vmnet_if, packets, _cnt);
+if (status != VMNET_SUCCESS) {
+error_printf("vmnet: read failed: %s\n",
+ vmnet_status_map_str(status));
+s->packets_send_current_pos = -1;
+s->packets_send_end_pos = -1;
+return -1;
+}
+
+/*
+ * Adjust pointers: packets to be sent
+ * lay in (packets_send_current_pos..packets_send_end_pos]
+ * - excluding current_pos, including end_pos.
+ */
+s->packets_send_current_pos = -1;
+s->packets_send_end_pos = pkt_cnt - 1;
+
+return pkt_cnt;
+}
+
+
+/**
+ * Write packets from temporary buffers in VmnetState
+ * to QEMU.
+ */
+static void vmnet_write_packets_to_qemu(VmnetState *s)
+{
+while (s->packets_send_current_pos < s->packets_send_end_pos) {
+int next_packet_pos = s->packets_send_current_pos + 1;
+ssize_t size = qemu_send_packet_async(>nc,
+  s->iov_buf[next_packet_pos].iov_base,

[PATCH v18 7/7] net/vmnet: update hmp-commands.hx

2022-03-15 Thread Vladislav Yaroshchuk

Signed-off-by: Vladislav Yaroshchuk 
---
 hmp-commands.hx | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index 8476277aa9..8f3d78f177 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1265,7 +1265,11 @@ ERST
 {
 .name   = "netdev_add",
 .args_type  = "netdev:O",
-.params = 
"[user|tap|socket|vde|bridge|hubport|netmap|vhost-user],id=str[,prop=value][,...]",
+.params = "[user|tap|socket|vde|bridge|hubport|netmap|vhost-user"
+#ifdef CONFIG_VMNET
+  "|vmnet-host|vmnet-shared|vmnet-bridged"
+#endif
+  "],id=str[,prop=value][,...]",
 .help   = "add host network device",
 .cmd= hmp_netdev_add,
 .command_completion = netdev_add_completion,
-- 
2.34.1.vfs.0.0

[PATCH v18 2/7] net/vmnet: add vmnet backends to qapi/net

2022-03-15 Thread Vladislav Yaroshchuk

Create separate netdevs for each vmnet operating mode:
- vmnet-host
- vmnet-shared
- vmnet-bridged

Signed-off-by: Vladislav Yaroshchuk 
---
 net/clients.h   |  11 
 net/meson.build |   7 +++
 net/net.c   |  10 
 net/vmnet-bridged.m |  25 +
 net/vmnet-common.m  |  20 +++
 net/vmnet-host.c|  24 
 net/vmnet-shared.c  |  25 +
 net/vmnet_int.h |  25 +
 qapi/net.json   | 133 +++-
 9 files changed, 278 insertions(+), 2 deletions(-)
 create mode 100644 net/vmnet-bridged.m
 create mode 100644 net/vmnet-common.m
 create mode 100644 net/vmnet-host.c
 create mode 100644 net/vmnet-shared.c
 create mode 100644 net/vmnet_int.h

diff --git a/net/clients.h b/net/clients.h
index 92f9b59aed..c9157789f2 100644
--- a/net/clients.h
+++ b/net/clients.h
@@ -63,4 +63,15 @@ int net_init_vhost_user(const Netdev *netdev, const char 
*name,
 
 int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
 NetClientState *peer, Error **errp);
+#ifdef CONFIG_VMNET
+int net_init_vmnet_host(const Netdev *netdev, const char *name,
+  NetClientState *peer, Error **errp);
+
+int net_init_vmnet_shared(const Netdev *netdev, const char *name,
+  NetClientState *peer, Error **errp);
+
+int net_init_vmnet_bridged(const Netdev *netdev, const char *name,
+  NetClientState *peer, Error **errp);
+#endif /* CONFIG_VMNET */
+
 #endif /* QEMU_NET_CLIENTS_H */
diff --git a/net/meson.build b/net/meson.build
index 847bc2ac85..00a88c4951 100644
--- a/net/meson.build
+++ b/net/meson.build
@@ -42,4 +42,11 @@ softmmu_ss.add(when: 'CONFIG_POSIX', if_true: 
files(tap_posix))
 softmmu_ss.add(when: 'CONFIG_WIN32', if_true: files('tap-win32.c'))
 softmmu_ss.add(when: 'CONFIG_VHOST_NET_VDPA', if_true: files('vhost-vdpa.c'))
 
+vmnet_files = files(
+  'vmnet-common.m',
+  'vmnet-bridged.m',
+  'vmnet-host.c',
+  'vmnet-shared.c'
+)
+softmmu_ss.add(when: vmnet, if_true: vmnet_files)
 subdir('can')
diff --git a/net/net.c b/net/net.c
index f0d14dbfc1..1dbb64b935 100644
--- a/net/net.c
+++ b/net/net.c
@@ -1021,6 +1021,11 @@ static int (* const 
net_client_init_fun[NET_CLIENT_DRIVER__MAX])(
 #ifdef CONFIG_L2TPV3
 [NET_CLIENT_DRIVER_L2TPV3]= net_init_l2tpv3,
 #endif
+#ifdef CONFIG_VMNET
+[NET_CLIENT_DRIVER_VMNET_HOST] = net_init_vmnet_host,
+[NET_CLIENT_DRIVER_VMNET_SHARED] = net_init_vmnet_shared,
+[NET_CLIENT_DRIVER_VMNET_BRIDGED] = net_init_vmnet_bridged,
+#endif /* CONFIG_VMNET */
 };
 
 
@@ -1106,6 +,11 @@ void show_netdevs(void)
 #endif
 #ifdef CONFIG_VHOST_VDPA
 "vhost-vdpa",
+#endif
+#ifdef CONFIG_VMNET
+"vmnet-host",
+"vmnet-shared",
+"vmnet-bridged",
 #endif
 };
 
diff --git a/net/vmnet-bridged.m b/net/vmnet-bridged.m
new file mode 100644
index 00..91c1a2f2c7
--- /dev/null
+++ b/net/vmnet-bridged.m
@@ -0,0 +1,25 @@
+/*
+ * vmnet-bridged.m
+ *
+ * Copyright(c) 2022 Vladislav Yaroshchuk 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/qapi-types-net.h"
+#include "vmnet_int.h"
+#include "clients.h"
+#include "qemu/error-report.h"
+#include "qapi/error.h"
+
+#include 
+
+int net_init_vmnet_bridged(const Netdev *netdev, const char *name,
+   NetClientState *peer, Error **errp)
+{
+  error_setg(errp, "vmnet-bridged is not implemented yet");
+  return -1;
+}
diff --git a/net/vmnet-common.m b/net/vmnet-common.m
new file mode 100644
index 00..06326efb1c
--- /dev/null
+++ b/net/vmnet-common.m
@@ -0,0 +1,20 @@
+/*
+ * vmnet-common.m - network client wrapper for Apple vmnet.framework
+ *
+ * Copyright(c) 2022 Vladislav Yaroshchuk 
+ * Copyright(c) 2021 Phillip Tennen 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/qapi-types-net.h"
+#include "vmnet_int.h"
+#include "clients.h"
+#include "qemu/error-report.h"
+#include "qapi/error.h"
+
+#include 
+
diff --git a/net/vmnet-host.c b/net/vmnet-host.c
new file mode 100644
index 00..a461d507c5
--- /dev/null
+++ b/net/vmnet-host.c
@@ -0,0 +1,24 @@
+/*
+ * vmnet-host.c
+ *
+ * Copyright(c) 2022 Vladislav Yaroshchuk 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/qapi-types-net.h"
+#include "vmnet_int.h"
+#include "clients.h"
+#include "qemu/error-report.h"
+#include "qapi/error.h"
+
+#include 
+
+int net_init_vmnet_host(const Netdev *netdev, const char *name,
+NetClientState *peer, Error **errp) {
+  error_setg(errp, "vmnet-host is not implemented yet");
+  return -1;

[PATCH v18 6/7] net/vmnet: update qemu-options.hx

2022-03-15 Thread Vladislav Yaroshchuk

Signed-off-by: Vladislav Yaroshchuk 
---
 qemu-options.hx | 25 +
 1 file changed, 25 insertions(+)

diff --git a/qemu-options.hx b/qemu-options.hx
index 5ce0ada75e..ea00d0eeb6 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -2743,6 +2743,25 @@ DEF("netdev", HAS_ARG, QEMU_OPTION_netdev,
 #ifdef __linux__
 "-netdev vhost-vdpa,id=str,vhostdev=/path/to/dev\n"
 "configure a vhost-vdpa network,Establish a vhost-vdpa 
netdev\n"
+#endif
+#ifdef CONFIG_VMNET
+"-netdev vmnet-host,id=str[,isolated=on|off][,net-uuid=uuid]\n"
+" [,start-address=addr,end-address=addr,subnet-mask=mask]\n"
+"configure a vmnet network backend in host mode with ID 
'str',\n"
+"isolate this interface from others with 'isolated',\n"
+"configure the address range and choose a subnet mask,\n"
+"specify network UUID 'uuid' to disable DHCP and interact 
with\n"
+"vmnet-host interfaces within this isolated network\n"
+"-netdev vmnet-shared,id=str[,isolated=on|off][,nat66-prefix=addr]\n"
+" [,start-address=addr,end-address=addr,subnet-mask=mask]\n"
+"configure a vmnet network backend in shared mode with ID 
'str',\n"
+"configure the address range and choose a subnet mask,\n"
+"set IPv6 ULA prefix (of length 64) to use for internal 
network,\n"
+"isolate this interface from others with 'isolated'\n"
+"-netdev vmnet-bridged,id=str,ifname=name[,isolated=on|off]\n"
+"configure a vmnet network backend in bridged mode with ID 
'str',\n"
+"use 'ifname=name' to select a physical network interface 
to be bridged,\n"
+"isolate this interface from others with 'isolated'\n"
 #endif
 "-netdev hubport,id=str,hubid=n[,netdev=nd]\n"
 "configure a hub port on the hub with ID 'n'\n", 
QEMU_ARCH_ALL)
@@ -2762,6 +2781,9 @@ DEF("nic", HAS_ARG, QEMU_OPTION_nic,
 #endif
 #ifdef CONFIG_POSIX
 "vhost-user|"
+#endif
+#ifdef CONFIG_VMNET
+"vmnet-host|vmnet-shared|vmnet-bridged|"
 #endif
 "socket][,option][,...][mac=macaddr]\n"
 "initialize an on-board / default host NIC (using MAC 
address\n"
@@ -2784,6 +2806,9 @@ DEF("net", HAS_ARG, QEMU_OPTION_net,
 #endif
 #ifdef CONFIG_NETMAP
 "netmap|"
+#endif
+#ifdef CONFIG_VMNET
+"vmnet-host|vmnet-shared|vmnet-bridged|"
 #endif
 "socket][,option][,option][,...]\n"
 "old way to initialize a host network interface\n"
-- 
2.34.1.vfs.0.0

[PATCH v18 1/7] net/vmnet: add vmnet dependency and customizable option

2022-03-15 Thread Vladislav Yaroshchuk

vmnet.framework dependency is added with 'vmnet' option
to enable or disable it. Default value is 'auto'.

used vmnet features are available since macOS 11.0,
but new backend can be built and work properly with
subset of them on 10.15 too.

Signed-off-by: Vladislav Yaroshchuk 
---
 meson.build   | 16 +++-
 meson_options.txt |  2 ++
 scripts/meson-buildoptions.sh |  1 +
 3 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/meson.build b/meson.build
index 2d6601467f..806f3869f9 100644
--- a/meson.build
+++ b/meson.build
@@ -522,6 +522,18 @@ if cocoa.found() and get_option('gtk').enabled()
   error('Cocoa and GTK+ cannot be enabled at the same time')
 endif
 
+vmnet = dependency('appleframeworks', modules: 'vmnet', required: 
get_option('vmnet'))
+if vmnet.found() and not cc.has_header_symbol('vmnet/vmnet.h',
+  'VMNET_BRIDGED_MODE',
+  dependencies: vmnet)
+  vmnet = not_found
+  if get_option('vmnet').enabled()
+error('vmnet.framework API is outdated')
+  else
+warning('vmnet.framework API is outdated, disabling')
+  endif
+endif
+
 seccomp = not_found
 if not get_option('seccomp').auto() or have_system or have_tools
   seccomp = dependency('libseccomp', version: '>=2.3.0',
@@ -1550,6 +1562,7 @@ config_host_data.set('CONFIG_SNAPPY', snappy.found())
 config_host_data.set('CONFIG_TPM', have_tpm)
 config_host_data.set('CONFIG_USB_LIBUSB', libusb.found())
 config_host_data.set('CONFIG_VDE', vde.found())
+config_host_data.set('CONFIG_VMNET', vmnet.found())
 config_host_data.set('CONFIG_VHOST_USER_BLK_SERVER', 
have_vhost_user_blk_server)
 config_host_data.set('CONFIG_VNC', vnc.found())
 config_host_data.set('CONFIG_VNC_JPEG', jpeg.found())
@@ -3588,7 +3601,8 @@ summary(summary_info, bool_yn: true, section: 'Crypto')
 # Libraries
 summary_info = {}
 if targetos == 'darwin'
-  summary_info += {'Cocoa support':   cocoa}
+  summary_info += {'Cocoa support':   cocoa}
+  summary_info += {'vmnet.framework support': vmnet}
 endif
 summary_info += {'SDL support':   sdl}
 summary_info += {'SDL image support': sdl_image}
diff --git a/meson_options.txt b/meson_options.txt
index 52b11cead4..d2c0b6b412 100644
--- a/meson_options.txt
+++ b/meson_options.txt
@@ -175,6 +175,8 @@ option('netmap', type : 'feature', value : 'auto',
description: 'netmap network backend support')
 option('vde', type : 'feature', value : 'auto',
description: 'vde network backend support')
+option('vmnet', type : 'feature', value : 'auto',
+   description: 'vmnet.framework network backend support')
 option('virglrenderer', type : 'feature', value : 'auto',
description: 'virgl rendering support')
 option('vnc', type : 'feature', value : 'auto',
diff --git a/scripts/meson-buildoptions.sh b/scripts/meson-buildoptions.sh
index 9ee684ef03..30946f3798 100644
--- a/scripts/meson-buildoptions.sh
+++ b/scripts/meson-buildoptions.sh
@@ -116,6 +116,7 @@ meson_options_help() {
   printf "%s\n" '  usb-redir   libusbredir support'
   printf "%s\n" '  vde vde network backend support'
   printf "%s\n" '  vdi vdi image format support'
+  printf "%s\n" '  vmnet   vmnet.framework network backend support'
   printf "%s\n" '  vhost-user-blk-server'
   printf "%s\n" '  build vhost-user-blk server'
   printf "%s\n" '  virglrenderer   virgl rendering support'
-- 
2.34.1.vfs.0.0

[PATCH v18 0/7] Add vmnet.framework based network backend

2022-03-15 Thread Vladislav Yaroshchuk

macOS provides networking API for VMs called 'vmnet.framework':
https://developer.apple.com/documentation/vmnet

We can provide its support as the new QEMU network backends which
represent three different vmnet.framework interface usage modes:

  * `vmnet-shared`:
allows the guest to communicate with other guests in shared mode and
also with external network (Internet) via NAT. Has (macOS-provided)
DHCP server; subnet mask and IP range can be configured;

  * `vmnet-host`:
allows the guest to communicate with other guests in host mode.
By default has enabled DHCP as `vmnet-shared`, but providing
network unique id (uuid) can make `vmnet-host` interfaces isolated
from each other and also disables DHCP.

  * `vmnet-bridged`:
bridges the guest with a physical network interface.

This backends cannot work on macOS Catalina 10.15 cause we use
vmnet.framework API provided only with macOS 11 and newer. Seems
that it is not a problem, because QEMU guarantees to work on two most
recent versions of macOS which now are Big Sur (11) and Monterey (12).

Also, we have one inconvenient restriction: vmnet.framework interfaces
can create only privileged user:
`$ sudo qemu-system-x86_64 -nic vmnet-shared`

Attempt of `vmnet-*` netdev creation being unprivileged user fails with
vmnet's 'general failure'.

This happens because vmnet.framework requires `com.apple.vm.networking`
entitlement which is: "restricted to developers of virtualization software.
To request this entitlement, contact your Apple representative." as Apple
documentation says:
https://developer.apple.com/documentation/bundleresources/entitlements/com_apple_vm_networking

One more note: we still have quite useful but not supported
'vmnet.framework' features as creating port forwarding rules, IPv6
NAT prefix specifying and so on.

Nevertheless, new backends work fine and tested within `qemu-system-x86-64`
on macOS Bir Sur 11.5.2 host with such nic models:
  * e1000-82545em
  * virtio-net-pci
  * vmxnet3

The guests were:
  * macOS 10.15.7
  * Ubuntu Bionic (server cloudimg)


This series partially reuses patches by Phillip Tennen:
https://patchew.org/QEMU/20210218134947.1860-1-phillip.en...@gmail.com/
So I included them signed-off line into one of the commit messages and
also here.

v1 -> v2:
 Since v1 minor typos were fixed, patches rebased onto latest master,
 redundant changes removed (small commits squashed)
v2 -> v3:
 - QAPI style fixes
 - Typos fixes in comments
 - `#include`'s updated to be in sync with recent master
v3 -> v4:
 - Support vmnet interfaces isolation feature
 - Support vmnet-host network uuid setting feature
 - Refactored sources a bit
v4 -> v5:
 - Missed 6.2 boat, now 7.0 candidate
 - Fix qapi netdev descriptions and styles
   (@subnetmask -> @subnet-mask)
 - Support vmnet-shared IPv6 prefix setting feature
v5 -> v6
 - provide detailed commit messages for commits of
   many changes
 - rename properties @dhcpstart and @dhcpend to
   @start-address and @end-address
 - improve qapi documentation about isolation
   features (@isolated, @net-uuid)
v6 -> v7:
 - update MAINTAINERS list
v7 -> v8
 - QAPI code style fixes
v8 -> v9
 - Fix building on Linux: add missing qapi
   `'if': 'CONFIG_VMNET'` statement to Netdev union
v9 -> v10
 - Disable vmnet feature for macOS < 11.0: add
   vmnet.framework API probe into meson.build.
   This fixes QEMU building on macOS < 11.0:
   https://patchew.org/QEMU/20220110034000.20221-1-jasow...@redhat.com/
v10 -> v11
 - Enable vmnet for macOS 10.15 with subset of available
   features. Disable vmnet for macOS < 10.15.
 - Fix typos
v11 -> v12
 - use more general macOS version check with
   MAC_OS_VERSION_11_0 instead of manual
   definition creating.
v12 -> v13
 - fix incorrect macOS version bound while
   'feature available since 11.0' check.
   Use MAC_OS_X_VERSION_MIN_REQUIRED instead of
   MAC_OS_X_VERSION_MAX_ALLOWED.
v13 -> v14
 - fix memory leaks
 - get rid of direct global mutex taking while resending
   packets from vmnet to QEMU, schedule a bottom half
   instead (it can be a thing to discuss, maybe exists a
   better way to perform the packets transfer)
 - update hmp commands
 - a bit refactor everything
 - change the email from which patches are being
   submitted, same to email in MAINTAINERS list
 - P.S. sorry for so late reply
v14 -> v15
 - restore --enable-vdi and --disable-vdi
   mistakenly dropped in previous series
v15 -> v16
 - common: complete sending pending packets when
   QEMU is ready, refactor, fix memory leaks
 - QAPI: change version to 7.1 (cause 7.0 feature freeze
   happened). This is the only change in QAPI, Markus Armbruster,
   please confirm if you can (decided to drop your Acked-by due
   to this change)
 - vmnet-bridged: extend "supported ifnames" message buffer len
 - fix behaviour dependence on debug (add "return -1" after
   assert_not_reached)
 - use PRIu64 for proper printing
 - NOTE: This version of patch series may be one the last
   I submit

Re: [PATCH] target/riscv: Exit current TB after an sfence.vma

2022-03-15 Thread Richard Henderson


On 3/15/22 12:23, Idan Horowitz wrote:

If the pages which control the translation of the currently executing
instructions are changed, and then the TLB is flushed using sfence.vma
we have to exit the current TB early, to ensure we don't execute stale
instructions.

Signed-off-by: Idan Horowitz
---
  target/riscv/insn_trans/trans_privileged.c.inc | 7 +++
  1 file changed, 7 insertions(+)


Reviewed-by: Richard Henderson 


r~

Re: [PATCH v2 1/6] block: Support passing NULL ops to blk_set_dev_ops()

2022-03-15 Thread John Snow

On Tue, Mar 15, 2022 at 4:47 AM Stefan Hajnoczi  wrote:
>
> On Mon, Mar 14, 2022 at 03:09:35PM -0400, John Snow wrote:
> > On Mon, Mar 14, 2022 at 1:23 PM Stefan Hajnoczi  wrote:
> > >
> > > On Tue, Feb 15, 2022 at 06:59:38PM +0800, Xie Yongji wrote:
> > > > This supports passing NULL ops to blk_set_dev_ops()
> > > > so that we can remove stale ops in some cases.
> > > >
> > > > Signed-off-by: Xie Yongji 
> > > > ---
> > > >  block/block-backend.c | 2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > >
> > > > diff --git a/block/block-backend.c b/block/block-backend.c
> > > > index 4ff6b4d785..08dd0a3093 100644
> > > > --- a/block/block-backend.c
> > > > +++ b/block/block-backend.c
> > > > @@ -1015,7 +1015,7 @@ void blk_set_dev_ops(BlockBackend *blk, const 
> > > > BlockDevOps *ops,
> > > >  blk->dev_opaque = opaque;
> > > >
> > > >  /* Are we currently quiesced? Should we enforce this right now? */
> > > > -if (blk->quiesce_counter && ops->drained_begin) {
> > > > +if (blk->quiesce_counter && ops && ops->drained_begin) {
> > > >  ops->drained_begin(opaque);
> > > >  }
> > > >  }
> > >
> > > John: You added this code in f4d9cc88ee6. Does blk_set_dev_ops() need to
> > > call ->drained_end() when ops is set to NULL?
> > >
> > > Stefan
> >
> > I'm not sure I trust my memory from five years ago.
> >
> > From what I recall, the problem was that block jobs weren't getting
> > drained/paused when the backend was getting quiesced -- we wanted to
> > be sure that a blockjob wasn't continuing to run and submit requests
> > against a backend we wanted to have on ice during a sensitive
> > operation. This conditional stanza here is meant to check if the node
> > we're already attached to is *already quiesced* and we missed the
> > signal (so-to-speak), so we replay the drained_begin() request right
> > there.
> >
> > i.e. there was some case where blockjobs were getting added to an
> > already quiesced node, and this code here post-hoc relays that drain
> > request to the blockjob. This gets used in
> > 600ac6a0ef5c06418446ef2f37407bddcc51b21c to pause/unpause jobs.
> > Original thread is here:
> > https://lists.gnu.org/archive/html/qemu-devel/2017-03/msg03416.html
> >
> > Now, I'm not sure why you want to set ops to NULL here. If we're in a
> > drained section, that sounds like it might be potentially bad because
> > we lose track of the operation to end the drained section. If your
> > intent is to destroy the thing that we'd need to call drained_end on,
> > I guess it doesn't matter -- provided you've cleaned up the target
> > object correctly. Just calling drained_end() pre-emptively seems like
> > it might be bad, what if it unpauses something you're in the middle of
> > trying to delete?
> >
> > I might need slightly more context to know what you're hoping to
> > accomplish, but I hope this info helps contextualize this code
> > somewhat.
>
> Setting to NULL in this patch is a subset of blk_detach_dev(), which
> gets called when a storage controller is hot unplugged.
>
> In this patch series there is no DeviceState because a VDUSE export is
> not a device. The VDUSE code calls blk_set_dev_ops() to
> register/unregister callbacks (e.g. ->resize_cb()).
>
> The reason I asked about ->drained_end() is for symmetry. If the
> device's ->drained_begin() callback changed state or allocated resources
> then they may need to be freed/reset. On the other hand, the
> blk_set_dev_ops(blk, NULL, NULL) call should be made by the dev_ops
> owner so they can clean up without a ->drained_end() call.

OK, got it... Hm, we don't actually use these for BlockJobs anymore.
It looks like the only user of these callbacks now is the NBD driver.

ad90febaf22d95e49fb6821bfb3ebd05b4919417 followed not long after my
initial patch and removed my intended user. I tried just removing the
fields, but the build chokes on NBD.
It looks like these usages are pretty modern, Sergio added them in
fd6afc50 (2021-06-02). So, I guess we do actually still use these
hooks. (After a period of maybe not using them for 4 years? Wow.)

I'm not clear on what we *want* to happen here, though. It doesn't
sound like NBD is the anticipated use case, so maybe just make the
removal fail if the drained section is active and callbacks are
defined? That's the safe thing to do, probably.

--js

Re: [PULL 0/8] s390x and misc fixes

2022-03-15 Thread Peter Maydell

On Tue, 15 Mar 2022 at 18:58, Peter Maydell  wrote:
>
> On Tue, 15 Mar 2022 at 11:20, Thomas Huth  wrote:
> >
> >  Hi Peter!
> >
> > The following changes since commit 352998df1c53b366413690d95b35f76d0721ebed:
> >
> >   Merge tag 'i2c-20220314' of https://github.com/philmd/qemu into staging 
> > (2022-03-14 14:39:33 +)
> >
> > are available in the Git repository at:
> >
> >   https://gitlab.com/thuth/qemu.git tags/pull-request-2022-03-15
> >
> > for you to fetch changes up to 36149534792dcf07a3c59867f967eaee23ab906c:
> >
> >   meson: Update to version 0.61.3 (2022-03-15 10:32:36 +0100)
> >
> > 
> > * Fixes for s390x branch instruction emulation
> > * Fixes for the tests/avocado/boot_linux.py:BootLinuxS390X test
> > * Fix for "-cpu help" output
> > * Bump meson to 0.61.3 to fix stderr log of the iotests
> >
> > 
>
> This results in every "Linking" step on my macos box producing the
> warning:
>
> ld: warning: directory not found for option
> '-Lns/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/12.0.0'
>
> Obvious suspect here is the new meson version.

Also, after rolling this merge attempt back, older meson barfs
on whatever the new one left behind:

[0/1] Regenerating build files.
Traceback (most recent call last):
  File "/Users/pm215/src/qemu-for-merges/meson/mesonbuild/mesonmain.py",
line 228, in run
return options.run_func(options)
  File "/Users/pm215/src/qemu-for-merges/meson/mesonbuild/msetup.py",
line 281, in run
app.generate()
  File "/Users/pm215/src/qemu-for-merges/meson/mesonbuild/msetup.py",
line 177, in generate
env = environment.Environment(self.source_dir, self.build_dir, self.options)
  File "/Users/pm215/src/qemu-for-merges/meson/mesonbuild/environment.py",
line 462, in __init__
self.coredata = coredata.load(self.get_build_dir())  # type:
coredata.CoreData
  File "/Users/pm215/src/qemu-for-merges/meson/mesonbuild/coredata.py",
line 1003, in load
obj = pickle.load(f)
  File 
"/Users/pm215/src/qemu-for-merges/meson/mesonbuild/mesonlib/universal.py",
line 2076, in __setstate__
self.__init__(**state)  # type: ignore
TypeError: __init__() got an unexpected keyword argument 'module'
FAILED: build.ninja
/usr/local/opt/python@3.9/bin/python3.9
/Users/pm215/src/qemu-for-merges/meson/meson.py --internal regenerate
/Users/pm215/src/qemu-for-merges
/Users/pm215/src/qemu-for-merges/build/all --backend ninja
ninja: error: rebuilding 'build.ninja': subcommand failed
/usr/local/bin/ninja  build.ninja && touch build.ninja.stamp
  GIT ui/keycodemapdb meson tests/fp/berkeley-testfloat-3
tests/fp/berkeley-softfloat-3 dtc capstone slirp
[0/1] Regenerating build files.
Traceback (most recent call last):
  File "/Users/pm215/src/qemu-for-merges/meson/mesonbuild/mesonmain.py",
line 228, in run
return options.run_func(options)
  File "/Users/pm215/src/qemu-for-merges/meson/mesonbuild/msetup.py",
line 281, in run
app.generate()
  File "/Users/pm215/src/qemu-for-merges/meson/mesonbuild/msetup.py",
line 177, in generate
env = environment.Environment(self.source_dir, self.build_dir, self.options)
  File "/Users/pm215/src/qemu-for-merges/meson/mesonbuild/environment.py",
line 462, in __init__
self.coredata = coredata.load(self.get_build_dir())  # type:
coredata.CoreData
  File "/Users/pm215/src/qemu-for-merges/meson/mesonbuild/coredata.py",
line 1003, in load
obj = pickle.load(f)
  File 
"/Users/pm215/src/qemu-for-merges/meson/mesonbuild/mesonlib/universal.py",
line 2076, in __setstate__
self.__init__(**state)  # type: ignore
TypeError: __init__() got an unexpected keyword argument 'module'
FAILED: build.ninja

meson ought to be smart enough to spot that it's got data from an
incompatible version and just discard its cache rather than
choking on it.

thanks
-- PMM

Re: [PATCH v17 3/7] net/vmnet: implement shared mode (vmnet-shared)

2022-03-15 Thread Vladislav Yaroshchuk

On Tue, Mar 15, 2022 at 8:54 PM Akihiko Odaki 
wrote:

> On 2022/03/16 2:45, Vladislav Yaroshchuk wrote:
> >
> >
> > On Tue, Mar 15, 2022 at 1:18 PM Akihiko Odaki  > > wrote:
> >
> > On 2022/03/15 19:02, Vladislav Yaroshchuk wrote:
> >  > Interaction with vmnet.framework in different modes
> >  > differs only on configuration stage, so we can create
> >  > common `send`, `receive`, etc. procedures and reuse them.
> >  >
> >  > Signed-off-by: Phillip Tennen  > >
> >  > Signed-off-by: Vladislav Yaroshchuk
> >  > >
> >  > ---
> >  >   net/vmnet-common.m | 359
> > +
> >  >   net/vmnet-shared.c |  94 +++-
> >  >   net/vmnet_int.h|  41 +-
> >  >   3 files changed, 489 insertions(+), 5 deletions(-)
> >  >
> >  > diff --git a/net/vmnet-common.m b/net/vmnet-common.m
> >  > index 56612c72ce..6af042406b 100644
> >  > --- a/net/vmnet-common.m
> >  > +++ b/net/vmnet-common.m
> >  > @@ -10,6 +10,8 @@
> >  >*/
> >  >
> >  >   #include "qemu/osdep.h"
> >  > +#include "qemu/main-loop.h"
> >  > +#include "qemu/log.h"
> >  >   #include "qapi/qapi-types-net.h"
> >  >   #include "vmnet_int.h"
> >  >   #include "clients.h"
> >  > @@ -17,4 +19,361 @@
> >  >   #include "qapi/error.h"
> >  >
> >  >   #include 
> >  > +#include 
> >  >
> >  > +
> >  > +static void vmnet_send_completed(NetClientState *nc, ssize_t
> len);
> >  > +
> >  > +
> >  > +const char *vmnet_status_map_str(vmnet_return_t status)
> >  > +{
> >  > +switch (status) {
> >  > +case VMNET_SUCCESS:
> >  > +return "success";
> >  > +case VMNET_FAILURE:
> >  > +return "general failure (possibly not enough
> privileges)";
> >  > +case VMNET_MEM_FAILURE:
> >  > +return "memory allocation failure";
> >  > +case VMNET_INVALID_ARGUMENT:
> >  > +return "invalid argument specified";
> >  > +case VMNET_SETUP_INCOMPLETE:
> >  > +return "interface setup is not complete";
> >  > +case VMNET_INVALID_ACCESS:
> >  > +return "invalid access, permission denied";
> >  > +case VMNET_PACKET_TOO_BIG:
> >  > +return "packet size is larger than MTU";
> >  > +case VMNET_BUFFER_EXHAUSTED:
> >  > +return "buffers exhausted in kernel";
> >  > +case VMNET_TOO_MANY_PACKETS:
> >  > +return "packet count exceeds limit";
> >  > +#if defined(MAC_OS_VERSION_11_0) && \
> >  > +MAC_OS_X_VERSION_MIN_REQUIRED >= MAC_OS_VERSION_11_0
> >  > +case VMNET_SHARING_SERVICE_BUSY:
> >  > +return "conflict, sharing service is in use";
> >  > +#endif
> >  > +default:
> >  > +return "unknown vmnet error";
> >  > +}
> >  > +}
> >  > +
> >  > +/**
> >  > + * Write packets from QEMU to vmnet interface.
> >  > + *
> >  > + * vmnet.framework supports iov, but writing more than
> >  > + * one iov into vmnet interface fails with
> >  > + * 'VMNET_INVALID_ARGUMENT'. Collecting provided iovs into
> >  > + * one and passing it to vmnet works fine. That's the
> >  > + * reason why receive_iov() left unimplemented. But it still
> >  > + * works with good performance having .receive() only.
> >  > + */
> >  > +ssize_t vmnet_receive_common(NetClientState *nc,
> >  > + const uint8_t *buf,
> >  > + size_t size)
> >  > +{
> >  > +VmnetCommonState *s = DO_UPCAST(VmnetCommonState, nc, nc);
> >  > +struct vmpktdesc packet;
> >  > +struct iovec iov;
> >  > +int pkt_cnt;
> >  > +vmnet_return_t if_status;
> >  > +
> >  > +if (size > s->max_packet_size) {
> >  > +warn_report("vmnet: packet is too big, %zu > %" PRIu64,
> >  > +packet.vm_pkt_size,
> >  > +s->max_packet_size);
> >  > +return -1;
> >  > +}
> >  > +
> >  > +iov.iov_base = (char *) buf;
> >  > +iov.iov_len = size;
> >  > +
> >  > +packet.vm_pkt_iovcnt = 1;
> >  > +packet.vm_flags = 0;
> >  > +packet.vm_pkt_size = size;
> >  > +packet.vm_pkt_iov = 
> >  > +pkt_cnt = 1;
> >  > +
> >  > +if_status = vmnet_write(s->vmnet_if, , _cnt);
> >  > +if (if_status != VMNET_SUCCESS) {
> >  > +error_report("vmnet: write error: %s\n",
> >  > + vmnet_status_map_str(if_status));
> >  > +return -1;
> >  > +}
> >  > +
> >  > +if (pkt_cnt) {
> >  > +return size;
> >  > +}
> >  > +return 0;
> >  > +}
> >

[PATCH] target/riscv: Exit current TB after an sfence.vma

2022-03-15 Thread Idan Horowitz

If the pages which control the translation of the currently executing
instructions are changed, and then the TLB is flushed using sfence.vma
we have to exit the current TB early, to ensure we don't execute stale
instructions.

Signed-off-by: Idan Horowitz 
---
 target/riscv/insn_trans/trans_privileged.c.inc | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/target/riscv/insn_trans/trans_privileged.c.inc 
b/target/riscv/insn_trans/trans_privileged.c.inc
index 53613682e8..f265e8202d 100644
--- a/target/riscv/insn_trans/trans_privileged.c.inc
+++ b/target/riscv/insn_trans/trans_privileged.c.inc
@@ -114,6 +114,13 @@ static bool trans_sfence_vma(DisasContext *ctx, 
arg_sfence_vma *a)
 {
 #ifndef CONFIG_USER_ONLY
 gen_helper_tlb_flush(cpu_env);
+/*
+ * The flush might have changed the backing physical memory of
+ * the instructions we're currently executing
+ */
+gen_set_pc_imm(ctx, ctx->pc_succ_insn);
+tcg_gen_exit_tb(NULL, 0);
+ctx->base.is_jmp = DISAS_NORETURN;
 return true;
 #endif
 return false;
-- 
2.35.1

Re: [PATCH v2 1/3] util/osdep: Avoid madvise proto on modern Solaris

2022-03-15 Thread Andrew Deason

On Tue, 15 Mar 2022 18:33:33 +
Peter Maydell  wrote:

> Since this is a little bit tricky, I think a comment here will help
> future readers:
> 
> # Older Solaris/Illumos provide madvise() but forget to prototype it.

I don't think it matters much, but just to mention, the prototype is in
there, but it's deliberately hidden by some #ifdef logic for (older?)
POSIX/XPG compliance or something. I sometimes try to phrase this in a
way that reflects that, but it's hard so I probably won't care.

> > +#ifdef HAVE_MADVISE_MISSING_PROTOTYPE
> >  /* See MySQL bug #7156 (http://bugs.mysql.com/bug.php?id=7156) for
> > discussion about Solaris header problems */
> >  extern int madvise(char *, size_t, int);
> >  #endif
> 
> As you note, this doesn't match the name we picked in meson.build.
> I don't feel very strongly about the name (we certainly don't manage
> consistency across the project about CONFIG_ vs HAVE_ !), but my suggestion
> is HAVE_MADVISE_WITHOUT_PROTOTYPE.
> 
> Can you put the prototype in include/qemu/osdep.h, please?
> (Exact location not very important as long as it's inside
> the extern-C block, but I suggest just under the bit where we
> define SIGIO for __HAIKU__.)

Okay, but this will cause callers that call madvise() directly to
"work", even though they're not going through the qemu_madvise wrapper.
There's one area in cross-platform code you noted before, in
softmmu/physmem.c, and that does cause the same build error if the
prototype is missing. (I'm going to add another commit to make that use
the wrapper in the next patchset.)

I assume that's not a concern unless I hear otherwise; just pointing it
out.

And all other comments will be addressed; thanks.

-- 
Andrew Deason
adea...@sinenomine.net

Re: [PATCH v5 42/48] target/nios2: Implement rdprs, wrprs

2022-03-15 Thread Richard Henderson


On 3/15/22 09:26, Amir Gonnen wrote:

Something is wrong when translating rdprs in an interrupt handler when CRS is 
0x1.
I'm hitting "../tcg/tcg.c:3466: tcg_reg_alloc_mov: Assertion `ts->val_type == 
TEMP_VAL_REG' failed."

When stopped on that assertion I can see that :
- ts->val_type  = TEMP_VAL_DEAD
- op->opc = INDEX_op_mov_i32
- ots->name = "pc"
- cpu->ctrl[0] = 0x5f9 (that's STATUS so CRS = 1)
- pc = 0xa2d5c

so, it looks related to an assignment to PC a little after rdprs.


Ok, thanks for the report.  Yes, it's a bug in the indirection lowering.


r~

Re: [PULL 0/8] s390x and misc fixes

2022-03-15 Thread Peter Maydell

On Tue, 15 Mar 2022 at 11:20, Thomas Huth  wrote:
>
>  Hi Peter!
>
> The following changes since commit 352998df1c53b366413690d95b35f76d0721ebed:
>
>   Merge tag 'i2c-20220314' of https://github.com/philmd/qemu into staging 
> (2022-03-14 14:39:33 +)
>
> are available in the Git repository at:
>
>   https://gitlab.com/thuth/qemu.git tags/pull-request-2022-03-15
>
> for you to fetch changes up to 36149534792dcf07a3c59867f967eaee23ab906c:
>
>   meson: Update to version 0.61.3 (2022-03-15 10:32:36 +0100)
>
> 
> * Fixes for s390x branch instruction emulation
> * Fixes for the tests/avocado/boot_linux.py:BootLinuxS390X test
> * Fix for "-cpu help" output
> * Bump meson to 0.61.3 to fix stderr log of the iotests
>
> 

This results in every "Linking" step on my macos box producing the
warning:

ld: warning: directory not found for option
'-Lns/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/12.0.0'

Obvious suspect here is the new meson version.

thanks
-- PMM

Re: [PULL v2 00/47] virtio,pc,pci: features, cleanups, fixes

2022-03-15 Thread Peter Maydell

On Tue, 15 Mar 2022 at 18:35, Philippe Mathieu-Daudé
 wrote:
> On 8/3/22 12:18, Peter Maydell wrote:
> > Using 'unsigned long' in a cast (or anything else) is often
> > the wrong thing in QEMU...
>
> $ git grep -F '(unsigned long)' | wc -l
>   273
>
> Ouch :/

Only "often", not "always" :-) We have some APIs that work on
'long', usually because they're generic APIs borrowed from the
Linux kernel like the clear_bit/set_bit functions. And sometimes
you're interfacing to a host OS API whose types are 'long'.
So it's only one of those things that I tend to have in the
back of my head during code review, rather than something I think
we could enforce automatically.

The stuff in sev.c you list does look a bit suspicious, but
it's not actually buggy because that's all KVM code so we
know 'unsigned long' and pointers are the same size.
'uintptr_t' would be better, though.

thanks
-- PMM

Re: [PATCH] Don't include sysemu/tcg.h if it is not necessary

2022-03-15 Thread Richard Henderson


On 3/15/22 07:41, Thomas Huth wrote:

This header only defines the tcg_allowed variable and the tcg_enabled()
function - which are not required in many files that include this
header. Drop the #include statement there.

Signed-off-by: Thomas Huth
---
  accel/tcg/hmp.c  | 1 -
  accel/tcg/tcg-accel-ops-icount.c | 1 -
  bsd-user/main.c  | 1 -
  hw/virtio/vhost.c| 1 -
  linux-user/main.c| 1 -
  monitor/misc.c   | 1 -
  target/arm/helper.c  | 1 -
  target/s390x/cpu_models_sysemu.c | 1 -
  target/s390x/helper.c| 1 -
  9 files changed, 9 deletions(-)


Thanks.  Queued to tcg-next.


r~

Re: [PATCH 2/2] hw/arm/virt: Fix gic-version=max when CONFIG_ARM_GICV3_TCG is unset

2022-03-15 Thread Philippe Mathieu-Daudé


On 8/3/22 19:24, Eric Auger wrote:

In TCG mode, if gic-version=max we always select GICv3 even if
CONFIG_ARM_GICV3_TCG is unset. We shall rather select GICv2.
This also brings the benefit of fixing qos tests errors for tests
using gic-version=max with CONFIG_ARM_GICV3_TCG unset.

Signed-off-by: Eric Auger 

---

v2 -> v3:
- Use module_object_class_by_name() and refer to the renamed
   CONFIG_ARM_GICV3_TCG config
---
  hw/arm/virt.c | 7 ++-
  1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 46bf7ceddf..39790d29d2 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1852,7 +1852,12 @@ static void finalize_gic_version(VirtMachineState *vms)
  vms->gic_version = VIRT_GIC_VERSION_2;
  break;
  case VIRT_GIC_VERSION_MAX:
-vms->gic_version = VIRT_GIC_VERSION_3;
+if (module_object_class_by_name("arm-gicv3")) {


Too late, but why not use TYPE_ARM_GICV3?


+/* CONFIG_ARM_GICV3_TCG was set */
+vms->gic_version = VIRT_GIC_VERSION_3;
+} else {
+vms->gic_version = VIRT_GIC_VERSION_2;
+}
  break;
  case VIRT_GIC_VERSION_HOST:
  error_report("gic-version=host requires KVM");

Re: [PULL v2 00/47] virtio,pc,pci: features, cleanups, fixes

2022-03-15 Thread Philippe Mathieu-Daudé


On 8/3/22 12:18, Peter Maydell wrote:

On Tue, 8 Mar 2022 at 11:01, Michael S. Tsirkin  wrote:


On Tue, Mar 08, 2022 at 09:05:27AM +, Peter Maydell wrote:

On Mon, 7 Mar 2022 at 22:52, Michael S. Tsirkin  wrote:



Now, I could maybe get behind this if it simply warned about a cast that
loses information (cast to a smaller integer) or integer/pointer cast
that does not go through uintptr_t without regard to size.


This *is* warning about losing information. On 64-bit Windows
pointers are 64 bits but 'long' is 32 bits, so the path
pointer -> long -> uint64_t drops the top half of the pointer.



Yes obviously. My point is that this:
(uint64_t)hdev->vqs[queue].avail
is always harmless but it warns on a 32 bit system.


True, I suppose. But compiler warnings are often like that: we
take the hit of having to tweak some things we know to be OK in
order to catch the real bugs in other cases.


And someone trying to fix that *is* what resulted in
(uint64_t)(unsigned long)hdev->vqs[queue].avail


Using 'unsigned long' in a cast (or anything else) is often
the wrong thing in QEMU...


$ git grep -F '(unsigned long)' | wc -l
 273

Ouch :/

These require cleanup:

target/i386/sev.c:170:input.data = (__u64)(unsigned long)data;
target/i386/sev.c:188:arg.data = (unsigned long)data;
target/i386/sev.c:243:range.addr = (__u64)(unsigned long)host;
target/i386/sev.c:273:range.addr = (__u64)(unsigned long)host;
target/i386/sev.c:730:update.uaddr = (__u64)(unsigned long)addr;

And we might add a Gitlab issue to look at the hw/ ones:

$ git grep -F '(unsigned long)' hw | wc -l
  76

Re: [PATCH v2 1/3] util/osdep: Avoid madvise proto on modern Solaris

2022-03-15 Thread Peter Maydell

On Tue, 15 Mar 2022 at 02:20, Andrew Deason  wrote:
>
> On older Solaris releases, we didn't get a protype for madvise, and so
> util/osdep.c provides its own prototype. Some time between the public
> Solaris 11.4 release and Solaris 11.4.42 CBE, we started getting an
> madvise prototype that looks like this:
>
> extern int madvise(void *, size_t, int);
>
> which conflicts with the prototype in util/osdeps.c. Instead of always
> declaring this prototype, check if we're missing the madvise()
> prototype, and only declare it ourselves if the prototype is missing.
>
> The 'missing_madvise_proto' meson check contains an obviously wrong
> prototype for madvise. So if that code compiles and links, we must be
> missing the actual prototype for madvise.
>
> Signed-off-by: Andrew Deason 
> ---
> To be clear, I'm okay with removing the prototype workaround
> unconditionally; I'm just not sure if there's enough consensus on doing
> that.
>
> The "missing prototype" check is based on getting a compiler error on a
> conflicting prototype, since this just seems more precise and certain
> than getting an error from a missing prototype (needs
> -Werror=missing-prototypes or -Werror). But I can do it the other way
> around if needed.

Seems a reasonable approach to me.

> Changes since v1:
> - madvise prototype check changed to not be platforms-specific, and turned 
> into
>   CONFIG_MADVISE_MISSING_PROTOTYPE.
>
>  meson.build  | 17 +++--
>  util/osdep.c |  3 +++
>  2 files changed, 18 insertions(+), 2 deletions(-)
>
> diff --git a/meson.build b/meson.build
> index 2d6601467f..ff5fce693e 100644
> --- a/meson.build
> +++ b/meson.build
> @@ -1705,25 +1705,38 @@ config_host_data.set('CONFIG_EVENTFD', cc.links('''
>int main(void) { return eventfd(0, EFD_NONBLOCK | EFD_CLOEXEC); }'''))
>  config_host_data.set('CONFIG_FDATASYNC', cc.links(gnu_source_prefix + '''
>#include 
>int main(void) {
>#if defined(_POSIX_SYNCHRONIZED_IO) && _POSIX_SYNCHRONIZED_IO > 0
>return fdatasync(0);
>#else
>#error Not supported
>#endif
>}'''))
> -config_host_data.set('CONFIG_MADVISE', cc.links(gnu_source_prefix + '''
> +
> +has_madvise = cc.links(gnu_source_prefix + '''
>#include 
>#include 
>#include 
> -  int main(void) { return madvise(NULL, 0, MADV_DONTNEED); }'''))
> +  int main(void) { return madvise(NULL, 0, MADV_DONTNEED); }''')

Since this is a little bit tricky, I think a comment here will help
future readers:

# Older Solaris/Illumos provide madvise() but forget to prototype it.
# In this case has_madvise will be true (the test program links despite
# a compile warning). To detect the missing-prototype case, we try
# again with a definitely-bogus prototype. This will only compile
# if the system headers don't provide the prototype; otherwise the
# conflicting prototypes will cause a compiler error.

> +missing_madvise_proto = false
> +if has_madvise
> +  missing_madvise_proto = cc.links(gnu_source_prefix + '''
> +#include 
> +#include 
> +#include 
> +extern int madvise(int);
> +int main(void) { return madvise(0); }''')
> +endif
> +config_host_data.set('CONFIG_MADVISE', has_madvise)
> +config_host_data.set('CONFIG_MADVISE_MISSING_PROTOTYPE', 
> missing_madvise_proto)

> +#ifdef HAVE_MADVISE_MISSING_PROTOTYPE
>  /* See MySQL bug #7156 (http://bugs.mysql.com/bug.php?id=7156) for
> discussion about Solaris header problems */
>  extern int madvise(char *, size_t, int);
>  #endif

As you note, this doesn't match the name we picked in meson.build.
I don't feel very strongly about the name (we certainly don't manage
consistency across the project about CONFIG_ vs HAVE_ !), but my suggestion
is HAVE_MADVISE_WITHOUT_PROTOTYPE.

Can you put the prototype in include/qemu/osdep.h, please?
(Exact location not very important as long as it's inside
the extern-C block, but I suggest just under the bit where we
define SIGIO for __HAIKU__.)

This means moving the comment, which will then want fixing up to
our coding style, which these days is
 /*
  * line 1
  * line 2
  */

for multi-line comments.

thanks
-- PMM

Re: [PATCH 3/3] linux-user/arm: Implement __kernel_cmpxchg64 with host atomics

2022-03-15 Thread Richard Henderson


On 3/15/22 11:18, Peter Maydell wrote:

-segv:
-end_exclusive();
-/* We get the PC of the entry address - which is as good as anything,
-   on a real kernel what you get depends on which mode it uses. */


This comment about the PC the guest signal handler is going
to see when we take the SEGV is still valid, I think ?


Yes.  I guess I could move it to the block comment in front of atomic_mmu_lookup, because 
it would apply to both the SEGV and the BUS raised there.



r~

Re: [PATCH v2 3/3] util/osdep: Remove some early cruft

2022-03-15 Thread Peter Maydell

On Tue, 15 Mar 2022 at 02:20, Andrew Deason  wrote:
>
> The include for statvfs.h has not been needed since all statvfs calls
> were removed in commit 4a1418e07bdc ("Unbreak large mem support by
> removing kqemu").
>
> The comment mentioning CONFIG_BSD hasn't made sense since an include
> for config-host.h was removed in commit aafd75841001 ("util: Clean up
> includes").
>
> Remove this cruft.
>
> Signed-off-by: Andrew Deason 
> ---
Reviewed-by: Peter Maydell 

thanks
-- PMM

Re: [PATCH v5] target/riscv: Add isa extenstion strings to the device tree

2022-03-15 Thread Atish Kumar Patra

On Tue, Mar 15, 2022 at 2:17 AM Bin Meng  wrote:
>
> On Tue, Mar 15, 2022 at 7:43 AM Atish Patra  wrote:
> >
> > The Linux kernel parses the ISA extensions from "riscv,isa" DT
> > property. It used to parse only the single letter base extensions
> > until now. A generic ISA extension parsing framework was proposed[1]
> > recently that can parse multi-letter ISA extensions as well.
> >
> > Generate the extended ISA string by appending  the available ISA extensions
>
> nits: remove one space after "appending"
>

Will fix it.

> > to the "riscv,isa" string if it is enabled so that kernel can process it.
> >
> > [1] https://lkml.org/lkml/2022/2/15/263
>
> Could you please post a link to the "riscv,isa" DT bindings spec or
> discussion thread? It seems not mentioned in the above LKML thread.
>

Latest discussion on the discussion:
https://lkml.org/lkml/2022/3/10/1416

riscv,isa DT binding:
https://elixir.bootlin.com/linux/v5.17-rc8/source/Documentation/devicetree/bindings/riscv/cpus.yaml#L66

> >
> > Reviewed-by: Anup Patel 
> > Reviewed-by: Alistair Francis 
> > Suggested-by: Heiko Stubner 
> > Signed-off-by: Atish Patra 
> > ---
> > Changes from v4->v5:
> > 1. Fixed the order of Zxx extensions.
> > 2. Added a comment clearly describing the rules of extension order.
> >
> > Changes from v3->v4:
> > 1. Fixed the order of the extension names.
> > 2. Added all the available ISA extensions in Qemu.
> >
> > Changes from v2->v3:
> > 1. Used g_strconcat to replace snprintf & a max isa string length as
> > suggested by Anup.
> > 2. I have not included the Tested-by Tag from Heiko because the
> > implementation changed from v2 to v3.
> >
> > Changes from v1->v2:
> > 1. Improved the code redability by using arrays instead of individual check
> > ---
> >  target/riscv/cpu.c | 61 ++
> >  1 file changed, 61 insertions(+)
> >
> > diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
> > index ddda4906ffb7..097c42f5c50f 100644
> > --- a/target/riscv/cpu.c
> > +++ b/target/riscv/cpu.c
> > @@ -34,6 +34,12 @@
> >
> >  /* RISC-V CPU definitions */
> >
> > +/* This includes the null terminated character '\0' */
> > +struct isa_ext_data {
> > +const char *name;
> > +bool enabled;
> > +};
> > +
> >  static const char riscv_exts[26] = "IEMAFDQCLBJTPVNSUHKORWXYZG";
> >
> >  const char * const riscv_int_regnames[] = {
> > @@ -898,6 +904,60 @@ static void riscv_cpu_class_init(ObjectClass *c, void 
> > *data)
> >  device_class_set_props(dc, riscv_cpu_properties);
> >  }
> >
> > +#define ISA_EDATA_ENTRY(name, prop) {#name, cpu->cfg.prop}
> > +
> > +static void riscv_isa_string_ext(RISCVCPU *cpu, char **isa_str, int 
> > max_str_len)
> > +{
> > +char *old = *isa_str;
> > +char *new = *isa_str;
> > +int i;
> > +
> > +/**
> > + * Here are the ordering rules of extension naming defined by RISC-V
> > + * specification :
> > + * 1. All extensions should be separated from other multi-letter 
> > extensions
> > + *from other multi-letter extensions by an underscore.
>
> redundant "from other multi-letter extensions"
>

Oops. Will fix it.

> > + * 2. The first letter following the 'Z' conventionally indicates the 
> > most
>
> Should this be lower case "z"?

Nope. I am just iterating the rules defined by the spec. The device
tree has lower case 'z'
as per the device binding which mandates all lower case.

>
> > + *closely related alphabetical extension category, IMAFDQLCBKJTPVH.
> > + *If multiple 'Z' extensions are named, they should be ordered 
> > first
> > + *by category, then alphabetically within a category.
> > + * 3. Standard supervisor-level extensions (starts with 'S') should be
>
> lower case "s"?

Same reasoning as above.

>
> > + *listed after standard unprivileged extensions.  If multiple
> > + *supervisor-level extensions are listed, they should be ordered
> > + *alphabetically.
> > + * 4. Non-standard extensions (starts with 'X') must be listed after 
> > all
> > + *standard extensions. They must be separated from other 
> > multi-letter
> > + *extensions by an underscore.
> > + */
> > +struct isa_ext_data isa_edata_arr[] = {
> > +ISA_EDATA_ENTRY(zfh, ext_zfhmin),
>
> This should be (zfh, ext_zfh)

Yeah. It's a typo. Thanks for catching it.

>
> > +ISA_EDATA_ENTRY(zfhmin, ext_zfhmin),
> > +ISA_EDATA_ENTRY(zfinx, ext_zfinx),
> > +ISA_EDATA_ENTRY(zdinx, ext_zdinx),
>
> Should "zdinx" come before "zfinx" *alphabetically* ?

As per the ISA naming rules,

"The first letter following the 'Z' conventionally indicates the most
closely related alphabetical extension category, IMAFDQLCBKJTPVH.
If multiple 'Z' extensions are named, they should be ordered first
by category, then alphabetically within a category."

That's why, zfinx comes before zdinx.
>
> > +ISA_EDATA_ENTRY(zba, ext_zba),
> > +ISA_EDATA_ENTRY(zbb,

Re: [PATCH 3/3] linux-user/arm: Implement __kernel_cmpxchg64 with host atomics

2022-03-15 Thread Peter Maydell

On Mon, 14 Mar 2022 at 04:46, Richard Henderson
 wrote:
>
> If CONFIG_ATOMIC64, we can use a host cmpxchg and provide
> atomicity across processes; otherwise we have no choice but
> to continue using start/end_exclusive.
>
> Signed-off-by: Richard Henderson 

> -segv:
> -end_exclusive();
> -/* We get the PC of the entry address - which is as good as anything,
> -   on a real kernel what you get depends on which mode it uses. */

This comment about the PC the guest signal handler is going
to see when we take the SEGV is still valid, I think ?

> -/* XXX: check env->error_code */
> -force_sig_fault(TARGET_SIGSEGV, TARGET_SEGV_MAPERR,
> -env->exception.vaddress);
> + segv:
> +force_sig_fault(TARGET_SIGSEGV,
> +page_get_flags(addr) & PAGE_VALID ?
> +TARGET_SEGV_ACCERR : TARGET_SEGV_MAPERR, addr);
>  }

Otherwise
Reviewed-by: Peter Maydell 

thanks
-- PMM

1 2 3 4 >

1 - 100 of 370 matches

Mail list logo