Re: [PATCH v4] mc146818rtc: add a way to generate RTC interrupts via QMP

2024-05-29 Thread Daniil Tatianin

On 5/29/24 4:39 PM, Philippe Mathieu-Daudé wrote:


On 29/5/24 14:43, Daniil Tatianin wrote:

On 5/29/24 3:36 PM, Philippe Mathieu-Daudé wrote:


On 29/5/24 14:03, Markus Armbruster wrote:

Daniil Tatianin  writes:


This can be used to force-synchronize the time in guest after a long
stop-cont pause, which can be useful for serverless-type workload.

Also add a comment to highlight the fact that this (and one other QMP
command) only works for the MC146818 RTC controller.

Acked-by: Philippe Mathieu-Daudé 
Signed-off-by: Daniil Tatianin 
---

Changes since v0:
- Rename to rtc-inject-irq to match other similar API
- Add a comment to highlight that this only works for the I386 RTC

Changes since v1:
- Added a description below the QMP command to explain how it can be
   used and what it does.

Changes since v2:
- Add a 'broadcast' suffix.
- Change the comments to explain the flags we're setting.
- Change the command description to fix styling & explain that 
it's a broadcast command.


Changes since v3:
- Fix checkpatch complaints about usage of C99 comments

---
  hw/rtc/mc146818rtc.c | 20 
  include/hw/rtc/mc146818rtc.h |  1 +
  qapi/misc-target.json    | 19 +++
  3 files changed, 40 insertions(+)




diff --git a/qapi/misc-target.json b/qapi/misc-target.json
index 4e0a6492a9..7d388a3753 100644
--- a/qapi/misc-target.json
+++ b/qapi/misc-target.json
@@ -19,6 +19,25 @@
  { 'command': 'rtc-reset-reinjection',
    'if': 'TARGET_I386' }
  +##
+# @rtc-inject-irq-broadcast:
+#
+# Inject an RTC interrupt for all existing RTCs on the system.
+# The interrupt forces the guest to synchronize the time with RTC.
+# This is useful after a long stop-cont pause, which is common for
+# serverless-type workload.


In previous version you said:

  > This isn't really related to migration though. Serverless is based
  > on constantly stopping and resuming the VM on e.g. every HTTP
  > request to an endpoint.

Which made some sense. Maybe mention HTTP? And point to that use case
(possibly with QMP commands) in the commit description?


Hmm, maybe it would be helpful for people who don't know what 
serverless means.


How about:
 This is useful after a long stop-const pause, which is common 
for serverless-type workloads,
 e.g. stopping/resuming the VM on every HTTP request to an 
endpoint, which might involve
 a long pause in between the requests, causing time drift in the 
guest.


Please help me understand your workflow. Your management layer call
@stop and @cont QMP commands, is that right?


Yes, that is correct.


@cont will emit a @RESUME event.

If we could listen to QAPI events from C code, we could have the
mc146818rtc device automatically sync on VM resume, and no need for
this async command.


Perhaps? I'm not sure how that would be implemented, but let's see what 
Markus has to say.




I'll let our QAPI expert enlighten me on this :)




Re: [PATCH v4] mc146818rtc: add a way to generate RTC interrupts via QMP

2024-05-29 Thread Daniil Tatianin

On 5/29/24 3:36 PM, Philippe Mathieu-Daudé wrote:


On 29/5/24 14:03, Markus Armbruster wrote:

Daniil Tatianin  writes:


This can be used to force-synchronize the time in guest after a long
stop-cont pause, which can be useful for serverless-type workload.

Also add a comment to highlight the fact that this (and one other QMP
command) only works for the MC146818 RTC controller.

Acked-by: Philippe Mathieu-Daudé 
Signed-off-by: Daniil Tatianin 
---

Changes since v0:
- Rename to rtc-inject-irq to match other similar API
- Add a comment to highlight that this only works for the I386 RTC

Changes since v1:
- Added a description below the QMP command to explain how it can be
   used and what it does.

Changes since v2:
- Add a 'broadcast' suffix.
- Change the comments to explain the flags we're setting.
- Change the command description to fix styling & explain that it's 
a broadcast command.


Changes since v3:
- Fix checkpatch complaints about usage of C99 comments

---
  hw/rtc/mc146818rtc.c | 20 
  include/hw/rtc/mc146818rtc.h |  1 +
  qapi/misc-target.json    | 19 +++
  3 files changed, 40 insertions(+)

diff --git a/hw/rtc/mc146818rtc.c b/hw/rtc/mc146818rtc.c
index 3379f92748..96ecd43036 100644
--- a/hw/rtc/mc146818rtc.c
+++ b/hw/rtc/mc146818rtc.c
@@ -107,6 +107,11 @@ static void 
rtc_coalesced_timer_update(MC146818RtcState *s)

  static QLIST_HEAD(, MC146818RtcState) rtc_devices =
  QLIST_HEAD_INITIALIZER(rtc_devices);
  +/*
+ * NOTE:
+ * The two QMP functions below are _only_ implemented for the 
MC146818.

+ * All other RTC devices ignore this.
+ */
  void qmp_rtc_reset_reinjection(Error **errp)
  {
  MC146818RtcState *s;
@@ -116,6 +121,21 @@ void qmp_rtc_reset_reinjection(Error **errp)
  }
  }
  +void qmp_rtc_inject_irq_broadcast(Error **errp)
+{
+    MC146818RtcState *s;
+
+    QLIST_FOREACH(s, _devices, link) {
+    /* Update-ended interrupt enable */
+    s->cmos_data[RTC_REG_B] |= REG_B_UIE;
+
+    /* Interrupt request flag | update interrupt flag */
+    s->cmos_data[RTC_REG_C] |= REG_C_IRQF | REG_C_UF;
+
+    qemu_irq_raise(s->irq);
+    }
+}
+
  static bool rtc_policy_slew_deliver_irq(MC146818RtcState *s)
  {
  kvm_reset_irq_delivered();
diff --git a/include/hw/rtc/mc146818rtc.h 
b/include/hw/rtc/mc146818rtc.h

index 97cec0b3e8..e9dd0f9c72 100644
--- a/include/hw/rtc/mc146818rtc.h
+++ b/include/hw/rtc/mc146818rtc.h
@@ -56,5 +56,6 @@ MC146818RtcState *mc146818_rtc_init(ISABus *bus, 
int base_year,
  void mc146818rtc_set_cmos_data(MC146818RtcState *s, int addr, int 
val);

  int mc146818rtc_get_cmos_data(MC146818RtcState *s, int addr);
  void qmp_rtc_reset_reinjection(Error **errp);
+void qmp_rtc_inject_irq_broadcast(Error **errp);
    #endif /* HW_RTC_MC146818RTC_H */
diff --git a/qapi/misc-target.json b/qapi/misc-target.json
index 4e0a6492a9..7d388a3753 100644
--- a/qapi/misc-target.json
+++ b/qapi/misc-target.json
@@ -19,6 +19,25 @@
  { 'command': 'rtc-reset-reinjection',
    'if': 'TARGET_I386' }
  +##
+# @rtc-inject-irq-broadcast:
+#
+# Inject an RTC interrupt for all existing RTCs on the system.
+# The interrupt forces the guest to synchronize the time with RTC.
+# This is useful after a long stop-cont pause, which is common for
+# serverless-type workload.


In previous version you said:

  > This isn't really related to migration though. Serverless is based
  > on constantly stopping and resuming the VM on e.g. every HTTP
  > request to an endpoint.

Which made some sense. Maybe mention HTTP? And point to that use case
(possibly with QMP commands) in the commit description?


Hmm, maybe it would be helpful for people who don't know what serverless 
means.


How about:
    This is useful after a long stop-const pause, which is common for 
serverless-type workloads,
    e.g. stopping/resuming the VM on every HTTP request to an endpoint, 
which might involve

    a long pause in between the requests, causing time drift in the guest.


Make that "workloads".

"For all existing RTCs" is a lie.  It's really just all mc146818s.  The
command works as documented only as long as the VM has no other RTCs.


@rtc-mc146818-force-sync-broadcast sounds clearer & safer;
IIUC the command goal, mentioning IRQ injection is irrelevant
(implementation detail).

I like this if Markus is okay with that. If we go with this, would it 
make sense to drop the "Bug" clause?



(I'm trying to not spread the problems we already have with
@rtc-reset-reinjection).


+#
+# Since: 9.1
+#
+# Example:
+#
+# -> { "execute": "rtc-inject-irq-broadcast" }
+# <- { "return": {} }
+#
+##
+{ 'command': 'rtc-inject-irq-broadcast',
+  'if': 'TARGET_I386' }


The conditional kind-of-sort-of ensures "VM has no other RTCs":
TARGET_I386 compiles only this file in hw/rtc/, and therefore can't have
other RTCs (unless they're hiding in some other

Re: [PATCH v4] mc146818rtc: add a way to generate RTC interrupts via QMP

2024-05-29 Thread Daniil Tatianin

Thanks for the review Markus!

I will fix the wording and add a "Bug:" clause for the next revision.

On 5/29/24 3:03 PM, Markus Armbruster wrote:

Daniil Tatianin  writes:


This can be used to force-synchronize the time in guest after a long
stop-cont pause, which can be useful for serverless-type workload.

Also add a comment to highlight the fact that this (and one other QMP
command) only works for the MC146818 RTC controller.

Acked-by: Philippe Mathieu-Daudé 
Signed-off-by: Daniil Tatianin 
---

Changes since v0:
- Rename to rtc-inject-irq to match other similar API
- Add a comment to highlight that this only works for the I386 RTC

Changes since v1:
- Added a description below the QMP command to explain how it can be
   used and what it does.

Changes since v2:
- Add a 'broadcast' suffix.
- Change the comments to explain the flags we're setting.
- Change the command description to fix styling & explain that it's a broadcast 
command.

Changes since v3:
- Fix checkpatch complaints about usage of C99 comments

---
  hw/rtc/mc146818rtc.c | 20 
  include/hw/rtc/mc146818rtc.h |  1 +
  qapi/misc-target.json| 19 +++
  3 files changed, 40 insertions(+)

diff --git a/hw/rtc/mc146818rtc.c b/hw/rtc/mc146818rtc.c
index 3379f92748..96ecd43036 100644
--- a/hw/rtc/mc146818rtc.c
+++ b/hw/rtc/mc146818rtc.c
@@ -107,6 +107,11 @@ static void rtc_coalesced_timer_update(MC146818RtcState *s)
  static QLIST_HEAD(, MC146818RtcState) rtc_devices =
  QLIST_HEAD_INITIALIZER(rtc_devices);
  
+/*

+ * NOTE:
+ * The two QMP functions below are _only_ implemented for the MC146818.
+ * All other RTC devices ignore this.
+ */
  void qmp_rtc_reset_reinjection(Error **errp)
  {
  MC146818RtcState *s;
@@ -116,6 +121,21 @@ void qmp_rtc_reset_reinjection(Error **errp)
  }
  }
  
+void qmp_rtc_inject_irq_broadcast(Error **errp)

+{
+MC146818RtcState *s;
+
+QLIST_FOREACH(s, _devices, link) {
+/* Update-ended interrupt enable */
+s->cmos_data[RTC_REG_B] |= REG_B_UIE;
+
+/* Interrupt request flag | update interrupt flag */
+s->cmos_data[RTC_REG_C] |= REG_C_IRQF | REG_C_UF;
+
+qemu_irq_raise(s->irq);
+}
+}
+
  static bool rtc_policy_slew_deliver_irq(MC146818RtcState *s)
  {
  kvm_reset_irq_delivered();
diff --git a/include/hw/rtc/mc146818rtc.h b/include/hw/rtc/mc146818rtc.h
index 97cec0b3e8..e9dd0f9c72 100644
--- a/include/hw/rtc/mc146818rtc.h
+++ b/include/hw/rtc/mc146818rtc.h
@@ -56,5 +56,6 @@ MC146818RtcState *mc146818_rtc_init(ISABus *bus, int 
base_year,
  void mc146818rtc_set_cmos_data(MC146818RtcState *s, int addr, int val);
  int mc146818rtc_get_cmos_data(MC146818RtcState *s, int addr);
  void qmp_rtc_reset_reinjection(Error **errp);
+void qmp_rtc_inject_irq_broadcast(Error **errp);
  
  #endif /* HW_RTC_MC146818RTC_H */

diff --git a/qapi/misc-target.json b/qapi/misc-target.json
index 4e0a6492a9..7d388a3753 100644
--- a/qapi/misc-target.json
+++ b/qapi/misc-target.json
@@ -19,6 +19,25 @@
  { 'command': 'rtc-reset-reinjection',
'if': 'TARGET_I386' }
  
+##

+# @rtc-inject-irq-broadcast:
+#
+# Inject an RTC interrupt for all existing RTCs on the system.
+# The interrupt forces the guest to synchronize the time with RTC.
+# This is useful after a long stop-cont pause, which is common for
+# serverless-type workload.

Make that "workloads".

"For all existing RTCs" is a lie.  It's really just all mc146818s.  The
command works as documented only as long as the VM has no other RTCs.


+#
+# Since: 9.1
+#
+# Example:
+#
+# -> { "execute": "rtc-inject-irq-broadcast" }
+# <- { "return": {} }
+#
+##
+{ 'command': 'rtc-inject-irq-broadcast',
+  'if': 'TARGET_I386' }

The conditional kind-of-sort-of ensures "VM has no other RTCs":
TARGET_I386 compiles only this file in hw/rtc/, and therefore can't have
other RTCs (unless they're hiding in some other directory).  Brittle.

When we move to single binary, we will compile in other RTCs.  How can
we ensure "VM has no nother RTCs" then?  What if one of these other RTCs
can be added with -device or device_add?

When this falls apart because the VM does have other RTCs, it can only
do so silently: the command can't tell us for which RTCs it actually
injected an interrupt.

Documentation making promises the implementation doesn't actually
deliver can only end in tears.  The only reason I'm not rejecting this
patch out of hand is the existing and similarly broken
rtc-reset-reinjection.

I'm willing to reluctantly accept it with honest documentation.
Perhaps: "Bug: RTCs other than mc146818rtc are silently ignored."

Much, much better would be an interface that's actually usable with
multiple RTCs.  We'd have to talk how interrupt injection could be used
with such a machine.

Anything less will likely need to be replaced later on.


+
  ##
  # @SevState:
  #




[PATCH v4] mc146818rtc: add a way to generate RTC interrupts via QMP

2024-05-28 Thread Daniil Tatianin
This can be used to force-synchronize the time in guest after a long
stop-cont pause, which can be useful for serverless-type workload.

Also add a comment to highlight the fact that this (and one other QMP
command) only works for the MC146818 RTC controller.

Acked-by: Philippe Mathieu-Daudé 
Signed-off-by: Daniil Tatianin 
---

Changes since v0:
- Rename to rtc-inject-irq to match other similar API
- Add a comment to highlight that this only works for the I386 RTC

Changes since v1:
- Added a description below the QMP command to explain how it can be
  used and what it does.

Changes since v2:
- Add a 'broadcast' suffix.
- Change the comments to explain the flags we're setting.
- Change the command description to fix styling & explain that it's a broadcast 
command.

Changes since v3:
- Fix checkpatch complaints about usage of C99 comments

---
 hw/rtc/mc146818rtc.c | 20 
 include/hw/rtc/mc146818rtc.h |  1 +
 qapi/misc-target.json| 19 +++
 3 files changed, 40 insertions(+)

diff --git a/hw/rtc/mc146818rtc.c b/hw/rtc/mc146818rtc.c
index 3379f92748..96ecd43036 100644
--- a/hw/rtc/mc146818rtc.c
+++ b/hw/rtc/mc146818rtc.c
@@ -107,6 +107,11 @@ static void rtc_coalesced_timer_update(MC146818RtcState *s)
 static QLIST_HEAD(, MC146818RtcState) rtc_devices =
 QLIST_HEAD_INITIALIZER(rtc_devices);
 
+/*
+ * NOTE:
+ * The two QMP functions below are _only_ implemented for the MC146818.
+ * All other RTC devices ignore this.
+ */
 void qmp_rtc_reset_reinjection(Error **errp)
 {
 MC146818RtcState *s;
@@ -116,6 +121,21 @@ void qmp_rtc_reset_reinjection(Error **errp)
 }
 }
 
+void qmp_rtc_inject_irq_broadcast(Error **errp)
+{
+MC146818RtcState *s;
+
+QLIST_FOREACH(s, _devices, link) {
+/* Update-ended interrupt enable */
+s->cmos_data[RTC_REG_B] |= REG_B_UIE;
+
+/* Interrupt request flag | update interrupt flag */
+s->cmos_data[RTC_REG_C] |= REG_C_IRQF | REG_C_UF;
+
+qemu_irq_raise(s->irq);
+}
+}
+
 static bool rtc_policy_slew_deliver_irq(MC146818RtcState *s)
 {
 kvm_reset_irq_delivered();
diff --git a/include/hw/rtc/mc146818rtc.h b/include/hw/rtc/mc146818rtc.h
index 97cec0b3e8..e9dd0f9c72 100644
--- a/include/hw/rtc/mc146818rtc.h
+++ b/include/hw/rtc/mc146818rtc.h
@@ -56,5 +56,6 @@ MC146818RtcState *mc146818_rtc_init(ISABus *bus, int 
base_year,
 void mc146818rtc_set_cmos_data(MC146818RtcState *s, int addr, int val);
 int mc146818rtc_get_cmos_data(MC146818RtcState *s, int addr);
 void qmp_rtc_reset_reinjection(Error **errp);
+void qmp_rtc_inject_irq_broadcast(Error **errp);
 
 #endif /* HW_RTC_MC146818RTC_H */
diff --git a/qapi/misc-target.json b/qapi/misc-target.json
index 4e0a6492a9..7d388a3753 100644
--- a/qapi/misc-target.json
+++ b/qapi/misc-target.json
@@ -19,6 +19,25 @@
 { 'command': 'rtc-reset-reinjection',
   'if': 'TARGET_I386' }
 
+##
+# @rtc-inject-irq-broadcast:
+#
+# Inject an RTC interrupt for all existing RTCs on the system.
+# The interrupt forces the guest to synchronize the time with RTC.
+# This is useful after a long stop-cont pause, which is common for
+# serverless-type workload.
+#
+# Since: 9.1
+#
+# Example:
+#
+# -> { "execute": "rtc-inject-irq-broadcast" }
+# <- { "return": {} }
+#
+##
+{ 'command': 'rtc-inject-irq-broadcast',
+  'if': 'TARGET_I386' }
+
 ##
 # @SevState:
 #
-- 
2.34.1




Re: [PATCH v3] mc146818rtc: add a way to generate RTC interrupts via QMP

2024-05-28 Thread Daniil Tatianin

On 5/27/24 8:01 PM, Philippe Mathieu-Daudé wrote:


Hi Daniil,

On 21/5/24 10:08, Daniil Tatianin wrote:
Could you please take a look at this revision? I think I've taken 
everyone's feedback into account.


Sorry for the delay, I missed your patch since you didn't Cc me
(Markus asked me to look at this).


Oops, my bad for forgetting to Cc you!



Thanks for addressing the previous requests.


Thank you!

On 5/14/24 9:57 AM, Daniil Tatianin wrote:

ping :)
06.05.2024, 11:34, "Daniil Tatianin" :

    This can be used to force-synchronize the time in guest after a 
long

    stop-cont pause, which can be useful for serverless-type workload.

    Also add a comment to highlight the fact that this (and one 
other QMP

    command) only works for the MC146818 RTC controller.

    Signed-off-by: Daniil Tatianin 
    ---

    Changes since v0:
    - Rename to rtc-inject-irq to match other similar API
    - Add a comment to highlight that this only works for the I386 RTC

    Changes since v1:
    - Added a description below the QMP command to explain how it 
can be

      used and what it does.

    Changes since v2:
    - Add a 'broadcast' suffix.
    - Change the comments to explain the flags we're setting.
    - Change the command description to fix styling & explain that
    it's a broadcast command.

    ---
     hw/rtc/mc146818rtc.c | 20 
     include/hw/rtc/mc146818rtc.h | 1 +
     qapi/misc-target.json | 19 +++
     3 files changed, 40 insertions(+)

    diff --git a/hw/rtc/mc146818rtc.c b/hw/rtc/mc146818rtc.c
    index 3379f92748..2b3754f5c6 100644
    --- a/hw/rtc/mc146818rtc.c
    +++ b/hw/rtc/mc146818rtc.c
    @@ -107,6 +107,11 @@ static void
    rtc_coalesced_timer_update(MC146818RtcState *s)
     static QLIST_HEAD(, MC146818RtcState) rtc_devices =
     QLIST_HEAD_INITIALIZER(rtc_devices);

    +/*
    + * NOTE:
    + * The two QMP functions below are _only_ implemented for the
    MC146818.
    + * All other RTC devices ignore this.
    + */
     void qmp_rtc_reset_reinjection(Error **errp)
     {
     MC146818RtcState *s;
    @@ -116,6 +121,21 @@ void qmp_rtc_reset_reinjection(Error **errp)
     }
     }

    +void qmp_rtc_inject_irq_broadcast(Error **errp)
    +{
    + MC146818RtcState *s;
    +
    + QLIST_FOREACH(s, _devices, link) {
    + // Update-ended interrupt enable


This doesn't pass the checkpatch script because it isn't QEMU coding
style:
https://www.qemu.org/docs/master/devel/submitting-a-patch.html#use-the-qemu-coding-style 




Looks like // comments are not allowed, will fix.


    + s->cmos_data[RTC_REG_B] |= REG_B_UIE;
    +
    + // Interrupt request flag | update interrupt flag
    + s->cmos_data[RTC_REG_C] |= REG_C_IRQF | REG_C_UF;
    +
    + qemu_irq_raise(s->irq);
    + }
    +}
    +
     static bool rtc_policy_slew_deliver_irq(MC146818RtcState *s)
     {
     kvm_reset_irq_delivered();
    diff --git a/include/hw/rtc/mc146818rtc.h
    b/include/hw/rtc/mc146818rtc.h
    index 97cec0b3e8..e9dd0f9c72 100644
    --- a/include/hw/rtc/mc146818rtc.h
    +++ b/include/hw/rtc/mc146818rtc.h
    @@ -56,5 +56,6 @@ MC146818RtcState *mc146818_rtc_init(ISABus *bus,
    int base_year,
     void mc146818rtc_set_cmos_data(MC146818RtcState *s, int addr, int
    val);
     int mc146818rtc_get_cmos_data(MC146818RtcState *s, int addr);
     void qmp_rtc_reset_reinjection(Error **errp);
    +void qmp_rtc_inject_irq_broadcast(Error **errp);

     #endif /* HW_RTC_MC146818RTC_H */
    diff --git a/qapi/misc-target.json b/qapi/misc-target.json
    index 4e0a6492a9..7d388a3753 100644
    --- a/qapi/misc-target.json
    +++ b/qapi/misc-target.json
    @@ -19,6 +19,25 @@
     { 'command': 'rtc-reset-reinjection',
       'if': 'TARGET_I386' }



Your new command doesn't make my life harder than the current
'rtc-reset-reinjection' command, so if this is useful to you,
I'm OK to:
Acked-by: Philippe Mathieu-Daudé 
(assuming the comment style is fixed).

I'll see later how to deal with that with heterogeneous
emulation.

Regards,

Phil.


Thank you!


    +##
    +# @rtc-inject-irq-broadcast:
    +#
    +# Inject an RTC interrupt for all existing RTCs on the system.
    +# The interrupt forces the guest to synchronize the time with RTC.
    +# This is useful after a long stop-cont pause, which is common for
    +# serverless-type workload.
    +#
    +# Since: 9.1
    +#
    +# Example:
    +#
    +# -> { "execute": "rtc-inject-irq-broadcast" }
    +# <- { "return": {} }
    +#
    +##
    +{ 'command': 'rtc-inject-irq-broadcast',
    + 'if': 'TARGET_I386' }
    +
     ##
     # @SevState:
     #

    --
    2.34.1







Re: [PATCH v3] mc146818rtc: add a way to generate RTC interrupts via QMP

2024-05-21 Thread Daniil Tatianin
Could you please take a look at this revision? I think I've taken 
everyone's feedback into account.


Thank you!

On 5/14/24 9:57 AM, Daniil Tatianin wrote:

ping :)
06.05.2024, 11:34, "Daniil Tatianin" :

This can be used to force-synchronize the time in guest after a long
stop-cont pause, which can be useful for serverless-type workload.

Also add a comment to highlight the fact that this (and one other QMP
command) only works for the MC146818 RTC controller.

Signed-off-by: Daniil Tatianin 
---

Changes since v0:
- Rename to rtc-inject-irq to match other similar API
- Add a comment to highlight that this only works for the I386 RTC

Changes since v1:
- Added a description below the QMP command to explain how it can be
  used and what it does.

Changes since v2:
- Add a 'broadcast' suffix.
- Change the comments to explain the flags we're setting.
- Change the command description to fix styling & explain that
it's a broadcast command.

---
 hw/rtc/mc146818rtc.c | 20 
 include/hw/rtc/mc146818rtc.h | 1 +
 qapi/misc-target.json | 19 +++
 3 files changed, 40 insertions(+)

diff --git a/hw/rtc/mc146818rtc.c b/hw/rtc/mc146818rtc.c
index 3379f92748..2b3754f5c6 100644
--- a/hw/rtc/mc146818rtc.c
+++ b/hw/rtc/mc146818rtc.c
@@ -107,6 +107,11 @@ static void
rtc_coalesced_timer_update(MC146818RtcState *s)
 static QLIST_HEAD(, MC146818RtcState) rtc_devices =
 QLIST_HEAD_INITIALIZER(rtc_devices);

+/*
+ * NOTE:
+ * The two QMP functions below are _only_ implemented for the
MC146818.
+ * All other RTC devices ignore this.
+ */
 void qmp_rtc_reset_reinjection(Error **errp)
 {
 MC146818RtcState *s;
@@ -116,6 +121,21 @@ void qmp_rtc_reset_reinjection(Error **errp)
 }
 }

+void qmp_rtc_inject_irq_broadcast(Error **errp)
+{
+ MC146818RtcState *s;
+
+ QLIST_FOREACH(s, _devices, link) {
+ // Update-ended interrupt enable
+ s->cmos_data[RTC_REG_B] |= REG_B_UIE;
+
+ // Interrupt request flag | update interrupt flag
+ s->cmos_data[RTC_REG_C] |= REG_C_IRQF | REG_C_UF;
+
+ qemu_irq_raise(s->irq);
+ }
+}
+
 static bool rtc_policy_slew_deliver_irq(MC146818RtcState *s)
 {
 kvm_reset_irq_delivered();
diff --git a/include/hw/rtc/mc146818rtc.h
b/include/hw/rtc/mc146818rtc.h
index 97cec0b3e8..e9dd0f9c72 100644
--- a/include/hw/rtc/mc146818rtc.h
+++ b/include/hw/rtc/mc146818rtc.h
@@ -56,5 +56,6 @@ MC146818RtcState *mc146818_rtc_init(ISABus *bus,
int base_year,
 void mc146818rtc_set_cmos_data(MC146818RtcState *s, int addr, int
val);
 int mc146818rtc_get_cmos_data(MC146818RtcState *s, int addr);
 void qmp_rtc_reset_reinjection(Error **errp);
+void qmp_rtc_inject_irq_broadcast(Error **errp);

 #endif /* HW_RTC_MC146818RTC_H */
diff --git a/qapi/misc-target.json b/qapi/misc-target.json
index 4e0a6492a9..7d388a3753 100644
--- a/qapi/misc-target.json
+++ b/qapi/misc-target.json
@@ -19,6 +19,25 @@
 { 'command': 'rtc-reset-reinjection',
   'if': 'TARGET_I386' }

+##
+# @rtc-inject-irq-broadcast:
+#
+# Inject an RTC interrupt for all existing RTCs on the system.
+# The interrupt forces the guest to synchronize the time with RTC.
+# This is useful after a long stop-cont pause, which is common for
+# serverless-type workload.
+#
+# Since: 9.1
+#
+# Example:
+#
+# -> { "execute": "rtc-inject-irq-broadcast" }
+# <- { "return": {} }
+#
+##
+{ 'command': 'rtc-inject-irq-broadcast',
+ 'if': 'TARGET_I386' }
+
 ##
 # @SevState:
 #

--
2.34.1


Re: [PATCH v3] mc146818rtc: add a way to generate RTC interrupts via QMP

2024-05-14 Thread Daniil Tatianin
ping :) 06.05.2024, 11:34, "Daniil Tatianin" :This can be used to force-synchronize the time in guest after a longstop-cont pause, which can be useful for serverless-type workload.Also add a comment to highlight the fact that this (and one other QMPcommand) only works for the MC146818 RTC controller.Signed-off-by: Daniil Tatianin <d-tatia...@yandex-team.ru>---Changes since v0:- Rename to rtc-inject-irq to match other similar API- Add a comment to highlight that this only works for the I386 RTCChanges since v1:- Added a description below the QMP command to explain how it can be  used and what it does.Changes since v2:- Add a 'broadcast' suffix.- Change the comments to explain the flags we're setting.- Change the command description to fix styling & explain that it's a broadcast command.--- hw/rtc/mc146818rtc.c | 20  include/hw/rtc/mc146818rtc.h | 1 + qapi/misc-target.json | 19 +++ 3 files changed, 40 insertions(+)diff --git a/hw/rtc/mc146818rtc.c b/hw/rtc/mc146818rtc.cindex 3379f92748..2b3754f5c6 100644--- a/hw/rtc/mc146818rtc.c+++ b/hw/rtc/mc146818rtc.c@@ -107,6 +107,11 @@ static void rtc_coalesced_timer_update(MC146818RtcState *s) static QLIST_HEAD(, MC146818RtcState) rtc_devices = QLIST_HEAD_INITIALIZER(rtc_devices); +/*+ * NOTE:+ * The two QMP functions below are _only_ implemented for the MC146818.+ * All other RTC devices ignore this.+ */ void qmp_rtc_reset_reinjection(Error **errp) { MC146818RtcState *s;@@ -116,6 +121,21 @@ void qmp_rtc_reset_reinjection(Error **errp) } } +void qmp_rtc_inject_irq_broadcast(Error **errp)+{+ MC146818RtcState *s;++ QLIST_FOREACH(s, _devices, link) {+ // Update-ended interrupt enable+ s->cmos_data[RTC_REG_B] |= REG_B_UIE;++ // Interrupt request flag | update interrupt flag+ s->cmos_data[RTC_REG_C] |= REG_C_IRQF | REG_C_UF;++ qemu_irq_raise(s->irq);+ }+}+ static bool rtc_policy_slew_deliver_irq(MC146818RtcState *s) { kvm_reset_irq_delivered();diff --git a/include/hw/rtc/mc146818rtc.h b/include/hw/rtc/mc146818rtc.hindex 97cec0b3e8..e9dd0f9c72 100644--- a/include/hw/rtc/mc146818rtc.h+++ b/include/hw/rtc/mc146818rtc.h@@ -56,5 +56,6 @@ MC146818RtcState *mc146818_rtc_init(ISABus *bus, int base_year, void mc146818rtc_set_cmos_data(MC146818RtcState *s, int addr, int val); int mc146818rtc_get_cmos_data(MC146818RtcState *s, int addr); void qmp_rtc_reset_reinjection(Error **errp);+void qmp_rtc_inject_irq_broadcast(Error **errp);  #endif /* HW_RTC_MC146818RTC_H */diff --git a/qapi/misc-target.json b/qapi/misc-target.jsonindex 4e0a6492a9..7d388a3753 100644--- a/qapi/misc-target.json+++ b/qapi/misc-target.json@@ -19,6 +19,25 @@ { 'command': 'rtc-reset-reinjection',   'if': 'TARGET_I386' } +##+# @rtc-inject-irq-broadcast:+#+# Inject an RTC interrupt for all existing RTCs on the system.+# The interrupt forces the guest to synchronize the time with RTC.+# This is useful after a long stop-cont pause, which is common for+# serverless-type workload.+#+# Since: 9.1+#+# Example:+#+# -> { "execute": "rtc-inject-irq-broadcast" }+# <- { "return": {} }+#+##+{ 'command': 'rtc-inject-irq-broadcast',+ 'if': 'TARGET_I386' }+ ## # @SevState: #--2.34.1 

[PATCH v3] mc146818rtc: add a way to generate RTC interrupts via QMP

2024-05-06 Thread Daniil Tatianin
This can be used to force-synchronize the time in guest after a long
stop-cont pause, which can be useful for serverless-type workload.

Also add a comment to highlight the fact that this (and one other QMP
command) only works for the MC146818 RTC controller.

Signed-off-by: Daniil Tatianin 
---

Changes since v0:
- Rename to rtc-inject-irq to match other similar API
- Add a comment to highlight that this only works for the I386 RTC

Changes since v1:
- Added a description below the QMP command to explain how it can be
  used and what it does.

Changes since v2:
- Add a 'broadcast' suffix.
- Change the comments to explain the flags we're setting.
- Change the command description to fix styling & explain that it's a broadcast 
command.

---
 hw/rtc/mc146818rtc.c | 20 
 include/hw/rtc/mc146818rtc.h |  1 +
 qapi/misc-target.json| 19 +++
 3 files changed, 40 insertions(+)

diff --git a/hw/rtc/mc146818rtc.c b/hw/rtc/mc146818rtc.c
index 3379f92748..2b3754f5c6 100644
--- a/hw/rtc/mc146818rtc.c
+++ b/hw/rtc/mc146818rtc.c
@@ -107,6 +107,11 @@ static void rtc_coalesced_timer_update(MC146818RtcState *s)
 static QLIST_HEAD(, MC146818RtcState) rtc_devices =
 QLIST_HEAD_INITIALIZER(rtc_devices);
 
+/*
+ * NOTE:
+ * The two QMP functions below are _only_ implemented for the MC146818.
+ * All other RTC devices ignore this.
+ */
 void qmp_rtc_reset_reinjection(Error **errp)
 {
 MC146818RtcState *s;
@@ -116,6 +121,21 @@ void qmp_rtc_reset_reinjection(Error **errp)
 }
 }
 
+void qmp_rtc_inject_irq_broadcast(Error **errp)
+{
+MC146818RtcState *s;
+
+QLIST_FOREACH(s, _devices, link) {
+// Update-ended interrupt enable
+s->cmos_data[RTC_REG_B] |= REG_B_UIE;
+
+// Interrupt request flag | update interrupt flag
+s->cmos_data[RTC_REG_C] |= REG_C_IRQF | REG_C_UF;
+
+qemu_irq_raise(s->irq);
+}
+}
+
 static bool rtc_policy_slew_deliver_irq(MC146818RtcState *s)
 {
 kvm_reset_irq_delivered();
diff --git a/include/hw/rtc/mc146818rtc.h b/include/hw/rtc/mc146818rtc.h
index 97cec0b3e8..e9dd0f9c72 100644
--- a/include/hw/rtc/mc146818rtc.h
+++ b/include/hw/rtc/mc146818rtc.h
@@ -56,5 +56,6 @@ MC146818RtcState *mc146818_rtc_init(ISABus *bus, int 
base_year,
 void mc146818rtc_set_cmos_data(MC146818RtcState *s, int addr, int val);
 int mc146818rtc_get_cmos_data(MC146818RtcState *s, int addr);
 void qmp_rtc_reset_reinjection(Error **errp);
+void qmp_rtc_inject_irq_broadcast(Error **errp);
 
 #endif /* HW_RTC_MC146818RTC_H */
diff --git a/qapi/misc-target.json b/qapi/misc-target.json
index 4e0a6492a9..7d388a3753 100644
--- a/qapi/misc-target.json
+++ b/qapi/misc-target.json
@@ -19,6 +19,25 @@
 { 'command': 'rtc-reset-reinjection',
   'if': 'TARGET_I386' }
 
+##
+# @rtc-inject-irq-broadcast:
+#
+# Inject an RTC interrupt for all existing RTCs on the system.
+# The interrupt forces the guest to synchronize the time with RTC.
+# This is useful after a long stop-cont pause, which is common for
+# serverless-type workload.
+#
+# Since: 9.1
+#
+# Example:
+#
+# -> { "execute": "rtc-inject-irq-broadcast" }
+# <- { "return": {} }
+#
+##
+{ 'command': 'rtc-inject-irq-broadcast',
+  'if': 'TARGET_I386' }
+
 ##
 # @SevState:
 #
-- 
2.34.1




Re: [PATCH] mc146818rtc: add a way to generate RTC interrupts via QMP

2024-05-02 Thread Daniil Tatianin

On 4/29/24 4:39 PM, Philippe Mathieu-Daudé wrote:


(+Peter who has more experience on such design).

On 29/4/24 13:32, Markus Armbruster wrote:

Philippe Mathieu-Daudé  writes:


Hi Daniil, Markus,

On 26/4/24 10:39, Markus Armbruster wrote:

Daniil Tatianin  writes:


This can be used to force-synchronize the time in guest after a long
stop-cont pause, which can be useful for serverless-type workload.


What is a "serverless-type workload"?


That's when your VM instance only runs on demand for a few seconds 
before going to sleep again

until the next user request comes in.


Signed-off-by: Daniil Tatianin 
---
   hw/rtc/mc146818rtc.c | 15 +++
   include/hw/rtc/mc146818rtc.h |  1 +
   qapi/misc-target.json    | 16 
   3 files changed, 32 insertions(+)

diff --git a/hw/rtc/mc146818rtc.c b/hw/rtc/mc146818rtc.c
index f4c1869232..6980a78d5f 100644
--- a/hw/rtc/mc146818rtc.c
+++ b/hw/rtc/mc146818rtc.c
@@ -116,6 +116,21 @@ void qmp_rtc_reset_reinjection(Error **errp)
   }
   }
   +void qmp_rtc_notify(Error **errp)
+{
+    MC146818RtcState *s;
+
+    /*
+ * See:
+ * 
https://www.kernel.org/doc/Documentation/virtual/kvm/timekeeping.txt


What part of this document explains why this change is required?
I probably missed it. Explaining it here briefly would be more
useful.


Sure, will do!




+ */
+    QLIST_FOREACH(s, _devices, link) {
+    s->cmos_data[RTC_REG_B] |= REG_B_UIE;

  // Update-ended interrupt enable


+    s->cmos_data[RTC_REG_C] |= REG_C_IRQF | REG_C_UF;

  // interrupt request flag
  //   update interrupt flag


+    qemu_irq_raise(s->irq);
+    }
+}
+

Note for later: qmp_rtc_notify() works on all realized mc146818rtc
devices.  Other kinds of RTC devices are silently ignored. Just like
qmp_rtc_reset_reinjection().


IMO to avoid any future ambiguity (in heterogeneous machines), this
command must take a QOM device path (or a list of) and only notify
those.


Let's compare:

• With QOM path:

   · You need to know the machine's RTC device(s).

 Unfortunately, this is bothersome, as the QOM path is not stable.


But we'll need more of that with dynamic machines...


 For Q35, it's generally "/machine/unattached/device[N]/rtc", but N
 varies with configuration (TCG N=2, KVM N=3 for me), and it might
 vary with machine type version.  That's because the machine code
 creates ICH9-LPC without a proper name.  We do that a lot. I hate
 it.

 Likewise for i440FX with PIIX3 instead of ICH9-LPC.

 For isapc, it's /machine/unattached/device[3].  I suspect the 3
 isn't reliable there, either.

 microvm doesn't seem to have an RTC by default.

   · If the device so named doesn't support IRQ inject, the command
 should fail.


Yes, why the management app would want to run this command if there
are not RTC on the machine?


   · Could be generalized to non-RTC devices when that's useful.

• Broadcast:

   · You don't need to know the machine's RTC device(s).

   · If there are multiple RTC devices that support IRQ inject, we 
inject

 for each of them.  There is no way to select specific RTCs.

   · If there is no RTC device that supports IRQ inject, the command 
does

 nothing silently.

 I don't like silent failures.  It could be made to fail instead.

If it wasn't for the unstable QOM path problem, I'd advise against
the broadcast interface.

Thoughts?


Something bugs me in this patch but I couldn't figure out what I am
missing. The issue is when migrated VM is restored. I don't get why
the behavior depends on an external decision (via external management
transport). Don't we have post_load() hooks for such tuning?
This device implements it in rtc_post_load().


This isn't really related to migration though. Serverless is based on 
constantly stopping and resuming the VM on e.g. every HTTP request to an 
endpoint.


As far as Markus' comment goes, I propose we go the broadcast route, and 
just add the -broadcast suffix to the current command name.


If everyone is okay with that, I will submit a v3.

Thank you!



Regards,

Phil.

PD: BTW tomorrow community call could be a good opportunity to discuss
this.





Re: [PATCH v1] mc146818rtc: add a way to generate RTC interrupts via QMP

2024-04-29 Thread Daniil Tatianin

On 4/29/24 12:40 PM, Philippe Mathieu-Daudé wrote:


On 29/4/24 11:34, Daniil Tatianin wrote:

On 4/29/24 11:51 AM, Markus Armbruster wrote:


Daniil Tatianin  writes:


This can be used to force-synchronize the time in guest after a long
stop-cont pause, which can be useful for serverless-type workload.

Also add a comment to highlight the fact that this (and one other QMP
command) only works for the MC146818 RTC controller.

Signed-off-by: Daniil Tatianin 
---

Changes since v0:
- Rename to rtc-inject-irq to match other similar API
- Add a comment to highlight that this only works for the I386 RTC

---
  hw/rtc/mc146818rtc.c | 20 
  include/hw/rtc/mc146818rtc.h |  1 +
  qapi/misc-target.json    | 16 
  3 files changed, 37 insertions(+)

diff --git a/hw/rtc/mc146818rtc.c b/hw/rtc/mc146818rtc.c
index f4c1869232..8501b55cbd 100644
--- a/hw/rtc/mc146818rtc.c
+++ b/hw/rtc/mc146818rtc.c
@@ -107,6 +107,11 @@ static void 
rtc_coalesced_timer_update(MC146818RtcState *s)

  static QLIST_HEAD(, MC146818RtcState) rtc_devices =
  QLIST_HEAD_INITIALIZER(rtc_devices);
+/*
+ * NOTE:
+ * The two QMP functions below are _only_ implemented for the 
MC146818.

+ * All other RTC devices ignore this.
+ */
  void qmp_rtc_reset_reinjection(Error **errp)
  {
  MC146818RtcState *s;
@@ -116,6 +121,21 @@ void qmp_rtc_reset_reinjection(Error **errp)
  }
  }
+void qmp_rtc_inject_irq(Error **errp)
+{
+    MC146818RtcState *s;
+
+    /*
+ * See:
+ * 
https://www.kernel.org/doc/Documentation/virtual/kvm/timekeeping.txt

+ */
+    QLIST_FOREACH(s, _devices, link) {
+    s->cmos_data[RTC_REG_B] |= REG_B_UIE;
+    s->cmos_data[RTC_REG_C] |= REG_C_IRQF | REG_C_UF;
+    qemu_irq_raise(s->irq);
+    }
+}
+
  static bool rtc_policy_slew_deliver_irq(MC146818RtcState *s)
  {
  kvm_reset_irq_delivered();
diff --git a/include/hw/rtc/mc146818rtc.h 
b/include/hw/rtc/mc146818rtc.h

index 97cec0b3e8..6cd9761d80 100644
--- a/include/hw/rtc/mc146818rtc.h
+++ b/include/hw/rtc/mc146818rtc.h
@@ -56,5 +56,6 @@ MC146818RtcState *mc146818_rtc_init(ISABus *bus, 
int base_year,
  void mc146818rtc_set_cmos_data(MC146818RtcState *s, int addr, int 
val);

  int mc146818rtc_get_cmos_data(MC146818RtcState *s, int addr);
  void qmp_rtc_reset_reinjection(Error **errp);
+void qmp_rtc_inject_irq(Error **errp);
  #endif /* HW_RTC_MC146818RTC_H */
diff --git a/qapi/misc-target.json b/qapi/misc-target.json
index 4e0a6492a9..d84a5d07a2 100644
--- a/qapi/misc-target.json
+++ b/qapi/misc-target.json
@@ -19,6 +19,22 @@
  { 'command': 'rtc-reset-reinjection',
    'if': 'TARGET_I386' }
+##
+# @rtc-inject-irq:
+#
+# Inject an RTC interrupt.

Your cover letter explains what this could be good for.  Would it make
sense to explain it here, too?


Sure, sounds useful. I'll add a description in the next version.


Please also see my comments on the previous patch:
https://lore.kernel.org/qemu-devel/11c78645-e87b-4a43-8191-a73540c36...@linaro.org/ 



I think this makes sense, but there's already a similar command, which 
doesn't do it. Should that one be changed as well then? Or do we only 
change the interface for this one?




[PATCH v2] mc146818rtc: add a way to generate RTC interrupts via QMP

2024-04-29 Thread Daniil Tatianin
This can be used to force-synchronize the time in guest after a long
stop-cont pause, which can be useful for serverless-type workload.

Also add a comment to highlight the fact that this (and one other QMP
command) only works for the MC146818 RTC controller.

Signed-off-by: Daniil Tatianin 
---

Changes since v0:
- Rename to rtc-inject-irq to match other similar API
- Add a comment to highlight that this only works for the I386 RTC

Changes since v1:
- Added a description below the QMP command to explain how it can be
  used and what it does.

---
 hw/rtc/mc146818rtc.c | 20 
 include/hw/rtc/mc146818rtc.h |  1 +
 qapi/misc-target.json| 18 ++
 3 files changed, 39 insertions(+)

diff --git a/hw/rtc/mc146818rtc.c b/hw/rtc/mc146818rtc.c
index f4c1869232..8501b55cbd 100644
--- a/hw/rtc/mc146818rtc.c
+++ b/hw/rtc/mc146818rtc.c
@@ -107,6 +107,11 @@ static void rtc_coalesced_timer_update(MC146818RtcState *s)
 static QLIST_HEAD(, MC146818RtcState) rtc_devices =
 QLIST_HEAD_INITIALIZER(rtc_devices);
 
+/*
+ * NOTE:
+ * The two QMP functions below are _only_ implemented for the MC146818.
+ * All other RTC devices ignore this.
+ */
 void qmp_rtc_reset_reinjection(Error **errp)
 {
 MC146818RtcState *s;
@@ -116,6 +121,21 @@ void qmp_rtc_reset_reinjection(Error **errp)
 }
 }
 
+void qmp_rtc_inject_irq(Error **errp)
+{
+MC146818RtcState *s;
+
+/*
+ * See:
+ * https://www.kernel.org/doc/Documentation/virtual/kvm/timekeeping.txt
+ */
+QLIST_FOREACH(s, _devices, link) {
+s->cmos_data[RTC_REG_B] |= REG_B_UIE;
+s->cmos_data[RTC_REG_C] |= REG_C_IRQF | REG_C_UF;
+qemu_irq_raise(s->irq);
+}
+}
+
 static bool rtc_policy_slew_deliver_irq(MC146818RtcState *s)
 {
 kvm_reset_irq_delivered();
diff --git a/include/hw/rtc/mc146818rtc.h b/include/hw/rtc/mc146818rtc.h
index 97cec0b3e8..6cd9761d80 100644
--- a/include/hw/rtc/mc146818rtc.h
+++ b/include/hw/rtc/mc146818rtc.h
@@ -56,5 +56,6 @@ MC146818RtcState *mc146818_rtc_init(ISABus *bus, int 
base_year,
 void mc146818rtc_set_cmos_data(MC146818RtcState *s, int addr, int val);
 int mc146818rtc_get_cmos_data(MC146818RtcState *s, int addr);
 void qmp_rtc_reset_reinjection(Error **errp);
+void qmp_rtc_inject_irq(Error **errp);
 
 #endif /* HW_RTC_MC146818RTC_H */
diff --git a/qapi/misc-target.json b/qapi/misc-target.json
index 4e0a6492a9..0f2479f8f4 100644
--- a/qapi/misc-target.json
+++ b/qapi/misc-target.json
@@ -19,6 +19,24 @@
 { 'command': 'rtc-reset-reinjection',
   'if': 'TARGET_I386' }
 
+##
+# @rtc-inject-irq:
+#
+# Inject an RTC interrupt. This command forces the guest to synchornize
+# the time with RTC. This is useful after a long stop-cont pause, which
+# is common for serverless-type workload.
+#
+# Since: 9.1
+#
+# Example:
+#
+# -> { "execute": "rtc-inject-irq" }
+# <- { "return": {} }
+#
+##
+{ 'command': 'rtc-inject-irq',
+  'if': 'TARGET_I386' }
+
 ##
 # @SevState:
 #
-- 
2.34.1




Re: [PATCH v1] mc146818rtc: add a way to generate RTC interrupts via QMP

2024-04-29 Thread Daniil Tatianin

On 4/29/24 11:51 AM, Markus Armbruster wrote:


Daniil Tatianin  writes:


This can be used to force-synchronize the time in guest after a long
stop-cont pause, which can be useful for serverless-type workload.

Also add a comment to highlight the fact that this (and one other QMP
command) only works for the MC146818 RTC controller.

Signed-off-by: Daniil Tatianin 
---

Changes since v0:
- Rename to rtc-inject-irq to match other similar API
- Add a comment to highlight that this only works for the I386 RTC

---
  hw/rtc/mc146818rtc.c | 20 
  include/hw/rtc/mc146818rtc.h |  1 +
  qapi/misc-target.json| 16 
  3 files changed, 37 insertions(+)

diff --git a/hw/rtc/mc146818rtc.c b/hw/rtc/mc146818rtc.c
index f4c1869232..8501b55cbd 100644
--- a/hw/rtc/mc146818rtc.c
+++ b/hw/rtc/mc146818rtc.c
@@ -107,6 +107,11 @@ static void rtc_coalesced_timer_update(MC146818RtcState *s)
  static QLIST_HEAD(, MC146818RtcState) rtc_devices =
  QLIST_HEAD_INITIALIZER(rtc_devices);
  
+/*

+ * NOTE:
+ * The two QMP functions below are _only_ implemented for the MC146818.
+ * All other RTC devices ignore this.
+ */
  void qmp_rtc_reset_reinjection(Error **errp)
  {
  MC146818RtcState *s;
@@ -116,6 +121,21 @@ void qmp_rtc_reset_reinjection(Error **errp)
  }
  }
  
+void qmp_rtc_inject_irq(Error **errp)

+{
+MC146818RtcState *s;
+
+/*
+ * See:
+ * https://www.kernel.org/doc/Documentation/virtual/kvm/timekeeping.txt
+ */
+QLIST_FOREACH(s, _devices, link) {
+s->cmos_data[RTC_REG_B] |= REG_B_UIE;
+s->cmos_data[RTC_REG_C] |= REG_C_IRQF | REG_C_UF;
+qemu_irq_raise(s->irq);
+}
+}
+
  static bool rtc_policy_slew_deliver_irq(MC146818RtcState *s)
  {
  kvm_reset_irq_delivered();
diff --git a/include/hw/rtc/mc146818rtc.h b/include/hw/rtc/mc146818rtc.h
index 97cec0b3e8..6cd9761d80 100644
--- a/include/hw/rtc/mc146818rtc.h
+++ b/include/hw/rtc/mc146818rtc.h
@@ -56,5 +56,6 @@ MC146818RtcState *mc146818_rtc_init(ISABus *bus, int 
base_year,
  void mc146818rtc_set_cmos_data(MC146818RtcState *s, int addr, int val);
  int mc146818rtc_get_cmos_data(MC146818RtcState *s, int addr);
  void qmp_rtc_reset_reinjection(Error **errp);
+void qmp_rtc_inject_irq(Error **errp);
  
  #endif /* HW_RTC_MC146818RTC_H */

diff --git a/qapi/misc-target.json b/qapi/misc-target.json
index 4e0a6492a9..d84a5d07a2 100644
--- a/qapi/misc-target.json
+++ b/qapi/misc-target.json
@@ -19,6 +19,22 @@
  { 'command': 'rtc-reset-reinjection',
'if': 'TARGET_I386' }
  
+##

+# @rtc-inject-irq:
+#
+# Inject an RTC interrupt.

Your cover letter explains what this could be good for.  Would it make
sense to explain it here, too?


Sure, sounds useful. I'll add a description in the next version.

Thanks


+#
+# Since: 9.1
+#
+# Example:
+#
+# -> { "execute": "rtc-inject-irq" }
+# <- { "return": {} }
+#
+##
+{ 'command': 'rtc-inject-irq',
+  'if': 'TARGET_I386' }
+
  ##
  # @SevState:
  #




[PATCH v1] mc146818rtc: add a way to generate RTC interrupts via QMP

2024-04-27 Thread Daniil Tatianin
This can be used to force-synchronize the time in guest after a long
stop-cont pause, which can be useful for serverless-type workload.

Also add a comment to highlight the fact that this (and one other QMP
command) only works for the MC146818 RTC controller.

Signed-off-by: Daniil Tatianin 
---

Changes since v0:
- Rename to rtc-inject-irq to match other similar API
- Add a comment to highlight that this only works for the I386 RTC

---
 hw/rtc/mc146818rtc.c | 20 
 include/hw/rtc/mc146818rtc.h |  1 +
 qapi/misc-target.json| 16 
 3 files changed, 37 insertions(+)

diff --git a/hw/rtc/mc146818rtc.c b/hw/rtc/mc146818rtc.c
index f4c1869232..8501b55cbd 100644
--- a/hw/rtc/mc146818rtc.c
+++ b/hw/rtc/mc146818rtc.c
@@ -107,6 +107,11 @@ static void rtc_coalesced_timer_update(MC146818RtcState *s)
 static QLIST_HEAD(, MC146818RtcState) rtc_devices =
 QLIST_HEAD_INITIALIZER(rtc_devices);
 
+/*
+ * NOTE:
+ * The two QMP functions below are _only_ implemented for the MC146818.
+ * All other RTC devices ignore this.
+ */
 void qmp_rtc_reset_reinjection(Error **errp)
 {
 MC146818RtcState *s;
@@ -116,6 +121,21 @@ void qmp_rtc_reset_reinjection(Error **errp)
 }
 }
 
+void qmp_rtc_inject_irq(Error **errp)
+{
+MC146818RtcState *s;
+
+/*
+ * See:
+ * https://www.kernel.org/doc/Documentation/virtual/kvm/timekeeping.txt
+ */
+QLIST_FOREACH(s, _devices, link) {
+s->cmos_data[RTC_REG_B] |= REG_B_UIE;
+s->cmos_data[RTC_REG_C] |= REG_C_IRQF | REG_C_UF;
+qemu_irq_raise(s->irq);
+}
+}
+
 static bool rtc_policy_slew_deliver_irq(MC146818RtcState *s)
 {
 kvm_reset_irq_delivered();
diff --git a/include/hw/rtc/mc146818rtc.h b/include/hw/rtc/mc146818rtc.h
index 97cec0b3e8..6cd9761d80 100644
--- a/include/hw/rtc/mc146818rtc.h
+++ b/include/hw/rtc/mc146818rtc.h
@@ -56,5 +56,6 @@ MC146818RtcState *mc146818_rtc_init(ISABus *bus, int 
base_year,
 void mc146818rtc_set_cmos_data(MC146818RtcState *s, int addr, int val);
 int mc146818rtc_get_cmos_data(MC146818RtcState *s, int addr);
 void qmp_rtc_reset_reinjection(Error **errp);
+void qmp_rtc_inject_irq(Error **errp);
 
 #endif /* HW_RTC_MC146818RTC_H */
diff --git a/qapi/misc-target.json b/qapi/misc-target.json
index 4e0a6492a9..d84a5d07a2 100644
--- a/qapi/misc-target.json
+++ b/qapi/misc-target.json
@@ -19,6 +19,22 @@
 { 'command': 'rtc-reset-reinjection',
   'if': 'TARGET_I386' }
 
+##
+# @rtc-inject-irq:
+#
+# Inject an RTC interrupt.
+#
+# Since: 9.1
+#
+# Example:
+#
+# -> { "execute": "rtc-inject-irq" }
+# <- { "return": {} }
+#
+##
+{ 'command': 'rtc-inject-irq',
+  'if': 'TARGET_I386' }
+
 ##
 # @SevState:
 #
-- 
2.34.1




Re: [PATCH] mc146818rtc: add a way to generate RTC interrupts via QMP

2024-04-26 Thread Daniil Tatianin

On 4/26/24 11:39 AM, Markus Armbruster wrote:


Daniil Tatianin  writes:


This can be used to force-synchronize the time in guest after a long
stop-cont pause, which can be useful for serverless-type workload.

Signed-off-by: Daniil Tatianin 
---
  hw/rtc/mc146818rtc.c | 15 +++
  include/hw/rtc/mc146818rtc.h |  1 +
  qapi/misc-target.json| 16 
  3 files changed, 32 insertions(+)

diff --git a/hw/rtc/mc146818rtc.c b/hw/rtc/mc146818rtc.c
index f4c1869232..6980a78d5f 100644
--- a/hw/rtc/mc146818rtc.c
+++ b/hw/rtc/mc146818rtc.c
@@ -116,6 +116,21 @@ void qmp_rtc_reset_reinjection(Error **errp)
  }
  }
  
+void qmp_rtc_notify(Error **errp)

+{
+MC146818RtcState *s;
+
+/*
+ * See:
+ * https://www.kernel.org/doc/Documentation/virtual/kvm/timekeeping.txt
+ */
+QLIST_FOREACH(s, _devices, link) {
+s->cmos_data[RTC_REG_B] |= REG_B_UIE;
+s->cmos_data[RTC_REG_C] |= REG_C_IRQF | REG_C_UF;
+qemu_irq_raise(s->irq);
+}
+}
+

Note for later: qmp_rtc_notify() works on all realized mc146818rtc
devices.  Other kinds of RTC devices are silently ignored.  Just like
qmp_rtc_reset_reinjection().


  static bool rtc_policy_slew_deliver_irq(MC146818RtcState *s)
  {
  kvm_reset_irq_delivered();
diff --git a/include/hw/rtc/mc146818rtc.h b/include/hw/rtc/mc146818rtc.h
index 97cec0b3e8..5229dffbbd 100644
--- a/include/hw/rtc/mc146818rtc.h
+++ b/include/hw/rtc/mc146818rtc.h
@@ -56,5 +56,6 @@ MC146818RtcState *mc146818_rtc_init(ISABus *bus, int 
base_year,
  void mc146818rtc_set_cmos_data(MC146818RtcState *s, int addr, int val);
  int mc146818rtc_get_cmos_data(MC146818RtcState *s, int addr);
  void qmp_rtc_reset_reinjection(Error **errp);
+void qmp_rtc_notify(Error **errp);
  
  #endif /* HW_RTC_MC146818RTC_H */

diff --git a/qapi/misc-target.json b/qapi/misc-target.json
index 4e0a6492a9..20457b0acc 100644
--- a/qapi/misc-target.json
+++ b/qapi/misc-target.json
@@ -19,6 +19,22 @@

##
# @rtc-reset-reinjection:
#
# This command will reset the RTC interrupt reinjection backlog.  Can
# be used if another mechanism to synchronize guest time is in effect,
# for example QEMU guest agent's guest-set-time command.
#
# Since: 2.1
#
# Example:
#
# -> { "execute": "rtc-reset-reinjection" }
# <- { "return": {} }
##

  { 'command': 'rtc-reset-reinjection',
'if': 'TARGET_I386' }
  
+##

+# @rtc-notify:
+#
+# Generate an RTC interrupt.

Our QMP command to generate NMIs is called inject-nmi.  Call this one
inject-rtc-irq for consistency?  rtc-inject-irq?


This makes sense, I'll rename in the next version. Thanks.


+#
+# Since: 9.1
+#
+# Example:
+#
+# -> { "execute": "rtc-notify" }
+# <- { "return": {} }
+#
+##
+{ 'command': 'rtc-notify',
+  'if': 'TARGET_I386' }
+

As noted above, both commands silently ignore RTCs other than
mc146818rtc.

They're only available with TARGET_I386.

As long as all machines available with TARGET_I386 can only ever contain
mc146818rtc RTCs, ignoring other RTCs is a non-problem in practice.

Feels a bit fragile to me.  Thoughts?


Feels a bit fragile indeed, but this code has been there since 2.1, and 
I guess no one really found this to be a problem.



  ##
  # @SevState:
  #




[PATCH] mc146818rtc: add a way to generate RTC interrupts via QMP

2024-04-25 Thread Daniil Tatianin
This can be used to force-synchronize the time in guest after a long
stop-cont pause, which can be useful for serverless-type workload.

Signed-off-by: Daniil Tatianin 
---
 hw/rtc/mc146818rtc.c | 15 +++
 include/hw/rtc/mc146818rtc.h |  1 +
 qapi/misc-target.json| 16 
 3 files changed, 32 insertions(+)

diff --git a/hw/rtc/mc146818rtc.c b/hw/rtc/mc146818rtc.c
index f4c1869232..6980a78d5f 100644
--- a/hw/rtc/mc146818rtc.c
+++ b/hw/rtc/mc146818rtc.c
@@ -116,6 +116,21 @@ void qmp_rtc_reset_reinjection(Error **errp)
 }
 }
 
+void qmp_rtc_notify(Error **errp)
+{
+MC146818RtcState *s;
+
+/*
+ * See:
+ * https://www.kernel.org/doc/Documentation/virtual/kvm/timekeeping.txt
+ */
+QLIST_FOREACH(s, _devices, link) {
+s->cmos_data[RTC_REG_B] |= REG_B_UIE;
+s->cmos_data[RTC_REG_C] |= REG_C_IRQF | REG_C_UF;
+qemu_irq_raise(s->irq);
+}
+}
+
 static bool rtc_policy_slew_deliver_irq(MC146818RtcState *s)
 {
 kvm_reset_irq_delivered();
diff --git a/include/hw/rtc/mc146818rtc.h b/include/hw/rtc/mc146818rtc.h
index 97cec0b3e8..5229dffbbd 100644
--- a/include/hw/rtc/mc146818rtc.h
+++ b/include/hw/rtc/mc146818rtc.h
@@ -56,5 +56,6 @@ MC146818RtcState *mc146818_rtc_init(ISABus *bus, int 
base_year,
 void mc146818rtc_set_cmos_data(MC146818RtcState *s, int addr, int val);
 int mc146818rtc_get_cmos_data(MC146818RtcState *s, int addr);
 void qmp_rtc_reset_reinjection(Error **errp);
+void qmp_rtc_notify(Error **errp);
 
 #endif /* HW_RTC_MC146818RTC_H */
diff --git a/qapi/misc-target.json b/qapi/misc-target.json
index 4e0a6492a9..20457b0acc 100644
--- a/qapi/misc-target.json
+++ b/qapi/misc-target.json
@@ -19,6 +19,22 @@
 { 'command': 'rtc-reset-reinjection',
   'if': 'TARGET_I386' }
 
+##
+# @rtc-notify:
+#
+# Generate an RTC interrupt.
+#
+# Since: 9.1
+#
+# Example:
+#
+# -> { "execute": "rtc-notify" }
+# <- { "return": {} }
+#
+##
+{ 'command': 'rtc-notify',
+  'if': 'TARGET_I386' }
+
 ##
 # @SevState:
 #
-- 
2.34.1




Re: [PATCH v2 1/3] i386/a-b-bootblock: factor test memory addresses out into constants

2023-10-02 Thread Daniil Tatianin
Ping! :) 19.09.2023, 13:23, "Daniil Tatianin" :So that we have less magic numbers to deal with. This also allows us toreuse these in the following commits.Signed-off-by: Daniil Tatianin <d-tatia...@yandex-team.ru>Reviewed-by: Peter Xu <pet...@redhat.com>--- tests/migration/i386/a-b-bootblock.S | 9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-)diff --git a/tests/migration/i386/a-b-bootblock.S b/tests/migration/i386/a-b-bootblock.Sindex 3d464c7568..036216e4a7 100644--- a/tests/migration/i386/a-b-bootblock.S+++ b/tests/migration/i386/a-b-bootblock.S@@ -34,6 +34,10 @@ start: # at 0x7c00 ? mov $16,%eax mov %eax,%ds +# Start from 1MB+.set TEST_MEM_START, (1024*1024)+.set TEST_MEM_END, (100*1024*1024)+ mov $65,%ax mov $0x3f8,%dx outb %al,%dx@@ -41,12 +45,11 @@ start: # at 0x7c00 ? # bl keeps a counter so we limit the output speed mov $0, %bl mainloop:- # Start from 1MB- mov $(1024*1024),%eax+ mov $TEST_MEM_START,%eax innerloop: incb (%eax) add $4096,%eax- cmp $(100*1024*1024),%eax+ cmp $TEST_MEM_END,%eax jl innerloop  inc %bl--2.34.1 

Re: [PATCH 0/2] i386/a-b-bootblock: zero the first byte of each page on start

2023-09-26 Thread Daniil Tatianin
27.09.2023, 00:02, "Peter Xu" :On Thu, Sep 07, 2023 at 10:29:42PM +0300, Daniil Tatianin wrote: This series fixes an issue where the outcome of the migration qtest relies on the initial memory contents all being the same across the first 100MiB of RAM, which is a very fragile invariant.  We fix this by making sure we zero the first byte of every testable page in range beforehand.What's the difference between this one and:https://lore.kernel.org/r/20230907193051.1609310-1-d-tatia...@yandex-team.ru?There's a v2 version of this series that adds a fix for a similar issue in s390x as well, please look at that one instead.Thanks, --Peter Xu 



Re: [PATCH 2/2] i386/a-b-bootblock: zero the first byte of each page on start

2023-09-26 Thread Daniil Tatianin
26.09.2023, 23:41, "Vladimir Sementsov-Ogievskiy" :On 07.09.23 22:29, Daniil Tatianin wrote: The migration qtest all the way up to this point used to work by sheer luck relying on the contents of all pages from 1MiB to 100MiB to contain the same one value in the first byte initially.  This easily breaks if we reduce the amount of RAM for the test instances from 150MiB to e.g 110MiB since that makes SeaBIOS dirty some of the pages starting at about 0x5dd2000 (~93 MiB) as it reuses those for the HighMemory allocator since commit dc88f9b72df ("malloc: use large ZoneHigh when there is enough memory").  This would result in the following errors:  12/60 qemu:qtest+qtest-x86_64 / qtest-x86_64/migration-test ERROR 2.74s killed by signal 6 SIGABRT  stderr:  Memory content inconsistency at 5dd2000 first_byte = cc last_byte = cb current = 9e hit_edge = 1  Memory content inconsistency at 5dd3000 first_byte = cc last_byte = cb current = 89 hit_edge = 1  Memory content inconsistency at 5dd4000 first_byte = cc last_byte = cb current = 23 hit_edge = 1  Memory content inconsistency at 5dd5000 first_byte = cc last_byte = cb current = 31 hit_edge = 1  Memory content inconsistency at 5dd6000 first_byte = cc last_byte = cb current = 70 hit_edge = 1  Memory content inconsistency at 5dd7000 first_byte = cc last_byte = cb current = ff hit_edge = 1  Memory content inconsistency at 5dd8000 first_byte = cc last_byte = cb current = 54 hit_edge = 1  Memory content inconsistency at 5dd9000 first_byte = cc last_byte = cb current = 64 hit_edge = 1  Memory content inconsistency at 5dda000 first_byte = cc last_byte = cb current = 1d hit_edge = 1  Memory content inconsistency at 5ddb000 first_byte = cc last_byte = cb current = 1a hit_edge = 1  and in another 26 pages**  ERROR:../tests/qtest/migration-test.c:300:check_guests_ram: assertion failed: (bad == 0)  Fix this by always zeroing the first byte of each page in the range so that we get consistent results no matter the initial contents.  Fixes: ea0c6d62391 ("test: Postcopy") Signed-off-by: Daniil Tatianin <d-tatia...@yandex-team.ru> ---   tests/migration/i386/a-b-bootblock.S | 9 +   tests/migration/i386/a-b-bootblock.h | 16    2 files changed, 17 insertions(+), 8 deletions(-)  diff --git a/tests/migration/i386/a-b-bootblock.S b/tests/migration/i386/a-b-bootblock.S index 036216e4a7..6bbd60 100644 --- a/tests/migration/i386/a-b-bootblock.S +++ b/tests/migration/i386/a-b-bootblock.S @@ -44,6 +44,15 @@ start: # at 0x7c00 ?  # bl keeps a counter so we limit the output speed   mov $0, %bl + +pre_zero: + mov $TEST_MEM_START,%eax +do_zero: + movb $0, (%eax) + add $4096,%eax + cmp $TEST_MEM_END,%eax + jl do_zero +   mainloop:   mov $TEST_MEM_START,%eax   innerloop: diff --git a/tests/migration/i386/a-b-bootblock.h b/tests/migration/i386/a-b-bootblock.h index b7b0fce2ee..5b523917ce 100644 --- a/tests/migration/i386/a-b-bootblock.h +++ b/tests/migration/i386/a-b-bootblock.h @@ -4,18 +4,18 @@* the header and the assembler differences in your patch submission.*/   unsigned char x86_bootsect[] = { - 0xfa, 0x0f, 0x01, 0x16, 0x78, 0x7c, 0x66, 0xb8, 0x01, 0x00, 0x00, 0x00, + 0xfa, 0x0f, 0x01, 0x16, 0x8c, 0x7c, 0x66, 0xb8, 0x01, 0x00, 0x00, 0x00, 0x0f, 0x22, 0xc0, 0x66, 0xea, 0x20, 0x7c, 0x00, 0x00, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xe4, 0x92, 0x0c, 0x02, 0xe6, 0x92, 0xb8, 0x10, 0x00, 0x00, 0x00, 0x8e, 0xd8, 0x66, 0xb8, 0x41, 0x00, 0x66, 0xba, 0xf8, 0x03, 0xee, 0xb3, 0x00, 0xb8, 0x00, 0x00, 0x10, - 0x00, 0xfe, 0x00, 0x05, 0x00, 0x10, 0x00, 0x00, 0x3d, 0x00, 0x00, 0x40, - 0x06, 0x7c, 0xf2, 0xfe, 0xc3, 0x80, 0xe3, 0x3f, 0x75, 0xe6, 0x66, 0xb8, - 0x42, 0x00, 0x66, 0xba, 0xf8, 0x03, 0xee, 0xeb, 0xdb, 0x8d, 0x76, 0x00, - 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0x00, 0x00, - 0x00, 0x9a, 0xcf, 0x00, 0xff, 0xff, 0x00, 0x00, 0x00, 0x92, 0xcf, 0x00, - 0x27, 0x00, 0x60, 0x7c, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, - 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0xc6, 0x00, 0x00, 0x05, 0x00, 0x10, 0x00, 0x00, 0x3d, 0x00, 0x00, + 0x40, 0x06, 0x7c, 0xf1, 0xb8, 0x00, 0x00, 0x10, 0x00, 0xfe, 0x00, 0x05, + 0x00, 0x10, 0x00, 0x00, 0x3d, 0x00, 0x00, 0x40, 0x06, 0x7c, 0xf2, 0xfe, + 0xc3, 0x80, 0xe3, 0x3f, 0x75, 0xe6, 0x66, 0xb8, 0x42, 0x00, 0x66, 0xba, + 0xf8, 0x03, 0xee, 0xeb, 0xdb, 0x8d, 0x76, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0x00, 0x00, 0x00, 0x9a, 0xcf, 0x00, + 0xff, 0xff, 0x00, 0x00, 0x00, 0x92, 0xcf, 0x00, 0x27, 0x00, 0x74, 0x7c, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,I understand the idea of patch, but don't follow why and how this bo

Re: [PATCH 0/2] i386/a-b-bootblock: zero the first byte of each page on start

2023-09-26 Thread Daniil Tatianin
27.09.2023, 00:02, "Peter Xu" :On Thu, Sep 07, 2023 at 10:29:42PM +0300, Daniil Tatianin wrote: This series fixes an issue where the outcome of the migration qtest relies on the initial memory contents all being the same across the first 100MiB of RAM, which is a very fragile invariant.  We fix this by making sure we zero the first byte of every testable page in range beforehand.What's the difference between this one and:https://lore.kernel.org/r/20230907193051.1609310-1-d-tatia...@yandex-team.ru?I think i initially forgot something and had to resend, but not sure. Anyway, disregard that one. --Peter Xu 



Re: [PATCH v2 0/3] migration-qtest: zero the first byte of each page on start

2023-09-26 Thread Daniil Tatianin
ping :)



[PATCH v2 0/3] migration-qtest: zero the first byte of each page on start

2023-09-19 Thread Daniil Tatianin
This series fixes an issue where the outcome of the migration qtest
relies on the initial memory contents all being the same across the
first 100MiB of RAM, which is a very fragile invariant.
 
We fix this by making sure we zero the first byte of every testable page
in range beforehand.

Changes since v1:
- Add a fix for the s390x test binary as well as suggested by Peter Xu

Daniil Tatianin (3):
  i386/a-b-bootblock: factor test memory addresses out into constants
  i386/a-b-bootblock: zero the first byte of each page on start
  s390x/a-b-bios: zero the first byte of each page on start

 tests/migration/i386/a-b-bootblock.S |  18 +-
 tests/migration/i386/a-b-bootblock.h |  16 +-
 tests/migration/s390x/a-b-bios.c |   8 +
 tests/migration/s390x/a-b-bios.h | 380 ++-
 4 files changed, 234 insertions(+), 188 deletions(-)

-- 
2.34.1




[PATCH v2 2/3] i386/a-b-bootblock: zero the first byte of each page on start

2023-09-19 Thread Daniil Tatianin
The migration qtest all the way up to this point used to work by sheer
luck relying on the contents of all pages from 1MiB to 100MiB to contain
the same one value in the first byte initially.

This easily breaks if we reduce the amount of RAM for the test instances
from 150MiB to e.g 110MiB since that makes SeaBIOS dirty some of the
pages starting at about 0x5dd2000 (~93 MiB) as it reuses those for the
HighMemory allocator since commit dc88f9b72df ("malloc: use large
ZoneHigh when there is enough memory").

This would result in the following errors:
12/60 qemu:qtest+qtest-x86_64 / qtest-x86_64/migration-test 
ERROR   2.74s   killed by signal 6 SIGABRT
stderr:
Memory content inconsistency at 5dd2000 first_byte = cc last_byte = cb 
current = 9e hit_edge = 1
Memory content inconsistency at 5dd3000 first_byte = cc last_byte = cb 
current = 89 hit_edge = 1
Memory content inconsistency at 5dd4000 first_byte = cc last_byte = cb 
current = 23 hit_edge = 1
Memory content inconsistency at 5dd5000 first_byte = cc last_byte = cb 
current = 31 hit_edge = 1
Memory content inconsistency at 5dd6000 first_byte = cc last_byte = cb 
current = 70 hit_edge = 1
Memory content inconsistency at 5dd7000 first_byte = cc last_byte = cb 
current = ff hit_edge = 1
Memory content inconsistency at 5dd8000 first_byte = cc last_byte = cb 
current = 54 hit_edge = 1
Memory content inconsistency at 5dd9000 first_byte = cc last_byte = cb 
current = 64 hit_edge = 1
Memory content inconsistency at 5dda000 first_byte = cc last_byte = cb 
current = 1d hit_edge = 1
Memory content inconsistency at 5ddb000 first_byte = cc last_byte = cb 
current = 1a hit_edge = 1
and in another 26 pages**
ERROR:../tests/qtest/migration-test.c:300:check_guests_ram: assertion 
failed: (bad == 0)

Fix this by always zeroing the first byte of each page in the range so
that we get consistent results no matter the initial contents.

Fixes: ea0c6d62391 ("test: Postcopy")
Signed-off-by: Daniil Tatianin 
Reviewed-by: Peter Xu 
---
 tests/migration/i386/a-b-bootblock.S |  9 +
 tests/migration/i386/a-b-bootblock.h | 16 
 2 files changed, 17 insertions(+), 8 deletions(-)

diff --git a/tests/migration/i386/a-b-bootblock.S 
b/tests/migration/i386/a-b-bootblock.S
index 036216e4a7..6bbd60 100644
--- a/tests/migration/i386/a-b-bootblock.S
+++ b/tests/migration/i386/a-b-bootblock.S
@@ -44,6 +44,15 @@ start: # at 0x7c00 ?
 
 # bl keeps a counter so we limit the output speed
 mov $0, %bl
+
+pre_zero:
+mov $TEST_MEM_START,%eax
+do_zero:
+movb $0, (%eax)
+add $4096,%eax
+cmp $TEST_MEM_END,%eax
+jl do_zero
+
 mainloop:
 mov $TEST_MEM_START,%eax
 innerloop:
diff --git a/tests/migration/i386/a-b-bootblock.h 
b/tests/migration/i386/a-b-bootblock.h
index b7b0fce2ee..5b523917ce 100644
--- a/tests/migration/i386/a-b-bootblock.h
+++ b/tests/migration/i386/a-b-bootblock.h
@@ -4,18 +4,18 @@
  * the header and the assembler differences in your patch submission.
  */
 unsigned char x86_bootsect[] = {
-  0xfa, 0x0f, 0x01, 0x16, 0x78, 0x7c, 0x66, 0xb8, 0x01, 0x00, 0x00, 0x00,
+  0xfa, 0x0f, 0x01, 0x16, 0x8c, 0x7c, 0x66, 0xb8, 0x01, 0x00, 0x00, 0x00,
   0x0f, 0x22, 0xc0, 0x66, 0xea, 0x20, 0x7c, 0x00, 0x00, 0x08, 0x00, 0x00,
   0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xe4, 0x92, 0x0c, 0x02,
   0xe6, 0x92, 0xb8, 0x10, 0x00, 0x00, 0x00, 0x8e, 0xd8, 0x66, 0xb8, 0x41,
   0x00, 0x66, 0xba, 0xf8, 0x03, 0xee, 0xb3, 0x00, 0xb8, 0x00, 0x00, 0x10,
-  0x00, 0xfe, 0x00, 0x05, 0x00, 0x10, 0x00, 0x00, 0x3d, 0x00, 0x00, 0x40,
-  0x06, 0x7c, 0xf2, 0xfe, 0xc3, 0x80, 0xe3, 0x3f, 0x75, 0xe6, 0x66, 0xb8,
-  0x42, 0x00, 0x66, 0xba, 0xf8, 0x03, 0xee, 0xeb, 0xdb, 0x8d, 0x76, 0x00,
-  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0x00, 0x00,
-  0x00, 0x9a, 0xcf, 0x00, 0xff, 0xff, 0x00, 0x00, 0x00, 0x92, 0xcf, 0x00,
-  0x27, 0x00, 0x60, 0x7c, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
-  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+  0x00, 0xc6, 0x00, 0x00, 0x05, 0x00, 0x10, 0x00, 0x00, 0x3d, 0x00, 0x00,
+  0x40, 0x06, 0x7c, 0xf1, 0xb8, 0x00, 0x00, 0x10, 0x00, 0xfe, 0x00, 0x05,
+  0x00, 0x10, 0x00, 0x00, 0x3d, 0x00, 0x00, 0x40, 0x06, 0x7c, 0xf2, 0xfe,
+  0xc3, 0x80, 0xe3, 0x3f, 0x75, 0xe6, 0x66, 0xb8, 0x42, 0x00, 0x66, 0xba,
+  0xf8, 0x03, 0xee, 0xeb, 0xdb, 0x8d, 0x76, 0x00, 0x00, 0x00, 0x00, 0x00,
+  0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0x00, 0x00, 0x00, 0x9a, 0xcf, 0x00,
+  0xff, 0xff, 0x00, 0x00, 0x00, 0x92, 0xcf, 0x00, 0x27, 0x00, 0x74, 0x7c,
   0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
   0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
   0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
-- 
2.34.1




[PATCH v2 3/3] s390x/a-b-bios: zero the first byte of each page on start

2023-09-19 Thread Daniil Tatianin
Same as with the x86 verison of this test, we relied on the contents of
all pages in RAM to be the same across the entire test range, which is
very fragile. Zero the first byte of each page before running the
increment loop to fix this.

Fixes: 5571dc824b ("tests/migration: Enable the migration test on s390x, too")
Signed-off-by: Daniil Tatianin 
---
 tests/migration/s390x/a-b-bios.c |   8 +
 tests/migration/s390x/a-b-bios.h | 380 +--
 2 files changed, 211 insertions(+), 177 deletions(-)

diff --git a/tests/migration/s390x/a-b-bios.c b/tests/migration/s390x/a-b-bios.c
index a0327cd153..ff99a3ef57 100644
--- a/tests/migration/s390x/a-b-bios.c
+++ b/tests/migration/s390x/a-b-bios.c
@@ -27,6 +27,14 @@ void main(void)
 sclp_setup();
 sclp_print("A");
 
+/*
+ * Make sure all of the pages have consistent contents before incrementing
+ * the first byte below.
+ */
+for (addr = START_ADDRESS; addr < END_ADDRESS; addr += 4096) {
+*(volatile char *)addr = 0;
+}
+
 while (1) {
 for (addr = START_ADDRESS; addr < END_ADDRESS; addr += 4096) {
 *(volatile char *)addr += 1;  /* Change pages */
diff --git a/tests/migration/s390x/a-b-bios.h b/tests/migration/s390x/a-b-bios.h
index e722dc7c40..96103dadbb 100644
--- a/tests/migration/s390x/a-b-bios.h
+++ b/tests/migration/s390x/a-b-bios.h
@@ -6,10 +6,10 @@
 unsigned char s390x_elf[] = {
   0x7f, 0x45, 0x4c, 0x46, 0x02, 0x02, 0x01, 0x00, 0x00, 0x00, 0x00, 0x00,
   0x00, 0x00, 0x00, 0x00, 0x00, 0x03, 0x00, 0x16, 0x00, 0x00, 0x00, 0x01,
-  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x02, 0x78, 0x00, 0x00, 0x00, 0x00,
-  0x00, 0x00, 0x00, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x08, 0x80,
+  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x02, 0xa8, 0x00, 0x00, 0x00, 0x00,
+  0x00, 0x00, 0x00, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x09, 0x80,
   0x00, 0x00, 0x00, 0x00, 0x00, 0x40, 0x00, 0x38, 0x00, 0x07, 0x00, 0x40,
-  0x00, 0x0c, 0x00, 0x0b, 0x00, 0x00, 0x00, 0x06, 0x00, 0x00, 0x00, 0x04,
+  0x00, 0x0d, 0x00, 0x0c, 0x00, 0x00, 0x00, 0x06, 0x00, 0x00, 0x00, 0x04,
   0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x40, 0x00, 0x00, 0x00, 0x00,
   0x00, 0x00, 0x00, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x40,
   0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x01, 0x88, 0x00, 0x00, 0x00, 0x00,
@@ -21,140 +21,154 @@ unsigned char s390x_elf[] = {
   0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x01, 0x00, 0x00, 0x00, 0x01,
   0x00, 0x00, 0x00, 0x05, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
   0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
-  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x07, 0x0c,
-  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x07, 0x0c, 0x00, 0x00, 0x00, 0x00,
+  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x07, 0xac,
+  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x07, 0xac, 0x00, 0x00, 0x00, 0x00,
   0x00, 0x00, 0x10, 0x00, 0x00, 0x00, 0x00, 0x01, 0x00, 0x00, 0x00, 0x06,
-  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x07, 0x10, 0x00, 0x00, 0x00, 0x00,
-  0x00, 0x00, 0x17, 0x10, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x17, 0x10,
-  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xe8, 0x00, 0x00, 0x00, 0x00,
-  0x00, 0x00, 0x98, 0xf0, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x10, 0x00,
+  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x07, 0xb8, 0x00, 0x00, 0x00, 0x00,
+  0x00, 0x00, 0x17, 0xb8, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x17, 0xb8,
+  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x01, 0x38, 0x00, 0x00, 0x00, 0x00,
+  0x00, 0x01, 0x18, 0x48, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x10, 0x00,
   0x00, 0x00, 0x00, 0x02, 0x00, 0x00, 0x00, 0x06, 0x00, 0x00, 0x00, 0x00,
-  0x00, 0x00, 0x07, 0x10, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x17, 0x10,
-  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x17, 0x10, 0x00, 0x00, 0x00, 0x00,
-  0x00, 0x00, 0x00, 0xd0, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xd0,
+  0x00, 0x00, 0x07, 0xb8, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x17, 0xb8,
+  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x17, 0xb8, 0x00, 0x00, 0x00, 0x00,
+  0x00, 0x00, 0x01, 0x20, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x01, 0x20,
   0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x08, 0x64, 0x74, 0xe5, 0x51,
   0x00, 0x00, 0x00, 0x07, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
   0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
   0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
   0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
   0x00, 0x00, 0x00, 0x10, 0x64, 0x74, 0xe5, 0x52, 0x00, 0x00, 0x00, 0x04,
-  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x07, 0x10, 0x00, 0x00, 0x00, 0x00,
-  0x00, 0x00, 0x17, 0x10, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x17, 0x10,
-  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xd0, 0x00, 0x00, 0x00, 0x00,
-  0x00, 0x00, 0x00, 0xd0, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x01,
+  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x07, 0xb8, 0x00, 0x00, 0x00, 0x00,
+  0x00, 0x00, 0x17, 0xb8, 0x00, 0x

[PATCH v2 1/3] i386/a-b-bootblock: factor test memory addresses out into constants

2023-09-19 Thread Daniil Tatianin
So that we have less magic numbers to deal with. This also allows us to
reuse these in the following commits.

Signed-off-by: Daniil Tatianin 
Reviewed-by: Peter Xu 
---
 tests/migration/i386/a-b-bootblock.S | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/tests/migration/i386/a-b-bootblock.S 
b/tests/migration/i386/a-b-bootblock.S
index 3d464c7568..036216e4a7 100644
--- a/tests/migration/i386/a-b-bootblock.S
+++ b/tests/migration/i386/a-b-bootblock.S
@@ -34,6 +34,10 @@ start: # at 0x7c00 ?
 mov $16,%eax
 mov %eax,%ds
 
+# Start from 1MB
+.set TEST_MEM_START, (1024*1024)
+.set TEST_MEM_END, (100*1024*1024)
+
 mov $65,%ax
 mov $0x3f8,%dx
 outb %al,%dx
@@ -41,12 +45,11 @@ start: # at 0x7c00 ?
 # bl keeps a counter so we limit the output speed
 mov $0, %bl
 mainloop:
-# Start from 1MB
-mov $(1024*1024),%eax
+mov $TEST_MEM_START,%eax
 innerloop:
 incb (%eax)
 add $4096,%eax
-cmp $(100*1024*1024),%eax
+cmp $TEST_MEM_END,%eax
 jl innerloop
 
 inc %bl
-- 
2.34.1




Re: [PATCH v1 0/2] i386/a-b-bootblock: zero the first byte of each page on start

2023-09-13 Thread Daniil Tatianin
ping 07.09.2023, 22:31, "Daniil Tatianin" :This series fixes an issue where the outcome of the migration qtestrelies on the initial memory contents all being the same across thefirst 100MiB of RAM, which is a very fragile invariant.We fix this by making sure we zero the first byte of every testable pagein range beforehand.Daniil Tatianin (2):  i386/a-b-bootblock: factor test memory addresses out into constants  i386/a-b-bootblock: zero the first byte of each page on start tests/migration/i386/a-b-bootblock.S | 18 +++--- tests/migration/i386/a-b-bootblock.h | 16  2 files changed, 23 insertions(+), 11 deletions(-) --2.34.1 



[PATCH v1 0/2] i386/a-b-bootblock: zero the first byte of each page on start

2023-09-07 Thread Daniil Tatianin
This series fixes an issue where the outcome of the migration qtest
relies on the initial memory contents all being the same across the
first 100MiB of RAM, which is a very fragile invariant.

We fix this by making sure we zero the first byte of every testable page
in range beforehand.

Daniil Tatianin (2):
  i386/a-b-bootblock: factor test memory addresses out into constants
  i386/a-b-bootblock: zero the first byte of each page on start

 tests/migration/i386/a-b-bootblock.S | 18 +++---
 tests/migration/i386/a-b-bootblock.h | 16 
 2 files changed, 23 insertions(+), 11 deletions(-)

-- 
2.34.1




[PATCH v1 2/2] i386/a-b-bootblock: zero the first byte of each page on start

2023-09-07 Thread Daniil Tatianin
The migration qtest all the way up to this point used to work by sheer
luck relying on the contents of all pages from 1MiB to 100MiB to contain
the same one value in the first byte initially.

This easily breaks if we reduce the amount of RAM for the test instances
from 150MiB to e.g 110MiB since that makes SeaBIOS dirty some of the
pages starting at about 0x5dd2000 (~93 MiB) as it reuses those for the
HighMemory allocator since commit dc88f9b72df ("malloc: use large
ZoneHigh when there is enough memory").

This would result in the following errors:
12/60 qemu:qtest+qtest-x86_64 / qtest-x86_64/migration-test 
ERROR   2.74s   killed by signal 6 SIGABRT
stderr:
Memory content inconsistency at 5dd2000 first_byte = cc last_byte = cb 
current = 9e hit_edge = 1
Memory content inconsistency at 5dd3000 first_byte = cc last_byte = cb 
current = 89 hit_edge = 1
Memory content inconsistency at 5dd4000 first_byte = cc last_byte = cb 
current = 23 hit_edge = 1
Memory content inconsistency at 5dd5000 first_byte = cc last_byte = cb 
current = 31 hit_edge = 1
Memory content inconsistency at 5dd6000 first_byte = cc last_byte = cb 
current = 70 hit_edge = 1
Memory content inconsistency at 5dd7000 first_byte = cc last_byte = cb 
current = ff hit_edge = 1
Memory content inconsistency at 5dd8000 first_byte = cc last_byte = cb 
current = 54 hit_edge = 1
Memory content inconsistency at 5dd9000 first_byte = cc last_byte = cb 
current = 64 hit_edge = 1
Memory content inconsistency at 5dda000 first_byte = cc last_byte = cb 
current = 1d hit_edge = 1
Memory content inconsistency at 5ddb000 first_byte = cc last_byte = cb 
current = 1a hit_edge = 1
and in another 26 pages**
ERROR:../tests/qtest/migration-test.c:300:check_guests_ram: assertion 
failed: (bad == 0)

Fix this by always zeroing the first byte of each page in the range so
that we get consistent results no matter the initial contents.

Fixes: ea0c6d62391 ("test: Postcopy")
Signed-off-by: Daniil Tatianin 
---
 tests/migration/i386/a-b-bootblock.S |  9 +
 tests/migration/i386/a-b-bootblock.h | 16 
 2 files changed, 17 insertions(+), 8 deletions(-)

diff --git a/tests/migration/i386/a-b-bootblock.S 
b/tests/migration/i386/a-b-bootblock.S
index 036216e4a7..6bbd60 100644
--- a/tests/migration/i386/a-b-bootblock.S
+++ b/tests/migration/i386/a-b-bootblock.S
@@ -44,6 +44,15 @@ start: # at 0x7c00 ?
 
 # bl keeps a counter so we limit the output speed
 mov $0, %bl
+
+pre_zero:
+mov $TEST_MEM_START,%eax
+do_zero:
+movb $0, (%eax)
+add $4096,%eax
+cmp $TEST_MEM_END,%eax
+jl do_zero
+
 mainloop:
 mov $TEST_MEM_START,%eax
 innerloop:
diff --git a/tests/migration/i386/a-b-bootblock.h 
b/tests/migration/i386/a-b-bootblock.h
index b7b0fce2ee..5b523917ce 100644
--- a/tests/migration/i386/a-b-bootblock.h
+++ b/tests/migration/i386/a-b-bootblock.h
@@ -4,18 +4,18 @@
  * the header and the assembler differences in your patch submission.
  */
 unsigned char x86_bootsect[] = {
-  0xfa, 0x0f, 0x01, 0x16, 0x78, 0x7c, 0x66, 0xb8, 0x01, 0x00, 0x00, 0x00,
+  0xfa, 0x0f, 0x01, 0x16, 0x8c, 0x7c, 0x66, 0xb8, 0x01, 0x00, 0x00, 0x00,
   0x0f, 0x22, 0xc0, 0x66, 0xea, 0x20, 0x7c, 0x00, 0x00, 0x08, 0x00, 0x00,
   0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xe4, 0x92, 0x0c, 0x02,
   0xe6, 0x92, 0xb8, 0x10, 0x00, 0x00, 0x00, 0x8e, 0xd8, 0x66, 0xb8, 0x41,
   0x00, 0x66, 0xba, 0xf8, 0x03, 0xee, 0xb3, 0x00, 0xb8, 0x00, 0x00, 0x10,
-  0x00, 0xfe, 0x00, 0x05, 0x00, 0x10, 0x00, 0x00, 0x3d, 0x00, 0x00, 0x40,
-  0x06, 0x7c, 0xf2, 0xfe, 0xc3, 0x80, 0xe3, 0x3f, 0x75, 0xe6, 0x66, 0xb8,
-  0x42, 0x00, 0x66, 0xba, 0xf8, 0x03, 0xee, 0xeb, 0xdb, 0x8d, 0x76, 0x00,
-  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0x00, 0x00,
-  0x00, 0x9a, 0xcf, 0x00, 0xff, 0xff, 0x00, 0x00, 0x00, 0x92, 0xcf, 0x00,
-  0x27, 0x00, 0x60, 0x7c, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
-  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+  0x00, 0xc6, 0x00, 0x00, 0x05, 0x00, 0x10, 0x00, 0x00, 0x3d, 0x00, 0x00,
+  0x40, 0x06, 0x7c, 0xf1, 0xb8, 0x00, 0x00, 0x10, 0x00, 0xfe, 0x00, 0x05,
+  0x00, 0x10, 0x00, 0x00, 0x3d, 0x00, 0x00, 0x40, 0x06, 0x7c, 0xf2, 0xfe,
+  0xc3, 0x80, 0xe3, 0x3f, 0x75, 0xe6, 0x66, 0xb8, 0x42, 0x00, 0x66, 0xba,
+  0xf8, 0x03, 0xee, 0xeb, 0xdb, 0x8d, 0x76, 0x00, 0x00, 0x00, 0x00, 0x00,
+  0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0x00, 0x00, 0x00, 0x9a, 0xcf, 0x00,
+  0xff, 0xff, 0x00, 0x00, 0x00, 0x92, 0xcf, 0x00, 0x27, 0x00, 0x74, 0x7c,
   0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
   0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
   0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
-- 
2.34.1




[PATCH v1 1/2] i386/a-b-bootblock: factor test memory addresses out into constants

2023-09-07 Thread Daniil Tatianin
So that we have less magic numbers to deal with. This also allows us to
reuse these in the following commits.

Signed-off-by: Daniil Tatianin 
---
 tests/migration/i386/a-b-bootblock.S | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/tests/migration/i386/a-b-bootblock.S 
b/tests/migration/i386/a-b-bootblock.S
index 3d464c7568..036216e4a7 100644
--- a/tests/migration/i386/a-b-bootblock.S
+++ b/tests/migration/i386/a-b-bootblock.S
@@ -34,6 +34,10 @@ start: # at 0x7c00 ?
 mov $16,%eax
 mov %eax,%ds
 
+# Start from 1MB
+.set TEST_MEM_START, (1024*1024)
+.set TEST_MEM_END, (100*1024*1024)
+
 mov $65,%ax
 mov $0x3f8,%dx
 outb %al,%dx
@@ -41,12 +45,11 @@ start: # at 0x7c00 ?
 # bl keeps a counter so we limit the output speed
 mov $0, %bl
 mainloop:
-# Start from 1MB
-mov $(1024*1024),%eax
+mov $TEST_MEM_START,%eax
 innerloop:
 incb (%eax)
 add $4096,%eax
-cmp $(100*1024*1024),%eax
+cmp $TEST_MEM_END,%eax
 jl innerloop
 
 inc %bl
-- 
2.34.1




[PATCH 2/2] i386/a-b-bootblock: zero the first byte of each page on start

2023-09-07 Thread Daniil Tatianin
The migration qtest all the way up to this point used to work by sheer
luck relying on the contents of all pages from 1MiB to 100MiB to contain
the same one value in the first byte initially.

This easily breaks if we reduce the amount of RAM for the test instances
from 150MiB to e.g 110MiB since that makes SeaBIOS dirty some of the
pages starting at about 0x5dd2000 (~93 MiB) as it reuses those for the
HighMemory allocator since commit dc88f9b72df ("malloc: use large
ZoneHigh when there is enough memory").

This would result in the following errors:
12/60 qemu:qtest+qtest-x86_64 / qtest-x86_64/migration-test 
ERROR   2.74s   killed by signal 6 SIGABRT
stderr:
Memory content inconsistency at 5dd2000 first_byte = cc last_byte = cb 
current = 9e hit_edge = 1
Memory content inconsistency at 5dd3000 first_byte = cc last_byte = cb 
current = 89 hit_edge = 1
Memory content inconsistency at 5dd4000 first_byte = cc last_byte = cb 
current = 23 hit_edge = 1
Memory content inconsistency at 5dd5000 first_byte = cc last_byte = cb 
current = 31 hit_edge = 1
Memory content inconsistency at 5dd6000 first_byte = cc last_byte = cb 
current = 70 hit_edge = 1
Memory content inconsistency at 5dd7000 first_byte = cc last_byte = cb 
current = ff hit_edge = 1
Memory content inconsistency at 5dd8000 first_byte = cc last_byte = cb 
current = 54 hit_edge = 1
Memory content inconsistency at 5dd9000 first_byte = cc last_byte = cb 
current = 64 hit_edge = 1
Memory content inconsistency at 5dda000 first_byte = cc last_byte = cb 
current = 1d hit_edge = 1
Memory content inconsistency at 5ddb000 first_byte = cc last_byte = cb 
current = 1a hit_edge = 1
and in another 26 pages**
ERROR:../tests/qtest/migration-test.c:300:check_guests_ram: assertion 
failed: (bad == 0)

Fix this by always zeroing the first byte of each page in the range so
that we get consistent results no matter the initial contents.

Fixes: ea0c6d62391 ("test: Postcopy")
Signed-off-by: Daniil Tatianin 
---
 tests/migration/i386/a-b-bootblock.S |  9 +
 tests/migration/i386/a-b-bootblock.h | 16 
 2 files changed, 17 insertions(+), 8 deletions(-)

diff --git a/tests/migration/i386/a-b-bootblock.S 
b/tests/migration/i386/a-b-bootblock.S
index 036216e4a7..6bbd60 100644
--- a/tests/migration/i386/a-b-bootblock.S
+++ b/tests/migration/i386/a-b-bootblock.S
@@ -44,6 +44,15 @@ start: # at 0x7c00 ?
 
 # bl keeps a counter so we limit the output speed
 mov $0, %bl
+
+pre_zero:
+mov $TEST_MEM_START,%eax
+do_zero:
+movb $0, (%eax)
+add $4096,%eax
+cmp $TEST_MEM_END,%eax
+jl do_zero
+
 mainloop:
 mov $TEST_MEM_START,%eax
 innerloop:
diff --git a/tests/migration/i386/a-b-bootblock.h 
b/tests/migration/i386/a-b-bootblock.h
index b7b0fce2ee..5b523917ce 100644
--- a/tests/migration/i386/a-b-bootblock.h
+++ b/tests/migration/i386/a-b-bootblock.h
@@ -4,18 +4,18 @@
  * the header and the assembler differences in your patch submission.
  */
 unsigned char x86_bootsect[] = {
-  0xfa, 0x0f, 0x01, 0x16, 0x78, 0x7c, 0x66, 0xb8, 0x01, 0x00, 0x00, 0x00,
+  0xfa, 0x0f, 0x01, 0x16, 0x8c, 0x7c, 0x66, 0xb8, 0x01, 0x00, 0x00, 0x00,
   0x0f, 0x22, 0xc0, 0x66, 0xea, 0x20, 0x7c, 0x00, 0x00, 0x08, 0x00, 0x00,
   0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xe4, 0x92, 0x0c, 0x02,
   0xe6, 0x92, 0xb8, 0x10, 0x00, 0x00, 0x00, 0x8e, 0xd8, 0x66, 0xb8, 0x41,
   0x00, 0x66, 0xba, 0xf8, 0x03, 0xee, 0xb3, 0x00, 0xb8, 0x00, 0x00, 0x10,
-  0x00, 0xfe, 0x00, 0x05, 0x00, 0x10, 0x00, 0x00, 0x3d, 0x00, 0x00, 0x40,
-  0x06, 0x7c, 0xf2, 0xfe, 0xc3, 0x80, 0xe3, 0x3f, 0x75, 0xe6, 0x66, 0xb8,
-  0x42, 0x00, 0x66, 0xba, 0xf8, 0x03, 0xee, 0xeb, 0xdb, 0x8d, 0x76, 0x00,
-  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0x00, 0x00,
-  0x00, 0x9a, 0xcf, 0x00, 0xff, 0xff, 0x00, 0x00, 0x00, 0x92, 0xcf, 0x00,
-  0x27, 0x00, 0x60, 0x7c, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
-  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+  0x00, 0xc6, 0x00, 0x00, 0x05, 0x00, 0x10, 0x00, 0x00, 0x3d, 0x00, 0x00,
+  0x40, 0x06, 0x7c, 0xf1, 0xb8, 0x00, 0x00, 0x10, 0x00, 0xfe, 0x00, 0x05,
+  0x00, 0x10, 0x00, 0x00, 0x3d, 0x00, 0x00, 0x40, 0x06, 0x7c, 0xf2, 0xfe,
+  0xc3, 0x80, 0xe3, 0x3f, 0x75, 0xe6, 0x66, 0xb8, 0x42, 0x00, 0x66, 0xba,
+  0xf8, 0x03, 0xee, 0xeb, 0xdb, 0x8d, 0x76, 0x00, 0x00, 0x00, 0x00, 0x00,
+  0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0x00, 0x00, 0x00, 0x9a, 0xcf, 0x00,
+  0xff, 0xff, 0x00, 0x00, 0x00, 0x92, 0xcf, 0x00, 0x27, 0x00, 0x74, 0x7c,
   0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
   0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
   0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
-- 
2.34.1




[PATCH 1/2] i386/a-b-bootblock: factor test memory addresses out into constants

2023-09-07 Thread Daniil Tatianin
So that we have less magic numbers to deal with. This also allows us to
reuse these in the following commits.

Signed-off-by: Daniil Tatianin 
---
 tests/migration/i386/a-b-bootblock.S | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/tests/migration/i386/a-b-bootblock.S 
b/tests/migration/i386/a-b-bootblock.S
index 3d464c7568..036216e4a7 100644
--- a/tests/migration/i386/a-b-bootblock.S
+++ b/tests/migration/i386/a-b-bootblock.S
@@ -34,6 +34,10 @@ start: # at 0x7c00 ?
 mov $16,%eax
 mov %eax,%ds
 
+# Start from 1MB
+.set TEST_MEM_START, (1024*1024)
+.set TEST_MEM_END, (100*1024*1024)
+
 mov $65,%ax
 mov $0x3f8,%dx
 outb %al,%dx
@@ -41,12 +45,11 @@ start: # at 0x7c00 ?
 # bl keeps a counter so we limit the output speed
 mov $0, %bl
 mainloop:
-# Start from 1MB
-mov $(1024*1024),%eax
+mov $TEST_MEM_START,%eax
 innerloop:
 incb (%eax)
 add $4096,%eax
-cmp $(100*1024*1024),%eax
+cmp $TEST_MEM_END,%eax
 jl innerloop
 
 inc %bl
-- 
2.34.1




[PATCH 0/2] i386/a-b-bootblock: zero the first byte of each page on start

2023-09-07 Thread Daniil Tatianin
This series fixes an issue where the outcome of the migration qtest
relies on the initial memory contents all being the same across the
first 100MiB of RAM, which is a very fragile invariant.

We fix this by making sure we zero the first byte of every testable page
in range beforehand.

Daniil Tatianin (2):
  i386/a-b-bootblock: factor test memory addresses out into constants
  i386/a-b-bootblock: zero the first byte of each page on start

 tests/migration/i386/a-b-bootblock.S | 18 +++---
 tests/migration/i386/a-b-bootblock.h | 16 
 2 files changed, 23 insertions(+), 11 deletions(-)

-- 
2.34.1




Re: [PATCH] replication: compile out some staff when replication is not configured

2023-04-13 Thread Daniil Tatianin

Just a few minor nits

On 4/11/23 5:51 PM, Vladimir Sementsov-Ogievskiy wrote:

Don't compile-in replication-related files when replication is disabled
in config.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---

Hi all!

I'm unsure, should it be actually separate
--disable-colo / --enable-colo options or it's really used only together
with replication staff.. So, I decided to start with simpler variant.


You probably meant 'stuff' and not 'staff' in the commit message and 
here as well?




  block/meson.build |  2 +-
  migration/meson.build |  6 --
  net/meson.build   |  8 
  qapi/migration.json   |  6 --
  stubs/colo.c  | 46 +++
  stubs/meson.build |  1 +
  6 files changed, 60 insertions(+), 9 deletions(-)
  create mode 100644 stubs/colo.c

diff --git a/block/meson.build b/block/meson.build
index 382bec0e7d..b9a72e219b 100644
--- a/block/meson.build
+++ b/block/meson.build
@@ -84,7 +84,7 @@ block_ss.add(when: 'CONFIG_WIN32', if_true: 
files('file-win32.c', 'win32-aio.c')
  block_ss.add(when: 'CONFIG_POSIX', if_true: [files('file-posix.c'), coref, 
iokit])
  block_ss.add(when: libiscsi, if_true: files('iscsi-opts.c'))
  block_ss.add(when: 'CONFIG_LINUX', if_true: files('nvme.c'))
-if not get_option('replication').disabled()
+if get_option('replication').allowed()
block_ss.add(files('replication.c'))
  endif
  block_ss.add(when: libaio, if_true: files('linux-aio.c'))
diff --git a/migration/meson.build b/migration/meson.build
index 0d1bb9f96e..8180eaea7b 100644
--- a/migration/meson.build
+++ b/migration/meson.build
@@ -13,8 +13,6 @@ softmmu_ss.add(files(
'block-dirty-bitmap.c',
'channel.c',
'channel-block.c',
-  'colo-failover.c',
-  'colo.c',
'exec.c',
'fd.c',
'global_state.c',
@@ -29,6 +27,10 @@ softmmu_ss.add(files(
'threadinfo.c',
  ), gnutls)
  
+if get_option('replication').allowed()

+  softmmu_ss.add(files('colo.c', 'colo-failover.c'))
+endif
+
  softmmu_ss.add(when: rdma, if_true: files('rdma.c'))
  if get_option('live_block_migration').allowed()
softmmu_ss.add(files('block.c'))
diff --git a/net/meson.build b/net/meson.build
index 87afca3e93..634ab71cc6 100644
--- a/net/meson.build
+++ b/net/meson.build
@@ -1,13 +1,9 @@
  softmmu_ss.add(files(
'announce.c',
'checksum.c',
-  'colo-compare.c',
-  'colo.c',
'dump.c',
'eth.c',
'filter-buffer.c',
-  'filter-mirror.c',
-  'filter-rewriter.c',
'filter.c',
'hub.c',
'net-hmp-cmds.c',
@@ -19,6 +15,10 @@ softmmu_ss.add(files(
'util.c',
  ))
  
+if get_option('replication').allowed()

+  softmmu_ss.add(files('colo-compare.c', 'colo.c', 'filter-rewriter.c', 
'filter-mirror.c'))
+endif
+
  softmmu_ss.add(when: 'CONFIG_TCG', if_true: files('filter-replay.c'))
  
  if have_l2tpv3

diff --git a/qapi/migration.json b/qapi/migration.json
index c84fa10e86..5b81e09369 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -1685,7 +1685,8 @@
  ##
  { 'struct': 'COLOStatus',
'data': { 'mode': 'COLOMode', 'last-mode': 'COLOMode',
-'reason': 'COLOExitReason' } }
+'reason': 'COLOExitReason' },
+  'if': 'CONFIG_REPLICATION' }
  
  ##

  # @query-colo-status:
@@ -1702,7 +1703,8 @@
  # Since: 3.1
  ##
  { 'command': 'query-colo-status',
-  'returns': 'COLOStatus' }
+  'returns': 'COLOStatus',
+  'if': 'CONFIG_REPLICATION' }
  
  ##

  # @migrate-recover:
diff --git a/stubs/colo.c b/stubs/colo.c
new file mode 100644
index 00..5a02540baa
--- /dev/null
+++ b/stubs/colo.c
@@ -0,0 +1,46 @@
+#include "qemu/osdep.h"
+#include "qemu/notify.h"
+#include "net/colo-compare.h"
+#include "migration/colo.h"
+#include "qapi/error.h"
+#include "qapi/qapi-commands-migration.h"
+
+void colo_compare_cleanup(void)
+{
+abort();
+}
+
+void colo_shutdown(void)
+{
+abort();
+}
+
+void *colo_process_incoming_thread(void *opaque)
+{
+abort();
+}
+
+void colo_checkpoint_notify(void *opaque)
+{
+abort();
+}
+
+void migrate_start_colo_process(MigrationState *s)
+{
+abort();
+}
+
+bool migration_in_colo_state(void)
+{
+return false;
+}
+
+bool migration_incoming_in_colo_state(void)
+{
+return false;
+}
+
+void qmp_x_colo_lost_heartbeat(Error **errp)
+{
+error_setg(errp, "COLO support is not built in");


Maybe 'built-in' with a dash for consistency with usb-dev-stub?


+}
diff --git a/stubs/meson.build b/stubs/meson.build
index b2b5956d97..8412cad15f 100644
--- a/stubs/meson.build
+++ b/stubs/meson.build
@@ -45,6 +45,7 @@ stub_ss.add(files('target-get-monitor-def.c'))
  stub_ss.add(files('target-monitor-defs.c'))
  stub_ss.add(files('trace-control.c'))
  stub_ss.add(files('uuid.c'))
+stub_ss.add(files('colo.c'))
  stub_ss.add(files('vmstate.c'))
  stub_ss.add(files('vm-stop.c'))
  stub_ss.add(files('win32-kbd-hook.c'))




Re: [PATCH 0/2] virtio-scsi: stop using aio_disable_external() during unplug

2023-04-04 Thread Daniil Tatianin

On 3/23/23 9:56 PM, Stefan Hajnoczi wrote:

The aio_disable_external() API is a solution for stopping I/O during critical
sections. The newer BlockDevOps->drained_begin/end/poll() callbacks offer a
cleaner solution that supports the upcoming multi-queue block layer. This
series removes aio_disable_external() from the virtio-scsi emulation code.

Patch 1 is a fix for something I noticed when reading the code.

Patch 2 replaces aio_disable_external() with functionality for safe hot unplug
that's mostly already present in the SCSI emulation code.

Stefan Hajnoczi (2):
   virtio-scsi: avoid race between unplug and transport event
   virtio-scsi: stop using aio_disable_external() during unplug

  hw/scsi/scsi-bus.c|  3 ++-
  hw/scsi/scsi-disk.c   |  1 +
  hw/scsi/virtio-scsi.c | 21 +
  3 files changed, 12 insertions(+), 13 deletions(-)



For both patches:
Reviewed-by: Daniil Tatianin 



Re: [PATCH v0 0/4] backends/hostmem: add an ability to specify prealloc timeout

2023-01-23 Thread Daniil Tatianin

On 1/23/23 4:47 PM, Daniel P. Berrangé wrote:

On Mon, Jan 23, 2023 at 04:30:03PM +0300, Daniil Tatianin wrote:

On 1/23/23 11:57 AM, David Hildenbrand wrote:

On 20.01.23 14:47, Daniil Tatianin wrote:

This series introduces new qemu_prealloc_mem_with_timeout() api,
which allows limiting the maximum amount of time to be spent on memory
preallocation. It also adds prealloc statistics collection that is
exposed via an optional timeout handler.

This new api is then utilized by hostmem for guest RAM preallocation
controlled via new object properties called 'prealloc-timeout' and
'prealloc-timeout-fatal'.

This is useful for limiting VM startup time on systems with
unpredictable page allocation delays due to memory fragmentation or the
backing storage. The timeout can be configured to either simply emit a
warning and continue VM startup without having preallocated the entire
guest RAM or just abort startup entirely if that is not acceptable for
a specific use case.


The major use case for preallocation is memory resources that cannot be
overcommitted (hugetlb, file blocks, ...), to avoid running out of such
resources later, while the guest is already running, and crashing it.


Wouldn't you say that preallocating memory for the sake of speeding up guest
kernel startup & runtime is a valid use case of prealloc? This way we can
avoid expensive (for a multitude of reasons) page faults that will otherwise
slow down the guest significantly at runtime and affect the user experience.


Allocating only a fraction "because it takes too long" looks quite
useless in that (main use-case) context. We shouldn't encourage QEMU
users to play with fire in such a way. IOW, there should be no way
around "prealloc-timeout-fatal". Either preallocation succeeded and the
guest can run, or it failed, and the guest can't run.


Here we basically accept the fact that e.g with fragmented memory the kernel
might take a while in a page fault handler especially for hugetlb because of
page compaction that has to run for every fault.

This way we can prefault at least some number of pages and let the guest
fault the rest on demand later on during runtime even if it's slow and would
cause a noticeable lag.


Rather than treat this as a problem that needs a timeout, can we
restate it as situations need synchronous vs asynchronous
preallocation ?

For the case where we need synchronous prealloc, current QEMU deals
with that. If it doesn't work quickly enough, mgmt can just kill
QEMU already today.

For the case where you would like some prealloc, but don't mind
if it runs without full prealloc, then why not just treat it as an
entirely asynchronous task ? Instead of calling qemu_prealloc_mem
and waiting for it to complete, just spawn a thread to run
qemu_prealloc_mem, so it doesn't block QEMU startup. This will
have minimal maint burden on the existing code, and will avoid
need for mgmt apps to think about what timeout value to give,
which is good because timeouts are hard to get right.

Most of the time that async background prealloc will still finish
before the guest even gets out of the firmware phase, but if it
takes longer it is no big deal. You don't need to quit the prealloc
job early, you just need it to not delay the guest OS boot IIUC.

This impl could be done with the 'prealloc' property turning from
a boolean on/off, to a enum  on/async/off, where 'on' == sync
prealloc. Or add a separate 'prealloc-async' bool property


I like this idea, but I'm not sure how we would go about writing to live 
guest memory. Is that something that can be done safely without racing 
with the guest?



With regards,
Daniel




Re: [PATCH v0 0/4] backends/hostmem: add an ability to specify prealloc timeout

2023-01-23 Thread Daniil Tatianin

On 1/23/23 11:57 AM, David Hildenbrand wrote:

On 20.01.23 14:47, Daniil Tatianin wrote:

This series introduces new qemu_prealloc_mem_with_timeout() api,
which allows limiting the maximum amount of time to be spent on memory
preallocation. It also adds prealloc statistics collection that is
exposed via an optional timeout handler.

This new api is then utilized by hostmem for guest RAM preallocation
controlled via new object properties called 'prealloc-timeout' and
'prealloc-timeout-fatal'.

This is useful for limiting VM startup time on systems with
unpredictable page allocation delays due to memory fragmentation or the
backing storage. The timeout can be configured to either simply emit a
warning and continue VM startup without having preallocated the entire
guest RAM or just abort startup entirely if that is not acceptable for
a specific use case.


The major use case for preallocation is memory resources that cannot be 
overcommitted (hugetlb, file blocks, ...), to avoid running out of such 
resources later, while the guest is already running, and crashing it.


Wouldn't you say that preallocating memory for the sake of speeding up 
guest kernel startup & runtime is a valid use case of prealloc? This way 
we can avoid expensive (for a multitude of reasons) page faults that 
will otherwise slow down the guest significantly at runtime and affect 
the user experience.


Allocating only a fraction "because it takes too long" looks quite 
useless in that (main use-case) context. We shouldn't encourage QEMU 
users to play with fire in such a way. IOW, there should be no way 
around "prealloc-timeout-fatal". Either preallocation succeeded and the 
guest can run, or it failed, and the guest can't run.


Here we basically accept the fact that e.g with fragmented memory the 
kernel might take a while in a page fault handler especially for hugetlb 
because of page compaction that has to run for every fault.


This way we can prefault at least some number of pages and let the guest 
fault the rest on demand later on during runtime even if it's slow and 
would cause a noticeable lag.


... but then, management tools can simply start QEMU with "-S", start an 
own timer, and zap QEMU if it didn't manage to come up in time, and 
simply start a new QEMU instance without preallocation enabled.


The "good" thing about that approach is that it will also cover any 
implicit memory preallocation, like using mlock() or VFIO, that don't 
run in ordinary per-hostmem preallocation context. If setting QEMU up 
takes to long, you might want to try on a different hypervisor in your 
cluster instead.


This approach definitely works too but again it assumes that we always 
want 'prealloc-timeout-fatal' to be on, which is, for the most part only 
the case for working around issues that might be caused by overcommit.




I don't immediately see why we want to make our preallcoation+hostmem 
implementation in QEMU more complicated for such a use case.






[PATCH 3/4] backends/hostmem: add an ability to specify prealloc timeout

2023-01-20 Thread Daniil Tatianin
Use the new qemu_prealloc_mem_with_timeout api so that we can limit the
maximum amount of time to be spent preallocating guest RAM. We also emit
a warning from the timeout handler detailing the current prealloc
progress and letting the user know that it was exceeded.

The timeout is set to zero (no timeout) by default, and can be
configured via the new 'prealloc-timeout' property.

Signed-off-by: Daniil Tatianin 
---
 backends/hostmem.c   | 48 ++--
 include/sysemu/hostmem.h |  2 ++
 qapi/qom.json|  4 
 3 files changed, 52 insertions(+), 2 deletions(-)

diff --git a/backends/hostmem.c b/backends/hostmem.c
index 842bfa9eb7..be9af7515e 100644
--- a/backends/hostmem.c
+++ b/backends/hostmem.c
@@ -34,6 +34,19 @@ QEMU_BUILD_BUG_ON(HOST_MEM_POLICY_BIND != MPOL_BIND);
 QEMU_BUILD_BUG_ON(HOST_MEM_POLICY_INTERLEAVE != MPOL_INTERLEAVE);
 #endif
 
+static void
+host_memory_on_prealloc_timeout(void *opaque,
+const PreallocStats *stats)
+{
+HostMemoryBackend *backend = opaque;
+
+backend->prealloc_did_timeout = true;
+warn_report("HostMemory preallocation timeout %"PRIu64"s exceeded, "
+"allocated %zu/%zu (%zu byte) pages (%d threads)",
+(uint64_t)stats->seconds_elapsed, stats->allocated_pages,
+stats->total_pages, stats->page_size, stats->threads);
+}
+
 char *
 host_memory_backend_get_name(HostMemoryBackend *backend)
 {
@@ -223,8 +236,26 @@ static bool do_prealloc_mr(HostMemoryBackend *backend, 
Error **errp)
 void *ptr = memory_region_get_ram_ptr(>mr);
 uint64_t sz = memory_region_size(>mr);
 
-qemu_prealloc_mem(fd, ptr, sz, backend->prealloc_threads,
-  backend->prealloc_context, _err);
+if (backend->prealloc_timeout) {
+PreallocTimeout timeout = {
+.seconds = (time_t)backend->prealloc_timeout,
+.user = backend,
+.on_timeout = host_memory_on_prealloc_timeout,
+};
+
+qemu_prealloc_mem_with_timeout(fd, ptr, sz, backend->prealloc_threads,
+   backend->prealloc_context, ,
+   _err);
+if (local_err && backend->prealloc_did_timeout) {
+error_free(local_err);
+local_err = NULL;
+}
+} else {
+qemu_prealloc_mem(fd, ptr, sz, backend->prealloc_threads,
+  backend->prealloc_context, _err);
+}
+
+
 if (local_err) {
 error_propagate(errp, local_err);
 return false;
@@ -277,6 +308,13 @@ static void 
host_memory_backend_set_prealloc_threads(Object *obj, Visitor *v,
 backend->prealloc_threads = value;
 }
 
+static void host_memory_backend_get_set_prealloc_timeout(Object *obj,
+Visitor *v, const char *name, void *opaque, Error **errp)
+{
+HostMemoryBackend *backend = MEMORY_BACKEND(obj);
+visit_type_uint32(v, name, >prealloc_timeout, errp);
+}
+
 static void host_memory_backend_init(Object *obj)
 {
 HostMemoryBackend *backend = MEMORY_BACKEND(obj);
@@ -516,6 +554,12 @@ host_memory_backend_class_init(ObjectClass *oc, void *data)
 object_property_allow_set_link, OBJ_PROP_LINK_STRONG);
 object_class_property_set_description(oc, "prealloc-context",
 "Context to use for creating CPU threads for preallocation");
+object_class_property_add(oc, "prealloc-timeout", "int",
+host_memory_backend_get_set_prealloc_timeout,
+host_memory_backend_get_set_prealloc_timeout,
+NULL, NULL);
+object_class_property_set_description(oc, "prealloc-timeout",
+"Maximum memory preallocation timeout in seconds");
 object_class_property_add(oc, "size", "int",
 host_memory_backend_get_size,
 host_memory_backend_set_size,
diff --git a/include/sysemu/hostmem.h b/include/sysemu/hostmem.h
index 39326f1d4f..21910f3b45 100644
--- a/include/sysemu/hostmem.h
+++ b/include/sysemu/hostmem.h
@@ -66,7 +66,9 @@ struct HostMemoryBackend {
 uint64_t size;
 bool merge, dump, use_canonical_path;
 bool prealloc, is_mapped, share, reserve;
+bool prealloc_did_timeout;
 uint32_t prealloc_threads;
+uint32_t prealloc_timeout;
 ThreadContext *prealloc_context;
 DECLARE_BITMAP(host_nodes, MAX_NODES + 1);
 HostMemPolicy policy;
diff --git a/qapi/qom.json b/qapi/qom.json
index 30e76653ad..9149c064b8 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -581,6 +581,9 @@
 # @prealloc-context: thread context to use for creation of preallocation 
threads
 #(default: none) (since 7.2)
 #
+# @prealloc-timeout: Maximum memory preallocation timeout in seconds
+#(default: 0) (since 7.3)
+#
 # @share: if false, the memory is private to QEMU; if true

[PATCH 2/4] backends/hostmem: move memory region preallocation logic into a helper

2023-01-20 Thread Daniil Tatianin
...so that we don't have to duplicate it in multiple places throughout
the file.

Signed-off-by: Daniil Tatianin 
---
 backends/hostmem.c | 38 --
 1 file changed, 20 insertions(+), 18 deletions(-)

diff --git a/backends/hostmem.c b/backends/hostmem.c
index 747e7838c0..842bfa9eb7 100644
--- a/backends/hostmem.c
+++ b/backends/hostmem.c
@@ -216,10 +216,26 @@ static bool host_memory_backend_get_prealloc(Object *obj, 
Error **errp)
 return backend->prealloc;
 }
 
+static bool do_prealloc_mr(HostMemoryBackend *backend, Error **errp)
+{
+Error *local_err = NULL;
+int fd = memory_region_get_fd(>mr);
+void *ptr = memory_region_get_ram_ptr(>mr);
+uint64_t sz = memory_region_size(>mr);
+
+qemu_prealloc_mem(fd, ptr, sz, backend->prealloc_threads,
+  backend->prealloc_context, _err);
+if (local_err) {
+error_propagate(errp, local_err);
+return false;
+}
+
+return true;
+}
+
 static void host_memory_backend_set_prealloc(Object *obj, bool value,
  Error **errp)
 {
-Error *local_err = NULL;
 HostMemoryBackend *backend = MEMORY_BACKEND(obj);
 
 if (!backend->reserve && value) {
@@ -233,17 +249,7 @@ static void host_memory_backend_set_prealloc(Object *obj, 
bool value,
 }
 
 if (value && !backend->prealloc) {
-int fd = memory_region_get_fd(>mr);
-void *ptr = memory_region_get_ram_ptr(>mr);
-uint64_t sz = memory_region_size(>mr);
-
-qemu_prealloc_mem(fd, ptr, sz, backend->prealloc_threads,
-  backend->prealloc_context, _err);
-if (local_err) {
-error_propagate(errp, local_err);
-return;
-}
-backend->prealloc = true;
+backend->prealloc = do_prealloc_mr(backend, errp);
 }
 }
 
@@ -399,12 +405,8 @@ host_memory_backend_memory_complete(UserCreatable *uc, 
Error **errp)
  * specified NUMA policy in place.
  */
 if (backend->prealloc) {
-qemu_prealloc_mem(memory_region_get_fd(>mr), ptr, sz,
-  backend->prealloc_threads,
-  backend->prealloc_context, _err);
-if (local_err) {
-goto out;
-}
+do_prealloc_mr(backend, errp);
+return;
 }
 }
 out:
-- 
2.25.1




[PATCH v0 0/4] backends/hostmem: add an ability to specify prealloc timeout

2023-01-20 Thread Daniil Tatianin
This series introduces new qemu_prealloc_mem_with_timeout() api,
which allows limiting the maximum amount of time to be spent on memory
preallocation. It also adds prealloc statistics collection that is
exposed via an optional timeout handler.

This new api is then utilized by hostmem for guest RAM preallocation
controlled via new object properties called 'prealloc-timeout' and
'prealloc-timeout-fatal'.

This is useful for limiting VM startup time on systems with
unpredictable page allocation delays due to memory fragmentation or the
backing storage. The timeout can be configured to either simply emit a
warning and continue VM startup without having preallocated the entire
guest RAM or just abort startup entirely if that is not acceptable for
a specific use case.

Daniil Tatianin (4):
  oslib: introduce new qemu_prealloc_mem_with_timeout() api
  backends/hostmem: move memory region preallocation logic into a helper
  backends/hostmem: add an ability to specify prealloc timeout
  backends/hostmem: add an ability to make prealloc timeout fatal

 backends/hostmem.c   | 112 +++---
 include/qemu/osdep.h |  19 +++
 include/sysemu/hostmem.h |   3 ++
 qapi/qom.json|   8 +++
 util/oslib-posix.c   | 114 +++
 util/oslib-win32.c   |   9 
 6 files changed, 238 insertions(+), 27 deletions(-)

-- 
2.25.1




[PATCH 4/4] backends/hostmem: add an ability to make prealloc timeout fatal

2023-01-20 Thread Daniil Tatianin
This is controlled via the new 'prealloc-timeout-fatal' property and can
be useful for cases when we cannot afford to not preallocate all guest
pages while being time constrained.

Signed-off-by: Daniil Tatianin 
---
 backends/hostmem.c   | 38 ++
 include/sysemu/hostmem.h |  1 +
 qapi/qom.json|  4 
 3 files changed, 39 insertions(+), 4 deletions(-)

diff --git a/backends/hostmem.c b/backends/hostmem.c
index be9af7515e..0808dc6951 100644
--- a/backends/hostmem.c
+++ b/backends/hostmem.c
@@ -39,12 +39,21 @@ host_memory_on_prealloc_timeout(void *opaque,
 const PreallocStats *stats)
 {
 HostMemoryBackend *backend = opaque;
+const char *msg = "HostMemory preallocation timeout %"PRIu64"s exceeded, "
+  "allocated %zu/%zu (%zu byte) pages (%d threads)";
+
+if (backend->prealloc_timeout_fatal) {
+error_report(msg, (uint64_t)stats->seconds_elapsed,
+ stats->allocated_pages, stats->total_pages,
+ stats->page_size, stats->threads);
+exit(1);
+
+}
 
 backend->prealloc_did_timeout = true;
-warn_report("HostMemory preallocation timeout %"PRIu64"s exceeded, "
-"allocated %zu/%zu (%zu byte) pages (%d threads)",
-(uint64_t)stats->seconds_elapsed, stats->allocated_pages,
-stats->total_pages, stats->page_size, stats->threads);
+warn_report(msg, (uint64_t)stats->seconds_elapsed,
+stats->allocated_pages, stats->total_pages,
+stats->page_size, stats->threads);
 }
 
 char *
@@ -315,6 +324,22 @@ static void 
host_memory_backend_get_set_prealloc_timeout(Object *obj,
 visit_type_uint32(v, name, >prealloc_timeout, errp);
 }
 
+static bool host_memory_backend_get_prealloc_timeout_fatal(
+Object *obj, Error **errp)
+{
+HostMemoryBackend *backend = MEMORY_BACKEND(obj);
+
+return backend->prealloc_timeout_fatal;
+}
+
+static void host_memory_backend_set_prealloc_timeout_fatal(
+Object *obj, bool value, Error **errp)
+{
+HostMemoryBackend *backend = MEMORY_BACKEND(obj);
+
+backend->prealloc_timeout_fatal = value;
+}
+
 static void host_memory_backend_init(Object *obj)
 {
 HostMemoryBackend *backend = MEMORY_BACKEND(obj);
@@ -560,6 +585,11 @@ host_memory_backend_class_init(ObjectClass *oc, void *data)
 NULL, NULL);
 object_class_property_set_description(oc, "prealloc-timeout",
 "Maximum memory preallocation timeout in seconds");
+object_class_property_add_bool(oc, "prealloc-timeout-fatal",
+host_memory_backend_get_prealloc_timeout_fatal,
+host_memory_backend_set_prealloc_timeout_fatal);
+object_class_property_set_description(oc, "prealloc-timeout-fatal",
+"Consider preallocation timeout a fatal error");
 object_class_property_add(oc, "size", "int",
 host_memory_backend_get_size,
 host_memory_backend_set_size,
diff --git a/include/sysemu/hostmem.h b/include/sysemu/hostmem.h
index 21910f3b45..b501b5eff2 100644
--- a/include/sysemu/hostmem.h
+++ b/include/sysemu/hostmem.h
@@ -67,6 +67,7 @@ struct HostMemoryBackend {
 bool merge, dump, use_canonical_path;
 bool prealloc, is_mapped, share, reserve;
 bool prealloc_did_timeout;
+bool prealloc_timeout_fatal;
 uint32_t prealloc_threads;
 uint32_t prealloc_timeout;
 ThreadContext *prealloc_context;
diff --git a/qapi/qom.json b/qapi/qom.json
index 9149c064b8..70644d714b 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -584,6 +584,9 @@
 # @prealloc-timeout: Maximum memory preallocation timeout in seconds
 #(default: 0) (since 7.3)
 #
+# @prealloc-timeout-fatal: Consider preallocation timeout a fatal error
+#  (default: false) (since 7.3)
+#
 # @share: if false, the memory is private to QEMU; if true, it is shared
 # (default: false)
 #
@@ -616,6 +619,7 @@
 '*prealloc-threads': 'uint32',
 '*prealloc-context': 'str',
 '*prealloc-timeout': 'uint32',
+'*prealloc-timeout-fatal': 'bool',
 '*share': 'bool',
 '*reserve': 'bool',
 'size': 'size',
-- 
2.25.1




[PATCH 1/4] oslib: introduce new qemu_prealloc_mem_with_timeout() api

2023-01-20 Thread Daniil Tatianin
This helper allows limiting the maximum amount time to be spent
preallocating a block of memory, which is important on systems that
might have unpredictable page allocation delays because of possible
fragmentation or other reasons specific to the backend.

It also exposes a way to register a callback that is invoked in case the
specified timeout is exceeded. The callback is provided with a
PreallocStats structure that includes a bunch of statistics about the
progress including total & allocated number of pages, as well as page
size and number of allocation threads.

The win32 implementation is currently a stub that just calls into the
old qemu_prealloc_mem api.

Signed-off-by: Daniil Tatianin 
---
 include/qemu/osdep.h |  19 
 util/oslib-posix.c   | 114 +++
 util/oslib-win32.c   |   9 
 3 files changed, 133 insertions(+), 9 deletions(-)

diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index bd23a08595..21757e5144 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -595,6 +595,25 @@ typedef struct ThreadContext ThreadContext;
 void qemu_prealloc_mem(int fd, char *area, size_t sz, int max_threads,
ThreadContext *tc, Error **errp);
 
+typedef struct PreallocStats {
+size_t page_size;
+size_t total_pages;
+size_t allocated_pages;
+int threads;
+time_t seconds_elapsed;
+} PreallocStats;
+
+typedef struct PreallocTimeout {
+time_t seconds;
+void *user;
+void (*on_timeout)(void *user, const PreallocStats *stats);
+} PreallocTimeout;
+
+void qemu_prealloc_mem_with_timeout(int fd, char *area, size_t sz,
+int max_threads, ThreadContext *tc,
+const PreallocTimeout *timeout,
+Error **errp);
+
 /**
  * qemu_get_pid_name:
  * @pid: pid of a process
diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index 59a891b6a8..570fca601f 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -74,6 +74,7 @@ typedef struct MemsetContext {
 bool any_thread_failed;
 struct MemsetThread *threads;
 int num_threads;
+PreallocStats stats;
 } MemsetContext;
 
 struct MemsetThread {
@@ -83,6 +84,7 @@ struct MemsetThread {
 QemuThread pgthread;
 sigjmp_buf env;
 MemsetContext *context;
+size_t touched_pages;
 };
 typedef struct MemsetThread MemsetThread;
 
@@ -373,6 +375,7 @@ static void *do_touch_pages(void *arg)
  */
 *(volatile char *)addr = *addr;
 addr += hpagesize;
+qatomic_inc(_args->touched_pages);
 }
 }
 pthread_sigmask(SIG_SETMASK, , NULL);
@@ -396,6 +399,11 @@ static void *do_madv_populate_write_pages(void *arg)
 if (size && qemu_madvise(addr, size, QEMU_MADV_POPULATE_WRITE)) {
 ret = -errno;
 }
+
+if (!ret) {
+qatomic_set(_args->touched_pages, memset_args->numpages);
+}
+
 return (void *)(uintptr_t)ret;
 }
 
@@ -418,8 +426,68 @@ static inline int get_memset_num_threads(size_t hpagesize, 
size_t numpages,
 return ret;
 }
 
+static int do_join_memset_threads_with_timeout(MemsetContext *context,
+   time_t seconds)
+{
+struct timespec ts;
+int i = 0;
+
+if (clock_gettime(CLOCK_REALTIME, ) < 0) {
+return i;
+}
+ts.tv_sec += seconds;
+
+for (; i < context->num_threads; ++i) {
+if (pthread_timedjoin_np(context->threads[i].pgthread.thread,
+ NULL, )) {
+break;
+}
+}
+
+return i;
+}
+
+static void memset_stats_count_pages(MemsetContext *context)
+{
+int i;
+
+for (i = 0; i < context->num_threads; ++i) {
+size_t pages = qatomic_load_acquire(
+>threads[i].touched_pages);
+context->stats.allocated_pages += pages;
+}
+}
+
+static int timed_join_memset_threads(MemsetContext *context,
+ const PreallocTimeout *timeout)
+{
+int i, off;
+PreallocStats *stats = >stats;
+off = do_join_memset_threads_with_timeout(context, timeout->seconds);
+
+if (off != context->num_threads && timeout->on_timeout) {
+memset_stats_count_pages(context);
+
+/*
+ * Guard against possible races if preallocation finishes right
+ * after the timeout is exceeded.
+ */
+if (stats->allocated_pages < stats->total_pages) {
+stats->seconds_elapsed = timeout->seconds;
+timeout->on_timeout(timeout->user, stats);
+}
+}
+
+for (i = off; i < context->num_threads; ++i) {
+pthread_cancel(context->threads[i].pgthread.thread);
+}
+
+return off;
+}
+
 static int touch_all_pages(char *area, size_t hpagesize, size_t numpages,
   

Re: [PATCH v3 0/5] vhost-user-blk: dynamically resize config space based on features

2022-10-11 Thread Daniil Tatianin

Ping :)

On 9/6/22 10:31 AM, Daniil Tatianin wrote:

This patch set attempts to align vhost-user-blk with virtio-blk in
terms of backward compatibility and flexibility. It also improves
the virtio core by introducing new common code that can be used by
a virtio device to calculate its config space size.

In particular it adds the following things:
- Common virtio code for deducing the required device config size based
   on provided host features.
- Ability to disable modern virtio-blk features like
   discard/write-zeroes for vhost-user-blk.
- Dynamic configuration space resizing based on enabled features,
   by reusing the common code introduced in the earlier commits.
- Cleans up the VHostUserBlk structure by reusing parent fields.

Changes since v1 (mostly addresses Stefan's feedback):
- Introduce VirtIOConfigSizeParams & virtio_get_config_size
- Remove virtio_blk_set_config_size altogether, make virtio-blk-common.c
   only hold the virtio-blk config size parameters.
- Reuse parent fields in vhost-user-blk instead of introducing new ones.

Changes since v2:
- Squash the first four commits into one
- Set .min_size for virtio-net as well
- Move maintainer/meson user-blk bits to the last commit

Daniil Tatianin (5):
   virtio: introduce VirtIOConfigSizeParams & virtio_get_config_size
   virtio-blk: move config size params to virtio-blk-common
   vhost-user-blk: make it possible to disable write-zeroes/discard
   vhost-user-blk: make 'config_wce' part of 'host_features'
   vhost-user-blk: dynamically resize config space based on features

  MAINTAINERS   |  4 +++
  hw/block/meson.build  |  4 +--
  hw/block/vhost-user-blk.c | 29 +++-
  hw/block/virtio-blk-common.c  | 39 +++
  hw/block/virtio-blk.c | 28 +++
  hw/net/virtio-net.c   |  9 +--
  hw/virtio/virtio.c| 10 ---
  include/hw/virtio/vhost-user-blk.h|  1 -
  include/hw/virtio/virtio-blk-common.h | 20 ++
  include/hw/virtio/virtio.h| 10 +--
  10 files changed, 105 insertions(+), 49 deletions(-)
  create mode 100644 hw/block/virtio-blk-common.c
  create mode 100644 include/hw/virtio/virtio-blk-common.h





Re: [PATCH v3 0/5] vhost-user-blk: dynamically resize config space based on features

2022-10-11 Thread Daniil Tatianin




On 10/11/22 10:20 AM, Daniil Tatianin wrote:

Ping :)

Oops, didn't see the pull request. Disregard this.


On 9/6/22 10:31 AM, Daniil Tatianin wrote:

This patch set attempts to align vhost-user-blk with virtio-blk in
terms of backward compatibility and flexibility. It also improves
the virtio core by introducing new common code that can be used by
a virtio device to calculate its config space size.

In particular it adds the following things:
- Common virtio code for deducing the required device config size based
   on provided host features.
- Ability to disable modern virtio-blk features like
   discard/write-zeroes for vhost-user-blk.
- Dynamic configuration space resizing based on enabled features,
   by reusing the common code introduced in the earlier commits.
- Cleans up the VHostUserBlk structure by reusing parent fields.

Changes since v1 (mostly addresses Stefan's feedback):
- Introduce VirtIOConfigSizeParams & virtio_get_config_size
- Remove virtio_blk_set_config_size altogether, make virtio-blk-common.c
   only hold the virtio-blk config size parameters.
- Reuse parent fields in vhost-user-blk instead of introducing new ones.

Changes since v2:
- Squash the first four commits into one
- Set .min_size for virtio-net as well
- Move maintainer/meson user-blk bits to the last commit

Daniil Tatianin (5):
   virtio: introduce VirtIOConfigSizeParams & virtio_get_config_size
   virtio-blk: move config size params to virtio-blk-common
   vhost-user-blk: make it possible to disable write-zeroes/discard
   vhost-user-blk: make 'config_wce' part of 'host_features'
   vhost-user-blk: dynamically resize config space based on features

  MAINTAINERS   |  4 +++
  hw/block/meson.build  |  4 +--
  hw/block/vhost-user-blk.c | 29 +++-
  hw/block/virtio-blk-common.c  | 39 +++
  hw/block/virtio-blk.c | 28 +++
  hw/net/virtio-net.c   |  9 +--
  hw/virtio/virtio.c    | 10 ---
  include/hw/virtio/vhost-user-blk.h    |  1 -
  include/hw/virtio/virtio-blk-common.h | 20 ++
  include/hw/virtio/virtio.h    | 10 +--
  10 files changed, 105 insertions(+), 49 deletions(-)
  create mode 100644 hw/block/virtio-blk-common.c
  create mode 100644 include/hw/virtio/virtio-blk-common.h





Re: [PATCH v3 0/5] vhost-user-blk: dynamically resize config space based on features

2022-09-12 Thread Daniil Tatianin
Thanks for reviewing! Could you send a Pull request? Or do we need an 
ack from someone else?


On 9/7/22 7:02 AM, Raphael Norwitz wrote:

Thanks for the changes. For the whole series:

Reviewed-by: Raphael Norwitz 

On Tue, Sep 06, 2022 at 10:31:06AM +0300, Daniil Tatianin wrote:

This patch set attempts to align vhost-user-blk with virtio-blk in
terms of backward compatibility and flexibility. It also improves
the virtio core by introducing new common code that can be used by
a virtio device to calculate its config space size.

In particular it adds the following things:
- Common virtio code for deducing the required device config size based
   on provided host features.
- Ability to disable modern virtio-blk features like
   discard/write-zeroes for vhost-user-blk.
- Dynamic configuration space resizing based on enabled features,
   by reusing the common code introduced in the earlier commits.
- Cleans up the VHostUserBlk structure by reusing parent fields.

Changes since v1 (mostly addresses Stefan's feedback):
- Introduce VirtIOConfigSizeParams & virtio_get_config_size
- Remove virtio_blk_set_config_size altogether, make virtio-blk-common.c
   only hold the virtio-blk config size parameters.
- Reuse parent fields in vhost-user-blk instead of introducing new ones.

Changes since v2:
- Squash the first four commits into one
- Set .min_size for virtio-net as well
- Move maintainer/meson user-blk bits to the last commit

Daniil Tatianin (5):
   virtio: introduce VirtIOConfigSizeParams & virtio_get_config_size
   virtio-blk: move config size params to virtio-blk-common
   vhost-user-blk: make it possible to disable write-zeroes/discard
   vhost-user-blk: make 'config_wce' part of 'host_features'
   vhost-user-blk: dynamically resize config space based on features

  MAINTAINERS   |  4 +++
  hw/block/meson.build  |  4 +--
  hw/block/vhost-user-blk.c | 29 +++-
  hw/block/virtio-blk-common.c  | 39 +++
  hw/block/virtio-blk.c | 28 +++
  hw/net/virtio-net.c   |  9 +--
  hw/virtio/virtio.c| 10 ---
  include/hw/virtio/vhost-user-blk.h|  1 -
  include/hw/virtio/virtio-blk-common.h | 20 ++
  include/hw/virtio/virtio.h| 10 +--
  10 files changed, 105 insertions(+), 49 deletions(-)
  create mode 100644 hw/block/virtio-blk-common.c
  create mode 100644 include/hw/virtio/virtio-blk-common.h

--
2.25.1




[PATCH v3 0/5] vhost-user-blk: dynamically resize config space based on features

2022-09-06 Thread Daniil Tatianin
This patch set attempts to align vhost-user-blk with virtio-blk in
terms of backward compatibility and flexibility. It also improves
the virtio core by introducing new common code that can be used by
a virtio device to calculate its config space size.

In particular it adds the following things:
- Common virtio code for deducing the required device config size based
  on provided host features.
- Ability to disable modern virtio-blk features like
  discard/write-zeroes for vhost-user-blk.
- Dynamic configuration space resizing based on enabled features,
  by reusing the common code introduced in the earlier commits.
- Cleans up the VHostUserBlk structure by reusing parent fields.

Changes since v1 (mostly addresses Stefan's feedback):
- Introduce VirtIOConfigSizeParams & virtio_get_config_size
- Remove virtio_blk_set_config_size altogether, make virtio-blk-common.c
  only hold the virtio-blk config size parameters.
- Reuse parent fields in vhost-user-blk instead of introducing new ones.

Changes since v2:
- Squash the first four commits into one
- Set .min_size for virtio-net as well
- Move maintainer/meson user-blk bits to the last commit

Daniil Tatianin (5):
  virtio: introduce VirtIOConfigSizeParams & virtio_get_config_size
  virtio-blk: move config size params to virtio-blk-common
  vhost-user-blk: make it possible to disable write-zeroes/discard
  vhost-user-blk: make 'config_wce' part of 'host_features'
  vhost-user-blk: dynamically resize config space based on features

 MAINTAINERS   |  4 +++
 hw/block/meson.build  |  4 +--
 hw/block/vhost-user-blk.c | 29 +++-
 hw/block/virtio-blk-common.c  | 39 +++
 hw/block/virtio-blk.c | 28 +++
 hw/net/virtio-net.c   |  9 +--
 hw/virtio/virtio.c| 10 ---
 include/hw/virtio/vhost-user-blk.h|  1 -
 include/hw/virtio/virtio-blk-common.h | 20 ++
 include/hw/virtio/virtio.h| 10 +--
 10 files changed, 105 insertions(+), 49 deletions(-)
 create mode 100644 hw/block/virtio-blk-common.c
 create mode 100644 include/hw/virtio/virtio-blk-common.h

-- 
2.25.1




[PATCH v3 3/5] vhost-user-blk: make it possible to disable write-zeroes/discard

2022-09-06 Thread Daniil Tatianin
It is useful to have the ability to disable these features for
compatibility with older VMs that don't have these implemented.

Signed-off-by: Daniil Tatianin 
Reviewed-by: Raphael Norwitz 
---
 hw/block/vhost-user-blk.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index 9117222456..4c9727e08c 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -259,8 +259,6 @@ static uint64_t vhost_user_blk_get_features(VirtIODevice 
*vdev,
 virtio_add_feature(, VIRTIO_BLK_F_BLK_SIZE);
 virtio_add_feature(, VIRTIO_BLK_F_FLUSH);
 virtio_add_feature(, VIRTIO_BLK_F_RO);
-virtio_add_feature(, VIRTIO_BLK_F_DISCARD);
-virtio_add_feature(, VIRTIO_BLK_F_WRITE_ZEROES);
 
 if (s->config_wce) {
 virtio_add_feature(, VIRTIO_BLK_F_CONFIG_WCE);
@@ -592,6 +590,10 @@ static Property vhost_user_blk_properties[] = {
VHOST_USER_BLK_AUTO_NUM_QUEUES),
 DEFINE_PROP_UINT32("queue-size", VHostUserBlk, queue_size, 128),
 DEFINE_PROP_BIT("config-wce", VHostUserBlk, config_wce, 0, true),
+DEFINE_PROP_BIT64("discard", VHostUserBlk, parent_obj.host_features,
+  VIRTIO_BLK_F_DISCARD, true),
+DEFINE_PROP_BIT64("write-zeroes", VHostUserBlk, parent_obj.host_features,
+  VIRTIO_BLK_F_WRITE_ZEROES, true),
 DEFINE_PROP_END_OF_LIST(),
 };
 
-- 
2.25.1




[PATCH v3 4/5] vhost-user-blk: make 'config_wce' part of 'host_features'

2022-09-06 Thread Daniil Tatianin
No reason to have this be a separate field. This also makes it more akin
to what the virtio-blk device does.

Signed-off-by: Daniil Tatianin 
Reviewed-by: Raphael Norwitz 
---
 hw/block/vhost-user-blk.c  | 6 ++
 include/hw/virtio/vhost-user-blk.h | 1 -
 2 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index 4c9727e08c..0d916edefa 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -260,9 +260,6 @@ static uint64_t vhost_user_blk_get_features(VirtIODevice 
*vdev,
 virtio_add_feature(, VIRTIO_BLK_F_FLUSH);
 virtio_add_feature(, VIRTIO_BLK_F_RO);
 
-if (s->config_wce) {
-virtio_add_feature(, VIRTIO_BLK_F_CONFIG_WCE);
-}
 if (s->num_queues > 1) {
 virtio_add_feature(, VIRTIO_BLK_F_MQ);
 }
@@ -589,7 +586,8 @@ static Property vhost_user_blk_properties[] = {
 DEFINE_PROP_UINT16("num-queues", VHostUserBlk, num_queues,
VHOST_USER_BLK_AUTO_NUM_QUEUES),
 DEFINE_PROP_UINT32("queue-size", VHostUserBlk, queue_size, 128),
-DEFINE_PROP_BIT("config-wce", VHostUserBlk, config_wce, 0, true),
+DEFINE_PROP_BIT64("config-wce", VHostUserBlk, parent_obj.host_features,
+  VIRTIO_BLK_F_CONFIG_WCE, true),
 DEFINE_PROP_BIT64("discard", VHostUserBlk, parent_obj.host_features,
   VIRTIO_BLK_F_DISCARD, true),
 DEFINE_PROP_BIT64("write-zeroes", VHostUserBlk, parent_obj.host_features,
diff --git a/include/hw/virtio/vhost-user-blk.h 
b/include/hw/virtio/vhost-user-blk.h
index 7c91f15040..ea085ee1ed 100644
--- a/include/hw/virtio/vhost-user-blk.h
+++ b/include/hw/virtio/vhost-user-blk.h
@@ -34,7 +34,6 @@ struct VHostUserBlk {
 struct virtio_blk_config blkcfg;
 uint16_t num_queues;
 uint32_t queue_size;
-uint32_t config_wce;
 struct vhost_dev dev;
 struct vhost_inflight *inflight;
 VhostUserState vhost_user;
-- 
2.25.1




[PATCH v3 2/5] virtio-blk: move config size params to virtio-blk-common

2022-09-06 Thread Daniil Tatianin
This way we can reuse it for other virtio-blk devices, e.g
vhost-user-blk, which currently does not control its config space size
dynamically.

Signed-off-by: Daniil Tatianin 
Reviewed-by: Raphael Norwitz 
---
 MAINTAINERS   |  2 ++
 hw/block/meson.build  |  2 +-
 hw/block/virtio-blk-common.c  | 39 +++
 hw/block/virtio-blk.c | 24 ++---
 include/hw/virtio/virtio-blk-common.h | 20 ++
 5 files changed, 64 insertions(+), 23 deletions(-)
 create mode 100644 hw/block/virtio-blk-common.c
 create mode 100644 include/hw/virtio/virtio-blk-common.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 5ce4227ff6..2858577c5b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2030,8 +2030,10 @@ virtio-blk
 M: Stefan Hajnoczi 
 L: qemu-bl...@nongnu.org
 S: Supported
+F: hw/block/virtio-blk-common.c
 F: hw/block/virtio-blk.c
 F: hw/block/dataplane/*
+F: include/hw/virtio/virtio-blk-common.h
 F: tests/qtest/virtio-blk-test.c
 T: git https://github.com/stefanha/qemu.git block
 
diff --git a/hw/block/meson.build b/hw/block/meson.build
index 2389326112..8ee1f1f850 100644
--- a/hw/block/meson.build
+++ b/hw/block/meson.build
@@ -16,7 +16,7 @@ softmmu_ss.add(when: 'CONFIG_SWIM', if_true: files('swim.c'))
 softmmu_ss.add(when: 'CONFIG_XEN', if_true: files('xen-block.c'))
 softmmu_ss.add(when: 'CONFIG_TC58128', if_true: files('tc58128.c'))
 
-specific_ss.add(when: 'CONFIG_VIRTIO_BLK', if_true: files('virtio-blk.c'))
+specific_ss.add(when: 'CONFIG_VIRTIO_BLK', if_true: files('virtio-blk.c', 
'virtio-blk-common.c'))
 specific_ss.add(when: 'CONFIG_VHOST_USER_BLK', if_true: 
files('vhost-user-blk.c'))
 
 subdir('dataplane')
diff --git a/hw/block/virtio-blk-common.c b/hw/block/virtio-blk-common.c
new file mode 100644
index 00..ac52d7c176
--- /dev/null
+++ b/hw/block/virtio-blk-common.c
@@ -0,0 +1,39 @@
+/*
+ * Virtio Block Device common helpers
+ *
+ * Copyright IBM, Corp. 2007
+ *
+ * Authors:
+ *  Anthony Liguori   
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+
+#include "standard-headers/linux/virtio_blk.h"
+#include "hw/virtio/virtio.h"
+#include "hw/virtio/virtio-blk-common.h"
+
+/* Config size before the discard support (hide associated config fields) */
+#define VIRTIO_BLK_CFG_SIZE offsetof(struct virtio_blk_config, \
+ max_discard_sectors)
+
+/*
+ * Starting from the discard feature, we can use this array to properly
+ * set the config size depending on the features enabled.
+ */
+static const VirtIOFeature feature_sizes[] = {
+{.flags = 1ULL << VIRTIO_BLK_F_DISCARD,
+ .end = endof(struct virtio_blk_config, discard_sector_alignment)},
+{.flags = 1ULL << VIRTIO_BLK_F_WRITE_ZEROES,
+ .end = endof(struct virtio_blk_config, write_zeroes_may_unmap)},
+{}
+};
+
+const VirtIOConfigSizeParams virtio_blk_cfg_size_params = {
+.min_size = VIRTIO_BLK_CFG_SIZE,
+.max_size = sizeof(struct virtio_blk_config),
+.feature_sizes = feature_sizes
+};
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 10c47c2934..8131ec2dbc 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -32,29 +32,9 @@
 #include "hw/virtio/virtio-bus.h"
 #include "migration/qemu-file-types.h"
 #include "hw/virtio/virtio-access.h"
+#include "hw/virtio/virtio-blk-common.h"
 #include "qemu/coroutine.h"
 
-/* Config size before the discard support (hide associated config fields) */
-#define VIRTIO_BLK_CFG_SIZE offsetof(struct virtio_blk_config, \
- max_discard_sectors)
-/*
- * Starting from the discard feature, we can use this array to properly
- * set the config size depending on the features enabled.
- */
-static const VirtIOFeature feature_sizes[] = {
-{.flags = 1ULL << VIRTIO_BLK_F_DISCARD,
- .end = endof(struct virtio_blk_config, discard_sector_alignment)},
-{.flags = 1ULL << VIRTIO_BLK_F_WRITE_ZEROES,
- .end = endof(struct virtio_blk_config, write_zeroes_may_unmap)},
-{}
-};
-
-static const VirtIOConfigSizeParams cfg_size_params = {
-.min_size = VIRTIO_BLK_CFG_SIZE,
-.max_size = sizeof(struct virtio_blk_config),
-.feature_sizes = feature_sizes
-};
-
 static void virtio_blk_init_request(VirtIOBlock *s, VirtQueue *vq,
 VirtIOBlockReq *req)
 {
@@ -1202,7 +1182,7 @@ static void virtio_blk_device_realize(DeviceState *dev, 
Error **errp)
 return;
 }
 
-s->config_size = virtio_get_config_size(_size_params,
+s->config_size = virtio_get_config_size(_blk_cfg_size_params,
 s->host_features);
 virtio_init(vdev, VIRTIO_ID_BLOCK, s->config_size);
 
diff --git a/include/hw/virti

[PATCH v3 5/5] vhost-user-blk: dynamically resize config space based on features

2022-09-06 Thread Daniil Tatianin
Make vhost-user-blk backwards compatible when migrating from older VMs
running with modern features turned off, the same way it was done for
virtio-blk in 20764be0421c ("virtio-blk: set config size depending on the 
features enabled")

It's currently impossible to migrate from an older VM with
vhost-user-blk (with disable-legacy=off) because of errors like this:

qemu-system-x86_64: get_pci_config_device: Bad config data: i=0x10 read: 41 
device: 1 cmask: ff wmask: 80 w1cmask:0
qemu-system-x86_64: Failed to load PCIDevice:config
qemu-system-x86_64: Failed to load virtio-blk:virtio
qemu-system-x86_64: error while loading state for instance 0x0 of device 
':00:05.0:00.0:02.0/virtio-blk'
qemu-system-x86_64: load of migration failed: Invalid argument

This is caused by the newer (destination) VM requiring a bigger BAR0
alignment because it has to cover a bigger configuration space, which
isn't actually needed since those additional config fields are not
active (write-zeroes/discard).

Signed-off-by: Daniil Tatianin 
Reviewed-by: Raphael Norwitz 
---
 MAINTAINERS   |  2 ++
 hw/block/meson.build  |  2 +-
 hw/block/vhost-user-blk.c | 17 ++---
 3 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 2858577c5b..a7d3914735 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2273,11 +2273,13 @@ S: Maintained
 F: contrib/vhost-user-blk/
 F: contrib/vhost-user-scsi/
 F: hw/block/vhost-user-blk.c
+F: hw/block/virtio-blk-common.c
 F: hw/scsi/vhost-user-scsi.c
 F: hw/virtio/vhost-user-blk-pci.c
 F: hw/virtio/vhost-user-scsi-pci.c
 F: include/hw/virtio/vhost-user-blk.h
 F: include/hw/virtio/vhost-user-scsi.h
+F: include/hw/virtio/virtio-blk-common.h
 
 vhost-user-gpu
 M: Marc-André Lureau 
diff --git a/hw/block/meson.build b/hw/block/meson.build
index 8ee1f1f850..1908abd45c 100644
--- a/hw/block/meson.build
+++ b/hw/block/meson.build
@@ -17,6 +17,6 @@ softmmu_ss.add(when: 'CONFIG_XEN', if_true: 
files('xen-block.c'))
 softmmu_ss.add(when: 'CONFIG_TC58128', if_true: files('tc58128.c'))
 
 specific_ss.add(when: 'CONFIG_VIRTIO_BLK', if_true: files('virtio-blk.c', 
'virtio-blk-common.c'))
-specific_ss.add(when: 'CONFIG_VHOST_USER_BLK', if_true: 
files('vhost-user-blk.c'))
+specific_ss.add(when: 'CONFIG_VHOST_USER_BLK', if_true: 
files('vhost-user-blk.c', 'virtio-blk-common.c'))
 
 subdir('dataplane')
diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index 0d916edefa..8dd063eb96 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -23,6 +23,7 @@
 #include "hw/qdev-core.h"
 #include "hw/qdev-properties.h"
 #include "hw/qdev-properties-system.h"
+#include "hw/virtio/virtio-blk-common.h"
 #include "hw/virtio/vhost.h"
 #include "hw/virtio/vhost-user-blk.h"
 #include "hw/virtio/virtio.h"
@@ -63,7 +64,7 @@ static void vhost_user_blk_update_config(VirtIODevice *vdev, 
uint8_t *config)
 /* Our num_queues overrides the device backend */
 virtio_stw_p(vdev, >blkcfg.num_queues, s->num_queues);
 
-memcpy(config, >blkcfg, sizeof(struct virtio_blk_config));
+memcpy(config, >blkcfg, vdev->config_len);
 }
 
 static void vhost_user_blk_set_config(VirtIODevice *vdev, const uint8_t 
*config)
@@ -92,12 +93,12 @@ static int vhost_user_blk_handle_config_change(struct 
vhost_dev *dev)
 {
 int ret;
 struct virtio_blk_config blkcfg;
+VirtIODevice *vdev = dev->vdev;
 VHostUserBlk *s = VHOST_USER_BLK(dev->vdev);
 Error *local_err = NULL;
 
 ret = vhost_dev_get_config(dev, (uint8_t *),
-   sizeof(struct virtio_blk_config),
-   _err);
+   vdev->config_len, _err);
 if (ret < 0) {
 error_report_err(local_err);
 return ret;
@@ -106,7 +107,7 @@ static int vhost_user_blk_handle_config_change(struct 
vhost_dev *dev)
 /* valid for resize only */
 if (blkcfg.capacity != s->blkcfg.capacity) {
 s->blkcfg.capacity = blkcfg.capacity;
-memcpy(dev->vdev->config, >blkcfg, sizeof(struct 
virtio_blk_config));
+memcpy(dev->vdev->config, >blkcfg, vdev->config_len);
 virtio_notify_config(dev->vdev);
 }
 
@@ -442,7 +443,7 @@ static int vhost_user_blk_realize_connect(VHostUserBlk *s, 
Error **errp)
 assert(s->connected);
 
 ret = vhost_dev_get_config(>dev, (uint8_t *)>blkcfg,
-   sizeof(struct virtio_blk_config), errp);
+   s->parent_obj.config_len, errp);
 if (ret < 0) {
 qemu_chr_fe_disconnect(>chardev);
 vhost_dev_cleanup(>dev);
@@ -457,6 +458,7 @@ static void vhost_user_blk_device_realize(DeviceState *dev, 
Error **errp)
 ERRP_GUARD();
 VirtIODevice *vdev = VIRTIO_DEVICE(dev);
 VHostUserBlk *s = VHOST_USER_BLK(vdev);
+size_t co

[PATCH v3 1/5] virtio: introduce VirtIOConfigSizeParams & virtio_get_config_size

2022-09-06 Thread Daniil Tatianin
This is the first step towards moving all device config size calculation
logic into the virtio core code. In particular, this adds a struct that
contains all the necessary information for common virtio code to be able
to calculate the final config size for a device. This is expected to be
used with the new virtio_get_config_size helper, which calculates the
final length based on the provided host features.

This builds on top of already existing code like VirtIOFeature and
virtio_feature_get_config_size(), but adds additional fields, as well as
sanity checking so that device-specifc code doesn't have to duplicate it.

An example usage would be:

static const VirtIOFeature dev_features[] = {
{.flags = 1ULL << FEATURE_1_BIT,
 .end = endof(struct virtio_dev_config, feature_1)},
{.flags = 1ULL << FEATURE_2_BIT,
 .end = endof(struct virtio_dev_config, feature_2)},
{}
};

static const VirtIOConfigSizeParams dev_cfg_size_params = {
.min_size = DEV_BASE_CONFIG_SIZE,
.max_size = sizeof(struct virtio_dev_config),
.feature_sizes = dev_features
};

// code inside my_dev_device_realize()
size_t config_size = virtio_get_config_size(_cfg_size_params,
host_features);
virtio_init(vdev, VIRTIO_ID_MYDEV, config_size);

Currently every device is expected to write its own boilerplate from the
example above in device_realize(), however, the next step of this
transition is moving VirtIOConfigSizeParams into VirtioDeviceClass,
so that it can be done automatically by the virtio initialization code.

All of the users of virtio_feature_get_config_size have been converted
to use virtio_get_config_size so it's no longer needed and is removed
with this commit.

Signed-off-by: Daniil Tatianin 
---
 hw/block/virtio-blk.c  | 16 +++-
 hw/net/virtio-net.c|  9 +++--
 hw/virtio/virtio.c | 10 ++
 include/hw/virtio/virtio.h | 10 --
 4 files changed, 28 insertions(+), 17 deletions(-)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index e9ba752f6b..10c47c2934 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -49,13 +49,11 @@ static const VirtIOFeature feature_sizes[] = {
 {}
 };
 
-static void virtio_blk_set_config_size(VirtIOBlock *s, uint64_t host_features)
-{
-s->config_size = MAX(VIRTIO_BLK_CFG_SIZE,
-virtio_feature_get_config_size(feature_sizes, host_features));
-
-assert(s->config_size <= sizeof(struct virtio_blk_config));
-}
+static const VirtIOConfigSizeParams cfg_size_params = {
+.min_size = VIRTIO_BLK_CFG_SIZE,
+.max_size = sizeof(struct virtio_blk_config),
+.feature_sizes = feature_sizes
+};
 
 static void virtio_blk_init_request(VirtIOBlock *s, VirtQueue *vq,
 VirtIOBlockReq *req)
@@ -1204,8 +1202,8 @@ static void virtio_blk_device_realize(DeviceState *dev, 
Error **errp)
 return;
 }
 
-virtio_blk_set_config_size(s, s->host_features);
-
+s->config_size = virtio_get_config_size(_size_params,
+s->host_features);
 virtio_init(vdev, VIRTIO_ID_BLOCK, s->config_size);
 
 s->blk = conf->conf.blk;
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index dd0d056fde..78b003a158 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -106,6 +106,12 @@ static const VirtIOFeature feature_sizes[] = {
 {}
 };
 
+static const VirtIOConfigSizeParams cfg_size_params = {
+.min_size = endof(struct virtio_net_config, mac),
+.max_size = sizeof(struct virtio_net_config),
+.feature_sizes = feature_sizes
+};
+
 static VirtIONetQueue *virtio_net_get_subqueue(NetClientState *nc)
 {
 VirtIONet *n = qemu_get_nic_opaque(nc);
@@ -3246,8 +3252,7 @@ static void virtio_net_set_config_size(VirtIONet *n, 
uint64_t host_features)
 {
 virtio_add_feature(_features, VIRTIO_NET_F_MAC);
 
-n->config_size = virtio_feature_get_config_size(feature_sizes,
-host_features);
+n->config_size = virtio_get_config_size(_size_params, host_features);
 }
 
 void virtio_net_set_netclient_name(VirtIONet *n, const char *name,
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 5d607aeaa0..c0bf8dd6fd 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -2999,11 +2999,12 @@ int virtio_set_features(VirtIODevice *vdev, uint64_t 
val)
 return ret;
 }
 
-size_t virtio_feature_get_config_size(const VirtIOFeature *feature_sizes,
-  uint64_t host_features)
+size_t virtio_get_config_size(const VirtIOConfigSizeParams *params,
+  uint64_t host_features)
 {
-size_t config_size = 0;
-int i;
+size_t config_size = params->min_size;
+const VirtIOFeature *feature_sizes = params->feature_sizes;
+size_t i

Re: [PATCH v2 3/8] virtio-net: utilize VirtIOConfigSizeParams & virtio_get_config_size

2022-09-02 Thread Daniil Tatianin




On 9/2/22 8:54 PM, Raphael Norwitz wrote:

On Fri, Aug 26, 2022 at 05:32:43PM +0300, Daniil Tatianin wrote:

Use the new common helper. As an added bonus this also makes use of
config size sanity checking via the 'max_size' field.

Signed-off-by: Daniil Tatianin 
---
  hw/net/virtio-net.c | 8 ++--
  1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index dd0d056fde..dfc8dd8562 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -106,6 +106,11 @@ static const VirtIOFeature feature_sizes[] = {
  {}
  };
  
+static const VirtIOConfigSizeParams cfg_size_params = {


Can we have a zero length virtio-net config size? The both the v1.0 and
v1.1 section 5.1.4 say “The mac address field always exists (though is
only valid if VIRTIO_NET_F_MAC is set)” so we should probably set
min_size to offset_of status?


It currently hardcodes VIRTIO_NET_F_MAC as always on, but I guess it
doesn't hurt to be more explicit about it. Will add that in the next
version.
(resending because forgot to reply-all last time)

Otherwise LGTM.


+.max_size = sizeof(struct virtio_net_config),
+.feature_sizes = feature_sizes
+};
+
  static VirtIONetQueue *virtio_net_get_subqueue(NetClientState *nc)
  {
  VirtIONet *n = qemu_get_nic_opaque(nc);
@@ -3246,8 +3251,7 @@ static void virtio_net_set_config_size(VirtIONet *n, 
uint64_t host_features)
  {
  virtio_add_feature(_features, VIRTIO_NET_F_MAC);
  
-n->config_size = virtio_feature_get_config_size(feature_sizes,

-host_features);
+n->config_size = virtio_get_config_size(_size_params, host_features);
  }
  
  void virtio_net_set_netclient_name(VirtIONet *n, const char *name,

--
2.25.1




Re: [PATCH v2 5/8] virtio-blk: move config size params to virtio-blk-common

2022-09-02 Thread Daniil Tatianin




On 9/2/22 8:57 PM, Raphael Norwitz wrote:

The vhost-user-blk bits in meson.build and Maintainers should probably
be in patch 8?


You're totally right, thanks.


Otherwise LGTM.

On Fri, Aug 26, 2022 at 05:32:45PM +0300, Daniil Tatianin wrote:

This way we can reuse it for other virtio-blk devices, e.g
vhost-user-blk, which currently does not control its config space size
dynamically.

Signed-off-by: Daniil Tatianin 
---
  MAINTAINERS   |  4 +++
  hw/block/meson.build  |  4 +--
  hw/block/virtio-blk-common.c  | 39 +++
  hw/block/virtio-blk.c | 24 ++---
  include/hw/virtio/virtio-blk-common.h | 20 ++
  5 files changed, 67 insertions(+), 24 deletions(-)
  create mode 100644 hw/block/virtio-blk-common.c
  create mode 100644 include/hw/virtio/virtio-blk-common.h





i.e. move this.


@@ -2271,11 +2273,13 @@ S: Maintained
  F: contrib/vhost-user-blk/
  F: contrib/vhost-user-scsi/
  F: hw/block/vhost-user-blk.c
+F: hw/block/virtio-blk-common.c
  F: hw/scsi/vhost-user-scsi.c
  F: hw/virtio/vhost-user-blk-pci.c
  F: hw/virtio/vhost-user-scsi-pci.c
  F: include/hw/virtio/vhost-user-blk.h
  F: include/hw/virtio/vhost-user-scsi.h
+F: include/hw/virtio/virtio-blk-common.h
  
  vhost-user-gpu

  M: Marc-André Lureau 
diff --git a/hw/block/meson.build b/hw/block/meson.build
index 2389326112..1908abd45c 100644
--- a/hw/block/meson.build
+++ b/hw/block/meson.build
@@ -16,7 +16,7 @@ softmmu_ss.add(when: 'CONFIG_SWIM', if_true: files('swim.c'))
  softmmu_ss.add(when: 'CONFIG_XEN', if_true: files('xen-block.c'))
  softmmu_ss.add(when: 'CONFIG_TC58128', if_true: files('tc58128.c'))
  
-specific_ss.add(when: 'CONFIG_VIRTIO_BLK', if_true: files('virtio-blk.c'))

-specific_ss.add(when: 'CONFIG_VHOST_USER_BLK', if_true: 
files('vhost-user-blk.c'))
+specific_ss.add(when: 'CONFIG_VIRTIO_BLK', if_true: files('virtio-blk.c', 
'virtio-blk-common.c'))


And this


+specific_ss.add(when: 'CONFIG_VHOST_USER_BLK', if_true: 
files('vhost-user-blk.c', 'virtio-blk-common.c'))




Re: [PATCH v2 1/8] virtio: introduce VirtIOConfigSizeParams & virtio_get_config_size

2022-09-02 Thread Daniil Tatianin




On 9/2/22 8:52 PM, Raphael Norwitz wrote:

I feel like it would be easier to review if the first 4 patches were
squashed together, but that’s not a big deal.


Yeah, I think that's fair although I initially thought that maybe that 
was a bit too big of a change to put in one single commit. I can squash 
the first four if that would be better.



For this one:

Reviewed-by: Raphael Norwitz 

On Fri, Aug 26, 2022 at 05:32:41PM +0300, Daniil Tatianin wrote:

This is the first step towards moving all device config size calculation
logic into the virtio core code. In particular, this adds a struct that
contains all the necessary information for common virtio code to be able
to calculate the final config size for a device. This is expected to be
used with the new virtio_get_config_size helper, which calculates the
final length based on the provided host features.

This builds on top of already existing code like VirtIOFeature and
virtio_feature_get_config_size(), but adds additional fields, as well as
sanity checking so that device-specifc code doesn't have to duplicate it.

An example usage would be:

 static const VirtIOFeature dev_features[] = {
 {.flags = 1ULL << FEATURE_1_BIT,
  .end = endof(struct virtio_dev_config, feature_1)},
 {.flags = 1ULL << FEATURE_2_BIT,
  .end = endof(struct virtio_dev_config, feature_2)},
 {}
 };

 static const VirtIOConfigSizeParams dev_cfg_size_params = {
 .min_size = DEV_BASE_CONFIG_SIZE,
 .max_size = sizeof(struct virtio_dev_config),
 .feature_sizes = dev_features
 };

 // code inside my_dev_device_realize()
 size_t config_size = virtio_get_config_size(_cfg_size_params,
 host_features);
 virtio_init(vdev, VIRTIO_ID_MYDEV, config_size);

Currently every device is expected to write its own boilerplate from the
example above in device_realize(), however, the next step of this
transition is moving VirtIOConfigSizeParams into VirtioDeviceClass,
so that it can be done automatically by the virtio initialization code.

Signed-off-by: Daniil Tatianin 
---
  hw/virtio/virtio.c | 17 +
  include/hw/virtio/virtio.h |  9 +
  2 files changed, 26 insertions(+)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 5d607aeaa0..8518382025 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -3014,6 +3014,23 @@ size_t virtio_feature_get_config_size(const 
VirtIOFeature *feature_sizes,
  return config_size;
  }
  
+size_t virtio_get_config_size(const VirtIOConfigSizeParams *params,

+  uint64_t host_features)
+{
+size_t config_size = params->min_size;
+const VirtIOFeature *feature_sizes = params->feature_sizes;
+size_t i;
+
+for (i = 0; feature_sizes[i].flags != 0; i++) {
+if (host_features & feature_sizes[i].flags) {
+config_size = MAX(feature_sizes[i].end, config_size);
+}
+}
+
+assert(config_size <= params->max_size);
+return config_size;
+}
+
  int virtio_load(VirtIODevice *vdev, QEMUFile *f, int version_id)
  {
  int i, ret;
diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index db1c0ddf6b..1991c58d9b 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -44,6 +44,15 @@ typedef struct VirtIOFeature {
  size_t end;
  } VirtIOFeature;
  
+typedef struct VirtIOConfigSizeParams {

+size_t min_size;
+size_t max_size;
+const VirtIOFeature *feature_sizes;
+} VirtIOConfigSizeParams;
+
+size_t virtio_get_config_size(const VirtIOConfigSizeParams *params,
+  uint64_t host_features);
+
  size_t virtio_feature_get_config_size(const VirtIOFeature *features,
uint64_t host_features);
  
--

2.25.1




Re: [PATCH v2 0/8] vhost-user-blk: dynamically resize config space based on features

2022-09-01 Thread Daniil Tatianin
ping



[PATCH v2 3/8] virtio-net: utilize VirtIOConfigSizeParams & virtio_get_config_size

2022-08-26 Thread Daniil Tatianin
Use the new common helper. As an added bonus this also makes use of
config size sanity checking via the 'max_size' field.

Signed-off-by: Daniil Tatianin 
---
 hw/net/virtio-net.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index dd0d056fde..dfc8dd8562 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -106,6 +106,11 @@ static const VirtIOFeature feature_sizes[] = {
 {}
 };
 
+static const VirtIOConfigSizeParams cfg_size_params = {
+.max_size = sizeof(struct virtio_net_config),
+.feature_sizes = feature_sizes
+};
+
 static VirtIONetQueue *virtio_net_get_subqueue(NetClientState *nc)
 {
 VirtIONet *n = qemu_get_nic_opaque(nc);
@@ -3246,8 +3251,7 @@ static void virtio_net_set_config_size(VirtIONet *n, 
uint64_t host_features)
 {
 virtio_add_feature(_features, VIRTIO_NET_F_MAC);
 
-n->config_size = virtio_feature_get_config_size(feature_sizes,
-host_features);
+n->config_size = virtio_get_config_size(_size_params, host_features);
 }
 
 void virtio_net_set_netclient_name(VirtIONet *n, const char *name,
-- 
2.25.1




[PATCH v2 4/8] virtio: remove the virtio_feature_get_config_size helper

2022-08-26 Thread Daniil Tatianin
This has no more users and is superseded by virtio_get_config_size.

Signed-off-by: Daniil Tatianin 
---
 hw/virtio/virtio.c | 15 ---
 include/hw/virtio/virtio.h |  3 ---
 2 files changed, 18 deletions(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 8518382025..c0bf8dd6fd 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -2999,21 +2999,6 @@ int virtio_set_features(VirtIODevice *vdev, uint64_t val)
 return ret;
 }
 
-size_t virtio_feature_get_config_size(const VirtIOFeature *feature_sizes,
-  uint64_t host_features)
-{
-size_t config_size = 0;
-int i;
-
-for (i = 0; feature_sizes[i].flags != 0; i++) {
-if (host_features & feature_sizes[i].flags) {
-config_size = MAX(feature_sizes[i].end, config_size);
-}
-}
-
-return config_size;
-}
-
 size_t virtio_get_config_size(const VirtIOConfigSizeParams *params,
   uint64_t host_features)
 {
diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index 1991c58d9b..f3ff6710d5 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -53,9 +53,6 @@ typedef struct VirtIOConfigSizeParams {
 size_t virtio_get_config_size(const VirtIOConfigSizeParams *params,
   uint64_t host_features);
 
-size_t virtio_feature_get_config_size(const VirtIOFeature *features,
-  uint64_t host_features);
-
 typedef struct VirtQueue VirtQueue;
 
 #define VIRTQUEUE_MAX_SIZE 1024
-- 
2.25.1




[PATCH v2 2/8] virtio-blk: utilize VirtIOConfigSizeParams & virtio_get_config_size

2022-08-26 Thread Daniil Tatianin
Use the common helper instead of duplicating the same logic.

Signed-off-by: Daniil Tatianin 
---
 hw/block/virtio-blk.c | 16 +++-
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index e9ba752f6b..10c47c2934 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -49,13 +49,11 @@ static const VirtIOFeature feature_sizes[] = {
 {}
 };
 
-static void virtio_blk_set_config_size(VirtIOBlock *s, uint64_t host_features)
-{
-s->config_size = MAX(VIRTIO_BLK_CFG_SIZE,
-virtio_feature_get_config_size(feature_sizes, host_features));
-
-assert(s->config_size <= sizeof(struct virtio_blk_config));
-}
+static const VirtIOConfigSizeParams cfg_size_params = {
+.min_size = VIRTIO_BLK_CFG_SIZE,
+.max_size = sizeof(struct virtio_blk_config),
+.feature_sizes = feature_sizes
+};
 
 static void virtio_blk_init_request(VirtIOBlock *s, VirtQueue *vq,
 VirtIOBlockReq *req)
@@ -1204,8 +1202,8 @@ static void virtio_blk_device_realize(DeviceState *dev, 
Error **errp)
 return;
 }
 
-virtio_blk_set_config_size(s, s->host_features);
-
+s->config_size = virtio_get_config_size(_size_params,
+s->host_features);
 virtio_init(vdev, VIRTIO_ID_BLOCK, s->config_size);
 
 s->blk = conf->conf.blk;
-- 
2.25.1




[PATCH v2 1/8] virtio: introduce VirtIOConfigSizeParams & virtio_get_config_size

2022-08-26 Thread Daniil Tatianin
This is the first step towards moving all device config size calculation
logic into the virtio core code. In particular, this adds a struct that
contains all the necessary information for common virtio code to be able
to calculate the final config size for a device. This is expected to be
used with the new virtio_get_config_size helper, which calculates the
final length based on the provided host features.

This builds on top of already existing code like VirtIOFeature and
virtio_feature_get_config_size(), but adds additional fields, as well as
sanity checking so that device-specifc code doesn't have to duplicate it.

An example usage would be:

static const VirtIOFeature dev_features[] = {
{.flags = 1ULL << FEATURE_1_BIT,
 .end = endof(struct virtio_dev_config, feature_1)},
{.flags = 1ULL << FEATURE_2_BIT,
 .end = endof(struct virtio_dev_config, feature_2)},
{}
};

static const VirtIOConfigSizeParams dev_cfg_size_params = {
.min_size = DEV_BASE_CONFIG_SIZE,
.max_size = sizeof(struct virtio_dev_config),
.feature_sizes = dev_features
};

// code inside my_dev_device_realize()
size_t config_size = virtio_get_config_size(_cfg_size_params,
host_features);
virtio_init(vdev, VIRTIO_ID_MYDEV, config_size);

Currently every device is expected to write its own boilerplate from the
example above in device_realize(), however, the next step of this
transition is moving VirtIOConfigSizeParams into VirtioDeviceClass,
so that it can be done automatically by the virtio initialization code.

Signed-off-by: Daniil Tatianin 
---
 hw/virtio/virtio.c | 17 +
 include/hw/virtio/virtio.h |  9 +
 2 files changed, 26 insertions(+)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 5d607aeaa0..8518382025 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -3014,6 +3014,23 @@ size_t virtio_feature_get_config_size(const 
VirtIOFeature *feature_sizes,
 return config_size;
 }
 
+size_t virtio_get_config_size(const VirtIOConfigSizeParams *params,
+  uint64_t host_features)
+{
+size_t config_size = params->min_size;
+const VirtIOFeature *feature_sizes = params->feature_sizes;
+size_t i;
+
+for (i = 0; feature_sizes[i].flags != 0; i++) {
+if (host_features & feature_sizes[i].flags) {
+config_size = MAX(feature_sizes[i].end, config_size);
+}
+}
+
+assert(config_size <= params->max_size);
+return config_size;
+}
+
 int virtio_load(VirtIODevice *vdev, QEMUFile *f, int version_id)
 {
 int i, ret;
diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index db1c0ddf6b..1991c58d9b 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -44,6 +44,15 @@ typedef struct VirtIOFeature {
 size_t end;
 } VirtIOFeature;
 
+typedef struct VirtIOConfigSizeParams {
+size_t min_size;
+size_t max_size;
+const VirtIOFeature *feature_sizes;
+} VirtIOConfigSizeParams;
+
+size_t virtio_get_config_size(const VirtIOConfigSizeParams *params,
+  uint64_t host_features);
+
 size_t virtio_feature_get_config_size(const VirtIOFeature *features,
   uint64_t host_features);
 
-- 
2.25.1




[PATCH v2 6/8] vhost-user-blk: make it possible to disable write-zeroes/discard

2022-08-26 Thread Daniil Tatianin
It is useful to have the ability to disable these features for
compatibility with older VMs that don't have these implemented.

Signed-off-by: Daniil Tatianin 
---
 hw/block/vhost-user-blk.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index 9117222456..4c9727e08c 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -259,8 +259,6 @@ static uint64_t vhost_user_blk_get_features(VirtIODevice 
*vdev,
 virtio_add_feature(, VIRTIO_BLK_F_BLK_SIZE);
 virtio_add_feature(, VIRTIO_BLK_F_FLUSH);
 virtio_add_feature(, VIRTIO_BLK_F_RO);
-virtio_add_feature(, VIRTIO_BLK_F_DISCARD);
-virtio_add_feature(, VIRTIO_BLK_F_WRITE_ZEROES);
 
 if (s->config_wce) {
 virtio_add_feature(, VIRTIO_BLK_F_CONFIG_WCE);
@@ -592,6 +590,10 @@ static Property vhost_user_blk_properties[] = {
VHOST_USER_BLK_AUTO_NUM_QUEUES),
 DEFINE_PROP_UINT32("queue-size", VHostUserBlk, queue_size, 128),
 DEFINE_PROP_BIT("config-wce", VHostUserBlk, config_wce, 0, true),
+DEFINE_PROP_BIT64("discard", VHostUserBlk, parent_obj.host_features,
+  VIRTIO_BLK_F_DISCARD, true),
+DEFINE_PROP_BIT64("write-zeroes", VHostUserBlk, parent_obj.host_features,
+  VIRTIO_BLK_F_WRITE_ZEROES, true),
 DEFINE_PROP_END_OF_LIST(),
 };
 
-- 
2.25.1




[PATCH v2 0/8] vhost-user-blk: dynamically resize config space based on features

2022-08-26 Thread Daniil Tatianin
This patch set attempts to align vhost-user-blk with virtio-blk in
terms of backward compatibility and flexibility. It also improves
the virtio core by introducing new common code that can be used by
a virtio device to calculate its config space size.

In particular it adds the following things:
- Common virtio code for deducing the required device config size based
  on provided host features.
- Ability to disable modern virtio-blk features like
  discard/write-zeroes for vhost-user-blk.
- Dynamic configuration space resizing based on enabled features,
  by reusing the common code introduced in the earlier commits.
- Cleans up the VHostUserBlk structure by reusing parent fields.

Changes since v1 (mostly addresses Stefan's feedback):
- Introduce VirtIOConfigSizeParams & virtio_get_config_size
- Remove virtio_blk_set_config_size altogether, make virtio-blk-common.c
  only hold the virtio-blk config size parameters.
- Reuse parent fields in vhost-user-blk instead of introducing new ones.

Daniil Tatianin (8):
  virtio: introduce VirtIOConfigSizeParams & virtio_get_config_size
  virtio-blk: utilize VirtIOConfigSizeParams & virtio_get_config_size
  virtio-net: utilize VirtIOConfigSizeParams & virtio_get_config_size
  virtio: remove the virtio_feature_get_config_size helper
  virtio-blk: move config size params to virtio-blk-common
  vhost-user-blk: make it possible to disable write-zeroes/discard
  vhost-user-blk: make 'config_wce' part of 'host_features'
  vhost-user-blk: dynamically resize config space based on features

 MAINTAINERS   |  4 +++
 hw/block/meson.build  |  4 +--
 hw/block/vhost-user-blk.c | 29 +++-
 hw/block/virtio-blk-common.c  | 39 +++
 hw/block/virtio-blk.c | 28 +++
 hw/net/virtio-net.c   |  8 --
 hw/virtio/virtio.c| 10 ---
 include/hw/virtio/vhost-user-blk.h|  1 -
 include/hw/virtio/virtio-blk-common.h | 20 ++
 include/hw/virtio/virtio.h| 10 +--
 10 files changed, 104 insertions(+), 49 deletions(-)
 create mode 100644 hw/block/virtio-blk-common.c
 create mode 100644 include/hw/virtio/virtio-blk-common.h

-- 
2.25.1




[PATCH v2 7/8] vhost-user-blk: make 'config_wce' part of 'host_features'

2022-08-26 Thread Daniil Tatianin
No reason to have this be a separate field. This also makes it more akin
to what the virtio-blk device does.

Signed-off-by: Daniil Tatianin 
---
 hw/block/vhost-user-blk.c  | 6 ++
 include/hw/virtio/vhost-user-blk.h | 1 -
 2 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index 4c9727e08c..0d916edefa 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -260,9 +260,6 @@ static uint64_t vhost_user_blk_get_features(VirtIODevice 
*vdev,
 virtio_add_feature(, VIRTIO_BLK_F_FLUSH);
 virtio_add_feature(, VIRTIO_BLK_F_RO);
 
-if (s->config_wce) {
-virtio_add_feature(, VIRTIO_BLK_F_CONFIG_WCE);
-}
 if (s->num_queues > 1) {
 virtio_add_feature(, VIRTIO_BLK_F_MQ);
 }
@@ -589,7 +586,8 @@ static Property vhost_user_blk_properties[] = {
 DEFINE_PROP_UINT16("num-queues", VHostUserBlk, num_queues,
VHOST_USER_BLK_AUTO_NUM_QUEUES),
 DEFINE_PROP_UINT32("queue-size", VHostUserBlk, queue_size, 128),
-DEFINE_PROP_BIT("config-wce", VHostUserBlk, config_wce, 0, true),
+DEFINE_PROP_BIT64("config-wce", VHostUserBlk, parent_obj.host_features,
+  VIRTIO_BLK_F_CONFIG_WCE, true),
 DEFINE_PROP_BIT64("discard", VHostUserBlk, parent_obj.host_features,
   VIRTIO_BLK_F_DISCARD, true),
 DEFINE_PROP_BIT64("write-zeroes", VHostUserBlk, parent_obj.host_features,
diff --git a/include/hw/virtio/vhost-user-blk.h 
b/include/hw/virtio/vhost-user-blk.h
index 7c91f15040..ea085ee1ed 100644
--- a/include/hw/virtio/vhost-user-blk.h
+++ b/include/hw/virtio/vhost-user-blk.h
@@ -34,7 +34,6 @@ struct VHostUserBlk {
 struct virtio_blk_config blkcfg;
 uint16_t num_queues;
 uint32_t queue_size;
-uint32_t config_wce;
 struct vhost_dev dev;
 struct vhost_inflight *inflight;
 VhostUserState vhost_user;
-- 
2.25.1




[PATCH v2 8/8] vhost-user-blk: dynamically resize config space based on features

2022-08-26 Thread Daniil Tatianin
Make vhost-user-blk backwards compatible when migrating from older VMs
running with modern features turned off, the same way it was done for
virtio-blk in 20764be0421c ("virtio-blk: set config size depending on the 
features enabled")

It's currently impossible to migrate from an older VM with
vhost-user-blk (with disable-legacy=off) because of errors like this:

qemu-system-x86_64: get_pci_config_device: Bad config data: i=0x10 read: 41 
device: 1 cmask: ff wmask: 80 w1cmask:0
qemu-system-x86_64: Failed to load PCIDevice:config
qemu-system-x86_64: Failed to load virtio-blk:virtio
qemu-system-x86_64: error while loading state for instance 0x0 of device 
':00:05.0:00.0:02.0/virtio-blk'
qemu-system-x86_64: load of migration failed: Invalid argument

This is caused by the newer (destination) VM requiring a bigger BAR0
alignment because it has to cover a bigger configuration space, which
isn't actually needed since those additional config fields are not
active (write-zeroes/discard).

Signed-off-by: Daniil Tatianin 
---
 hw/block/vhost-user-blk.c | 17 ++---
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index 0d916edefa..8dd063eb96 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -23,6 +23,7 @@
 #include "hw/qdev-core.h"
 #include "hw/qdev-properties.h"
 #include "hw/qdev-properties-system.h"
+#include "hw/virtio/virtio-blk-common.h"
 #include "hw/virtio/vhost.h"
 #include "hw/virtio/vhost-user-blk.h"
 #include "hw/virtio/virtio.h"
@@ -63,7 +64,7 @@ static void vhost_user_blk_update_config(VirtIODevice *vdev, 
uint8_t *config)
 /* Our num_queues overrides the device backend */
 virtio_stw_p(vdev, >blkcfg.num_queues, s->num_queues);
 
-memcpy(config, >blkcfg, sizeof(struct virtio_blk_config));
+memcpy(config, >blkcfg, vdev->config_len);
 }
 
 static void vhost_user_blk_set_config(VirtIODevice *vdev, const uint8_t 
*config)
@@ -92,12 +93,12 @@ static int vhost_user_blk_handle_config_change(struct 
vhost_dev *dev)
 {
 int ret;
 struct virtio_blk_config blkcfg;
+VirtIODevice *vdev = dev->vdev;
 VHostUserBlk *s = VHOST_USER_BLK(dev->vdev);
 Error *local_err = NULL;
 
 ret = vhost_dev_get_config(dev, (uint8_t *),
-   sizeof(struct virtio_blk_config),
-   _err);
+   vdev->config_len, _err);
 if (ret < 0) {
 error_report_err(local_err);
 return ret;
@@ -106,7 +107,7 @@ static int vhost_user_blk_handle_config_change(struct 
vhost_dev *dev)
 /* valid for resize only */
 if (blkcfg.capacity != s->blkcfg.capacity) {
 s->blkcfg.capacity = blkcfg.capacity;
-memcpy(dev->vdev->config, >blkcfg, sizeof(struct 
virtio_blk_config));
+memcpy(dev->vdev->config, >blkcfg, vdev->config_len);
 virtio_notify_config(dev->vdev);
 }
 
@@ -442,7 +443,7 @@ static int vhost_user_blk_realize_connect(VHostUserBlk *s, 
Error **errp)
 assert(s->connected);
 
 ret = vhost_dev_get_config(>dev, (uint8_t *)>blkcfg,
-   sizeof(struct virtio_blk_config), errp);
+   s->parent_obj.config_len, errp);
 if (ret < 0) {
 qemu_chr_fe_disconnect(>chardev);
 vhost_dev_cleanup(>dev);
@@ -457,6 +458,7 @@ static void vhost_user_blk_device_realize(DeviceState *dev, 
Error **errp)
 ERRP_GUARD();
 VirtIODevice *vdev = VIRTIO_DEVICE(dev);
 VHostUserBlk *s = VHOST_USER_BLK(vdev);
+size_t config_size;
 int retries;
 int i, ret;
 
@@ -487,8 +489,9 @@ static void vhost_user_blk_device_realize(DeviceState *dev, 
Error **errp)
 return;
 }
 
-virtio_init(vdev, VIRTIO_ID_BLOCK,
-sizeof(struct virtio_blk_config));
+config_size = virtio_get_config_size(_blk_cfg_size_params,
+ vdev->host_features);
+virtio_init(vdev, VIRTIO_ID_BLOCK, config_size);
 
 s->virtqs = g_new(VirtQueue *, s->num_queues);
 for (i = 0; i < s->num_queues; i++) {
-- 
2.25.1




[PATCH v2 5/8] virtio-blk: move config size params to virtio-blk-common

2022-08-26 Thread Daniil Tatianin
This way we can reuse it for other virtio-blk devices, e.g
vhost-user-blk, which currently does not control its config space size
dynamically.

Signed-off-by: Daniil Tatianin 
---
 MAINTAINERS   |  4 +++
 hw/block/meson.build  |  4 +--
 hw/block/virtio-blk-common.c  | 39 +++
 hw/block/virtio-blk.c | 24 ++---
 include/hw/virtio/virtio-blk-common.h | 20 ++
 5 files changed, 67 insertions(+), 24 deletions(-)
 create mode 100644 hw/block/virtio-blk-common.c
 create mode 100644 include/hw/virtio/virtio-blk-common.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 5ce4227ff6..a7d3914735 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2030,8 +2030,10 @@ virtio-blk
 M: Stefan Hajnoczi 
 L: qemu-bl...@nongnu.org
 S: Supported
+F: hw/block/virtio-blk-common.c
 F: hw/block/virtio-blk.c
 F: hw/block/dataplane/*
+F: include/hw/virtio/virtio-blk-common.h
 F: tests/qtest/virtio-blk-test.c
 T: git https://github.com/stefanha/qemu.git block
 
@@ -2271,11 +2273,13 @@ S: Maintained
 F: contrib/vhost-user-blk/
 F: contrib/vhost-user-scsi/
 F: hw/block/vhost-user-blk.c
+F: hw/block/virtio-blk-common.c
 F: hw/scsi/vhost-user-scsi.c
 F: hw/virtio/vhost-user-blk-pci.c
 F: hw/virtio/vhost-user-scsi-pci.c
 F: include/hw/virtio/vhost-user-blk.h
 F: include/hw/virtio/vhost-user-scsi.h
+F: include/hw/virtio/virtio-blk-common.h
 
 vhost-user-gpu
 M: Marc-André Lureau 
diff --git a/hw/block/meson.build b/hw/block/meson.build
index 2389326112..1908abd45c 100644
--- a/hw/block/meson.build
+++ b/hw/block/meson.build
@@ -16,7 +16,7 @@ softmmu_ss.add(when: 'CONFIG_SWIM', if_true: files('swim.c'))
 softmmu_ss.add(when: 'CONFIG_XEN', if_true: files('xen-block.c'))
 softmmu_ss.add(when: 'CONFIG_TC58128', if_true: files('tc58128.c'))
 
-specific_ss.add(when: 'CONFIG_VIRTIO_BLK', if_true: files('virtio-blk.c'))
-specific_ss.add(when: 'CONFIG_VHOST_USER_BLK', if_true: 
files('vhost-user-blk.c'))
+specific_ss.add(when: 'CONFIG_VIRTIO_BLK', if_true: files('virtio-blk.c', 
'virtio-blk-common.c'))
+specific_ss.add(when: 'CONFIG_VHOST_USER_BLK', if_true: 
files('vhost-user-blk.c', 'virtio-blk-common.c'))
 
 subdir('dataplane')
diff --git a/hw/block/virtio-blk-common.c b/hw/block/virtio-blk-common.c
new file mode 100644
index 00..ac52d7c176
--- /dev/null
+++ b/hw/block/virtio-blk-common.c
@@ -0,0 +1,39 @@
+/*
+ * Virtio Block Device common helpers
+ *
+ * Copyright IBM, Corp. 2007
+ *
+ * Authors:
+ *  Anthony Liguori   
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+
+#include "standard-headers/linux/virtio_blk.h"
+#include "hw/virtio/virtio.h"
+#include "hw/virtio/virtio-blk-common.h"
+
+/* Config size before the discard support (hide associated config fields) */
+#define VIRTIO_BLK_CFG_SIZE offsetof(struct virtio_blk_config, \
+ max_discard_sectors)
+
+/*
+ * Starting from the discard feature, we can use this array to properly
+ * set the config size depending on the features enabled.
+ */
+static const VirtIOFeature feature_sizes[] = {
+{.flags = 1ULL << VIRTIO_BLK_F_DISCARD,
+ .end = endof(struct virtio_blk_config, discard_sector_alignment)},
+{.flags = 1ULL << VIRTIO_BLK_F_WRITE_ZEROES,
+ .end = endof(struct virtio_blk_config, write_zeroes_may_unmap)},
+{}
+};
+
+const VirtIOConfigSizeParams virtio_blk_cfg_size_params = {
+.min_size = VIRTIO_BLK_CFG_SIZE,
+.max_size = sizeof(struct virtio_blk_config),
+.feature_sizes = feature_sizes
+};
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 10c47c2934..8131ec2dbc 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -32,29 +32,9 @@
 #include "hw/virtio/virtio-bus.h"
 #include "migration/qemu-file-types.h"
 #include "hw/virtio/virtio-access.h"
+#include "hw/virtio/virtio-blk-common.h"
 #include "qemu/coroutine.h"
 
-/* Config size before the discard support (hide associated config fields) */
-#define VIRTIO_BLK_CFG_SIZE offsetof(struct virtio_blk_config, \
- max_discard_sectors)
-/*
- * Starting from the discard feature, we can use this array to properly
- * set the config size depending on the features enabled.
- */
-static const VirtIOFeature feature_sizes[] = {
-{.flags = 1ULL << VIRTIO_BLK_F_DISCARD,
- .end = endof(struct virtio_blk_config, discard_sector_alignment)},
-{.flags = 1ULL << VIRTIO_BLK_F_WRITE_ZEROES,
- .end = endof(struct virtio_blk_config, write_zeroes_may_unmap)},
-{}
-};
-
-static const VirtIOConfigSizeParams cfg_size_params = {
-.min_size = VIRTIO_BLK_CFG_SIZE,
-.max_size = sizeof(struct virtio_blk_config),
-.feature_sizes = feature_sizes
-};
-
 static void virtio_blk_init_requ

Re: [PATCH v1 2/5] virtio-blk: move config space sizing code to virtio-blk-common

2022-08-24 Thread Daniil Tatianin




On 8/24/22 9:13 PM, Stefan Hajnoczi wrote:

On Wed, Aug 24, 2022 at 12:18:34PM +0300, Daniil Tatianin wrote:

+size_t virtio_blk_common_get_config_size(uint64_t host_features)
+{
+size_t config_size = MAX(VIRTIO_BLK_CFG_SIZE,
+virtio_feature_get_config_size(feature_sizes, host_features));
+
+assert(config_size <= sizeof(struct virtio_blk_config));
+return config_size;
+}


This logic is common to all VIRTIO devices and I think it can be moved
to virtio_feature_get_config_size(). Then
virtio_blk_common_get_config_size() is no longer necessary and the
generic virtio_feature_get_config_size() can be called directly.

The only virtio-blk common part would be the
virtio_feature_get_config_size() parameter struct that describes the
minimum and maximum config space size, as well as how the feature bits
affect the size:

   size = virtio_feature_get_config_size(virtio_blk_config_size_params, 
host_features)

where virtio_blk_config_size_params is:

   const VirtIOConfigSizeParams virtio_blk_config_size_params = {
   .min_size = offsetof(struct virtio_blk_config, max_discard_sectors),
   .max_size = sizeof(struct virtio_blk_config),
   .features = {
   {.flags = 1ULL << VIRTIO_BLK_F_DISCARD,
.end = endof(struct virtio_blk_config, discard_sector_alignment)},
   ...,
   },
   };

Then virtio-blk-common.h just needs to define:

   extern const VirtIOConfigSizeParams virtio_blk_config_size_params;

Taking it one step further, maybe VirtioDeviceClass should include a
const VirtIOConfigSizeParams *config_size_params field so
vdev->config_size can be computed by common VIRTIO code and the devices
only need to describe the parameters.


I think that's a great idea! Do you think it should be done 
automatically in 'virtio_init' if this field is not NULL? One problem I 
see with that is that you would have to make all virtio devices use 
'parent_obj.host_features' for feature properties, which is currently 
far from true, but then again this is very much opt-in. Another thing 
you could do is add a separate helper for that, which maybe defeats the 
purpose a little bit?




Stefan




Re: [PATCH v1 3/5] vhost-user-blk: make it possible to disable write-zeroes/discard

2022-08-24 Thread Daniil Tatianin

On 8/24/22 9:00 PM, Stefan Hajnoczi wrote:

On Wed, Aug 24, 2022 at 12:18:35PM +0300, Daniil Tatianin wrote:

diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index 9117222456..e89164c358 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -251,6 +251,8 @@ static uint64_t vhost_user_blk_get_features(VirtIODevice 
*vdev,
  {
  VHostUserBlk *s = VHOST_USER_BLK(vdev);
  
+features |= s->host_features;


I think you can eliminate this if you use vdev->host_features in the
qdev properties instead of adding a separate s->host_features field.
That will simplify the code.
Indeed, thanks for spotting that. I wonder why every virtio device 
implementation I've looked at has chosen to add their own host_features 
field (net/blk)?




[PATCH v1 1/5] virtio-blk: decouple config size determination code from VirtIOBlock

2022-08-24 Thread Daniil Tatianin
Make it more stand-alone so that we can reuse it for other virtio-blk
devices that are not VirtIOBlock in the future commits.

Signed-off-by: Daniil Tatianin 
---
 hw/block/virtio-blk.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index e9ba752f6b..a4162dbbf2 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -49,12 +49,13 @@ static const VirtIOFeature feature_sizes[] = {
 {}
 };
 
-static void virtio_blk_set_config_size(VirtIOBlock *s, uint64_t host_features)
+static size_t virtio_blk_common_get_config_size(uint64_t host_features)
 {
-s->config_size = MAX(VIRTIO_BLK_CFG_SIZE,
+size_t config_size = MAX(VIRTIO_BLK_CFG_SIZE,
 virtio_feature_get_config_size(feature_sizes, host_features));
 
-assert(s->config_size <= sizeof(struct virtio_blk_config));
+assert(config_size <= sizeof(struct virtio_blk_config));
+return config_size;
 }
 
 static void virtio_blk_init_request(VirtIOBlock *s, VirtQueue *vq,
@@ -1204,7 +1205,7 @@ static void virtio_blk_device_realize(DeviceState *dev, 
Error **errp)
 return;
 }
 
-virtio_blk_set_config_size(s, s->host_features);
+s->config_size = virtio_blk_common_get_config_size(s->host_features);
 
 virtio_init(vdev, VIRTIO_ID_BLOCK, s->config_size);
 
-- 
2.25.1




[PATCH v1 2/5] virtio-blk: move config space sizing code to virtio-blk-common

2022-08-24 Thread Daniil Tatianin
This way we can reuse it for other virtio-blk devices, e.g
vhost-user-blk, which currently does not control its config space size
dynamically.

Signed-off-by: Daniil Tatianin 
---
 MAINTAINERS   |  4 +++
 hw/block/meson.build  |  4 +--
 hw/block/virtio-blk-common.c  | 42 +++
 hw/block/virtio-blk.c | 24 +--
 include/hw/virtio/virtio-blk-common.h | 21 ++
 5 files changed, 70 insertions(+), 25 deletions(-)
 create mode 100644 hw/block/virtio-blk-common.c
 create mode 100644 include/hw/virtio/virtio-blk-common.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 5ce4227ff6..a7d3914735 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2030,8 +2030,10 @@ virtio-blk
 M: Stefan Hajnoczi 
 L: qemu-bl...@nongnu.org
 S: Supported
+F: hw/block/virtio-blk-common.c
 F: hw/block/virtio-blk.c
 F: hw/block/dataplane/*
+F: include/hw/virtio/virtio-blk-common.h
 F: tests/qtest/virtio-blk-test.c
 T: git https://github.com/stefanha/qemu.git block
 
@@ -2271,11 +2273,13 @@ S: Maintained
 F: contrib/vhost-user-blk/
 F: contrib/vhost-user-scsi/
 F: hw/block/vhost-user-blk.c
+F: hw/block/virtio-blk-common.c
 F: hw/scsi/vhost-user-scsi.c
 F: hw/virtio/vhost-user-blk-pci.c
 F: hw/virtio/vhost-user-scsi-pci.c
 F: include/hw/virtio/vhost-user-blk.h
 F: include/hw/virtio/vhost-user-scsi.h
+F: include/hw/virtio/virtio-blk-common.h
 
 vhost-user-gpu
 M: Marc-André Lureau 
diff --git a/hw/block/meson.build b/hw/block/meson.build
index 2389326112..1908abd45c 100644
--- a/hw/block/meson.build
+++ b/hw/block/meson.build
@@ -16,7 +16,7 @@ softmmu_ss.add(when: 'CONFIG_SWIM', if_true: files('swim.c'))
 softmmu_ss.add(when: 'CONFIG_XEN', if_true: files('xen-block.c'))
 softmmu_ss.add(when: 'CONFIG_TC58128', if_true: files('tc58128.c'))
 
-specific_ss.add(when: 'CONFIG_VIRTIO_BLK', if_true: files('virtio-blk.c'))
-specific_ss.add(when: 'CONFIG_VHOST_USER_BLK', if_true: 
files('vhost-user-blk.c'))
+specific_ss.add(when: 'CONFIG_VIRTIO_BLK', if_true: files('virtio-blk.c', 
'virtio-blk-common.c'))
+specific_ss.add(when: 'CONFIG_VHOST_USER_BLK', if_true: 
files('vhost-user-blk.c', 'virtio-blk-common.c'))
 
 subdir('dataplane')
diff --git a/hw/block/virtio-blk-common.c b/hw/block/virtio-blk-common.c
new file mode 100644
index 00..ac54568eb6
--- /dev/null
+++ b/hw/block/virtio-blk-common.c
@@ -0,0 +1,42 @@
+/*
+ * Virtio Block Device common helpers
+ *
+ * Copyright IBM, Corp. 2007
+ *
+ * Authors:
+ *  Anthony Liguori   
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+
+#include "standard-headers/linux/virtio_blk.h"
+#include "hw/virtio/virtio.h"
+#include "hw/virtio/virtio-blk-common.h"
+
+/* Config size before the discard support (hide associated config fields) */
+#define VIRTIO_BLK_CFG_SIZE offsetof(struct virtio_blk_config, \
+ max_discard_sectors)
+
+/*
+ * Starting from the discard feature, we can use this array to properly
+ * set the config size depending on the features enabled.
+ */
+static VirtIOFeature feature_sizes[] = {
+{.flags = 1ULL << VIRTIO_BLK_F_DISCARD,
+ .end = endof(struct virtio_blk_config, discard_sector_alignment)},
+{.flags = 1ULL << VIRTIO_BLK_F_WRITE_ZEROES,
+ .end = endof(struct virtio_blk_config, write_zeroes_may_unmap)},
+{}
+};
+
+size_t virtio_blk_common_get_config_size(uint64_t host_features)
+{
+size_t config_size = MAX(VIRTIO_BLK_CFG_SIZE,
+virtio_feature_get_config_size(feature_sizes, host_features));
+
+assert(config_size <= sizeof(struct virtio_blk_config));
+return config_size;
+}
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index a4162dbbf2..4ca6d0f211 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -32,31 +32,9 @@
 #include "hw/virtio/virtio-bus.h"
 #include "migration/qemu-file-types.h"
 #include "hw/virtio/virtio-access.h"
+#include "hw/virtio/virtio-blk-common.h"
 #include "qemu/coroutine.h"
 
-/* Config size before the discard support (hide associated config fields) */
-#define VIRTIO_BLK_CFG_SIZE offsetof(struct virtio_blk_config, \
- max_discard_sectors)
-/*
- * Starting from the discard feature, we can use this array to properly
- * set the config size depending on the features enabled.
- */
-static const VirtIOFeature feature_sizes[] = {
-{.flags = 1ULL << VIRTIO_BLK_F_DISCARD,
- .end = endof(struct virtio_blk_config, discard_sector_alignment)},
-{.flags = 1ULL << VIRTIO_BLK_F_WRITE_ZEROES,
- .end = endof(struct virtio_blk_config, write_zeroes_may_unmap)},
-{}
-};
-
-static size_t virtio_blk_common_get_config_size(uint64_t host_features)
-{
-size_t config_size = MAX(VIRTIO_BLK_CFG_SIZ

[PATCH v1 0/5] vhost-user-blk: dynamically resize config space based on features

2022-08-24 Thread Daniil Tatianin
This patch set attempts to align vhost-user-blk with virtio-blk in
terms of backward compatibility and flexibility.

In particular it adds the following things:
- Ability to disable modern features like discard/write-zeroes.
- Dynamic configuration space resizing based on enabled features,
  by reusing the code, which was already present in virtio-blk.
- Makes the VHostUserBlk structure a bit less clunky by using the
  'host_features' field to represent enabled features, as opposed to
  using a separate field per feature. This was already done for
  virtio-blk a long time ago.

Daniil Tatianin (5):
  virtio-blk: decouple config size determination code from VirtIOBlock
  virtio-blk: move config space sizing code to virtio-blk-common
  vhost-user-blk: make it possible to disable write-zeroes/discard
  vhost-user-blk: make 'config_wce' part of 'host_features'
  vhost-user-blk: dynamically resize config space based on features

 MAINTAINERS   |  4 +++
 hw/block/meson.build  |  4 +--
 hw/block/vhost-user-blk.c | 29 +-
 hw/block/virtio-blk-common.c  | 42 +++
 hw/block/virtio-blk.c | 25 ++--
 include/hw/virtio/vhost-user-blk.h|  4 ++-
 include/hw/virtio/virtio-blk-common.h | 21 ++
 7 files changed, 90 insertions(+), 39 deletions(-)
 create mode 100644 hw/block/virtio-blk-common.c
 create mode 100644 include/hw/virtio/virtio-blk-common.h

-- 
2.25.1




[PATCH v1 5/5] vhost-user-blk: dynamically resize config space based on features

2022-08-24 Thread Daniil Tatianin
Make vhost-user-blk backwards compatible when migrating from older VMs
running with modern features turned off, the same way it was done for
virtio-blk in 20764be0421c ("virtio-blk: set config size depending on the 
features enabled")

It's currently impossible to migrate from an older VM with
vhost-user-blk (with disable-legacy=off) because of errors like this:

qemu-system-x86_64: get_pci_config_device: Bad config data: i=0x10 read: 41 
device: 1 cmask: ff wmask: 80 w1cmask:0
qemu-system-x86_64: Failed to load PCIDevice:config
qemu-system-x86_64: Failed to load virtio-blk:virtio
qemu-system-x86_64: error while loading state for instance 0x0 of device 
':00:05.0:00.0:02.0/virtio-blk'
qemu-system-x86_64: load of migration failed: Invalid argument

This is caused by the newer (destination) VM requiring a bigger BAR0
alignment because it has to cover a bigger configuration space, which
isn't actually needed since those additional config fields are not
active (write-zeroes/discard).

Signed-off-by: Daniil Tatianin 
---
 hw/block/vhost-user-blk.c  | 15 ---
 include/hw/virtio/vhost-user-blk.h |  1 +
 2 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index 64f3457373..d18a7a2cd4 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -23,6 +23,7 @@
 #include "hw/qdev-core.h"
 #include "hw/qdev-properties.h"
 #include "hw/qdev-properties-system.h"
+#include "hw/virtio/virtio-blk-common.h"
 #include "hw/virtio/vhost.h"
 #include "hw/virtio/vhost-user-blk.h"
 #include "hw/virtio/virtio.h"
@@ -63,7 +64,7 @@ static void vhost_user_blk_update_config(VirtIODevice *vdev, 
uint8_t *config)
 /* Our num_queues overrides the device backend */
 virtio_stw_p(vdev, >blkcfg.num_queues, s->num_queues);
 
-memcpy(config, >blkcfg, sizeof(struct virtio_blk_config));
+memcpy(config, >blkcfg, s->config_size);
 }
 
 static void vhost_user_blk_set_config(VirtIODevice *vdev, const uint8_t 
*config)
@@ -96,8 +97,7 @@ static int vhost_user_blk_handle_config_change(struct 
vhost_dev *dev)
 Error *local_err = NULL;
 
 ret = vhost_dev_get_config(dev, (uint8_t *),
-   sizeof(struct virtio_blk_config),
-   _err);
+   s->config_size, _err);
 if (ret < 0) {
 error_report_err(local_err);
 return ret;
@@ -106,7 +106,7 @@ static int vhost_user_blk_handle_config_change(struct 
vhost_dev *dev)
 /* valid for resize only */
 if (blkcfg.capacity != s->blkcfg.capacity) {
 s->blkcfg.capacity = blkcfg.capacity;
-memcpy(dev->vdev->config, >blkcfg, sizeof(struct 
virtio_blk_config));
+memcpy(dev->vdev->config, >blkcfg, s->config_size);
 virtio_notify_config(dev->vdev);
 }
 
@@ -444,7 +444,7 @@ static int vhost_user_blk_realize_connect(VHostUserBlk *s, 
Error **errp)
 assert(s->connected);
 
 ret = vhost_dev_get_config(>dev, (uint8_t *)>blkcfg,
-   sizeof(struct virtio_blk_config), errp);
+   s->config_size, errp);
 if (ret < 0) {
 qemu_chr_fe_disconnect(>chardev);
 vhost_dev_cleanup(>dev);
@@ -489,8 +489,9 @@ static void vhost_user_blk_device_realize(DeviceState *dev, 
Error **errp)
 return;
 }
 
-virtio_init(vdev, VIRTIO_ID_BLOCK,
-sizeof(struct virtio_blk_config));
+s->config_size = virtio_blk_common_get_config_size(s->host_features);
+
+virtio_init(vdev, VIRTIO_ID_BLOCK, s->config_size);
 
 s->virtqs = g_new(VirtQueue *, s->num_queues);
 for (i = 0; i < s->num_queues; i++) {
diff --git a/include/hw/virtio/vhost-user-blk.h 
b/include/hw/virtio/vhost-user-blk.h
index 6252095c45..b7810360b9 100644
--- a/include/hw/virtio/vhost-user-blk.h
+++ b/include/hw/virtio/vhost-user-blk.h
@@ -52,6 +52,7 @@ struct VHostUserBlk {
 bool started_vu;
 
 uint64_t host_features;
+size_t config_size;
 };
 
 #endif
-- 
2.25.1




[PATCH v1 3/5] vhost-user-blk: make it possible to disable write-zeroes/discard

2022-08-24 Thread Daniil Tatianin
It is useful to have the ability to disable these features for
compatibility with older VMs that don't have these implemented.

Signed-off-by: Daniil Tatianin 
---
 hw/block/vhost-user-blk.c  | 8 ++--
 include/hw/virtio/vhost-user-blk.h | 2 ++
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index 9117222456..e89164c358 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -251,6 +251,8 @@ static uint64_t vhost_user_blk_get_features(VirtIODevice 
*vdev,
 {
 VHostUserBlk *s = VHOST_USER_BLK(vdev);
 
+features |= s->host_features;
+
 /* Turn on pre-defined features */
 virtio_add_feature(, VIRTIO_BLK_F_SIZE_MAX);
 virtio_add_feature(, VIRTIO_BLK_F_SEG_MAX);
@@ -259,8 +261,6 @@ static uint64_t vhost_user_blk_get_features(VirtIODevice 
*vdev,
 virtio_add_feature(, VIRTIO_BLK_F_BLK_SIZE);
 virtio_add_feature(, VIRTIO_BLK_F_FLUSH);
 virtio_add_feature(, VIRTIO_BLK_F_RO);
-virtio_add_feature(, VIRTIO_BLK_F_DISCARD);
-virtio_add_feature(, VIRTIO_BLK_F_WRITE_ZEROES);
 
 if (s->config_wce) {
 virtio_add_feature(, VIRTIO_BLK_F_CONFIG_WCE);
@@ -592,6 +592,10 @@ static Property vhost_user_blk_properties[] = {
VHOST_USER_BLK_AUTO_NUM_QUEUES),
 DEFINE_PROP_UINT32("queue-size", VHostUserBlk, queue_size, 128),
 DEFINE_PROP_BIT("config-wce", VHostUserBlk, config_wce, 0, true),
+DEFINE_PROP_BIT64("discard", VHostUserBlk, host_features,
+  VIRTIO_BLK_F_DISCARD, true),
+DEFINE_PROP_BIT64("write-zeroes", VHostUserBlk, host_features,
+  VIRTIO_BLK_F_WRITE_ZEROES, true),
 DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/include/hw/virtio/vhost-user-blk.h 
b/include/hw/virtio/vhost-user-blk.h
index 7c91f15040..20573dd586 100644
--- a/include/hw/virtio/vhost-user-blk.h
+++ b/include/hw/virtio/vhost-user-blk.h
@@ -51,6 +51,8 @@ struct VHostUserBlk {
 bool connected;
 /* vhost_user_blk_start/vhost_user_blk_stop */
 bool started_vu;
+
+uint64_t host_features;
 };
 
 #endif
-- 
2.25.1




[PATCH v1 4/5] vhost-user-blk: make 'config_wce' part of 'host_features'

2022-08-24 Thread Daniil Tatianin
No reason to have this be a separate field. This also makes it more akin
to what the virtio-blk device does.

Signed-off-by: Daniil Tatianin 
---
 hw/block/vhost-user-blk.c  | 6 ++
 include/hw/virtio/vhost-user-blk.h | 1 -
 2 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index e89164c358..64f3457373 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -262,9 +262,6 @@ static uint64_t vhost_user_blk_get_features(VirtIODevice 
*vdev,
 virtio_add_feature(, VIRTIO_BLK_F_FLUSH);
 virtio_add_feature(, VIRTIO_BLK_F_RO);
 
-if (s->config_wce) {
-virtio_add_feature(, VIRTIO_BLK_F_CONFIG_WCE);
-}
 if (s->num_queues > 1) {
 virtio_add_feature(, VIRTIO_BLK_F_MQ);
 }
@@ -591,7 +588,8 @@ static Property vhost_user_blk_properties[] = {
 DEFINE_PROP_UINT16("num-queues", VHostUserBlk, num_queues,
VHOST_USER_BLK_AUTO_NUM_QUEUES),
 DEFINE_PROP_UINT32("queue-size", VHostUserBlk, queue_size, 128),
-DEFINE_PROP_BIT("config-wce", VHostUserBlk, config_wce, 0, true),
+DEFINE_PROP_BIT64("config-wce", VHostUserBlk, host_features,
+  VIRTIO_BLK_F_CONFIG_WCE, true),
 DEFINE_PROP_BIT64("discard", VHostUserBlk, host_features,
   VIRTIO_BLK_F_DISCARD, true),
 DEFINE_PROP_BIT64("write-zeroes", VHostUserBlk, host_features,
diff --git a/include/hw/virtio/vhost-user-blk.h 
b/include/hw/virtio/vhost-user-blk.h
index 20573dd586..6252095c45 100644
--- a/include/hw/virtio/vhost-user-blk.h
+++ b/include/hw/virtio/vhost-user-blk.h
@@ -34,7 +34,6 @@ struct VHostUserBlk {
 struct virtio_blk_config blkcfg;
 uint16_t num_queues;
 uint32_t queue_size;
-uint32_t config_wce;
 struct vhost_dev dev;
 struct vhost_inflight *inflight;
 VhostUserState vhost_user;
-- 
2.25.1




Re: [PATCH v1 2/2] osdep: support mempolicy for preallocation in os_mem_prealloc

2021-12-07 Thread Daniil Tatianin
I believe you're right. Looking at the implementation of shmem_alloc_page, it uses the inode policy, which is set viavma->set_policy (from the mbind() call in this case). set_mempolicy is both useless and redundant here, as thread'spolicy is only ever used in case vma->get_policy returns NULL (which it doesn't in our case).Sorry for the confusion.Thanks,Daniil 07.12.2021, 11:13, "David Hildenbrand" :On 07.12.21 08:06, Daniil Tatianin wrote: This is needed for cases where we want to make sure that a shared memory region gets allocated from a specific NUMA node. This is impossible to do with mbind(2) because it ignores the policy for memory mapped with MAP_SHARED. We work around this by calling set_mempolicy from prealloc threads instead.That's not quite true I think, how were you able to observe this? Do youhave a reproducer?While the man page says:"The specified policy will be ignored for any MAP_SHARED mappings inthe specified memory range. Rather the pages will be allocatedaccording to the memory policy of the thread that caused the page to beallocated. Again, this may not be the thread that called mbind()."What it really means is that as long as we access that memory via the*VMA* for which we called mbind(), which is the case when *not* usingfallocate() to preallocate memory, we end up using the correct policy.I did experiments a while ago with hugetlbfs shared memory and itproperly allocated from the right node. So I'd be curious how youtrigger this. --Thanks,David / dhildenb 

[PATCH v1 1/2] hostmem: use a static size for maxnode, validate policy everywhere

2021-12-06 Thread Daniil Tatianin
Previously we would calculate the last set bit in the mask, and add
2 to that value to get the maxnode value. This is unnecessary since
the mbind syscall allows the bitmap to be any (reasonable) size as
long as all the unused bits are clear. This also adds policy validation
in multiple places so that it's guaranteed to be valid when we call
mbind.

Signed-off-by: Daniil Tatianin 
---
 backends/hostmem.c | 64 +++---
 1 file changed, 43 insertions(+), 21 deletions(-)

diff --git a/backends/hostmem.c b/backends/hostmem.c
index 4c05862ed5..392026efe6 100644
--- a/backends/hostmem.c
+++ b/backends/hostmem.c
@@ -38,6 +38,29 @@ host_memory_backend_get_name(HostMemoryBackend *backend)
 return object_get_canonical_path(OBJECT(backend));
 }
 
+static bool
+validate_policy(HostMemPolicy policy, bool nodes_empty, Error **errp)
+{
+/*
+ * check for invalid host-nodes and policies and give more verbose
+ * error messages than mbind().
+ */
+if (!nodes_empty && policy == MPOL_DEFAULT) {
+error_setg(errp, "host-nodes must be empty for policy default,"
+   " or you should explicitly specify a policy other"
+   " than default");
+return false;
+}
+
+if (nodes_empty && policy != MPOL_DEFAULT) {
+error_setg(errp, "host-nodes must be set for policy %s",
+   HostMemPolicy_str(policy));
+return false;
+}
+
+return true;
+}
+
 static void
 host_memory_backend_get_size(Object *obj, Visitor *v, const char *name,
  void *opaque, Error **errp)
@@ -110,6 +133,7 @@ host_memory_backend_set_host_nodes(Object *obj, Visitor *v, 
const char *name,
 #ifdef CONFIG_NUMA
 HostMemoryBackend *backend = MEMORY_BACKEND(obj);
 uint16List *l, *host_nodes = NULL;
+bool nodes_empty = bitmap_empty(backend->host_nodes, MAX_NODES + 1);
 
 visit_type_uint16List(v, name, _nodes, errp);
 
@@ -118,6 +142,13 @@ host_memory_backend_set_host_nodes(Object *obj, Visitor 
*v, const char *name,
 error_setg(errp, "Invalid host-nodes value: %d", l->value);
 goto out;
 }
+
+nodes_empty = false;
+}
+
+if (host_memory_backend_mr_inited(backend) &&
+!validate_policy(backend->policy, nodes_empty, errp)) {
+goto out;
 }
 
 for (l = host_nodes; l; l = l->next) {
@@ -142,6 +173,13 @@ static void
 host_memory_backend_set_policy(Object *obj, int policy, Error **errp)
 {
 HostMemoryBackend *backend = MEMORY_BACKEND(obj);
+bool nodes_empty = bitmap_empty(backend->host_nodes, MAX_NODES + 1);
+
+if (host_memory_backend_mr_inited(backend) &&
+!validate_policy(policy, nodes_empty, errp)) {
+return;
+}
+
 backend->policy = policy;
 
 #ifndef CONFIG_NUMA
@@ -347,24 +385,9 @@ host_memory_backend_memory_complete(UserCreatable *uc, 
Error **errp)
 qemu_madvise(ptr, sz, QEMU_MADV_DONTDUMP);
 }
 #ifdef CONFIG_NUMA
-unsigned long lastbit = find_last_bit(backend->host_nodes, MAX_NODES);
-/* lastbit == MAX_NODES means maxnode = 0 */
-unsigned long maxnode = (lastbit + 1) % (MAX_NODES + 1);
-/* ensure policy won't be ignored in case memory is preallocated
- * before mbind(). note: MPOL_MF_STRICT is ignored on hugepages so
- * this doesn't catch hugepage case. */
 unsigned flags = MPOL_MF_STRICT | MPOL_MF_MOVE;
-
-/* check for invalid host-nodes and policies and give more verbose
- * error messages than mbind(). */
-if (maxnode && backend->policy == MPOL_DEFAULT) {
-error_setg(errp, "host-nodes must be empty for policy default,"
-   " or you should explicitly specify a policy other"
-   " than default");
-return;
-} else if (maxnode == 0 && backend->policy != MPOL_DEFAULT) {
-error_setg(errp, "host-nodes must be set for policy %s",
-   HostMemPolicy_str(backend->policy));
+bool nodes_empty = bitmap_empty(backend->host_nodes, MAX_NODES + 1);
+if (!validate_policy(backend->policy, nodes_empty, errp)) {
 return;
 }
 
@@ -373,12 +396,11 @@ host_memory_backend_memory_complete(UserCreatable *uc, 
Error **errp)
  * cuts off the last specified node. This means backend->host_nodes
  * must have MAX_NODES+1 bits available.
  */
-assert(sizeof(backend->host_nodes) >=
+QEMU_BUILD_BUG_ON(sizeof(backend->host_nodes) <
BITS_TO_LONGS(MAX_NODES + 1) * sizeof(unsigned long));
-assert(maxnode <= MAX_NODES);
 
-if (maxnode &&
-mbind(ptr, sz, backend->policy, backend->host_nodes, max

[PATCH v1 2/2] osdep: support mempolicy for preallocation in os_mem_prealloc

2021-12-06 Thread Daniil Tatianin
This is needed for cases where we want to make sure that a shared memory
region gets allocated from a specific NUMA node. This is impossible to do
with mbind(2) because it ignores the policy for memory mapped with
MAP_SHARED. We work around this by calling set_mempolicy from prealloc
threads instead.

Signed-off-by: Daniil Tatianin 
---
 backends/hostmem.c   |  6 --
 include/qemu/osdep.h |  3 ++-
 util/meson.build |  2 ++
 util/oslib-posix.c   | 29 ++---
 util/oslib-win32.c   |  3 ++-
 5 files changed, 36 insertions(+), 7 deletions(-)

diff --git a/backends/hostmem.c b/backends/hostmem.c
index 392026efe6..0c508ed9df 100644
--- a/backends/hostmem.c
+++ b/backends/hostmem.c
@@ -269,7 +269,8 @@ static void host_memory_backend_set_prealloc(Object *obj, 
bool value,
 void *ptr = memory_region_get_ram_ptr(>mr);
 uint64_t sz = memory_region_size(>mr);
 
-os_mem_prealloc(fd, ptr, sz, backend->prealloc_threads, _err);
+os_mem_prealloc(fd, ptr, sz, backend->prealloc_threads, 
backend->policy,
+backend->host_nodes, MAX_NODES + 1, _err);
 if (local_err) {
 error_propagate(errp, local_err);
 return;
@@ -415,7 +416,8 @@ host_memory_backend_memory_complete(UserCreatable *uc, 
Error **errp)
  */
 if (backend->prealloc) {
 os_mem_prealloc(memory_region_get_fd(>mr), ptr, sz,
-backend->prealloc_threads, _err);
+backend->prealloc_threads, backend->policy,
+backend->host_nodes, MAX_NODES + 1, _err);
 if (local_err) {
 goto out;
 }
diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index 60718fc342..abf88aeb0e 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -688,7 +688,8 @@ unsigned long qemu_getauxval(unsigned long type);
 void qemu_set_tty_echo(int fd, bool echo);
 
 void os_mem_prealloc(int fd, char *area, size_t sz, int smp_cpus,
- Error **errp);
+ int policy, unsigned long *node_bitmap,
+ unsigned long max_node, Error **errp);
 
 /**
  * qemu_get_pid_name:
diff --git a/util/meson.build b/util/meson.build
index 05b593055a..25f9fca379 100644
--- a/util/meson.build
+++ b/util/meson.build
@@ -87,3 +87,5 @@ if have_block
 if_false: files('filemonitor-stub.c'))
   util_ss.add(when: 'CONFIG_LINUX', if_true: files('vfio-helpers.c'))
 endif
+
+util_ss.add(when: 'CONFIG_NUMA', if_true: numa)
diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index e8bdb02e1d..bca25698c5 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -38,11 +38,13 @@
 #include "qemu/sockets.h"
 #include "qemu/thread.h"
 #include 
+#include "qemu/bitmap.h"
 #include "qemu/cutils.h"
 #include "qemu/compiler.h"
 
 #ifdef CONFIG_LINUX
 #include 
+#include 
 #endif
 
 #ifdef __FreeBSD__
@@ -79,6 +81,9 @@ struct MemsetThread {
 size_t hpagesize;
 QemuThread pgthread;
 sigjmp_buf env;
+int policy;
+unsigned long *node_bitmap;
+unsigned long max_node;
 };
 typedef struct MemsetThread MemsetThread;
 
@@ -464,6 +469,18 @@ static void *do_touch_pages(void *arg)
 }
 qemu_mutex_unlock(_mutex);
 
+#ifdef CONFIG_NUMA
+if (memset_args->max_node &&
+!bitmap_empty(memset_args->node_bitmap, memset_args->max_node)) {
+long ret = set_mempolicy(memset_args->policy, memset_args->node_bitmap,
+ memset_args->max_node);
+if (ret < 0) {
+memset_thread_failed = true;
+return NULL;
+}
+}
+#endif
+
 /* unblock SIGBUS */
 sigemptyset();
 sigaddset(, SIGBUS);
@@ -510,7 +527,8 @@ static inline int get_memset_num_threads(int smp_cpus)
 }
 
 static bool touch_all_pages(char *area, size_t hpagesize, size_t numpages,
-int smp_cpus)
+int smp_cpus, int policy,
+unsigned long *node_bitmap, unsigned long max_node)
 {
 static gsize initialized = 0;
 size_t numpages_per_thread, leftover;
@@ -533,6 +551,9 @@ static bool touch_all_pages(char *area, size_t hpagesize, 
size_t numpages,
 memset_thread[i].addr = addr;
 memset_thread[i].numpages = numpages_per_thread + (i < leftover);
 memset_thread[i].hpagesize = hpagesize;
+memset_thread[i].policy = policy;
+memset_thread[i].node_bitmap = node_bitmap;
+memset_thread[i].max_node = max_node;
 qemu_thread_create(_thread[i].pgthread, "touch_pages",
do_touch_pages, _thread[i],
QEMU_THREAD_JOINABLE);
@@ -554,7 +575,8 @@ static bool touch_all_pages(char *area, size_t hpagesize, 
size_t numpages,
 }
 
 voi

[PATCH v1] hw/smbios: verify header type for file before using it

2021-11-29 Thread Daniil Tatianin
Signed-off-by: Daniil Tatianin 
---
 hw/smbios/smbios.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/hw/smbios/smbios.c b/hw/smbios/smbios.c
index 7397e56737..c55f77368a 100644
--- a/hw/smbios/smbios.c
+++ b/hw/smbios/smbios.c
@@ -1163,6 +1163,12 @@ void smbios_entry_add(QemuOpts *opts, Error **errp)
 return;
 }
 
+if (header->type > SMBIOS_MAX_TYPE) {
+error_setg(errp,
+   "invalid header type %d!", header->type);
+return;
+}
+
 if (test_bit(header->type, have_fields_bitmap)) {
 error_setg(errp,
"can't load type %d struct, fields already specified!",
-- 
2.25.1




[PATCH v1 2/2] hw/scsi/vhost-scsi: don't double close vhostfd on error

2021-11-29 Thread Daniil Tatianin
vhost_dev_init calls vhost_dev_cleanup on error, which closes vhostfd,
don't double close it.

Signed-off-by: Daniil Tatianin 
---
 hw/scsi/vhost-scsi.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/hw/scsi/vhost-scsi.c b/hw/scsi/vhost-scsi.c
index efb3e14d9e..778f43e4c1 100644
--- a/hw/scsi/vhost-scsi.c
+++ b/hw/scsi/vhost-scsi.c
@@ -222,6 +222,11 @@ static void vhost_scsi_realize(DeviceState *dev, Error 
**errp)
 ret = vhost_dev_init(>dev, (void *)(uintptr_t)vhostfd,
  VHOST_BACKEND_TYPE_KERNEL, 0, errp);
 if (ret < 0) {
+/*
+ * vhost_dev_init calls vhost_dev_cleanup on error, which closes
+ * vhostfd, don't double close it.
+ */
+vhostfd = -1;
 goto free_vqs;
 }
 
@@ -242,7 +247,9 @@ static void vhost_scsi_realize(DeviceState *dev, Error 
**errp)
 error_free(vsc->migration_blocker);
 virtio_scsi_common_unrealize(dev);
  close_fd:
-close(vhostfd);
+if (vhostfd >= 0) {
+close(vhostfd);
+}
 return;
 }
 
-- 
2.25.1




[PATCH v1 1/2] hw/scsi/vhost-scsi: don't leak vqs on error

2021-11-29 Thread Daniil Tatianin
vhost_dev_init calls vhost_dev_cleanup in case of an error during
initialization, which zeroes out the entire vsc->dev as well as the
vsc->dev.vqs pointer. This prevents us from properly freeing it in free_vqs.
Keep a local copy of the pointer so we can free it later.

Signed-off-by: Daniil Tatianin 
---
 hw/scsi/vhost-scsi.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/hw/scsi/vhost-scsi.c b/hw/scsi/vhost-scsi.c
index 039caf2614..efb3e14d9e 100644
--- a/hw/scsi/vhost-scsi.c
+++ b/hw/scsi/vhost-scsi.c
@@ -170,6 +170,7 @@ static void vhost_scsi_realize(DeviceState *dev, Error 
**errp)
 Error *err = NULL;
 int vhostfd = -1;
 int ret;
+struct vhost_virtqueue *vqs = NULL;
 
 if (!vs->conf.wwpn) {
 error_setg(errp, "vhost-scsi: missing wwpn");
@@ -213,7 +214,8 @@ static void vhost_scsi_realize(DeviceState *dev, Error 
**errp)
 }
 
 vsc->dev.nvqs = VHOST_SCSI_VQ_NUM_FIXED + vs->conf.num_queues;
-vsc->dev.vqs = g_new0(struct vhost_virtqueue, vsc->dev.nvqs);
+vqs = g_new0(struct vhost_virtqueue, vsc->dev.nvqs);
+vsc->dev.vqs = vqs;
 vsc->dev.vq_index = 0;
 vsc->dev.backend_features = 0;
 
@@ -232,7 +234,7 @@ static void vhost_scsi_realize(DeviceState *dev, Error 
**errp)
 return;
 
  free_vqs:
-g_free(vsc->dev.vqs);
+g_free(vqs);
 if (!vsc->migratable) {
 migrate_del_blocker(vsc->migration_blocker);
 }
-- 
2.25.1




[PATCH v1] virtio/vhost-vsock: don't double close vhostfd, remove redundant cleanup

2021-11-29 Thread Daniil Tatianin
In case of an error during initialization in vhost_dev_init, vhostfd is
closed in vhost_dev_cleanup. Remove close from err_virtio as it's both
redundant and causes a double close on vhostfd.

Signed-off-by: Daniil Tatianin 
---
 hw/virtio/vhost-vsock.c | 11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/hw/virtio/vhost-vsock.c b/hw/virtio/vhost-vsock.c
index 478c0c9a87..433d42d897 100644
--- a/hw/virtio/vhost-vsock.c
+++ b/hw/virtio/vhost-vsock.c
@@ -171,6 +171,10 @@ static void vhost_vsock_device_realize(DeviceState *dev, 
Error **errp)
 ret = vhost_dev_init(>vhost_dev, (void *)(uintptr_t)vhostfd,
  VHOST_BACKEND_TYPE_KERNEL, 0, errp);
 if (ret < 0) {
+/*
+ * vhostfd is closed by vhost_dev_cleanup, which is called
+ * by vhost_dev_init on initialization error.
+ */
 goto err_virtio;
 }
 
@@ -183,15 +187,10 @@ static void vhost_vsock_device_realize(DeviceState *dev, 
Error **errp)
 return;
 
 err_vhost_dev:
-vhost_dev_cleanup(>vhost_dev);
 /* vhost_dev_cleanup() closes the vhostfd passed to vhost_dev_init() */
-vhostfd = -1;
+vhost_dev_cleanup(>vhost_dev);
 err_virtio:
 vhost_vsock_common_unrealize(vdev);
-if (vhostfd >= 0) {
-close(vhostfd);
-}
-return;
 }
 
 static void vhost_vsock_device_unrealize(DeviceState *dev)
-- 
2.25.1




[PATCH v1] hw/i386/amd_iommu: clean up broken event logging

2021-11-17 Thread Daniil Tatianin
- Don't create evt buffer in every function where we want to log,
  instead make amdvi_log_event construct the buffer in-place using the
  arguments it was given.

- Correctly place address & info in the event buffer. Previously both
  would get shifted to incorrect bit offsets leading to evt containing
  uninitialized event type and address.

- Reduce buffer size from 32 to 16 bytes, which is the amount of bytes
  actually needed for event storage.

- Remove amdvi_encode_event & amdvi_setevent_bits, as they are no longer
  needed.

Signed-off-by: Daniil Tatianin 
---
 hw/i386/amd_iommu.c | 62 ++---
 1 file changed, 14 insertions(+), 48 deletions(-)

diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
index fd75cae024..44b995be91 100644
--- a/hw/i386/amd_iommu.c
+++ b/hw/i386/amd_iommu.c
@@ -164,8 +164,14 @@ static void amdvi_generate_msi_interrupt(AMDVIState *s)
 }
 }
 
-static void amdvi_log_event(AMDVIState *s, uint64_t *evt)
+static void amdvi_log_event(AMDVIState *s, uint16_t devid,
+hwaddr addr, uint16_t info)
 {
+uint64_t evt[2] = {
+devid | ((uint64_t)info << 48),
+addr
+};
+
 /* event logging not enabled */
 if (!s->evtlog_enabled || amdvi_test_mask(s, AMDVI_MMIO_STATUS,
 AMDVI_MMIO_STATUS_EVT_OVF)) {
@@ -190,27 +196,6 @@ static void amdvi_log_event(AMDVIState *s, uint64_t *evt)
 amdvi_generate_msi_interrupt(s);
 }
 
-static void amdvi_setevent_bits(uint64_t *buffer, uint64_t value, int start,
-int length)
-{
-int index = start / 64, bitpos = start % 64;
-uint64_t mask = MAKE_64BIT_MASK(start, length);
-buffer[index] &= ~mask;
-buffer[index] |= (value << bitpos) & mask;
-}
-/*
- * AMDVi event structure
- *0:15   -> DeviceID
- *55:63  -> event type + miscellaneous info
- *63:127 -> related address
- */
-static void amdvi_encode_event(uint64_t *evt, uint16_t devid, uint64_t addr,
-   uint16_t info)
-{
-amdvi_setevent_bits(evt, devid, 0, 16);
-amdvi_setevent_bits(evt, info, 55, 8);
-amdvi_setevent_bits(evt, addr, 63, 64);
-}
 /* log an error encountered during a page walk
  *
  * @addr: virtual address in translation request
@@ -218,11 +203,8 @@ static void amdvi_encode_event(uint64_t *evt, uint16_t 
devid, uint64_t addr,
 static void amdvi_page_fault(AMDVIState *s, uint16_t devid,
  hwaddr addr, uint16_t info)
 {
-uint64_t evt[4];
-
 info |= AMDVI_EVENT_IOPF_I | AMDVI_EVENT_IOPF;
-amdvi_encode_event(evt, devid, addr, info);
-amdvi_log_event(s, evt);
+amdvi_log_event(s, devid, addr, info);
 pci_word_test_and_set_mask(s->pci.dev.config + PCI_STATUS,
 PCI_STATUS_SIG_TARGET_ABORT);
 }
@@ -232,14 +214,10 @@ static void amdvi_page_fault(AMDVIState *s, uint16_t 
devid,
  *  @info : error flags
  */
 static void amdvi_log_devtab_error(AMDVIState *s, uint16_t devid,
-   hwaddr devtab, uint16_t info)
+   hwaddr addr, uint16_t info)
 {
-uint64_t evt[4];
-
 info |= AMDVI_EVENT_DEV_TAB_HW_ERROR;
-
-amdvi_encode_event(evt, devid, devtab, info);
-amdvi_log_event(s, evt);
+amdvi_log_event(s, devid, addr, info);
 pci_word_test_and_set_mask(s->pci.dev.config + PCI_STATUS,
 PCI_STATUS_SIG_TARGET_ABORT);
 }
@@ -248,10 +226,7 @@ static void amdvi_log_devtab_error(AMDVIState *s, uint16_t 
devid,
  */
 static void amdvi_log_command_error(AMDVIState *s, hwaddr addr)
 {
-uint64_t evt[4], info = AMDVI_EVENT_COMMAND_HW_ERROR;
-
-amdvi_encode_event(evt, 0, addr, info);
-amdvi_log_event(s, evt);
+amdvi_log_event(s, 0, addr, AMDVI_EVENT_COMMAND_HW_ERROR);
 pci_word_test_and_set_mask(s->pci.dev.config + PCI_STATUS,
 PCI_STATUS_SIG_TARGET_ABORT);
 }
@@ -261,11 +236,8 @@ static void amdvi_log_command_error(AMDVIState *s, hwaddr 
addr)
 static void amdvi_log_illegalcom_error(AMDVIState *s, uint16_t info,
hwaddr addr)
 {
-uint64_t evt[4];
-
 info |= AMDVI_EVENT_ILLEGAL_COMMAND_ERROR;
-amdvi_encode_event(evt, 0, addr, info);
-amdvi_log_event(s, evt);
+amdvi_log_event(s, 0, addr, info);
 }
 /* log an error accessing device table
  *
@@ -276,11 +248,8 @@ static void amdvi_log_illegalcom_error(AMDVIState *s, 
uint16_t info,
 static void amdvi_log_illegaldevtab_error(AMDVIState *s, uint16_t devid,
   hwaddr addr, uint16_t info)
 {
-uint64_t evt[4];
-
 info |= AMDVI_EVENT_ILLEGAL_DEVTAB_ENTRY;
-amdvi_encode_event(evt, devid, addr, info);
-amdvi_log_event(s, evt);
+amdvi_log_event(s, devid, addr, info);
 }
 /* log an error accessing a PTE entry
  * @addr : address that couldn't be accessed
@@ -288,11 +257,8 @@ static void amdvi_log_illegaldevtab_e

[PATCH v1] chardev/wctable: don't free the instance in wctablet_chr_finalize

2021-11-17 Thread Daniil Tatianin
Object is supposed to be freed by invoking obj->free, and not
obj->instance_finalize. This would lead to use-after-free followed by
double free in object_unref/object_finalize.

Signed-off-by: Daniil Tatianin 
---
 chardev/wctablet.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/chardev/wctablet.c b/chardev/wctablet.c
index e9cb7ca710..fa3c9be04e 100644
--- a/chardev/wctablet.c
+++ b/chardev/wctablet.c
@@ -318,7 +318,6 @@ static void wctablet_chr_finalize(Object *obj)
 TabletChardev *tablet = WCTABLET_CHARDEV(obj);
 
 qemu_input_handler_unregister(tablet->hs);
-g_free(tablet);
 }
 
 static void wctablet_chr_open(Chardev *chr,
-- 
2.25.1