Re: [PATCH 3/3] usbredir: avoid queuing hello packet on snapshot restore

2022-08-12 Thread Joelle van Dyne
On Fri, Aug 12, 2022 at 10:50 PM Victor Toso  wrote:
>
> Hi,
>
> On Fri, Aug 12, 2022 at 10:33:54PM -0700, Joelle van Dyne wrote:
> > On Fri, Aug 12, 2022 at 10:30 PM Victor Toso  wrote:
> > >
> > > Hi,
> > >
> > > On Fri, Aug 12, 2022 at 06:10:31PM -0700, Joelle van Dyne wrote:
> > > > When launching QEMU with "-loadvm", usbredir_create_parser() should 
> > > > avoid
> > > > setting up the hello packet (just as with "-incoming". On the latest 
> > > > version
> > > > of libusbredir, usbredirparser_unserialize() will return error if the 
> > > > parser
> > > > is not "pristine."
> > >
> > > That was wrong in the usbredir side. The fix [0] was merged and
> > > included in the latest 0.13.0 release
> >
> > This is good to know. Should the entire runstate_check in
> > usbredir_create_parser be removed?
>
> From my POV your patch looks correct and would avoid migration
> issues such as [1] with usbredir 0.12.0 that was not patched
>
> [1] https://bugzilla.redhat.com/show_bug.cgi?id=2096008
>
> So, I'd keep the check and the patch :)

I have to admit I'm not too familiar with the inner workings of
libusbredir. But is it desirable to skip the HELLO packet on "loadvm"?
I wrote this on the assumption that it's correct because libusbredir
enforces it, but if that's wrong, then I'm wondering if maybe we need
the HELLO to re-establish communication (that was the issue I triaged
with "-S", when USB devices did not work due to the HELLO packet not
being sent). In a migration, it makes sense that a SPICE client has
not reset the USB device. However, when re-starting QEMU with
"-loadvm", it's possible the USB device has been disconnected and
reconnected. Ideally, we report that to the guest and let it handle
the reset. Would "usbredirparser_fl_no_hello" prevent that?

>
> > > [0] https://gitlab.freedesktop.org/spice/usbredir/-/merge_requests/61
> > >
> > > > Signed-off-by: Joelle van Dyne 
> > > > ---
> > > >  hw/usb/redirect.c | 3 ++-
> > > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/hw/usb/redirect.c b/hw/usb/redirect.c
> > > > index fd7df599bc..47fac3895a 100644
> > > > --- a/hw/usb/redirect.c
> > > > +++ b/hw/usb/redirect.c
> > > > @@ -1280,7 +1280,8 @@ static void usbredir_create_parser(USBRedirDevice 
> > > > *dev)
> > > >  }
> > > >  #endif
> > > >
> > > > -if (runstate_check(RUN_STATE_INMIGRATE)) {
> > > > +if (runstate_check(RUN_STATE_INMIGRATE) ||
> > > > +runstate_check(RUN_STATE_RESTORE_VM)) {
> > > >  flags |= usbredirparser_fl_no_hello;
> > > >  }
> > > >  usbredirparser_init(dev->parser, VERSION, caps, 
> > > > USB_REDIR_CAPS_SIZE,
> > > > --
> > > > 2.28.0
> > > >
> > > >
>
> Cheers,
> Victor
>



Re: [PATCH 3/3] usbredir: avoid queuing hello packet on snapshot restore

2022-08-12 Thread Victor Toso
Hi,

On Fri, Aug 12, 2022 at 10:33:54PM -0700, Joelle van Dyne wrote:
> On Fri, Aug 12, 2022 at 10:30 PM Victor Toso  wrote:
> >
> > Hi,
> >
> > On Fri, Aug 12, 2022 at 06:10:31PM -0700, Joelle van Dyne wrote:
> > > When launching QEMU with "-loadvm", usbredir_create_parser() should avoid
> > > setting up the hello packet (just as with "-incoming". On the latest 
> > > version
> > > of libusbredir, usbredirparser_unserialize() will return error if the 
> > > parser
> > > is not "pristine."
> >
> > That was wrong in the usbredir side. The fix [0] was merged and
> > included in the latest 0.13.0 release
>
> This is good to know. Should the entire runstate_check in
> usbredir_create_parser be removed?

From my POV your patch looks correct and would avoid migration
issues such as [1] with usbredir 0.12.0 that was not patched

[1] https://bugzilla.redhat.com/show_bug.cgi?id=2096008

So, I'd keep the check and the patch :)

> > [0] https://gitlab.freedesktop.org/spice/usbredir/-/merge_requests/61
> >
> > > Signed-off-by: Joelle van Dyne 
> > > ---
> > >  hw/usb/redirect.c | 3 ++-
> > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/hw/usb/redirect.c b/hw/usb/redirect.c
> > > index fd7df599bc..47fac3895a 100644
> > > --- a/hw/usb/redirect.c
> > > +++ b/hw/usb/redirect.c
> > > @@ -1280,7 +1280,8 @@ static void usbredir_create_parser(USBRedirDevice 
> > > *dev)
> > >  }
> > >  #endif
> > >
> > > -if (runstate_check(RUN_STATE_INMIGRATE)) {
> > > +if (runstate_check(RUN_STATE_INMIGRATE) ||
> > > +runstate_check(RUN_STATE_RESTORE_VM)) {
> > >  flags |= usbredirparser_fl_no_hello;
> > >  }
> > >  usbredirparser_init(dev->parser, VERSION, caps, USB_REDIR_CAPS_SIZE,
> > > --
> > > 2.28.0
> > >
> > >

Cheers,
Victor




Re: [PATCH 3/3] usbredir: avoid queuing hello packet on snapshot restore

2022-08-12 Thread Joelle van Dyne
On Fri, Aug 12, 2022 at 10:30 PM Victor Toso  wrote:
>
> Hi,
>
> On Fri, Aug 12, 2022 at 06:10:31PM -0700, Joelle van Dyne wrote:
> > When launching QEMU with "-loadvm", usbredir_create_parser() should avoid
> > setting up the hello packet (just as with "-incoming". On the latest version
> > of libusbredir, usbredirparser_unserialize() will return error if the parser
> > is not "pristine."
>
> That was wrong in the usbredir side. The fix [0] was merged and
> included in the latest 0.13.0 release

This is good to know. Should the entire runstate_check in
usbredir_create_parser be removed?

>
> [0] https://gitlab.freedesktop.org/spice/usbredir/-/merge_requests/61
>
> > Signed-off-by: Joelle van Dyne 
> > ---
> >  hw/usb/redirect.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/hw/usb/redirect.c b/hw/usb/redirect.c
> > index fd7df599bc..47fac3895a 100644
> > --- a/hw/usb/redirect.c
> > +++ b/hw/usb/redirect.c
> > @@ -1280,7 +1280,8 @@ static void usbredir_create_parser(USBRedirDevice 
> > *dev)
> >  }
> >  #endif
> >
> > -if (runstate_check(RUN_STATE_INMIGRATE)) {
> > +if (runstate_check(RUN_STATE_INMIGRATE) ||
> > +runstate_check(RUN_STATE_RESTORE_VM)) {
> >  flags |= usbredirparser_fl_no_hello;
> >  }
> >  usbredirparser_init(dev->parser, VERSION, caps, USB_REDIR_CAPS_SIZE,
> > --
> > 2.28.0
> >
> >
>
> Cheers,
> Victor
>



Re: [PATCH 3/3] usbredir: avoid queuing hello packet on snapshot restore

2022-08-12 Thread Victor Toso
Hi,

On Fri, Aug 12, 2022 at 06:10:31PM -0700, Joelle van Dyne wrote:
> When launching QEMU with "-loadvm", usbredir_create_parser() should avoid
> setting up the hello packet (just as with "-incoming". On the latest version
> of libusbredir, usbredirparser_unserialize() will return error if the parser
> is not "pristine."

That was wrong in the usbredir side. The fix [0] was merged and
included in the latest 0.13.0 release

[0] https://gitlab.freedesktop.org/spice/usbredir/-/merge_requests/61

> Signed-off-by: Joelle van Dyne 
> ---
>  hw/usb/redirect.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/usb/redirect.c b/hw/usb/redirect.c
> index fd7df599bc..47fac3895a 100644
> --- a/hw/usb/redirect.c
> +++ b/hw/usb/redirect.c
> @@ -1280,7 +1280,8 @@ static void usbredir_create_parser(USBRedirDevice *dev)
>  }
>  #endif
>  
> -if (runstate_check(RUN_STATE_INMIGRATE)) {
> +if (runstate_check(RUN_STATE_INMIGRATE) ||
> +runstate_check(RUN_STATE_RESTORE_VM)) {
>  flags |= usbredirparser_fl_no_hello;
>  }
>  usbredirparser_init(dev->parser, VERSION, caps, USB_REDIR_CAPS_SIZE,
> -- 
> 2.28.0
> 
> 

Cheers,
Victor




Re: [PATCH] riscv: Make semihosting configurable for all privilege modes

2022-08-12 Thread Furquan Shaikh
On Fri, Aug 12, 2022 at 7:32 PM Richard Henderson
 wrote:
>
> On 8/12/22 17:50, Furquan Shaikh wrote:
> >> Why do you need such fine-grained control?  What is the use-case?
> >
> > I ran into a problem when I was testing a project (with a microkernel
> > in M-mode and tasks in U-mode) that uses semihosting for debugging.
> > The semihosting worked fine for M-mode but not in U-mode.
>
> Sure.  This would be handled by Peter's proposed userspace-enable=on property.
>
> > As I started
> > digging into this, I realized that this is because qemu restricts
> > semihosting to only M and S modes. From reading the debug spec, I
> > understood that the DCSR presents options for ebreak behavior in each
> > mode including VS and VU.
>
> I strongly suspect that VS also already works, since that's just
>
>env->priv == PRV_S && riscv_cpu_virt_enabled(env)
>
> VU would also be handled by userspace-enable=on.
>
> I do not see any use for 5 separate properties.

It felt more natural to mimic the knobs that are provided by the debug
spec to allow users to easily control the ebreak behavior for
semihosting in all possible modes. I agree that it is possible to just
allow semihosting to be turned on/off for U-mode using the above
proposed config. It would work for the use case I ran into, but if
finer control is required for other cases, it makes sense to provide
the ebreak options. Anyways, if others here feel that the
userspace-enable option is sufficient, we can go ahead with that.

>
>
> r~



Re: [PATCH] riscv: Make semihosting configurable for all privilege modes

2022-08-12 Thread Richard Henderson

On 8/12/22 17:50, Furquan Shaikh wrote:

Why do you need such fine-grained control?  What is the use-case?


I ran into a problem when I was testing a project (with a microkernel
in M-mode and tasks in U-mode) that uses semihosting for debugging.
The semihosting worked fine for M-mode but not in U-mode.


Sure.  This would be handled by Peter's proposed userspace-enable=on property.


As I started
digging into this, I realized that this is because qemu restricts
semihosting to only M and S modes. From reading the debug spec, I
understood that the DCSR presents options for ebreak behavior in each
mode including VS and VU.


I strongly suspect that VS also already works, since that's just

  env->priv == PRV_S && riscv_cpu_virt_enabled(env)

VU would also be handled by userspace-enable=on.

I do not see any use for 5 separate properties.


r~



[PATCH 1/3] Revert "usbredir: avoid queuing hello packet on snapshot restore"

2022-08-12 Thread Joelle van Dyne
Run state is also in RUN_STATE_PRELAUNCH while "-S" is used.

This reverts commit 12d182898a4866e4be418e2abac286b497cfa1b2.

Signed-off-by: Joelle van Dyne 
---
 hw/usb/redirect.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/hw/usb/redirect.c b/hw/usb/redirect.c
index 1bd30efc3e..fd7df599bc 100644
--- a/hw/usb/redirect.c
+++ b/hw/usb/redirect.c
@@ -1280,8 +1280,7 @@ static void usbredir_create_parser(USBRedirDevice *dev)
 }
 #endif
 
-if (runstate_check(RUN_STATE_INMIGRATE) ||
-runstate_check(RUN_STATE_PRELAUNCH)) {
+if (runstate_check(RUN_STATE_INMIGRATE)) {
 flags |= usbredirparser_fl_no_hello;
 }
 usbredirparser_init(dev->parser, VERSION, caps, USB_REDIR_CAPS_SIZE,
-- 
2.28.0




[PATCH 3/3] usbredir: avoid queuing hello packet on snapshot restore

2022-08-12 Thread Joelle van Dyne
When launching QEMU with "-loadvm", usbredir_create_parser() should avoid
setting up the hello packet (just as with "-incoming". On the latest version
of libusbredir, usbredirparser_unserialize() will return error if the parser
is not "pristine."

Signed-off-by: Joelle van Dyne 
---
 hw/usb/redirect.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/usb/redirect.c b/hw/usb/redirect.c
index fd7df599bc..47fac3895a 100644
--- a/hw/usb/redirect.c
+++ b/hw/usb/redirect.c
@@ -1280,7 +1280,8 @@ static void usbredir_create_parser(USBRedirDevice *dev)
 }
 #endif
 
-if (runstate_check(RUN_STATE_INMIGRATE)) {
+if (runstate_check(RUN_STATE_INMIGRATE) ||
+runstate_check(RUN_STATE_RESTORE_VM)) {
 flags |= usbredirparser_fl_no_hello;
 }
 usbredirparser_init(dev->parser, VERSION, caps, USB_REDIR_CAPS_SIZE,
-- 
2.28.0




[PATCH 2/3] vl: on -loadvm set run state to "restore-vm"

2022-08-12 Thread Joelle van Dyne
This allows us to differentiate between a fresh boot and a restore boot.

Signed-off-by: Joelle van Dyne 
---
 softmmu/runstate.c | 1 +
 softmmu/vl.c   | 3 +++
 2 files changed, 4 insertions(+)

diff --git a/softmmu/runstate.c b/softmmu/runstate.c
index 1e68680b9d..fa3dd3a4ab 100644
--- a/softmmu/runstate.c
+++ b/softmmu/runstate.c
@@ -76,6 +76,7 @@ typedef struct {
 
 static const RunStateTransition runstate_transitions_def[] = {
 { RUN_STATE_PRELAUNCH, RUN_STATE_INMIGRATE },
+{ RUN_STATE_PRELAUNCH, RUN_STATE_RESTORE_VM },
 
 { RUN_STATE_DEBUG, RUN_STATE_RUNNING },
 { RUN_STATE_DEBUG, RUN_STATE_FINISH_MIGRATE },
diff --git a/softmmu/vl.c b/softmmu/vl.c
index 706bd7cff7..29586d94ff 100644
--- a/softmmu/vl.c
+++ b/softmmu/vl.c
@@ -3131,6 +3131,9 @@ void qemu_init(int argc, char **argv, char **envp)
 add_device_config(DEV_DEBUGCON, optarg);
 break;
 case QEMU_OPTION_loadvm:
+if (!loadvm) {
+runstate_set(RUN_STATE_RESTORE_VM);
+}
 loadvm = optarg;
 break;
 case QEMU_OPTION_full_screen:
-- 
2.28.0




[PATCH 0/3] Set runstate to RUN_STATE_RESTORE_VM when started with "-loadvm"

2022-08-12 Thread Joelle van Dyne
Previously, there was a bug in usbredir which prevented "-loadvm" from working
because libusbredir's usbredirparser_unserialize() fails when a HELLO packet is
sent as part of the device's initalization.

The fix was to not send HELLO when in the RUN_STATE_PRELAUNCH state. However,
recently we found that this breaks when QEMU is started with "-S" because it
is in the RUN_STATE_PRELAUNCH state while halted before boot as well.

This patch attempts to re-fix the issue by setting the run state to
RUN_STATE_RESTORE_VM when started with "-loadvm" and recognizing that state
in usbredir's initalization.

To make sure there's no unintended side effects, we searched for
"RUN_STATE_RESTORE_VM" in the code base and found only references in
migration/savevm.c, monitor/hmp-cmds.c, and replay/replay-debugging.c. None of
these seems to be affected by RUN_STATE_RESTORE_VM before RUN_STATE_RUNNING.

Joelle van Dyne (3):
  Revert "usbredir: avoid queuing hello packet on snapshot restore"
  vl: on -loadvm set run state to "restore-vm"
  usbredir: avoid queuing hello packet on snapshot restore

 hw/usb/redirect.c  | 2 +-
 softmmu/runstate.c | 1 +
 softmmu/vl.c   | 3 +++
 3 files changed, 5 insertions(+), 1 deletion(-)

-- 
2.28.0




Re: [PATCH] riscv: Make semihosting configurable for all privilege modes

2022-08-12 Thread Furquan Shaikh
On Fri, Aug 12, 2022 at 5:30 PM Richard Henderson
 wrote:
>
> On 8/12/22 16:57, Furquan Shaikh wrote:
> > I am not sure if I understood your comment correctly. Currently, qemu
> > has a check in `riscv_cpu_do_interrupt` to allow semihosting calls
> > only in S and M modes. This prevents semihosting calls from U mode.
> > This patch changes the default behavior of checking `>= PRV_S` for
> > semihosting and instead replaces it with the usage of
> > ebreak{m,s,u,vs,vu} properties to allow the user to configure what
> > modes should allow semihosting calls when `-semihosting` argument is
> > selected.
>
> Why do you need such fine-grained control?  What is the use-case?

I ran into a problem when I was testing a project (with a microkernel
in M-mode and tasks in U-mode) that uses semihosting for debugging.
The semihosting worked fine for M-mode but not in U-mode. As I started
digging into this, I realized that this is because qemu restricts
semihosting to only M and S modes. From reading the debug spec, I
understood that the DCSR presents options for ebreak behavior in each
mode including VS and VU. Adding CPU settable features to enable
ebreak behavior seemed like a flexible solution to support different
cases:
1. Old behavior of restricting semihosting to M and S modes by setting
only ebreakm and ebreaks.
2. My case of enabling semihosting to work in U-mode by setting only ebreaku.
3. Any future case where users would want to enable semihosting in
either M,S,U,VS or VU modes or any combination of these.

>
> r~



Re: [PATCH] riscv: Make semihosting configurable for all privilege modes

2022-08-12 Thread Richard Henderson

On 8/12/22 16:57, Furquan Shaikh wrote:

I am not sure if I understood your comment correctly. Currently, qemu
has a check in `riscv_cpu_do_interrupt` to allow semihosting calls
only in S and M modes. This prevents semihosting calls from U mode.
This patch changes the default behavior of checking `>= PRV_S` for
semihosting and instead replaces it with the usage of
ebreak{m,s,u,vs,vu} properties to allow the user to configure what
modes should allow semihosting calls when `-semihosting` argument is
selected.


Why do you need such fine-grained control?  What is the use-case?

r~



Re: [PATCH] riscv: Make semihosting configurable for all privilege modes

2022-08-12 Thread Furquan Shaikh
On Fri, Aug 12, 2022 at 4:42 PM Richard Henderson
 wrote:
>
> On 8/12/22 16:27, Richard Henderson wrote:
> > On 8/11/22 13:41, Furquan Shaikh wrote:
> >> Unlike ARM, RISC-V does not define a separate breakpoint type for
> >> semihosting. Instead, it is entirely ABI. Thus, we need an option
> >> to allow users to configure what the ebreak behavior should be for
> >> different privilege levels - M, S, U, VS, VU. As per the RISC-V
> >> privilege specification[1], ebreak traps into the execution
> >> environment. However, RISC-V debug specification[2] provides
> >> ebreak{m,s,u,vs,vu} configuration bits to allow ebreak behavior to
> >> be configured to trap into debug mode instead. This change adds
> >> settable properties for RISC-V CPUs - `ebreakm`, `ebreaks`, `ebreaku`,
> >> `ebreakvs` and `ebreakvu` to allow user to configure whether qemu
> >> should treat ebreak as semihosting traps or trap according to the
> >> privilege specification.
> >>
> >> [1]
> >> https://github.com/riscv/riscv-isa-manual/releases/download/draft-20220723-10eea63/riscv-privileged.pdf
> >>
> >> [2] 
> >> https://github.com/riscv/riscv-debug-spec/blob/release/riscv-debug-release.pdf
> >
> > I don't see why you need to change anything at all.
> >
> > Semihosting doesn't only use 'ebreak', but a sequence of 3 insns:
> >
> >  slli x0, x0, 0x1f   # 0x01f01013Entry NOP
> >  ebreak  # 0x00100073Break to debugger
> >  srai x0, x0, 7  # 0x40705013NOP encoding the semihosting 
> > call number 7
> >
> > If the -semihosting command-line argument is absent, then the new DSCR 
> > fields apply as
> > normal.  If the -semihosting command-line argument is present, and the 
> > ebreak is not
> > surrounded by the required nops, then the new DSCR fields apply as normal.  
> > But if the
> > command-line argument is present and the nops are present, then semihosting 
> > overrides the
> > architecture and DSCR does not apply at all.
>
> I note that there's a missing test of semihosting_enabled() in
> target/riscv/insn_trans/trans_privileged.c.inc, and the PRV_S check in
> riscv_cpu_do_interrupt can be done at translation via ctx->mem_idx >= PRV_S.

I am not sure if I understood your comment correctly. Currently, qemu
has a check in `riscv_cpu_do_interrupt` to allow semihosting calls
only in S and M modes. This prevents semihosting calls from U mode.
This patch changes the default behavior of checking `>= PRV_S` for
semihosting and instead replaces it with the usage of
ebreak{m,s,u,vs,vu} properties to allow the user to configure what
modes should allow semihosting calls when `-semihosting` argument is
selected.

>
>
> r~



Re: [PATCH] riscv: Make semihosting configurable for all privilege modes

2022-08-12 Thread Richard Henderson

On 8/12/22 16:27, Richard Henderson wrote:

On 8/11/22 13:41, Furquan Shaikh wrote:

Unlike ARM, RISC-V does not define a separate breakpoint type for
semihosting. Instead, it is entirely ABI. Thus, we need an option
to allow users to configure what the ebreak behavior should be for
different privilege levels - M, S, U, VS, VU. As per the RISC-V
privilege specification[1], ebreak traps into the execution
environment. However, RISC-V debug specification[2] provides
ebreak{m,s,u,vs,vu} configuration bits to allow ebreak behavior to
be configured to trap into debug mode instead. This change adds
settable properties for RISC-V CPUs - `ebreakm`, `ebreaks`, `ebreaku`,
`ebreakvs` and `ebreakvu` to allow user to configure whether qemu
should treat ebreak as semihosting traps or trap according to the
privilege specification.

[1] 
https://github.com/riscv/riscv-isa-manual/releases/download/draft-20220723-10eea63/riscv-privileged.pdf 


[2] 
https://github.com/riscv/riscv-debug-spec/blob/release/riscv-debug-release.pdf


I don't see why you need to change anything at all.

Semihosting doesn't only use 'ebreak', but a sequence of 3 insns:

     slli x0, x0, 0x1f   # 0x01f01013    Entry NOP
     ebreak  # 0x00100073    Break to debugger
     srai x0, x0, 7  # 0x40705013    NOP encoding the semihosting call 
number 7

If the -semihosting command-line argument is absent, then the new DSCR fields apply as 
normal.  If the -semihosting command-line argument is present, and the ebreak is not 
surrounded by the required nops, then the new DSCR fields apply as normal.  But if the 
command-line argument is present and the nops are present, then semihosting overrides the 
architecture and DSCR does not apply at all.


I note that there's a missing test of semihosting_enabled() in 
target/riscv/insn_trans/trans_privileged.c.inc, and the PRV_S check in 
riscv_cpu_do_interrupt can be done at translation via ctx->mem_idx >= PRV_S.



r~



Re: [PATCH] riscv: Make semihosting configurable for all privilege modes

2022-08-12 Thread Furquan Shaikh
On Fri, Aug 12, 2022 at 4:00 PM Palmer Dabbelt  wrote:
>
> On Fri, 12 Aug 2022 15:05:08 PDT (-0700), furq...@rivosinc.com wrote:
> > On Fri, Aug 12, 2022 at 4:04 AM Andrew Jones  
> > wrote:
> >>
> >> On Thu, Aug 11, 2022 at 01:41:04PM -0700, Furquan Shaikh wrote:
> >> > Unlike ARM, RISC-V does not define a separate breakpoint type for
> >> > semihosting. Instead, it is entirely ABI. Thus, we need an option
> >> > to allow users to configure what the ebreak behavior should be for
> >> > different privilege levels - M, S, U, VS, VU. As per the RISC-V
> >> > privilege specification[1], ebreak traps into the execution
> >> > environment. However, RISC-V debug specification[2] provides
> >> > ebreak{m,s,u,vs,vu} configuration bits to allow ebreak behavior to
> >> > be configured to trap into debug mode instead. This change adds
> >> > settable properties for RISC-V CPUs - `ebreakm`, `ebreaks`, `ebreaku`,
> >> > `ebreakvs` and `ebreakvu` to allow user to configure whether qemu
> >> > should treat ebreak as semihosting traps or trap according to the
> >> > privilege specification.
> >> >
> >> > [1] 
> >> > https://github.com/riscv/riscv-isa-manual/releases/download/draft-20220723-10eea63/riscv-privileged.pdf
> >> > [2] 
> >> > https://github.com/riscv/riscv-debug-spec/blob/release/riscv-debug-release.pdf
> >> >
> >> > Signed-off-by: Furquan Shaikh 
> >> > ---
> >> >  target/riscv/cpu.c|  8 
> >> >  target/riscv/cpu.h|  7 +++
> >> >  target/riscv/cpu_helper.c | 26 +-
> >> >  3 files changed, 40 insertions(+), 1 deletion(-)
> >> >
> >> > diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
> >> > index ac6f82ebd0..082194652b 100644
> >> > --- a/target/riscv/cpu.c
> >> > +++ b/target/riscv/cpu.c
> >> > @@ -953,6 +953,14 @@ static Property riscv_cpu_properties[] = {
> >> >  DEFINE_PROP_BOOL("short-isa-string", RISCVCPU,
> >> > cfg.short_isa_string, false),
> >> >
> >> >  DEFINE_PROP_BOOL("rvv_ta_all_1s", RISCVCPU, cfg.rvv_ta_all_1s, 
> >> > false),
> >> > +
> >> > +/* Debug spec */
> >> > +DEFINE_PROP_BOOL("ebreakm", RISCVCPU, cfg.ebreakm, true),
> >> > +DEFINE_PROP_BOOL("ebreaks", RISCVCPU, cfg.ebreaks, false),
> >> > +DEFINE_PROP_BOOL("ebreaku", RISCVCPU, cfg.ebreaku, false),
> >> > +DEFINE_PROP_BOOL("ebreakvs", RISCVCPU, cfg.ebreakvs, false),
> >> > +DEFINE_PROP_BOOL("ebreakvu", RISCVCPU, cfg.ebreakvu, false),
> >> > +
> >> >  DEFINE_PROP_END_OF_LIST(),
> >> >  };
> >> >
> >> > diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
> >> > index 5c7acc055a..eee8e487a6 100644
> >> > --- a/target/riscv/cpu.h
> >> > +++ b/target/riscv/cpu.h
> >> > @@ -454,6 +454,13 @@ struct RISCVCPUConfig {
> >> >  bool epmp;
> >> >  bool aia;
> >> >  bool debug;
> >> > +
> >> > +/* Debug spec */
> >> > +bool ebreakm;
> >> > +bool ebreaks;
> >> > +bool ebreaku;
> >> > +bool ebreakvs;
> >> > +bool ebreakvu;
> >>
> >> There's only five of these, so having each separate probably makes the
> >> most sense, but I wanted to point out that we could keep the properties
> >> independent booleans, as we should, but still consolidate the values
> >> into a single bitmap like we did for the sve vq bitmap for arm (see
> >> cpu_arm_get/set_vq). Maybe worth considering?
> >
> > Thanks for the review and feedback, Andrew! I gave your suggestion a
> > try and updated the independent booleans to a single bitmap. It works,
> > but I am not sure if we really need all that additional code for this.
> > Like you mentioned, it is just five of these and having independent
> > booleans isn't too bad. If you or others feel strongly about switching
> > this to a bitmap, I can push a revised patchset. Else, I will keep the
> > change as is.
> >
> >>
> >> >  uint64_t resetvec;
> >> >
> >> >  bool short_isa_string;
> >> > diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
> >> > index 59b3680b1b..be09abbe27 100644
> >> > --- a/target/riscv/cpu_helper.c
> >> > +++ b/target/riscv/cpu_helper.c
> >> > @@ -1314,6 +1314,30 @@ bool riscv_cpu_tlb_fill(CPUState *cs, vaddr
> >> > address, int size,
> >> >
> >> >  return true;
> >> >  }
> >> > +
> >> > +static bool semihosting_enabled(RISCVCPU *cpu)
> >> > +{
> >> > +CPURISCVState *env = >env;
> >> > +
> >> > +switch (env->priv) {
> >> > +case PRV_M:
> >> > +return cpu->cfg.ebreakm;
> >> > +case PRV_S:
> >> > +if (riscv_cpu_virt_enabled(env)) {
> >> > +return cpu->cfg.ebreakvs;
> >> > +} else {
> >> > +return cpu->cfg.ebreaks;
> >> > +}
> >> > +case PRV_U:
> >> > +if (riscv_cpu_virt_enabled(env)) {
> >> > +return cpu->cfg.ebreakvu;
> >> > +} else {
> >> > +return cpu->cfg.ebreaku;
> >> > +}
> >> > +}
> >> > +
> >> > +return false;
> >> > +}
> >> >  #endif /* !CONFIG_USER_ONLY */
> >> >
> >> >  /*
> >> > @@ -1342,7 +1366,7 @@ void 

Re: [PATCH] riscv: Make semihosting configurable for all privilege modes

2022-08-12 Thread Richard Henderson

On 8/11/22 13:41, Furquan Shaikh wrote:

Unlike ARM, RISC-V does not define a separate breakpoint type for
semihosting. Instead, it is entirely ABI. Thus, we need an option
to allow users to configure what the ebreak behavior should be for
different privilege levels - M, S, U, VS, VU. As per the RISC-V
privilege specification[1], ebreak traps into the execution
environment. However, RISC-V debug specification[2] provides
ebreak{m,s,u,vs,vu} configuration bits to allow ebreak behavior to
be configured to trap into debug mode instead. This change adds
settable properties for RISC-V CPUs - `ebreakm`, `ebreaks`, `ebreaku`,
`ebreakvs` and `ebreakvu` to allow user to configure whether qemu
should treat ebreak as semihosting traps or trap according to the
privilege specification.

[1] 
https://github.com/riscv/riscv-isa-manual/releases/download/draft-20220723-10eea63/riscv-privileged.pdf
[2] 
https://github.com/riscv/riscv-debug-spec/blob/release/riscv-debug-release.pdf


I don't see why you need to change anything at all.

Semihosting doesn't only use 'ebreak', but a sequence of 3 insns:

slli x0, x0, 0x1f   # 0x01f01013Entry NOP
ebreak  # 0x00100073Break to debugger
srai x0, x0, 7  # 0x40705013NOP encoding the semihosting call 
number 7

If the -semihosting command-line argument is absent, then the new DSCR fields apply as 
normal.  If the -semihosting command-line argument is present, and the ebreak is not 
surrounded by the required nops, then the new DSCR fields apply as normal.  But if the 
command-line argument is present and the nops are present, then semihosting overrides the 
architecture and DSCR does not apply at all.



r~



[PATCH v2] riscv: Make semihosting configurable for all privilege modes

2022-08-12 Thread Furquan Shaikh
Unlike ARM, RISC-V does not define a separate breakpoint type for
semihosting. Instead, it is entirely ABI. Thus, we need an option
to allow users to configure what the ebreak behavior should be for
different privilege levels - M, S, U, VS, VU. As per the RISC-V
privilege specification[1], ebreak traps into the execution
environment. However, RISC-V debug specification[2] provides
ebreak{m,s,u,vs,vu} configuration bits to allow ebreak behavior to
be configured to trap into debug mode instead. This change adds
settable properties for RISC-V CPUs - `ebreakm`, `ebreaks`, `ebreaku`,
`ebreakvs` and `ebreakvu` to allow user to configure whether qemu
should treat ebreak as semihosting traps or trap according to the
privilege specification.

[1] 
https://github.com/riscv/riscv-isa-manual/releases/download/draft-20220723-10eea63/riscv-privileged.pdf
[2] 
https://github.com/riscv/riscv-debug-spec/blob/release/riscv-debug-release.pdf

Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Andrew Jones 
Signed-off-by: Furquan Shaikh 
---
 v2: Updated qemu-options.hx to document the ebreak options.
 Retained Reviewed-by from Philippe and Andrew since no
 functional change in this version.

 qemu-options.hx   |  4 +++-
 target/riscv/cpu.c|  8 
 target/riscv/cpu.h|  7 +++
 target/riscv/cpu_helper.c | 26 +-
 4 files changed, 43 insertions(+), 2 deletions(-)

diff --git a/qemu-options.hx b/qemu-options.hx
index 3f23a42fa8..1e2e153946 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -4635,7 +4635,9 @@ SRST
 open/read/write/seek/select. Tensilica baremetal libc for ISS and
 linux platform "sim" use this interface.

-On RISC-V this implements the standard semihosting API, version 0.2.
+On RISC-V this implements the standard semihosting API, version 0.2. See
+the ebreak{m,s,u,vs,vu} CPU properties to control which modes treat
+breakpoints as semihosting calls.

 ``target=native|gdb|auto``
 Defines where the semihosting calls will be addressed, to QEMU
diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index ac6f82ebd0..082194652b 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -953,6 +953,14 @@ static Property riscv_cpu_properties[] = {
 DEFINE_PROP_BOOL("short-isa-string", RISCVCPU,
cfg.short_isa_string, false),

 DEFINE_PROP_BOOL("rvv_ta_all_1s", RISCVCPU, cfg.rvv_ta_all_1s, false),
+
+/* Debug spec */
+DEFINE_PROP_BOOL("ebreakm", RISCVCPU, cfg.ebreakm, true),
+DEFINE_PROP_BOOL("ebreaks", RISCVCPU, cfg.ebreaks, false),
+DEFINE_PROP_BOOL("ebreaku", RISCVCPU, cfg.ebreaku, false),
+DEFINE_PROP_BOOL("ebreakvs", RISCVCPU, cfg.ebreakvs, false),
+DEFINE_PROP_BOOL("ebreakvu", RISCVCPU, cfg.ebreakvu, false),
+
 DEFINE_PROP_END_OF_LIST(),
 };

diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 5c7acc055a..eee8e487a6 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -454,6 +454,13 @@ struct RISCVCPUConfig {
 bool epmp;
 bool aia;
 bool debug;
+
+/* Debug spec */
+bool ebreakm;
+bool ebreaks;
+bool ebreaku;
+bool ebreakvs;
+bool ebreakvu;
 uint64_t resetvec;

 bool short_isa_string;
diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
index 59b3680b1b..be09abbe27 100644
--- a/target/riscv/cpu_helper.c
+++ b/target/riscv/cpu_helper.c
@@ -1314,6 +1314,30 @@ bool riscv_cpu_tlb_fill(CPUState *cs, vaddr
address, int size,

 return true;
 }
+
+static bool semihosting_enabled(RISCVCPU *cpu)
+{
+CPURISCVState *env = >env;
+
+switch (env->priv) {
+case PRV_M:
+return cpu->cfg.ebreakm;
+case PRV_S:
+if (riscv_cpu_virt_enabled(env)) {
+return cpu->cfg.ebreakvs;
+} else {
+return cpu->cfg.ebreaks;
+}
+case PRV_U:
+if (riscv_cpu_virt_enabled(env)) {
+return cpu->cfg.ebreakvu;
+} else {
+return cpu->cfg.ebreaku;
+}
+}
+
+return false;
+}
 #endif /* !CONFIG_USER_ONLY */

 /*
@@ -1342,7 +1366,7 @@ void riscv_cpu_do_interrupt(CPUState *cs)
 target_ulong mtval2 = 0;

 if  (cause == RISCV_EXCP_SEMIHOST) {
-if (env->priv >= PRV_S) {
+if (semihosting_enabled(cpu)) {
 do_common_semihosting(cs);
 env->pc += 4;
 return;
--
2.34.1



Re: [PATCH] riscv: Make semihosting configurable for all privilege modes

2022-08-12 Thread Furquan Shaikh
On Fri, Aug 12, 2022 at 4:28 AM Peter Maydell  wrote:
>
> On Thu, 11 Aug 2022 at 21:47, Furquan Shaikh  wrote:
> >
> > Unlike ARM, RISC-V does not define a separate breakpoint type for
> > semihosting. Instead, it is entirely ABI. Thus, we need an option
> > to allow users to configure what the ebreak behavior should be for
> > different privilege levels - M, S, U, VS, VU. As per the RISC-V
> > privilege specification[1], ebreak traps into the execution
> > environment. However, RISC-V debug specification[2] provides
> > ebreak{m,s,u,vs,vu} configuration bits to allow ebreak behavior to
> > be configured to trap into debug mode instead. This change adds
> > settable properties for RISC-V CPUs - `ebreakm`, `ebreaks`, `ebreaku`,
> > `ebreakvs` and `ebreakvu` to allow user to configure whether qemu
> > should treat ebreak as semihosting traps or trap according to the
> > privilege specification.
> >
> > [1] 
> > https://github.com/riscv/riscv-isa-manual/releases/download/draft-20220723-10eea63/riscv-privileged.pdf
> > [2] 
> > https://github.com/riscv/riscv-debug-spec/blob/release/riscv-debug-release.pdf
>
> As a general rule we don't allow userspace to make semihosting
> calls, as a (rather weak) attempt at fencing off unprivileged
> guest code from being able to scribble all over the host
> filesystem. We should try to be consistent across architectures
> about that, and in particular about how we enable it.
>
> I have a half-finished patchset where I was planning to add
> a --semihosting-config userspace-enable=on option or similar
> to that effect.
>
> It sounds like these ebreak bits are somewhat architectural,
> so maybe they make sense as a riscv specific thing, but we
> should consider how they ought to interact with the general
> behaviour of semihosting. As it stands in QEMU today, we
> (at least in theory) ought not to permit userspace to make
> semihosting ebreak calls at all I think.

Thanks for the feedback, Peter. IIUC, the option that you are planning
to add "--semihosting-config userspace-enable=on" would allow
userspace to make semihosting calls on all architectures. The ebreak
bits here are risc-v specific and hence this change adds it as a
property for RISC-V CPUs. I agree with you that we should be
consistent about how we enable semihosting behavior within qemu.
Without seeing more details of how the config you plan to implement
works, it might be difficult to say how both these fit together. But,
just based on the above understanding, I am thinking we can use the
generic semihosting-config as the top-level config to allow
non-supervisor semihosting. Once your change is ready, we can apply
the additional restriction that these RISC-V ebreak bits for user mode
be settable only if the top-level config is enabled. There are other
ways of handling this too. I am new here and still learning, so open
to any ideas that might make this integration easier.

>
> thanks
> -- PMM



Re: [PATCH] riscv: Make semihosting configurable for all privilege modes

2022-08-12 Thread Palmer Dabbelt

On Fri, 12 Aug 2022 15:05:08 PDT (-0700), furq...@rivosinc.com wrote:

On Fri, Aug 12, 2022 at 4:04 AM Andrew Jones  wrote:


On Thu, Aug 11, 2022 at 01:41:04PM -0700, Furquan Shaikh wrote:
> Unlike ARM, RISC-V does not define a separate breakpoint type for
> semihosting. Instead, it is entirely ABI. Thus, we need an option
> to allow users to configure what the ebreak behavior should be for
> different privilege levels - M, S, U, VS, VU. As per the RISC-V
> privilege specification[1], ebreak traps into the execution
> environment. However, RISC-V debug specification[2] provides
> ebreak{m,s,u,vs,vu} configuration bits to allow ebreak behavior to
> be configured to trap into debug mode instead. This change adds
> settable properties for RISC-V CPUs - `ebreakm`, `ebreaks`, `ebreaku`,
> `ebreakvs` and `ebreakvu` to allow user to configure whether qemu
> should treat ebreak as semihosting traps or trap according to the
> privilege specification.
>
> [1] 
https://github.com/riscv/riscv-isa-manual/releases/download/draft-20220723-10eea63/riscv-privileged.pdf
> [2] 
https://github.com/riscv/riscv-debug-spec/blob/release/riscv-debug-release.pdf
>
> Signed-off-by: Furquan Shaikh 
> ---
>  target/riscv/cpu.c|  8 
>  target/riscv/cpu.h|  7 +++
>  target/riscv/cpu_helper.c | 26 +-
>  3 files changed, 40 insertions(+), 1 deletion(-)
>
> diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
> index ac6f82ebd0..082194652b 100644
> --- a/target/riscv/cpu.c
> +++ b/target/riscv/cpu.c
> @@ -953,6 +953,14 @@ static Property riscv_cpu_properties[] = {
>  DEFINE_PROP_BOOL("short-isa-string", RISCVCPU,
> cfg.short_isa_string, false),
>
>  DEFINE_PROP_BOOL("rvv_ta_all_1s", RISCVCPU, cfg.rvv_ta_all_1s, false),
> +
> +/* Debug spec */
> +DEFINE_PROP_BOOL("ebreakm", RISCVCPU, cfg.ebreakm, true),
> +DEFINE_PROP_BOOL("ebreaks", RISCVCPU, cfg.ebreaks, false),
> +DEFINE_PROP_BOOL("ebreaku", RISCVCPU, cfg.ebreaku, false),
> +DEFINE_PROP_BOOL("ebreakvs", RISCVCPU, cfg.ebreakvs, false),
> +DEFINE_PROP_BOOL("ebreakvu", RISCVCPU, cfg.ebreakvu, false),
> +
>  DEFINE_PROP_END_OF_LIST(),
>  };
>
> diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
> index 5c7acc055a..eee8e487a6 100644
> --- a/target/riscv/cpu.h
> +++ b/target/riscv/cpu.h
> @@ -454,6 +454,13 @@ struct RISCVCPUConfig {
>  bool epmp;
>  bool aia;
>  bool debug;
> +
> +/* Debug spec */
> +bool ebreakm;
> +bool ebreaks;
> +bool ebreaku;
> +bool ebreakvs;
> +bool ebreakvu;

There's only five of these, so having each separate probably makes the
most sense, but I wanted to point out that we could keep the properties
independent booleans, as we should, but still consolidate the values
into a single bitmap like we did for the sve vq bitmap for arm (see
cpu_arm_get/set_vq). Maybe worth considering?


Thanks for the review and feedback, Andrew! I gave your suggestion a
try and updated the independent booleans to a single bitmap. It works,
but I am not sure if we really need all that additional code for this.
Like you mentioned, it is just five of these and having independent
booleans isn't too bad. If you or others feel strongly about switching
this to a bitmap, I can push a revised patchset. Else, I will keep the
change as is.



>  uint64_t resetvec;
>
>  bool short_isa_string;
> diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
> index 59b3680b1b..be09abbe27 100644
> --- a/target/riscv/cpu_helper.c
> +++ b/target/riscv/cpu_helper.c
> @@ -1314,6 +1314,30 @@ bool riscv_cpu_tlb_fill(CPUState *cs, vaddr
> address, int size,
>
>  return true;
>  }
> +
> +static bool semihosting_enabled(RISCVCPU *cpu)
> +{
> +CPURISCVState *env = >env;
> +
> +switch (env->priv) {
> +case PRV_M:
> +return cpu->cfg.ebreakm;
> +case PRV_S:
> +if (riscv_cpu_virt_enabled(env)) {
> +return cpu->cfg.ebreakvs;
> +} else {
> +return cpu->cfg.ebreaks;
> +}
> +case PRV_U:
> +if (riscv_cpu_virt_enabled(env)) {
> +return cpu->cfg.ebreakvu;
> +} else {
> +return cpu->cfg.ebreaku;
> +}
> +}
> +
> +return false;
> +}
>  #endif /* !CONFIG_USER_ONLY */
>
>  /*
> @@ -1342,7 +1366,7 @@ void riscv_cpu_do_interrupt(CPUState *cs)
>  target_ulong mtval2 = 0;
>
>  if  (cause == RISCV_EXCP_SEMIHOST) {
> -if (env->priv >= PRV_S) {
> +if (semihosting_enabled(cpu)) {
>  do_common_semihosting(cs);
>  env->pc += 4;
>  return;
> --
> 2.34.1
>

Bitmap or no bitmap,

Reviewed-by: Andrew Jones 


Reviewed-by: Palmer Dabbelt 

Also no strong feelings on the bitmap, but I do like the feature.  When 
using an external JTAG debugger to talk to a this would probably be 
configured via something like GDB's monitor commands.  We could probably 
hook this up to QEMU's monitor as well, but doing it this way 

Re: [PATCH for-7.2 v2 10/20] hw/ppc: set machine->fdt in spapr machine

2022-08-12 Thread Daniel Henrique Barboza




On 8/8/22 00:26, David Gibson wrote:

On Fri, Aug 05, 2022 at 06:39:38AM -0300, Daniel Henrique Barboza wrote:

The pSeries machine never bothered with the common machine->fdt
attribute. We do all the FDT related work using spapr->fdt_blob.

We're going to introduce HMP commands to read and save the FDT, which
will rely on setting machine->fdt properly to work across all machine
archs/types.

Let's set machine->fdt in the two places where we manipulate the FDT:
spapr_machine_reset() and CAS. spapr->fdt_blob is left untouched: what
we want is a way to access the FDT from HMP, not replace
spapr->fdt_blob.


Given there is now an fdt field in the generic MACHINE structure, we
should be able to remove the one in spapr->fdt_blob, yes?


I thought about it but I backed down when I realized that spapr->fdt_blob is
being migrated.

At first glance it would be a matter of migrating ms->fdt and then removing
spapr->fdt_blob, but then I got confused about how a migration to an older
version would occur (e.g. QEMU 7.2 with ms->fdt to a QEMU 7.0 with
spapr->fdt_blob).

Migration to a newer QEMU would require us to move the spapr->version_id to 4
and then handle the old version accordingly in spapr_post_load().

Probably something to think about after this work is accepted.


Thanks,


Daniel







Cc: Cédric Le Goater 
Cc: qemu-...@nongnu.org
Signed-off-by: Daniel Henrique Barboza 
---
  hw/ppc/spapr.c   | 6 ++
  hw/ppc/spapr_hcall.c | 8 
  2 files changed, 14 insertions(+)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index bc9ba6e6dc..94c90f0351 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1713,6 +1713,12 @@ static void spapr_machine_reset(MachineState *machine)
  spapr->fdt_initial_size = spapr->fdt_size;
  spapr->fdt_blob = fdt;
  
+/*

+ * Set the common machine->fdt pointer to enable support
+ * for 'dumpdtb' and 'info fdt' commands.
+ */
+machine->fdt = fdt;
+
  /* Set up the entry state */
  first_ppc_cpu->env.gpr[5] = 0;
  
diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c

index a8d4a6bcf0..0079bc6fdc 100644
--- a/hw/ppc/spapr_hcall.c
+++ b/hw/ppc/spapr_hcall.c
@@ -1256,6 +1256,14 @@ target_ulong do_client_architecture_support(PowerPCCPU 
*cpu,
  spapr->fdt_initial_size = spapr->fdt_size;
  spapr->fdt_blob = fdt;
  
+/*

+ * Set the machine->fdt pointer again since we just freed
+ * it above (by freeing spapr->fdt_blob). We set this
+ * pointer to enable support for 'dumpdtb' and 'info fdt'
+ * HMP commands.
+ */
+MACHINE(spapr)->fdt = fdt;
+
  return H_SUCCESS;
  }
  






Re: [PATCH] riscv: Make semihosting configurable for all privilege modes

2022-08-12 Thread Furquan Shaikh
On Fri, Aug 12, 2022 at 4:04 AM Andrew Jones  wrote:
>
> On Thu, Aug 11, 2022 at 01:41:04PM -0700, Furquan Shaikh wrote:
> > Unlike ARM, RISC-V does not define a separate breakpoint type for
> > semihosting. Instead, it is entirely ABI. Thus, we need an option
> > to allow users to configure what the ebreak behavior should be for
> > different privilege levels - M, S, U, VS, VU. As per the RISC-V
> > privilege specification[1], ebreak traps into the execution
> > environment. However, RISC-V debug specification[2] provides
> > ebreak{m,s,u,vs,vu} configuration bits to allow ebreak behavior to
> > be configured to trap into debug mode instead. This change adds
> > settable properties for RISC-V CPUs - `ebreakm`, `ebreaks`, `ebreaku`,
> > `ebreakvs` and `ebreakvu` to allow user to configure whether qemu
> > should treat ebreak as semihosting traps or trap according to the
> > privilege specification.
> >
> > [1] 
> > https://github.com/riscv/riscv-isa-manual/releases/download/draft-20220723-10eea63/riscv-privileged.pdf
> > [2] 
> > https://github.com/riscv/riscv-debug-spec/blob/release/riscv-debug-release.pdf
> >
> > Signed-off-by: Furquan Shaikh 
> > ---
> >  target/riscv/cpu.c|  8 
> >  target/riscv/cpu.h|  7 +++
> >  target/riscv/cpu_helper.c | 26 +-
> >  3 files changed, 40 insertions(+), 1 deletion(-)
> >
> > diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
> > index ac6f82ebd0..082194652b 100644
> > --- a/target/riscv/cpu.c
> > +++ b/target/riscv/cpu.c
> > @@ -953,6 +953,14 @@ static Property riscv_cpu_properties[] = {
> >  DEFINE_PROP_BOOL("short-isa-string", RISCVCPU,
> > cfg.short_isa_string, false),
> >
> >  DEFINE_PROP_BOOL("rvv_ta_all_1s", RISCVCPU, cfg.rvv_ta_all_1s, false),
> > +
> > +/* Debug spec */
> > +DEFINE_PROP_BOOL("ebreakm", RISCVCPU, cfg.ebreakm, true),
> > +DEFINE_PROP_BOOL("ebreaks", RISCVCPU, cfg.ebreaks, false),
> > +DEFINE_PROP_BOOL("ebreaku", RISCVCPU, cfg.ebreaku, false),
> > +DEFINE_PROP_BOOL("ebreakvs", RISCVCPU, cfg.ebreakvs, false),
> > +DEFINE_PROP_BOOL("ebreakvu", RISCVCPU, cfg.ebreakvu, false),
> > +
> >  DEFINE_PROP_END_OF_LIST(),
> >  };
> >
> > diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
> > index 5c7acc055a..eee8e487a6 100644
> > --- a/target/riscv/cpu.h
> > +++ b/target/riscv/cpu.h
> > @@ -454,6 +454,13 @@ struct RISCVCPUConfig {
> >  bool epmp;
> >  bool aia;
> >  bool debug;
> > +
> > +/* Debug spec */
> > +bool ebreakm;
> > +bool ebreaks;
> > +bool ebreaku;
> > +bool ebreakvs;
> > +bool ebreakvu;
>
> There's only five of these, so having each separate probably makes the
> most sense, but I wanted to point out that we could keep the properties
> independent booleans, as we should, but still consolidate the values
> into a single bitmap like we did for the sve vq bitmap for arm (see
> cpu_arm_get/set_vq). Maybe worth considering?

Thanks for the review and feedback, Andrew! I gave your suggestion a
try and updated the independent booleans to a single bitmap. It works,
but I am not sure if we really need all that additional code for this.
Like you mentioned, it is just five of these and having independent
booleans isn't too bad. If you or others feel strongly about switching
this to a bitmap, I can push a revised patchset. Else, I will keep the
change as is.

>
> >  uint64_t resetvec;
> >
> >  bool short_isa_string;
> > diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
> > index 59b3680b1b..be09abbe27 100644
> > --- a/target/riscv/cpu_helper.c
> > +++ b/target/riscv/cpu_helper.c
> > @@ -1314,6 +1314,30 @@ bool riscv_cpu_tlb_fill(CPUState *cs, vaddr
> > address, int size,
> >
> >  return true;
> >  }
> > +
> > +static bool semihosting_enabled(RISCVCPU *cpu)
> > +{
> > +CPURISCVState *env = >env;
> > +
> > +switch (env->priv) {
> > +case PRV_M:
> > +return cpu->cfg.ebreakm;
> > +case PRV_S:
> > +if (riscv_cpu_virt_enabled(env)) {
> > +return cpu->cfg.ebreakvs;
> > +} else {
> > +return cpu->cfg.ebreaks;
> > +}
> > +case PRV_U:
> > +if (riscv_cpu_virt_enabled(env)) {
> > +return cpu->cfg.ebreakvu;
> > +} else {
> > +return cpu->cfg.ebreaku;
> > +}
> > +}
> > +
> > +return false;
> > +}
> >  #endif /* !CONFIG_USER_ONLY */
> >
> >  /*
> > @@ -1342,7 +1366,7 @@ void riscv_cpu_do_interrupt(CPUState *cs)
> >  target_ulong mtval2 = 0;
> >
> >  if  (cause == RISCV_EXCP_SEMIHOST) {
> > -if (env->priv >= PRV_S) {
> > +if (semihosting_enabled(cpu)) {
> >  do_common_semihosting(cs);
> >  env->pc += 4;
> >  return;
> > --
> > 2.34.1
> >
>
> Bitmap or no bitmap,
>
> Reviewed-by: Andrew Jones 



Re: [PATCH for-7.2 v2 01/20] hw/arm: do not free machine->fdt in arm_load_dtb()

2022-08-12 Thread Daniel Henrique Barboza

David,

On 8/8/22 00:23, David Gibson wrote:

On Fri, Aug 05, 2022 at 06:39:29AM -0300, Daniel Henrique Barboza wrote:

At this moment, arm_load_dtb() can free machine->fdt when
binfo->dtb_filename is NULL. If there's no 'dtb_filename', 'fdt' will be
retrieved by binfo->get_dtb(). If get_dtb() returns machine->fdt, as is
the case of machvirt_dtb() from hw/arm/virt.c, fdt now has a pointer to
machine->fdt. And, in that case, the existing g_free(fdt) at the end of
arm_load_dtb() will make machine->fdt point to an invalid memory region.

This is not an issue right now because there's no code that access
machine->fdt after arm_load_dtb(), but we're going to add a couple do
FDT HMP commands that will rely on machine->fdt being valid.

Instead of freeing 'fdt' at the end of arm_load_dtb(), assign it to
machine->fdt. This will allow the FDT of ARM machines that relies on
arm_load_dtb() to be accessed later on.

Since all ARM machines allocates the FDT only once, we don't need to
worry about leaking the existing FDT during a machine reset (which is
something that other machines have to look after, e.g. the ppc64 pSeries
machine).

Cc: Peter Maydell 
Cc: qemu-...@nongnu.org
Signed-off-by: Daniel Henrique Barboza 
---
  hw/arm/boot.c | 8 +++-
  1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/hw/arm/boot.c b/hw/arm/boot.c
index ada2717f76..9f5ceb62d2 100644
--- a/hw/arm/boot.c
+++ b/hw/arm/boot.c
@@ -684,7 +684,13 @@ int arm_load_dtb(hwaddr addr, const struct arm_boot_info 
*binfo,
   */
  rom_add_blob_fixed_as("dtb", fdt, size, addr, as);
  
-g_free(fdt);

+/*
+ * Update the ms->fdt pointer to enable support for 'dumpdtb'
+ * and 'info fdt' commands. Use fdt_pack() to shrink the blob
+ * size we're going to store.
+ */
+fdt_pack(fdt);
+ms->fdt = fdt;
  
  return size;


fdt_pack() could change (reduce) the effective size of the dtb blob,
so returning a 'size' value from above rather than the new value of
fdt_totalsize(fdt) doesn't see right.


After some thought I think executing fdt_pack() like I'm doing here is not
a good idea. The first problem is that I'm not returning the updated size,
as you've said.

But I can't just amend a 'return fdt_totalsize(fdt);' either. I'm packing the
FDT **after** the machine store it in the guest physical memory. If I return
the packed size, but the machine isn't packing the FDT before a
cpu_physical_memory_write(), I'll be under-reporting the FDT size written.

Machines such as e500 (patch 4) uses this returned value to put more stuff in
the guest memory. In that case, returning a smaller size that what was actually
written can cause the machine to overwrite the FDT by accident. In fact, only
a handful of machines (ppc/pseries, ppc/pvn, riscv, oepenrisc) is doing a
fdt_pack() before a cpu_physical_memory_write(). So this change would be
potentially harmful to a lot of people.

One alternative would be to do a fdt_pack() before the machine writes the
FDT in the guest memory, but that is too intrusive to do because I can't say
if each of these machines will be OK with that. I have a hunch that it would
be OK, but a hunch isn't going to cut it.

I'll just drop the fdt_pack() for each of these patches. If the machine code
is already packing it, fine. If not, that's fine too. Each maintainer can
assess whether packing the FDT is worth it or not.


Thanks,


Daniel




I believe some of the other patches in the series have similar concerns.





[PATCH] can: fix Xilinx ZynqMP CAN RX FIFO logic

2022-08-12 Thread Anton Kochkov
Function "update_rx_fifo()" should operate on the RX FIFO
registers, not the TX FIFO ones.

Signed-off-by: Anton Kochkov 
Resolves: https://gitlab.com/qemu-projects/qemu/-/issues/1123
---
 hw/net/can/xlnx-zynqmp-can.c | 32 
 1 file changed, 16 insertions(+), 16 deletions(-)

diff --git a/hw/net/can/xlnx-zynqmp-can.c b/hw/net/can/xlnx-zynqmp-can.c
index 82ac48cee2..e93e6c5e19 100644
--- a/hw/net/can/xlnx-zynqmp-can.c
+++ b/hw/net/can/xlnx-zynqmp-can.c
@@ -696,30 +696,30 @@ static void update_rx_fifo(XlnxZynqMPCANState *s, const 
qemu_can_frame *frame)
timestamp));

 /* First 32 bit of the data. */
-fifo32_push(>rx_fifo, deposit32(0, R_TXFIFO_DATA1_DB3_SHIFT,
-   R_TXFIFO_DATA1_DB3_LENGTH,
+fifo32_push(>rx_fifo, deposit32(0, R_RXFIFO_DATA1_DB3_SHIFT,
+   R_RXFIFO_DATA1_DB3_LENGTH,
frame->data[0]) |
- deposit32(0, R_TXFIFO_DATA1_DB2_SHIFT,
-   R_TXFIFO_DATA1_DB2_LENGTH,
+ deposit32(0, R_RXFIFO_DATA1_DB2_SHIFT,
+   R_RXFIFO_DATA1_DB2_LENGTH,
frame->data[1]) |
- deposit32(0, R_TXFIFO_DATA1_DB1_SHIFT,
-   R_TXFIFO_DATA1_DB1_LENGTH,
+ deposit32(0, R_RXFIFO_DATA1_DB1_SHIFT,
+   R_RXFIFO_DATA1_DB1_LENGTH,
frame->data[2]) |
- deposit32(0, R_TXFIFO_DATA1_DB0_SHIFT,
-   R_TXFIFO_DATA1_DB0_LENGTH,
+ deposit32(0, R_RXFIFO_DATA1_DB0_SHIFT,
+   R_RXFIFO_DATA1_DB0_LENGTH,
frame->data[3]));
 /* Last 32 bit of the data. */
-fifo32_push(>rx_fifo, deposit32(0, R_TXFIFO_DATA2_DB7_SHIFT,
-   R_TXFIFO_DATA2_DB7_LENGTH,
+fifo32_push(>rx_fifo, deposit32(0, R_RXFIFO_DATA2_DB7_SHIFT,
+   R_RXFIFO_DATA2_DB7_LENGTH,
frame->data[4]) |
- deposit32(0, R_TXFIFO_DATA2_DB6_SHIFT,
-   R_TXFIFO_DATA2_DB6_LENGTH,
+ deposit32(0, R_RXFIFO_DATA2_DB6_SHIFT,
+   R_RXFIFO_DATA2_DB6_LENGTH,
frame->data[5]) |
- deposit32(0, R_TXFIFO_DATA2_DB5_SHIFT,
-   R_TXFIFO_DATA2_DB5_LENGTH,
+ deposit32(0, R_RXFIFO_DATA2_DB5_SHIFT,
+   R_RXFIFO_DATA2_DB5_LENGTH,
frame->data[6]) |
- deposit32(0, R_TXFIFO_DATA2_DB4_SHIFT,
-   R_TXFIFO_DATA2_DB4_LENGTH,
+ deposit32(0, R_RXFIFO_DATA2_DB4_SHIFT,
+   R_RXFIFO_DATA2_DB4_LENGTH,
frame->data[7]));

 ARRAY_FIELD_DP32(s->regs, INTERRUPT_STATUS_REGISTER, RXOK, 1);
--
2.37.1





Re: [PULL 0/5] target-arm queue

2022-08-12 Thread Richard Henderson

On 8/12/22 04:45, Peter Maydell wrote:

This pullreq has:
  * two arm bug fixes which fix some "Linux fails to boot" bugs
  * a docs typo-fixing patch
  * a couple of compile failure/warning issues

I think they're all pretty safe and worth having in rc3.

thanks
-- PMM

The following changes since commit a6b1c53e79d08a99a28cc3e67a3e1a7c34102d6b:

   Merge tag 'linux-user-for-7.1-pull-request' of 
https://gitlab.com/laurent_vivier/qemu into staging (2022-08-10 10:26:57 -0700)

are available in the Git repository at:

   https://git.linaro.org/people/pmaydell/qemu-arm.git 
tags/pull-target-arm-20220812

for you to fetch changes up to 4311682ea8293f720730f260e8a7601117d79e65:

   cutils: Add missing dyld(3) include on macOS (2022-08-12 11:33:52 +0100)


target-arm queue:
  * Don't report Statistical Profiling Extension in ID registers
  * virt ACPI tables: Present the GICR structure properly for GICv4
  * Fix some typos in documentation
  * tests/unit: fix a -Wformat-truncation warning
  * cutils: Add missing dyld(3) include on macOS


Applied, thanks.  Please update https://wiki.qemu.org/ChangeLog/7.1 as 
appropriate.


r~





Marc-André Lureau (1):
   tests/unit: fix a -Wformat-truncation warning

Peter Maydell (1):
   target/arm: Don't report Statistical Profiling Extension in ID registers

Philippe Mathieu-Daudé (1):
   cutils: Add missing dyld(3) include on macOS

Stefan Weil (1):
   Fix some typos in documentation (most of them found by codespell)

Zenghui Yu (1):
   hw/arm/virt-acpi-build: Present the GICR structure properly for GICv4

  docs/about/deprecated.rst   |  2 +-
  docs/specs/acpi_erst.rst|  4 ++--
  docs/system/devices/canokey.rst |  8 
  docs/system/devices/cxl.rst | 12 ++--
  hw/arm/virt-acpi-build.c|  4 ++--
  target/arm/cpu.c| 11 +++
  tests/unit/test-qobject-input-visitor.c |  3 +--
  util/cutils.c   |  4 
  util/oslib-posix.c  |  4 
  9 files changed, 31 insertions(+), 21 deletions(-)






Re: [PATCH] target/arm: Rearrange cpu64.c so all the CPU initfns are together

2022-08-12 Thread Richard Henderson

On 8/12/22 10:41, Peter Maydell wrote:

cpu64.c has ended up in a slightly odd order -- it starts with the
initfns for most of the models-real-hardware CPUs; after that comes a
bunch of support code for SVE, SME, pauth and LPA2 properties.  Then
come the initfns for the 'host' and 'max' CPU types, and then after
that one more models-real-hardware CPU initfn, for a64fx.  (This
ordering is partly historical and partly required because a64fx needs
the SVE properties.)

Reorder the file into:
  * CPU property support functions
  * initfns for real hardware CPUs
  * initfns for host and max
  * class boilerplate

Signed-off-by: Peter Maydell
---
I started off thinking this would be a relatively simple "move the
a64fx initfn up to live with the others", but because we effectively
have to move all the cpu initfns the diffstat has ended up quite big.
On the other hand this patch is purely code motion, and the resulting
order in the file does seem to me to be more sensible.
---
  target/arm/cpu64.c | 712 ++---
  1 file changed, 356 insertions(+), 356 deletions(-)


Reviewed-by: Richard Henderson 

r~



[PATCH for-7.2 21/21] accel/tcg: Use DisasContextBase in plugin_gen_tb_start

2022-08-12 Thread Richard Henderson
Use the pc coming from db->pc_first rather than the TB.

Use the cached host_addr rather than re-computing for the
first page.  We still need a separate lookup for the second
page because it won't be computed for DisasContextBase until
the translator actually performs a read from the page.

Signed-off-by: Richard Henderson 
---
 include/exec/plugin-gen.h |  7 ---
 accel/tcg/plugin-gen.c| 23 ---
 accel/tcg/translator.c|  2 +-
 3 files changed, 17 insertions(+), 15 deletions(-)

diff --git a/include/exec/plugin-gen.h b/include/exec/plugin-gen.h
index f92f169739..5004728c61 100644
--- a/include/exec/plugin-gen.h
+++ b/include/exec/plugin-gen.h
@@ -19,7 +19,8 @@ struct DisasContextBase;
 
 #ifdef CONFIG_PLUGIN
 
-bool plugin_gen_tb_start(CPUState *cpu, const TranslationBlock *tb, bool 
supress);
+bool plugin_gen_tb_start(CPUState *cpu, const struct DisasContextBase *db,
+ bool supress);
 void plugin_gen_tb_end(CPUState *cpu);
 void plugin_gen_insn_start(CPUState *cpu, const struct DisasContextBase *db);
 void plugin_gen_insn_end(void);
@@ -48,8 +49,8 @@ static inline void plugin_insn_append(abi_ptr pc, const void 
*from, size_t size)
 
 #else /* !CONFIG_PLUGIN */
 
-static inline
-bool plugin_gen_tb_start(CPUState *cpu, const TranslationBlock *tb, bool 
supress)
+static inline bool
+plugin_gen_tb_start(CPUState *cpu, const struct DisasContextBase *db, bool sup)
 {
 return false;
 }
diff --git a/accel/tcg/plugin-gen.c b/accel/tcg/plugin-gen.c
index 8377c15383..0f080386af 100644
--- a/accel/tcg/plugin-gen.c
+++ b/accel/tcg/plugin-gen.c
@@ -852,7 +852,8 @@ static void plugin_gen_inject(const struct qemu_plugin_tb 
*plugin_tb)
 pr_ops();
 }
 
-bool plugin_gen_tb_start(CPUState *cpu, const TranslationBlock *tb, bool 
mem_only)
+bool plugin_gen_tb_start(CPUState *cpu, const DisasContextBase *db,
+ bool mem_only)
 {
 bool ret = false;
 
@@ -870,9 +871,9 @@ bool plugin_gen_tb_start(CPUState *cpu, const 
TranslationBlock *tb, bool mem_onl
 
 ret = true;
 
-ptb->vaddr = tb->pc;
+ptb->vaddr = db->pc_first;
 ptb->vaddr2 = -1;
-get_page_addr_code_hostp(cpu->env_ptr, tb->pc, true, >haddr1);
+ptb->haddr1 = db->host_addr[0];
 ptb->haddr2 = NULL;
 ptb->mem_only = mem_only;
 
@@ -898,16 +899,16 @@ void plugin_gen_insn_start(CPUState *cpu, const 
DisasContextBase *db)
  * Note that we skip this when haddr1 == NULL, e.g. when we're
  * fetching instructions from a region not backed by RAM.
  */
-if (likely(ptb->haddr1 != NULL && ptb->vaddr2 == -1) &&
-unlikely((db->pc_next & TARGET_PAGE_MASK) !=
- (db->pc_first & TARGET_PAGE_MASK))) {
-get_page_addr_code_hostp(cpu->env_ptr, db->pc_next,
- true, >haddr2);
-ptb->vaddr2 = db->pc_next;
-}
-if (likely(ptb->vaddr2 == -1)) {
+if (ptb->haddr1 == NULL) {
+pinsn->haddr = NULL;
+} else if (is_same_page(db, db->pc_next)) {
 pinsn->haddr = ptb->haddr1 + pinsn->vaddr - ptb->vaddr;
 } else {
+if (ptb->vaddr2 == -1) {
+ptb->vaddr2 = TARGET_PAGE_ALIGN(db->pc_first);
+get_page_addr_code_hostp(cpu->env_ptr, ptb->vaddr2,
+ true, >haddr2);
+}
 pinsn->haddr = ptb->haddr2 + pinsn->vaddr - ptb->vaddr2;
 }
 }
diff --git a/accel/tcg/translator.c b/accel/tcg/translator.c
index a693c17259..3e6fab482e 100644
--- a/accel/tcg/translator.c
+++ b/accel/tcg/translator.c
@@ -81,7 +81,7 @@ void translator_loop(CPUState *cpu, TranslationBlock *tb, int 
max_insns,
 ops->tb_start(db, cpu);
 tcg_debug_assert(db->is_jmp == DISAS_NEXT);  /* no early exit */
 
-plugin_enabled = plugin_gen_tb_start(cpu, tb, cflags & CF_MEMI_ONLY);
+plugin_enabled = plugin_gen_tb_start(cpu, db, cflags & CF_MEMI_ONLY);
 
 while (true) {
 db->num_insns++;
-- 
2.34.1




[PATCH for-7.2 20/21] accel/tcg: Add fast path for translator_ld*

2022-08-12 Thread Richard Henderson
Cache the translation from guest to host address, so we may
use direct loads when we hit on the primary translation page.

Look up the second translation page only once, during translation.
This obviates another lookup of the second page within tb_gen_code
after translation.

Fixes a bug in that plugin_insn_append should be passed the bytes
in the original memory order, not bswapped by pieces.

Signed-off-by: Richard Henderson 
---
 include/exec/translator.h |  52 --
 accel/tcg/translate-all.c |  22 +++-
 accel/tcg/translator.c| 111 +++---
 3 files changed, 135 insertions(+), 50 deletions(-)

diff --git a/include/exec/translator.h b/include/exec/translator.h
index 69db0f5c21..177a001698 100644
--- a/include/exec/translator.h
+++ b/include/exec/translator.h
@@ -81,13 +81,14 @@ typedef enum DisasJumpType {
  * Architecture-agnostic disassembly context.
  */
 typedef struct DisasContextBase {
-const TranslationBlock *tb;
+TranslationBlock *tb;
 target_ulong pc_first;
 target_ulong pc_next;
 DisasJumpType is_jmp;
 int num_insns;
 int max_insns;
 bool singlestep_enabled;
+void *host_addr[2];
 #ifdef CONFIG_USER_ONLY
 /*
  * Guest address of the last byte of the last protected page.
@@ -183,24 +184,43 @@ bool translator_use_goto_tb(DisasContextBase *db, 
target_ulong dest);
  * the relevant information at translation time.
  */
 
-#define GEN_TRANSLATOR_LD(fullname, type, load_fn, swap_fn) \
-type fullname ## _swap(CPUArchState *env, DisasContextBase *dcbase, \
-   abi_ptr pc, bool do_swap);   \
-static inline type fullname(CPUArchState *env,  \
-DisasContextBase *dcbase, abi_ptr pc)   \
-{   \
-return fullname ## _swap(env, dcbase, pc, false);   \
+uint8_t translator_ldub(CPUArchState *env, DisasContextBase *db, abi_ptr pc);
+uint16_t translator_lduw(CPUArchState *env, DisasContextBase *db, abi_ptr pc);
+uint32_t translator_ldl(CPUArchState *env, DisasContextBase *db, abi_ptr pc);
+uint64_t translator_ldq(CPUArchState *env, DisasContextBase *db, abi_ptr pc);
+
+static inline uint16_t
+translator_lduw_swap(CPUArchState *env, DisasContextBase *db,
+ abi_ptr pc, bool do_swap)
+{
+uint16_t ret = translator_lduw(env, db, pc);
+if (do_swap) {
+ret = bswap16(ret);
 }
+return ret;
+}
 
-#define FOR_EACH_TRANSLATOR_LD(F)   \
-F(translator_ldub, uint8_t, cpu_ldub_code, /* no swap */)   \
-F(translator_lduw, uint16_t, cpu_lduw_code, bswap16)\
-F(translator_ldl, uint32_t, cpu_ldl_code, bswap32)  \
-F(translator_ldq, uint64_t, cpu_ldq_code, bswap64)
+static inline uint32_t
+translator_ldl_swap(CPUArchState *env, DisasContextBase *db,
+abi_ptr pc, bool do_swap)
+{
+uint32_t ret = translator_ldl(env, db, pc);
+if (do_swap) {
+ret = bswap32(ret);
+}
+return ret;
+}
 
-FOR_EACH_TRANSLATOR_LD(GEN_TRANSLATOR_LD)
-
-#undef GEN_TRANSLATOR_LD
+static inline uint64_t
+translator_ldq_swap(CPUArchState *env, DisasContextBase *db,
+abi_ptr pc, bool do_swap)
+{
+uint64_t ret = translator_ldq_swap(env, db, pc, false);
+if (do_swap) {
+ret = bswap64(ret);
+}
+return ret;
+}
 
 /*
  * Return whether addr is on the same page as where disassembly started.
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index d52097ab2d..299b068f9c 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -1333,8 +1333,6 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
 {
 CPUArchState *env = cpu->env_ptr;
 TranslationBlock *tb, *existing_tb;
-tb_page_addr_t phys_page2;
-target_ulong virt_page2;
 tcg_insn_unit *gen_code_buf;
 int gen_code_size, search_size, max_insns;
 #ifdef CONFIG_PROFILER
@@ -1374,6 +1372,8 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
 tb->flags = flags;
 tb->cflags = cflags;
 tb->trace_vcpu_dstate = *cpu->trace_dstate;
+tb->page_addr[0] = phys_pc;
+tb->page_addr[1] = -1;
 tcg_ctx->tb_cflags = cflags;
  tb_overflow:
 
@@ -1567,13 +1567,11 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
 }
 
 /*
- * If the TB is not associated with a physical RAM page then
- * it must be a temporary one-insn TB, and we have nothing to do
- * except fill in the page_addr[] fields. Return early before
- * attempting to link to other TBs or add to the lookup table.
+ * If the TB is not associated with a physical RAM page then it must be
+ * a temporary one-insn TB, and we have nothing left to do. Return early
+ * before attempting to link to other TBs or add to the lookup table.
  */
-if (phys_pc == -1) {
-

[PATCH for-7.2 16/21] accel/tcg: Raise PROT_EXEC exception early

2022-08-12 Thread Richard Henderson
We currently ignore PROT_EXEC on the initial lookup, and
defer raising the exception until cpu_ld*_code().
It makes more sense to raise the exception early.

Signed-off-by: Richard Henderson 
---
 accel/tcg/cpu-exec.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index 5278d1837b..6a3ca8224f 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -376,7 +376,7 @@ const void *HELPER(lookup_tb_ptr)(CPUArchState *env)
 cpu_loop_exit(cpu);
 }
 
-phys_pc = get_page_addr_code(env, pc);
+phys_pc = get_page_addr_code_hostp(env, pc, false, NULL);
 if (phys_pc == -1) {
 return tcg_code_gen_epilogue;
 }
@@ -509,7 +509,7 @@ void cpu_exec_step_atomic(CPUState *cpu)
  */
 
 mmap_lock();
-phys_pc = get_page_addr_code_hostp(env, pc, true, _pc);
+phys_pc = get_page_addr_code_hostp(env, pc, false, _pc);
 if (phys_pc == -1) {
 tb = NULL;
 } else {
@@ -983,7 +983,7 @@ int cpu_exec(CPUState *cpu)
 
 mmap_lock();
 phys_pc = get_page_addr_code_hostp(cpu->env_ptr, pc,
-   true, _pc);
+   false, _pc);
 if (phys_pc == -1) {
 tb = NULL;
 } else {
-- 
2.34.1




[PATCH for-7.2 17/21] accel/tcg: Introduce is_same_page()

2022-08-12 Thread Richard Henderson
From: Ilya Leoshkevich 

Introduce a function that checks whether a given address is on the same
page as where disassembly started. Having it improves readability of
the following patches.

Signed-off-by: Ilya Leoshkevich 
Message-Id: <20220811095534.241224-3-...@linux.ibm.com>
Reviewed-by: Richard Henderson 
[rth: Make the DisasContextBase parameter const.]
Signed-off-by: Richard Henderson 
---
 include/exec/translator.h | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/include/exec/translator.h b/include/exec/translator.h
index 7db6845535..0d0bf3a31e 100644
--- a/include/exec/translator.h
+++ b/include/exec/translator.h
@@ -187,4 +187,14 @@ FOR_EACH_TRANSLATOR_LD(GEN_TRANSLATOR_LD)
 
 #undef GEN_TRANSLATOR_LD
 
+/*
+ * Return whether addr is on the same page as where disassembly started.
+ * Translators can use this to enforce the rule that only single-insn
+ * translation blocks are allowed to cross page boundaries.
+ */
+static inline bool is_same_page(const DisasContextBase *db, target_ulong addr)
+{
+return ((addr ^ db->pc_first) & TARGET_PAGE_MASK) == 0;
+}
+
 #endif /* EXEC__TRANSLATOR_H */
-- 
2.34.1




[PATCH for-7.2 15/21] accel/tcg: Hoist get_page_addr_code out of tb_gen_code

2022-08-12 Thread Richard Henderson
Reuse the result that we just used with tb_lookup.
Pass in host_pc while touching these lines, to be used shortly.
We must widen the scope of the mmap_lock, so that the page table
lookup that is finally used is covered by the lock.

Signed-off-by: Richard Henderson 
---
 accel/tcg/internal.h  |  7 ---
 accel/tcg/cpu-exec.c  | 20 
 accel/tcg/translate-all.c |  5 ++---
 3 files changed, 18 insertions(+), 14 deletions(-)

diff --git a/accel/tcg/internal.h b/accel/tcg/internal.h
index 3092bfa964..920d35e8bb 100644
--- a/accel/tcg/internal.h
+++ b/accel/tcg/internal.h
@@ -11,9 +11,10 @@
 
 #include "exec/exec-all.h"
 
-TranslationBlock *tb_gen_code(CPUState *cpu, target_ulong pc,
-  target_ulong cs_base, uint32_t flags,
-  int cflags);
+TranslationBlock *tb_gen_code(CPUState *cpu,
+  tb_page_addr_t phys_pc, void *host_pc,
+  target_ulong pc, target_ulong cs_base,
+  uint32_t flags, int cflags);
 G_NORETURN void cpu_io_recompile(CPUState *cpu, uintptr_t retaddr);
 void page_init(void);
 void tb_htable_init(void);
diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index 889355b341..5278d1837b 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -485,6 +485,7 @@ void cpu_exec_step_atomic(CPUState *cpu)
 target_ulong cs_base, pc;
 uint32_t flags, cflags;
 tb_page_addr_t phys_pc;
+void *host_pc;
 int tb_exit;
 
 if (sigsetjmp(cpu->jmp_env, 0) == 0) {
@@ -507,17 +508,17 @@ void cpu_exec_step_atomic(CPUState *cpu)
  * Any breakpoint for this insn will have been recognized earlier.
  */
 
-phys_pc = get_page_addr_code(env, pc);
+mmap_lock();
+phys_pc = get_page_addr_code_hostp(env, pc, true, _pc);
 if (phys_pc == -1) {
 tb = NULL;
 } else {
 tb = tb_lookup(cpu, phys_pc, pc, cs_base, flags, cflags);
 }
 if (tb == NULL) {
-mmap_lock();
-tb = tb_gen_code(cpu, pc, cs_base, flags, cflags);
-mmap_unlock();
+tb = tb_gen_code(cpu, phys_pc, host_pc, pc, cs_base, flags, 
cflags);
 }
+mmap_unlock();
 
 cpu_exec_enter(cpu);
 /* execute the generated code */
@@ -958,6 +959,7 @@ int cpu_exec(CPUState *cpu)
 target_ulong cs_base, pc;
 uint32_t flags, cflags;
 tb_page_addr_t phys_pc;
+void *host_pc;
 
 cpu_get_tb_cpu_state(cpu->env_ptr, , _base, );
 
@@ -979,22 +981,24 @@ int cpu_exec(CPUState *cpu)
 break;
 }
 
-phys_pc = get_page_addr_code(cpu->env_ptr, pc);
+mmap_lock();
+phys_pc = get_page_addr_code_hostp(cpu->env_ptr, pc,
+   true, _pc);
 if (phys_pc == -1) {
 tb = NULL;
 } else {
 tb = tb_lookup(cpu, phys_pc, pc, cs_base, flags, cflags);
 }
 if (tb == NULL) {
-mmap_lock();
-tb = tb_gen_code(cpu, pc, cs_base, flags, cflags);
-mmap_unlock();
+tb = tb_gen_code(cpu, phys_pc, host_pc, pc,
+ cs_base, flags, cflags);
 /*
  * We add the TB in the virtual pc hash table
  * for the fast lookup
  */
 qatomic_set(>tb_jmp_cache[tb_jmp_cache_hash_func(pc)], 
tb);
 }
+mmap_unlock();
 
 #ifndef CONFIG_USER_ONLY
 /*
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 65a23f47d6..86e7644c1b 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -1326,12 +1326,13 @@ tb_link_page(TranslationBlock *tb, tb_page_addr_t 
phys_pc,
 
 /* Called with mmap_lock held for user mode emulation.  */
 TranslationBlock *tb_gen_code(CPUState *cpu,
+  tb_page_addr_t phys_pc, void *host_pc,
   target_ulong pc, target_ulong cs_base,
   uint32_t flags, int cflags)
 {
 CPUArchState *env = cpu->env_ptr;
 TranslationBlock *tb, *existing_tb;
-tb_page_addr_t phys_pc, phys_page2;
+tb_page_addr_t phys_page2;
 target_ulong virt_page2;
 tcg_insn_unit *gen_code_buf;
 int gen_code_size, search_size, max_insns;
@@ -1343,8 +1344,6 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
 assert_memory_lock();
 qemu_thread_jit_write();
 
-phys_pc = get_page_addr_code(env, pc);
-
 if (phys_pc == -1) {
 /* Generate a one-shot TB with 1 insn in it */
 cflags = (cflags & ~CF_COUNT_MASK) | CF_LAST_IO | 1;
-- 
2.34.1




[PATCH for-7.2 12/21] accel/tcg: Add nofault parameter to get_page_addr_code_hostp

2022-08-12 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 include/exec/exec-all.h | 10 +-
 accel/tcg/cputlb.c  |  8 
 accel/tcg/plugin-gen.c  |  4 ++--
 accel/tcg/user-exec.c   |  4 ++--
 4 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 9f35e3b7a9..7a6dc44d86 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -599,6 +599,8 @@ struct MemoryRegionSection *iotlb_to_section(CPUState *cpu,
  * get_page_addr_code_hostp()
  * @env: CPUArchState
  * @addr: guest virtual address of guest code
+ * @nofault: do not raise an exception
+ * @hostp: output for host pointer
  *
  * See get_page_addr_code() (full-system version) for documentation on the
  * return value.
@@ -607,10 +609,10 @@ struct MemoryRegionSection *iotlb_to_section(CPUState 
*cpu,
  * If the return value is -1, sets *@hostp to NULL. Otherwise, sets *@hostp
  * to the host address where @addr's content is kept.
  *
- * Note: this function can trigger an exception.
+ * Note: Unless @nofault, this function can trigger an exception.
  */
 tb_page_addr_t get_page_addr_code_hostp(CPUArchState *env, target_ulong addr,
-void **hostp);
+bool nofault, void **hostp);
 
 /**
  * get_page_addr_code()
@@ -620,13 +622,11 @@ tb_page_addr_t get_page_addr_code_hostp(CPUArchState 
*env, target_ulong addr,
  * If we cannot translate and execute from the entire RAM page, or if
  * the region is not backed by RAM, returns -1. Otherwise, returns the
  * ram_addr_t corresponding to the guest code at @addr.
- *
- * Note: this function can trigger an exception.
  */
 static inline tb_page_addr_t get_page_addr_code(CPUArchState *env,
 target_ulong addr)
 {
-return get_page_addr_code_hostp(env, addr, NULL);
+return get_page_addr_code_hostp(env, addr, true, NULL);
 }
 
 #if defined(CONFIG_USER_ONLY)
diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index 2dc2affa12..ae7b40dd51 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -1644,16 +1644,16 @@ void *tlb_vaddr_to_host(CPUArchState *env, abi_ptr addr,
  * of RAM.  This will force us to execute by loading and translating
  * one insn at a time, without caching.
  *
- * NOTE: This function will trigger an exception if the page is
- * not executable.
+ * NOTE: Unless @nofault, this function will trigger an exception
+ * if the page is not executable.
  */
 tb_page_addr_t get_page_addr_code_hostp(CPUArchState *env, target_ulong addr,
-void **hostp)
+bool nofault, void **hostp)
 {
 void *p;
 
 (void)probe_access_internal(env, addr, 1, MMU_INST_FETCH,
-cpu_mmu_index(env, true), true, , 0);
+cpu_mmu_index(env, true), nofault, , 0);
 if (p == NULL) {
 return -1;
 }
diff --git a/accel/tcg/plugin-gen.c b/accel/tcg/plugin-gen.c
index 3d0b101e34..8377c15383 100644
--- a/accel/tcg/plugin-gen.c
+++ b/accel/tcg/plugin-gen.c
@@ -872,7 +872,7 @@ bool plugin_gen_tb_start(CPUState *cpu, const 
TranslationBlock *tb, bool mem_onl
 
 ptb->vaddr = tb->pc;
 ptb->vaddr2 = -1;
-get_page_addr_code_hostp(cpu->env_ptr, tb->pc, >haddr1);
+get_page_addr_code_hostp(cpu->env_ptr, tb->pc, true, >haddr1);
 ptb->haddr2 = NULL;
 ptb->mem_only = mem_only;
 
@@ -902,7 +902,7 @@ void plugin_gen_insn_start(CPUState *cpu, const 
DisasContextBase *db)
 unlikely((db->pc_next & TARGET_PAGE_MASK) !=
  (db->pc_first & TARGET_PAGE_MASK))) {
 get_page_addr_code_hostp(cpu->env_ptr, db->pc_next,
- >haddr2);
+ true, >haddr2);
 ptb->vaddr2 = db->pc_next;
 }
 if (likely(ptb->vaddr2 == -1)) {
diff --git a/accel/tcg/user-exec.c b/accel/tcg/user-exec.c
index a20234fb02..1b3403a064 100644
--- a/accel/tcg/user-exec.c
+++ b/accel/tcg/user-exec.c
@@ -200,11 +200,11 @@ void *probe_access(CPUArchState *env, target_ulong addr, 
int size,
 }
 
 tb_page_addr_t get_page_addr_code_hostp(CPUArchState *env, target_ulong addr,
-void **hostp)
+bool nofault, void **hostp)
 {
 int flags;
 
-flags = probe_access_internal(env, addr, 1, MMU_INST_FETCH, true, 0);
+flags = probe_access_internal(env, addr, 1, MMU_INST_FETCH, nofault, 0);
 if (unlikely(flags)) {
 return -1;
 }
-- 
2.34.1




[PATCH for-7.2 19/21] accel/tcg: Add pc and host_pc params to gen_intermediate_code

2022-08-12 Thread Richard Henderson
Pass these along to translator_loop -- pc may be used instead
of tb->pc, and host_pc is currently unused.  Adjust all targets
at one time.

Signed-off-by: Richard Henderson 
---
 include/exec/exec-all.h   |  1 -
 include/exec/translator.h | 24 
 accel/tcg/translate-all.c |  3 ++-
 accel/tcg/translator.c|  9 +
 target/alpha/translate.c  |  5 +++--
 target/arm/translate.c|  5 +++--
 target/avr/translate.c|  5 +++--
 target/cris/translate.c   |  5 +++--
 target/hexagon/translate.c|  6 --
 target/hppa/translate.c   |  5 +++--
 target/i386/tcg/translate.c   |  5 +++--
 target/loongarch/translate.c  |  6 --
 target/m68k/translate.c   |  5 +++--
 target/microblaze/translate.c |  5 +++--
 target/mips/tcg/translate.c   |  5 +++--
 target/nios2/translate.c  |  5 +++--
 target/openrisc/translate.c   |  6 --
 target/ppc/translate.c|  5 +++--
 target/riscv/translate.c  |  5 +++--
 target/rx/translate.c |  5 +++--
 target/s390x/tcg/translate.c  |  5 +++--
 target/sh4/translate.c|  5 +++--
 target/sparc/translate.c  |  5 +++--
 target/tricore/translate.c|  6 --
 target/xtensa/translate.c |  6 --
 25 files changed, 95 insertions(+), 52 deletions(-)

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 7a6dc44d86..4ad166966b 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -39,7 +39,6 @@ typedef ram_addr_t tb_page_addr_t;
 #define TB_PAGE_ADDR_FMT RAM_ADDR_FMT
 #endif
 
-void gen_intermediate_code(CPUState *cpu, TranslationBlock *tb, int max_insns);
 void restore_state_to_opc(CPUArchState *env, TranslationBlock *tb,
   target_ulong *data);
 
diff --git a/include/exec/translator.h b/include/exec/translator.h
index 45b9268ca4..69db0f5c21 100644
--- a/include/exec/translator.h
+++ b/include/exec/translator.h
@@ -26,6 +26,19 @@
 #include "exec/translate-all.h"
 #include "tcg/tcg.h"
 
+/**
+ * gen_intermediate_code
+ * @cpu: cpu context
+ * @tb: translation block
+ * @max_insns: max number of instructions to translate
+ * @pc: guest virtual program counter address
+ * @host_pc: host physical program counter address
+ *
+ * This function must be provided by the target, which should create
+ * the target-specific DisasContext, and then invoke translator_loop.
+ */
+void gen_intermediate_code(CPUState *cpu, TranslationBlock *tb, int max_insns,
+   target_ulong pc, void *host_pc);
 
 /**
  * DisasJumpType:
@@ -123,11 +136,13 @@ typedef struct TranslatorOps {
 
 /**
  * translator_loop:
- * @ops: Target-specific operations.
- * @db: Disassembly context.
  * @cpu: Target vCPU.
  * @tb: Translation block.
  * @max_insns: Maximum number of insns to translate.
+ * @pc: guest virtual program counter address
+ * @host_pc: host physical program counter address
+ * @ops: Target-specific operations.
+ * @db: Disassembly context.
  *
  * Generic translator loop.
  *
@@ -141,8 +156,9 @@ typedef struct TranslatorOps {
  * - When single-stepping is enabled (system-wide or on the current vCPU).
  * - When too many instructions have been translated.
  */
-void translator_loop(const TranslatorOps *ops, DisasContextBase *db,
- CPUState *cpu, TranslationBlock *tb, int max_insns);
+void translator_loop(CPUState *cpu, TranslationBlock *tb, int max_insns,
+ target_ulong pc, void *host_pc,
+ const TranslatorOps *ops, DisasContextBase *db);
 
 void translator_loop_temp_check(DisasContextBase *db);
 
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 86e7644c1b..d52097ab2d 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -46,6 +46,7 @@
 
 #include "exec/cputlb.h"
 #include "exec/translate-all.h"
+#include "exec/translator.h"
 #include "qemu/bitmap.h"
 #include "qemu/qemu-print.h"
 #include "qemu/timer.h"
@@ -1390,7 +1391,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
 tcg_func_start(tcg_ctx);
 
 tcg_ctx->cpu = env_cpu(env);
-gen_intermediate_code(cpu, tb, max_insns);
+gen_intermediate_code(cpu, tb, max_insns, pc, host_pc);
 assert(tb->size != 0);
 tcg_ctx->cpu = NULL;
 max_insns = tb->icount;
diff --git a/accel/tcg/translator.c b/accel/tcg/translator.c
index fe7af9b943..3eef30d93a 100644
--- a/accel/tcg/translator.c
+++ b/accel/tcg/translator.c
@@ -51,16 +51,17 @@ static inline void translator_page_protect(DisasContextBase 
*dcbase,
 #endif
 }
 
-void translator_loop(const TranslatorOps *ops, DisasContextBase *db,
- CPUState *cpu, TranslationBlock *tb, int max_insns)
+void translator_loop(CPUState *cpu, TranslationBlock *tb, int max_insns,
+ target_ulong pc, void *host_pc,
+ const TranslatorOps *ops, DisasContextBase *db)
 {
 uint32_t cflags = tb_cflags(tb);
 bool plugin_enabled;
 
 /* Initialize DisasContext */
 

[PATCH for-7.2 13/21] accel/tcg: Unlock mmap_lock after longjmp

2022-08-12 Thread Richard Henderson
The mmap_lock is held around tb_gen_code.  While the comment
is correct that the lock is dropped when tb_gen_code runs out
of memory, the lock is *not* dropped when an exception is
raised reading code for translation.

Signed-off-by: Richard Henderson 
---
 accel/tcg/cpu-exec.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index f6c0c0aff6..a9b7053274 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -517,13 +517,11 @@ void cpu_exec_step_atomic(CPUState *cpu)
 cpu_tb_exec(cpu, tb, _exit);
 cpu_exec_exit(cpu);
 } else {
-/*
- * The mmap_lock is dropped by tb_gen_code if it runs out of
- * memory.
- */
 #ifndef CONFIG_SOFTMMU
 clear_helper_retaddr();
-tcg_debug_assert(!have_mmap_lock());
+if (have_mmap_lock()) {
+mmap_unlock();
+}
 #endif
 if (qemu_mutex_iothread_locked()) {
 qemu_mutex_unlock_iothread();
@@ -930,7 +928,9 @@ int cpu_exec(CPUState *cpu)
 
 #ifndef CONFIG_SOFTMMU
 clear_helper_retaddr();
-tcg_debug_assert(!have_mmap_lock());
+if (have_mmap_lock()) {
+mmap_unlock();
+}
 #endif
 if (qemu_mutex_iothread_locked()) {
 qemu_mutex_unlock_iothread();
-- 
2.34.1




[PATCH for-7.2 14/21] accel/tcg: Hoist get_page_addr_code out of tb_lookup

2022-08-12 Thread Richard Henderson
We will want to re-use the result of get_page_addr_code
beyond the scope of tb_lookup.

Signed-off-by: Richard Henderson 
---
 accel/tcg/cpu-exec.c | 34 --
 1 file changed, 24 insertions(+), 10 deletions(-)

diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index a9b7053274..889355b341 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -209,13 +209,12 @@ static bool tb_lookup_cmp(const void *p, const void *d)
 }
 
 /* Might cause an exception, so have a longjmp destination ready */
-static TranslationBlock *tb_lookup(CPUState *cpu, target_ulong pc,
-   target_ulong cs_base,
+static TranslationBlock *tb_lookup(CPUState *cpu, tb_page_addr_t phys_pc,
+   target_ulong pc, target_ulong cs_base,
uint32_t flags, uint32_t cflags)
 {
 CPUArchState *env = cpu->env_ptr;
 TranslationBlock *tb;
-tb_page_addr_t phys_pc;
 struct tb_desc desc;
 uint32_t jmp_hash, tb_hash;
 
@@ -240,11 +239,8 @@ static TranslationBlock *tb_lookup(CPUState *cpu, 
target_ulong pc,
 desc.cflags = cflags;
 desc.trace_vcpu_dstate = *cpu->trace_dstate;
 desc.pc = pc;
-phys_pc = get_page_addr_code(desc.env, pc);
-if (phys_pc == -1) {
-return NULL;
-}
 desc.phys_page1 = phys_pc & TARGET_PAGE_MASK;
+
 tb_hash = tb_hash_func(phys_pc, pc, flags, cflags, *cpu->trace_dstate);
 tb = qht_lookup_custom(_ctx.htable, , tb_hash, tb_lookup_cmp);
 if (tb == NULL) {
@@ -371,6 +367,7 @@ const void *HELPER(lookup_tb_ptr)(CPUArchState *env)
 TranslationBlock *tb;
 target_ulong cs_base, pc;
 uint32_t flags, cflags;
+tb_page_addr_t phys_pc;
 
 cpu_get_tb_cpu_state(env, , _base, );
 
@@ -379,7 +376,12 @@ const void *HELPER(lookup_tb_ptr)(CPUArchState *env)
 cpu_loop_exit(cpu);
 }
 
-tb = tb_lookup(cpu, pc, cs_base, flags, cflags);
+phys_pc = get_page_addr_code(env, pc);
+if (phys_pc == -1) {
+return tcg_code_gen_epilogue;
+}
+
+tb = tb_lookup(cpu, phys_pc, pc, cs_base, flags, cflags);
 if (tb == NULL) {
 return tcg_code_gen_epilogue;
 }
@@ -482,6 +484,7 @@ void cpu_exec_step_atomic(CPUState *cpu)
 TranslationBlock *tb;
 target_ulong cs_base, pc;
 uint32_t flags, cflags;
+tb_page_addr_t phys_pc;
 int tb_exit;
 
 if (sigsetjmp(cpu->jmp_env, 0) == 0) {
@@ -504,7 +507,12 @@ void cpu_exec_step_atomic(CPUState *cpu)
  * Any breakpoint for this insn will have been recognized earlier.
  */
 
-tb = tb_lookup(cpu, pc, cs_base, flags, cflags);
+phys_pc = get_page_addr_code(env, pc);
+if (phys_pc == -1) {
+tb = NULL;
+} else {
+tb = tb_lookup(cpu, phys_pc, pc, cs_base, flags, cflags);
+}
 if (tb == NULL) {
 mmap_lock();
 tb = tb_gen_code(cpu, pc, cs_base, flags, cflags);
@@ -949,6 +957,7 @@ int cpu_exec(CPUState *cpu)
 TranslationBlock *tb;
 target_ulong cs_base, pc;
 uint32_t flags, cflags;
+tb_page_addr_t phys_pc;
 
 cpu_get_tb_cpu_state(cpu->env_ptr, , _base, );
 
@@ -970,7 +979,12 @@ int cpu_exec(CPUState *cpu)
 break;
 }
 
-tb = tb_lookup(cpu, pc, cs_base, flags, cflags);
+phys_pc = get_page_addr_code(cpu->env_ptr, pc);
+if (phys_pc == -1) {
+tb = NULL;
+} else {
+tb = tb_lookup(cpu, phys_pc, pc, cs_base, flags, cflags);
+}
 if (tb == NULL) {
 mmap_lock();
 tb = tb_gen_code(cpu, pc, cs_base, flags, cflags);
-- 
2.34.1




[PATCH for-7.2 11/21] accel/tcg: Use probe_access_internal for softmmu get_page_addr_code_hostp

2022-08-12 Thread Richard Henderson
Simplify the implementation of get_page_addr_code_hostp
by reusing the existing probe_access infrastructure.

Signed-off-by: Richard Henderson 
---
 accel/tcg/cputlb.c | 76 --
 1 file changed, 26 insertions(+), 50 deletions(-)

diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index 80a3eb4f1c..2dc2affa12 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -1482,56 +1482,6 @@ static bool victim_tlb_hit(CPUArchState *env, size_t 
mmu_idx, size_t index,
   victim_tlb_hit(env, mmu_idx, index, offsetof(CPUTLBEntry, TY), \
  (ADDR) & TARGET_PAGE_MASK)
 
-/*
- * Return a ram_addr_t for the virtual address for execution.
- *
- * Return -1 if we can't translate and execute from an entire page
- * of RAM.  This will force us to execute by loading and translating
- * one insn at a time, without caching.
- *
- * NOTE: This function will trigger an exception if the page is
- * not executable.
- */
-tb_page_addr_t get_page_addr_code_hostp(CPUArchState *env, target_ulong addr,
-void **hostp)
-{
-uintptr_t mmu_idx = cpu_mmu_index(env, true);
-uintptr_t index = tlb_index(env, mmu_idx, addr);
-CPUTLBEntry *entry = tlb_entry(env, mmu_idx, addr);
-void *p;
-
-if (unlikely(!tlb_hit(entry->addr_code, addr))) {
-if (!VICTIM_TLB_HIT(addr_code, addr)) {
-tlb_fill(env_cpu(env), addr, 0, MMU_INST_FETCH, mmu_idx, 0);
-index = tlb_index(env, mmu_idx, addr);
-entry = tlb_entry(env, mmu_idx, addr);
-
-if (unlikely(entry->addr_code & TLB_INVALID_MASK)) {
-/*
- * The MMU protection covers a smaller range than a target
- * page, so we must redo the MMU check for every insn.
- */
-return -1;
-}
-}
-assert(tlb_hit(entry->addr_code, addr));
-}
-
-if (unlikely(entry->addr_code & TLB_MMIO)) {
-/* The region is not backed by RAM.  */
-if (hostp) {
-*hostp = NULL;
-}
-return -1;
-}
-
-p = (void *)((uintptr_t)addr + entry->addend);
-if (hostp) {
-*hostp = p;
-}
-return qemu_ram_addr_from_host_nofail(p);
-}
-
 static void notdirty_write(CPUState *cpu, vaddr mem_vaddr, unsigned size,
CPUIOTLBEntry *iotlbentry, uintptr_t retaddr)
 {
@@ -1687,6 +1637,32 @@ void *tlb_vaddr_to_host(CPUArchState *env, abi_ptr addr,
 return flags ? NULL : host;
 }
 
+/*
+ * Return a ram_addr_t for the virtual address for execution.
+ *
+ * Return -1 if we can't translate and execute from an entire page
+ * of RAM.  This will force us to execute by loading and translating
+ * one insn at a time, without caching.
+ *
+ * NOTE: This function will trigger an exception if the page is
+ * not executable.
+ */
+tb_page_addr_t get_page_addr_code_hostp(CPUArchState *env, target_ulong addr,
+void **hostp)
+{
+void *p;
+
+(void)probe_access_internal(env, addr, 1, MMU_INST_FETCH,
+cpu_mmu_index(env, true), true, , 0);
+if (p == NULL) {
+return -1;
+}
+if (hostp) {
+*hostp = p;
+}
+return qemu_ram_addr_from_host_nofail(p);
+}
+
 #ifdef CONFIG_PLUGIN
 /*
  * Perform a TLB lookup and populate the qemu_plugin_hwaddr structure.
-- 
2.34.1




[PATCH for-7.2 09/21] accel/tcg: Move qemu_ram_addr_from_host_nofail to physmem.c

2022-08-12 Thread Richard Henderson
The base qemu_ram_addr_from_host function is already in
softmmu/physmem.c; move the nofail version to be adjacent.

Signed-off-by: Richard Henderson 
---
 include/exec/cpu-common.h |  1 +
 accel/tcg/cputlb.c| 12 
 softmmu/physmem.c | 12 
 3 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
index 2281be4e10..d909429427 100644
--- a/include/exec/cpu-common.h
+++ b/include/exec/cpu-common.h
@@ -72,6 +72,7 @@ typedef uintptr_t ram_addr_t;
 void qemu_ram_remap(ram_addr_t addr, ram_addr_t length);
 /* This should not be used by devices.  */
 ram_addr_t qemu_ram_addr_from_host(void *ptr);
+ram_addr_t qemu_ram_addr_from_host_nofail(void *ptr);
 RAMBlock *qemu_ram_block_by_name(const char *name);
 RAMBlock *qemu_ram_block_from_host(void *ptr, bool round_offset,
ram_addr_t *offset);
diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index a46f3a654d..5db56bcd1e 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -1283,18 +1283,6 @@ void tlb_set_page(CPUState *cpu, target_ulong vaddr,
 prot, mmu_idx, size);
 }
 
-static inline ram_addr_t qemu_ram_addr_from_host_nofail(void *ptr)
-{
-ram_addr_t ram_addr;
-
-ram_addr = qemu_ram_addr_from_host(ptr);
-if (ram_addr == RAM_ADDR_INVALID) {
-error_report("Bad ram pointer %p", ptr);
-abort();
-}
-return ram_addr;
-}
-
 /*
  * Note: tlb_fill() can trigger a resize of the TLB. This means that all of the
  * caller's prior references to the TLB table (e.g. CPUTLBEntry pointers) must
diff --git a/softmmu/physmem.c b/softmmu/physmem.c
index dc3c3e5f2e..d4c30e99ea 100644
--- a/softmmu/physmem.c
+++ b/softmmu/physmem.c
@@ -2460,6 +2460,18 @@ ram_addr_t qemu_ram_addr_from_host(void *ptr)
 return block->offset + offset;
 }
 
+ram_addr_t qemu_ram_addr_from_host_nofail(void *ptr)
+{
+ram_addr_t ram_addr;
+
+ram_addr = qemu_ram_addr_from_host(ptr);
+if (ram_addr == RAM_ADDR_INVALID) {
+error_report("Bad ram pointer %p", ptr);
+abort();
+}
+return ram_addr;
+}
+
 static MemTxResult flatview_read(FlatView *fv, hwaddr addr,
  MemTxAttrs attrs, void *buf, hwaddr len);
 static MemTxResult flatview_write(FlatView *fv, hwaddr addr, MemTxAttrs attrs,
-- 
2.34.1




[PATCH for-7.2 10/21] accel/tcg: Properly implement get_page_addr_code for user-only

2022-08-12 Thread Richard Henderson
The current implementation is a no-op, simply returning addr.
This is incorrect, because we ought to be checking the page
permissions for execution.

Make get_page_addr_code inline for both implementations.

Signed-off-by: Richard Henderson 
---
 include/exec/exec-all.h | 85 ++---
 accel/tcg/cputlb.c  |  5 ---
 accel/tcg/user-exec.c   | 15 
 3 files changed, 43 insertions(+), 62 deletions(-)

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index e7e30d55b8..9f35e3b7a9 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -595,43 +595,44 @@ struct MemoryRegionSection *iotlb_to_section(CPUState 
*cpu,
  hwaddr index, MemTxAttrs attrs);
 #endif
 
-#if defined(CONFIG_USER_ONLY)
-void mmap_lock(void);
-void mmap_unlock(void);
-bool have_mmap_lock(void);
-
 /**
- * get_page_addr_code() - user-mode version
+ * get_page_addr_code_hostp()
  * @env: CPUArchState
  * @addr: guest virtual address of guest code
  *
- * Returns @addr.
+ * See get_page_addr_code() (full-system version) for documentation on the
+ * return value.
+ *
+ * Sets *@hostp (when @hostp is non-NULL) as follows.
+ * If the return value is -1, sets *@hostp to NULL. Otherwise, sets *@hostp
+ * to the host address where @addr's content is kept.
+ *
+ * Note: this function can trigger an exception.
+ */
+tb_page_addr_t get_page_addr_code_hostp(CPUArchState *env, target_ulong addr,
+void **hostp);
+
+/**
+ * get_page_addr_code()
+ * @env: CPUArchState
+ * @addr: guest virtual address of guest code
+ *
+ * If we cannot translate and execute from the entire RAM page, or if
+ * the region is not backed by RAM, returns -1. Otherwise, returns the
+ * ram_addr_t corresponding to the guest code at @addr.
+ *
+ * Note: this function can trigger an exception.
  */
 static inline tb_page_addr_t get_page_addr_code(CPUArchState *env,
 target_ulong addr)
 {
-return addr;
+return get_page_addr_code_hostp(env, addr, NULL);
 }
 
-/**
- * get_page_addr_code_hostp() - user-mode version
- * @env: CPUArchState
- * @addr: guest virtual address of guest code
- *
- * Returns @addr.
- *
- * If @hostp is non-NULL, sets *@hostp to the host address where @addr's 
content
- * is kept.
- */
-static inline tb_page_addr_t get_page_addr_code_hostp(CPUArchState *env,
-  target_ulong addr,
-  void **hostp)
-{
-if (hostp) {
-*hostp = g2h_untagged(addr);
-}
-return addr;
-}
+#if defined(CONFIG_USER_ONLY)
+void mmap_lock(void);
+void mmap_unlock(void);
+bool have_mmap_lock(void);
 
 /**
  * adjust_signal_pc:
@@ -688,36 +689,6 @@ G_NORETURN void cpu_loop_exit_sigbus(CPUState *cpu, 
target_ulong addr,
 static inline void mmap_lock(void) {}
 static inline void mmap_unlock(void) {}
 
-/**
- * get_page_addr_code() - full-system version
- * @env: CPUArchState
- * @addr: guest virtual address of guest code
- *
- * If we cannot translate and execute from the entire RAM page, or if
- * the region is not backed by RAM, returns -1. Otherwise, returns the
- * ram_addr_t corresponding to the guest code at @addr.
- *
- * Note: this function can trigger an exception.
- */
-tb_page_addr_t get_page_addr_code(CPUArchState *env, target_ulong addr);
-
-/**
- * get_page_addr_code_hostp() - full-system version
- * @env: CPUArchState
- * @addr: guest virtual address of guest code
- *
- * See get_page_addr_code() (full-system version) for documentation on the
- * return value.
- *
- * Sets *@hostp (when @hostp is non-NULL) as follows.
- * If the return value is -1, sets *@hostp to NULL. Otherwise, sets *@hostp
- * to the host address where @addr's content is kept.
- *
- * Note: this function can trigger an exception.
- */
-tb_page_addr_t get_page_addr_code_hostp(CPUArchState *env, target_ulong addr,
-void **hostp);
-
 void tlb_reset_dirty(CPUState *cpu, ram_addr_t start1, ram_addr_t length);
 void tlb_set_dirty(CPUState *cpu, target_ulong vaddr);
 
diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index 5db56bcd1e..80a3eb4f1c 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -1532,11 +1532,6 @@ tb_page_addr_t get_page_addr_code_hostp(CPUArchState 
*env, target_ulong addr,
 return qemu_ram_addr_from_host_nofail(p);
 }
 
-tb_page_addr_t get_page_addr_code(CPUArchState *env, target_ulong addr)
-{
-return get_page_addr_code_hostp(env, addr, NULL);
-}
-
 static void notdirty_write(CPUState *cpu, vaddr mem_vaddr, unsigned size,
CPUIOTLBEntry *iotlbentry, uintptr_t retaddr)
 {
diff --git a/accel/tcg/user-exec.c b/accel/tcg/user-exec.c
index 20ada5472b..a20234fb02 100644
--- a/accel/tcg/user-exec.c
+++ b/accel/tcg/user-exec.c
@@ -199,6 +199,21 @@ void *probe_access(CPUArchState *env, 

[PATCH for-7.2 06/21] accel/tcg: Remove PageDesc code_bitmap

2022-08-12 Thread Richard Henderson
This bitmap is created and discarded immediately.
We gain nothing by its existence.

Signed-off-by: Richard Henderson 
---
 accel/tcg/translate-all.c | 78 ++-
 1 file changed, 4 insertions(+), 74 deletions(-)

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index ef62a199c7..cf99b2b876 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -101,21 +101,14 @@
 #define assert_memory_lock() tcg_debug_assert(have_mmap_lock())
 #endif
 
-#define SMC_BITMAP_USE_THRESHOLD 10
-
 typedef struct PageDesc {
 /* list of TBs intersecting this ram page */
 uintptr_t first_tb;
-#ifdef CONFIG_SOFTMMU
-/* in order to optimize self modifying code, we count the number
-   of lookups we do to a given page to use a bitmap */
-unsigned long *code_bitmap;
-unsigned int code_write_count;
-#else
+#ifdef CONFIG_USER_ONLY
 unsigned long flags;
 void *target_data;
 #endif
-#ifndef CONFIG_USER_ONLY
+#ifdef CONFIG_SOFTMMU
 QemuSpin lock;
 #endif
 } PageDesc;
@@ -906,17 +899,6 @@ void tb_htable_init(void)
 qht_init(_ctx.htable, tb_cmp, CODE_GEN_HTABLE_SIZE, mode);
 }
 
-/* call with @p->lock held */
-static inline void invalidate_page_bitmap(PageDesc *p)
-{
-assert_page_locked(p);
-#ifdef CONFIG_SOFTMMU
-g_free(p->code_bitmap);
-p->code_bitmap = NULL;
-p->code_write_count = 0;
-#endif
-}
-
 /* Set to NULL all the 'first_tb' fields in all PageDescs. */
 static void page_flush_tb_1(int level, void **lp)
 {
@@ -931,7 +913,6 @@ static void page_flush_tb_1(int level, void **lp)
 for (i = 0; i < V_L2_SIZE; ++i) {
 page_lock([i]);
 pd[i].first_tb = (uintptr_t)NULL;
-invalidate_page_bitmap(pd + i);
 page_unlock([i]);
 }
 } else {
@@ -1196,11 +1177,9 @@ static void do_tb_phys_invalidate(TranslationBlock *tb, 
bool rm_from_page_list)
 if (rm_from_page_list) {
 p = page_find(tb->page_addr[0] >> TARGET_PAGE_BITS);
 tb_page_remove(p, tb);
-invalidate_page_bitmap(p);
 if (tb->page_addr[1] != -1) {
 p = page_find(tb->page_addr[1] >> TARGET_PAGE_BITS);
 tb_page_remove(p, tb);
-invalidate_page_bitmap(p);
 }
 }
 
@@ -1245,35 +1224,6 @@ void tb_phys_invalidate(TranslationBlock *tb, 
tb_page_addr_t page_addr)
 }
 }
 
-#ifdef CONFIG_SOFTMMU
-/* call with @p->lock held */
-static void build_page_bitmap(PageDesc *p)
-{
-int n, tb_start, tb_end;
-TranslationBlock *tb;
-
-assert_page_locked(p);
-p->code_bitmap = bitmap_new(TARGET_PAGE_SIZE);
-
-PAGE_FOR_EACH_TB(p, tb, n) {
-/* NOTE: this is subtle as a TB may span two physical pages */
-if (n == 0) {
-/* NOTE: tb_end may be after the end of the page, but
-   it is not a problem */
-tb_start = tb->pc & ~TARGET_PAGE_MASK;
-tb_end = tb_start + tb->size;
-if (tb_end > TARGET_PAGE_SIZE) {
-tb_end = TARGET_PAGE_SIZE;
- }
-} else {
-tb_start = 0;
-tb_end = ((tb->pc + tb->size) & ~TARGET_PAGE_MASK);
-}
-bitmap_set(p->code_bitmap, tb_start, tb_end - tb_start);
-}
-}
-#endif
-
 /* add the tb in the target page and protect it if necessary
  *
  * Called with mmap_lock held for user-mode emulation.
@@ -1294,7 +1244,6 @@ static inline void tb_page_add(PageDesc *p, 
TranslationBlock *tb,
 page_already_protected = p->first_tb != (uintptr_t)NULL;
 #endif
 p->first_tb = (uintptr_t)tb | n;
-invalidate_page_bitmap(p);
 
 #if defined(CONFIG_USER_ONLY)
 /* translator_loop() must have made all TB pages non-writable */
@@ -1356,10 +1305,8 @@ tb_link_page(TranslationBlock *tb, tb_page_addr_t 
phys_pc,
 /* remove TB from the page(s) if we couldn't insert it */
 if (unlikely(existing_tb)) {
 tb_page_remove(p, tb);
-invalidate_page_bitmap(p);
 if (p2) {
 tb_page_remove(p2, tb);
-invalidate_page_bitmap(p2);
 }
 tb = existing_tb;
 }
@@ -1736,7 +1683,6 @@ tb_invalidate_phys_page_range__locked(struct 
page_collection *pages,
 #if !defined(CONFIG_USER_ONLY)
 /* if no code remaining, no need to continue to use slow writes */
 if (!p->first_tb) {
-invalidate_page_bitmap(p);
 tlb_unprotect_code(start);
 }
 #endif
@@ -1832,24 +1778,8 @@ void tb_invalidate_phys_page_fast(struct page_collection 
*pages,
 }
 
 assert_page_locked(p);
-if (!p->code_bitmap &&
-++p->code_write_count >= SMC_BITMAP_USE_THRESHOLD) {
-build_page_bitmap(p);
-}
-if (p->code_bitmap) {
-unsigned int nr;
-unsigned long b;
-
-nr = start & ~TARGET_PAGE_MASK;
-b = p->code_bitmap[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG - 1));
-if (b & ((1 << len) - 1)) {
-goto do_invalidate;
-}
-} else {
-do_invalidate:
-   

[PATCH for-7.2 07/21] accel/tcg: Use bool for page_find_alloc

2022-08-12 Thread Richard Henderson
Bool is more appropriate type for the alloc parameter.

Signed-off-by: Richard Henderson 
---
 accel/tcg/translate-all.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index cf99b2b876..65a23f47d6 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -464,7 +464,7 @@ void page_init(void)
 #endif
 }
 
-static PageDesc *page_find_alloc(tb_page_addr_t index, int alloc)
+static PageDesc *page_find_alloc(tb_page_addr_t index, bool alloc)
 {
 PageDesc *pd;
 void **lp;
@@ -532,11 +532,11 @@ static PageDesc *page_find_alloc(tb_page_addr_t index, 
int alloc)
 
 static inline PageDesc *page_find(tb_page_addr_t index)
 {
-return page_find_alloc(index, 0);
+return page_find_alloc(index, false);
 }
 
 static void page_lock_pair(PageDesc **ret_p1, tb_page_addr_t phys1,
-   PageDesc **ret_p2, tb_page_addr_t phys2, int alloc);
+   PageDesc **ret_p2, tb_page_addr_t phys2, bool 
alloc);
 
 /* In user-mode page locks aren't used; mmap_lock is enough */
 #ifdef CONFIG_USER_ONLY
@@ -650,7 +650,7 @@ static inline void page_unlock(PageDesc *pd)
 /* lock the page(s) of a TB in the correct acquisition order */
 static inline void page_lock_tb(const TranslationBlock *tb)
 {
-page_lock_pair(NULL, tb->page_addr[0], NULL, tb->page_addr[1], 0);
+page_lock_pair(NULL, tb->page_addr[0], NULL, tb->page_addr[1], false);
 }
 
 static inline void page_unlock_tb(const TranslationBlock *tb)
@@ -839,7 +839,7 @@ void page_collection_unlock(struct page_collection *set)
 #endif /* !CONFIG_USER_ONLY */
 
 static void page_lock_pair(PageDesc **ret_p1, tb_page_addr_t phys1,
-   PageDesc **ret_p2, tb_page_addr_t phys2, int alloc)
+   PageDesc **ret_p2, tb_page_addr_t phys2, bool alloc)
 {
 PageDesc *p1, *p2;
 tb_page_addr_t page1;
@@ -1289,7 +1289,7 @@ tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc,
  * Note that inserting into the hash table first isn't an option, since
  * we can only insert TBs that are fully initialized.
  */
-page_lock_pair(, phys_pc, , phys_page2, 1);
+page_lock_pair(, phys_pc, , phys_page2, true);
 tb_page_add(p, tb, 0, phys_pc & TARGET_PAGE_MASK);
 if (p2) {
 tb_page_add(p2, tb, 1, phys_page2);
@@ -2224,7 +2224,7 @@ void page_set_flags(target_ulong start, target_ulong end, 
int flags)
 for (addr = start, len = end - start;
  len != 0;
  len -= TARGET_PAGE_SIZE, addr += TARGET_PAGE_SIZE) {
-PageDesc *p = page_find_alloc(addr >> TARGET_PAGE_BITS, 1);
+PageDesc *p = page_find_alloc(addr >> TARGET_PAGE_BITS, true);
 
 /* If the write protection bit is set, then we invalidate
the code inside.  */
-- 
2.34.1




[PATCH for-7.2 08/21] accel/tcg: Merge tb_htable_lookup into caller

2022-08-12 Thread Richard Henderson
This function is used only once, so merge it into
its only caller, tb_lookup.  This requires moving
the support routine, tb_lookup_cmp, and its private
data structure, tb_desc, up in the file.

Signed-off-by: Richard Henderson 
---
 include/exec/exec-all.h |   3 -
 accel/tcg/cpu-exec.c| 134 +++-
 2 files changed, 64 insertions(+), 73 deletions(-)

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 311e5fb422..e7e30d55b8 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -552,9 +552,6 @@ void tb_invalidate_phys_addr(AddressSpace *as, hwaddr addr, 
MemTxAttrs attrs);
 #endif
 void tb_flush(CPUState *cpu);
 void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr);
-TranslationBlock *tb_htable_lookup(CPUState *cpu, target_ulong pc,
-   target_ulong cs_base, uint32_t flags,
-   uint32_t cflags);
 void tb_set_jmp_target(TranslationBlock *tb, int n, uintptr_t addr);
 
 /* GETPC is the true target of the return instruction that we'll execute.  */
diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index a565a3f8ec..f6c0c0aff6 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -170,19 +170,60 @@ uint32_t curr_cflags(CPUState *cpu)
 return cflags;
 }
 
-/* Might cause an exception, so have a longjmp destination ready */
-static inline TranslationBlock *tb_lookup(CPUState *cpu, target_ulong pc,
-  target_ulong cs_base,
-  uint32_t flags, uint32_t cflags)
+struct tb_desc {
+target_ulong pc;
+target_ulong cs_base;
+CPUArchState *env;
+tb_page_addr_t phys_page1;
+uint32_t flags;
+uint32_t cflags;
+uint32_t trace_vcpu_dstate;
+};
+
+static bool tb_lookup_cmp(const void *p, const void *d)
 {
+const TranslationBlock *tb = p;
+const struct tb_desc *desc = d;
+
+if (tb->pc == desc->pc &&
+tb->page_addr[0] == desc->phys_page1 &&
+tb->cs_base == desc->cs_base &&
+tb->flags == desc->flags &&
+tb->trace_vcpu_dstate == desc->trace_vcpu_dstate &&
+tb_cflags(tb) == desc->cflags) {
+/* check next page if needed */
+if (tb->page_addr[1] == -1) {
+return true;
+} else {
+tb_page_addr_t phys_page2;
+target_ulong virt_page2;
+
+virt_page2 = (desc->pc & TARGET_PAGE_MASK) + TARGET_PAGE_SIZE;
+phys_page2 = get_page_addr_code(desc->env, virt_page2);
+if (tb->page_addr[1] == phys_page2) {
+return true;
+}
+}
+}
+return false;
+}
+
+/* Might cause an exception, so have a longjmp destination ready */
+static TranslationBlock *tb_lookup(CPUState *cpu, target_ulong pc,
+   target_ulong cs_base,
+   uint32_t flags, uint32_t cflags)
+{
+CPUArchState *env = cpu->env_ptr;
 TranslationBlock *tb;
-uint32_t hash;
+tb_page_addr_t phys_pc;
+struct tb_desc desc;
+uint32_t jmp_hash, tb_hash;
 
 /* we should never be trying to look up an INVALID tb */
 tcg_debug_assert(!(cflags & CF_INVALID));
 
-hash = tb_jmp_cache_hash_func(pc);
-tb = qatomic_rcu_read(>tb_jmp_cache[hash]);
+jmp_hash = tb_jmp_cache_hash_func(pc);
+tb = qatomic_rcu_read(>tb_jmp_cache[jmp_hash]);
 
 if (likely(tb &&
tb->pc == pc &&
@@ -192,11 +233,25 @@ static inline TranslationBlock *tb_lookup(CPUState *cpu, 
target_ulong pc,
tb_cflags(tb) == cflags)) {
 return tb;
 }
-tb = tb_htable_lookup(cpu, pc, cs_base, flags, cflags);
+
+desc.env = env;
+desc.cs_base = cs_base;
+desc.flags = flags;
+desc.cflags = cflags;
+desc.trace_vcpu_dstate = *cpu->trace_dstate;
+desc.pc = pc;
+phys_pc = get_page_addr_code(desc.env, pc);
+if (phys_pc == -1) {
+return NULL;
+}
+desc.phys_page1 = phys_pc & TARGET_PAGE_MASK;
+tb_hash = tb_hash_func(phys_pc, pc, flags, cflags, *cpu->trace_dstate);
+tb = qht_lookup_custom(_ctx.htable, , tb_hash, tb_lookup_cmp);
 if (tb == NULL) {
 return NULL;
 }
-qatomic_set(>tb_jmp_cache[hash], tb);
+
+qatomic_set(>tb_jmp_cache[jmp_hash], tb);
 return tb;
 }
 
@@ -487,67 +542,6 @@ void cpu_exec_step_atomic(CPUState *cpu)
 end_exclusive();
 }
 
-struct tb_desc {
-target_ulong pc;
-target_ulong cs_base;
-CPUArchState *env;
-tb_page_addr_t phys_page1;
-uint32_t flags;
-uint32_t cflags;
-uint32_t trace_vcpu_dstate;
-};
-
-static bool tb_lookup_cmp(const void *p, const void *d)
-{
-const TranslationBlock *tb = p;
-const struct tb_desc *desc = d;
-
-if (tb->pc == desc->pc &&
-tb->page_addr[0] == desc->phys_page1 &&
-tb->cs_base == desc->cs_base &&
-tb->flags == desc->flags &&
-tb->trace_vcpu_dstate == 

[PATCH for-7.2 03/21] linux-user/x86_64: Allocate vsyscall page as a commpage

2022-08-12 Thread Richard Henderson
We're about to start validating PAGE_EXEC, which means that we've
got to the vsyscall page executable.  We had been special casing
this entirely within translate.

Signed-off-by: Richard Henderson 
---
 linux-user/elfload.c | 21 +
 1 file changed, 21 insertions(+)

diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index 29d910c4cc..e315155dad 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -195,6 +195,27 @@ static void elf_core_copy_regs(target_elf_gregset_t *regs, 
const CPUX86State *en
 (*regs)[26] = tswapreg(env->segs[R_GS].selector & 0x);
 }
 
+#define HI_COMMPAGE  TARGET_VSYSCALL_PAGE
+
+static bool init_guest_commpage(void)
+{
+/*
+ * The vsyscall page is at a high negative address aka kernel space,
+ * which means that we cannot actually allocate it with target_mmap.
+ * We still should be able to use page_set_flags, unless the user
+ * has specified -R reserved_va, which would trigger an assert().
+ */
+if (reserved_va != 0 &&
+TARGET_VSYSCALL_PAGE + TARGET_PAGE_SIZE >= reserved_va) {
+error_report("Cannot allocate vsyscall page");
+exit(EXIT_FAILURE);
+}
+page_set_flags(TARGET_VSYSCALL_PAGE,
+   TARGET_VSYSCALL_PAGE + TARGET_PAGE_SIZE,
+   PAGE_EXEC | PAGE_VALID);
+return true;
+}
+
 #else
 
 #define ELF_START_MMAP 0x8000
-- 
2.34.1




[PATCH for-7.2 02/21] linux-user/hppa: Allocate page zero as a commpage

2022-08-12 Thread Richard Henderson
We're about to start validating PAGE_EXEC, which means that we've
got to mark page zero executable.  We had been special casing this
entirely within translate.

Signed-off-by: Richard Henderson 
---
 linux-user/elfload.c | 34 +++---
 1 file changed, 31 insertions(+), 3 deletions(-)

diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index 3e3dc02499..29d910c4cc 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -1646,6 +1646,34 @@ static inline void init_thread(struct target_pt_regs 
*regs,
 regs->gr[31] = infop->entry;
 }
 
+#define LO_COMMPAGE  0
+
+static bool init_guest_commpage(void)
+{
+void *want = g2h_untagged(LO_COMMPAGE);
+void *addr = mmap(want, qemu_host_page_size, PROT_NONE,
+  MAP_ANONYMOUS | MAP_PRIVATE | MAP_FIXED, -1, 0);
+
+if (addr == MAP_FAILED) {
+perror("Allocating guest commpage");
+exit(EXIT_FAILURE);
+}
+if (addr != want) {
+return false;
+}
+
+/*
+ * On Linux, page zero is normally marked execute only + gateway.
+ * Normal read or write is supposed to fail (thus PROT_NONE above),
+ * but specific offsets have kernel code mapped to raise permissions
+ * and implement syscalls.  Here, simply mark the page executable.
+ * Special case the entry points during translation (see do_page_zero).
+ */
+page_set_flags(LO_COMMPAGE, LO_COMMPAGE + TARGET_PAGE_SIZE,
+   PAGE_EXEC | PAGE_VALID);
+return true;
+}
+
 #endif /* TARGET_HPPA */
 
 #ifdef TARGET_XTENSA
@@ -2326,12 +2354,12 @@ static abi_ulong create_elf_tables(abi_ulong p, int 
argc, int envc,
 }
 
 #if defined(HI_COMMPAGE)
-#define LO_COMMPAGE 0
+#define LO_COMMPAGE -1
 #elif defined(LO_COMMPAGE)
 #define HI_COMMPAGE 0
 #else
 #define HI_COMMPAGE 0
-#define LO_COMMPAGE 0
+#define LO_COMMPAGE -1
 #define init_guest_commpage() true
 #endif
 
@@ -2555,7 +2583,7 @@ static void pgb_static(const char *image_name, abi_ulong 
orig_loaddr,
 } else {
 offset = -(HI_COMMPAGE & -align);
 }
-} else if (LO_COMMPAGE != 0) {
+} else if (LO_COMMPAGE != -1) {
 loaddr = MIN(loaddr, LO_COMMPAGE & -align);
 }
 
-- 
2.34.1




[PATCH for-7.2 04/21] linux-user: Honor PT_GNU_STACK

2022-08-12 Thread Richard Henderson
Map the stack executable if required by default or on demand.

Signed-off-by: Richard Henderson 
---
 include/elf.h|  1 +
 linux-user/qemu.h|  1 +
 linux-user/elfload.c | 19 ++-
 3 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/include/elf.h b/include/elf.h
index 3a4bcb646a..3d6b9062c0 100644
--- a/include/elf.h
+++ b/include/elf.h
@@ -31,6 +31,7 @@ typedef int64_t  Elf64_Sxword;
 #define PT_LOPROC  0x7000
 #define PT_HIPROC  0x7fff
 
+#define PT_GNU_STACK  (PT_LOOS + 0x474e551)
 #define PT_GNU_PROPERTY   (PT_LOOS + 0x474e553)
 
 #define PT_MIPS_REGINFO   0x7000
diff --git a/linux-user/qemu.h b/linux-user/qemu.h
index 7d90de1b15..e2e93fbd1d 100644
--- a/linux-user/qemu.h
+++ b/linux-user/qemu.h
@@ -48,6 +48,7 @@ struct image_info {
 uint32_telf_flags;
 int personality;
 abi_ulong   alignment;
+boolexec_stack;
 
 /* Generic semihosting knows about these pointers. */
 abi_ulong   arg_strings;   /* strings for argv */
diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index e315155dad..b1169ca6df 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -232,6 +232,7 @@ static bool init_guest_commpage(void)
 #define ELF_ARCHEM_386
 
 #define ELF_PLATFORM get_elf_platform()
+#define EXSTACK_DEFAULT true
 
 static const char *get_elf_platform(void)
 {
@@ -308,6 +309,7 @@ static void elf_core_copy_regs(target_elf_gregset_t *regs, 
const CPUX86State *en
 
 #define ELF_ARCHEM_ARM
 #define ELF_CLASS   ELFCLASS32
+#define EXSTACK_DEFAULT true
 
 static inline void init_thread(struct target_pt_regs *regs,
struct image_info *infop)
@@ -776,6 +778,7 @@ static inline void init_thread(struct target_pt_regs *regs,
 #else
 
 #define ELF_CLASS   ELFCLASS32
+#define EXSTACK_DEFAULT true
 
 #endif
 
@@ -973,6 +976,7 @@ static void elf_core_copy_regs(target_elf_gregset_t *regs, 
const CPUPPCState *en
 
 #define ELF_CLASS   ELFCLASS64
 #define ELF_ARCHEM_LOONGARCH
+#define EXSTACK_DEFAULT true
 
 #define elf_check_arch(x) ((x) == EM_LOONGARCH)
 
@@ -1068,6 +1072,7 @@ static uint32_t get_elf_hwcap(void)
 #define ELF_CLASS   ELFCLASS32
 #endif
 #define ELF_ARCHEM_MIPS
+#define EXSTACK_DEFAULT true
 
 #ifdef TARGET_ABI_MIPSN32
 #define elf_check_abi(x) ((x) & EF_MIPS_ABI2)
@@ -1806,6 +1811,10 @@ static inline void init_thread(struct target_pt_regs 
*regs,
 #define bswaptls(ptr) bswap32s(ptr)
 #endif
 
+#ifndef EXSTACK_DEFAULT
+#define EXSTACK_DEFAULT false
+#endif
+
 #include "elf.h"
 
 /* We must delay the following stanzas until after "elf.h". */
@@ -2081,6 +2090,7 @@ static abi_ulong setup_arg_pages(struct linux_binprm 
*bprm,
  struct image_info *info)
 {
 abi_ulong size, error, guard;
+int prot;
 
 size = guest_stack_size;
 if (size < STACK_LOWER_LIMIT) {
@@ -2091,7 +2101,11 @@ static abi_ulong setup_arg_pages(struct linux_binprm 
*bprm,
 guard = qemu_real_host_page_size();
 }
 
-error = target_mmap(0, size + guard, PROT_READ | PROT_WRITE,
+prot = PROT_READ | PROT_WRITE;
+if (info->exec_stack) {
+prot |= PROT_EXEC;
+}
+error = target_mmap(0, size + guard, prot,
 MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
 if (error == -1) {
 perror("mmap stack");
@@ -2919,6 +2933,7 @@ static void load_elf_image(const char *image_name, int 
image_fd,
  */
 loaddr = -1, hiaddr = 0;
 info->alignment = 0;
+info->exec_stack = EXSTACK_DEFAULT;
 for (i = 0; i < ehdr->e_phnum; ++i) {
 struct elf_phdr *eppnt = phdr + i;
 if (eppnt->p_type == PT_LOAD) {
@@ -2961,6 +2976,8 @@ static void load_elf_image(const char *image_name, int 
image_fd,
 if (!parse_elf_properties(image_fd, info, eppnt, bprm_buf, )) {
 goto exit_errmsg;
 }
+} else if (eppnt->p_type == PT_GNU_STACK) {
+info->exec_stack = eppnt->p_flags & PF_X;
 }
 }
 
-- 
2.34.1




[PATCH for-7.2 18/21] accel/tcg: Remove translator_ldsw

2022-08-12 Thread Richard Henderson
The only user can easily use translator_lduw and
adjust the type to signed during the return.

Signed-off-by: Richard Henderson 
---
 include/exec/translator.h   | 1 -
 target/i386/tcg/translate.c | 2 +-
 2 files changed, 1 insertion(+), 2 deletions(-)

diff --git a/include/exec/translator.h b/include/exec/translator.h
index 0d0bf3a31e..45b9268ca4 100644
--- a/include/exec/translator.h
+++ b/include/exec/translator.h
@@ -178,7 +178,6 @@ bool translator_use_goto_tb(DisasContextBase *db, 
target_ulong dest);
 
 #define FOR_EACH_TRANSLATOR_LD(F)   \
 F(translator_ldub, uint8_t, cpu_ldub_code, /* no swap */)   \
-F(translator_ldsw, int16_t, cpu_ldsw_code, bswap16) \
 F(translator_lduw, uint16_t, cpu_lduw_code, bswap16)\
 F(translator_ldl, uint32_t, cpu_ldl_code, bswap32)  \
 F(translator_ldq, uint64_t, cpu_ldq_code, bswap64)
diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
index b7972f0ff5..a23417d058 100644
--- a/target/i386/tcg/translate.c
+++ b/target/i386/tcg/translate.c
@@ -2033,7 +2033,7 @@ static inline uint8_t x86_ldub_code(CPUX86State *env, 
DisasContext *s)
 
 static inline int16_t x86_ldsw_code(CPUX86State *env, DisasContext *s)
 {
-return translator_ldsw(env, >base, advance_pc(env, s, 2));
+return translator_lduw(env, >base, advance_pc(env, s, 2));
 }
 
 static inline uint16_t x86_lduw_code(CPUX86State *env, DisasContext *s)
-- 
2.34.1




[PATCH for-7.2 01/21] linux-user/arm: Mark the commpage executable

2022-08-12 Thread Richard Henderson
We're about to start validating PAGE_EXEC, which means
that we've got to mark the commpage executable.  We had
been placing the commpage outside of reserved_va, which
was incorrect and lead to an abort.

Signed-off-by: Richard Henderson 
---
 linux-user/arm/target_cpu.h | 4 ++--
 linux-user/elfload.c| 6 +-
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/linux-user/arm/target_cpu.h b/linux-user/arm/target_cpu.h
index 709d19bc9e..89ba274cfc 100644
--- a/linux-user/arm/target_cpu.h
+++ b/linux-user/arm/target_cpu.h
@@ -34,9 +34,9 @@ static inline unsigned long arm_max_reserved_va(CPUState *cs)
 } else {
 /*
  * We need to be able to map the commpage.
- * See validate_guest_space in linux-user/elfload.c.
+ * See init_guest_commpage in linux-user/elfload.c.
  */
-return 0xul;
+return 0xul;
 }
 }
 #define MAX_RESERVED_VA  arm_max_reserved_va
diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index ce902dbd56..3e3dc02499 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -398,7 +398,8 @@ enum {
 
 static bool init_guest_commpage(void)
 {
-void *want = g2h_untagged(HI_COMMPAGE & -qemu_host_page_size);
+abi_ptr commpage = HI_COMMPAGE & -qemu_host_page_size;
+void *want = g2h_untagged(commpage);
 void *addr = mmap(want, qemu_host_page_size, PROT_READ | PROT_WRITE,
   MAP_ANONYMOUS | MAP_PRIVATE | MAP_FIXED, -1, 0);
 
@@ -417,6 +418,9 @@ static bool init_guest_commpage(void)
 perror("Protecting guest commpage");
 exit(EXIT_FAILURE);
 }
+
+page_set_flags(commpage, commpage + qemu_host_page_size,
+   PAGE_READ | PAGE_EXEC | PAGE_VALID);
 return true;
 }
 
-- 
2.34.1




[PATCH for-7.2 05/21] tests/tcg/i386: Move smc_code2 to an executable section

2022-08-12 Thread Richard Henderson
We're about to start validating PAGE_EXEC, which means
that we've got to put this code into a section that is
both writable and executable.

Note that this test did not run on hardware beforehand either.

Signed-off-by: Richard Henderson 
---
 tests/tcg/i386/test-i386.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/tcg/i386/test-i386.c b/tests/tcg/i386/test-i386.c
index ac8d5a3c1f..e6b308a2c0 100644
--- a/tests/tcg/i386/test-i386.c
+++ b/tests/tcg/i386/test-i386.c
@@ -1998,7 +1998,7 @@ uint8_t code[] = {
 0xc3, /* ret */
 };
 
-asm(".section \".data\"\n"
+asm(".section \".data_x\",\"awx\"\n"
 "smc_code2:\n"
 "movl 4(%esp), %eax\n"
 "movl %eax, smc_patch_addr2 + 1\n"
-- 
2.34.1




[PATCH for-7.2 00/21] accel/tcg: minimize tlb lookups during translate + user-only PROT_EXEC fixes

2022-08-12 Thread Richard Henderson
This is part of a larger body of work, but in the process of
reorganizing I was reminded that PROT_EXEC wasn't being enforced
properly for user-only.  As this has come up in the context of
some of Ilya's patches, I thought I'd go ahead and post this part.


r~


Ilya Leoshkevich (1):
  accel/tcg: Introduce is_same_page()

Richard Henderson (20):
  linux-user/arm: Mark the commpage executable
  linux-user/hppa: Allocate page zero as a commpage
  linux-user/x86_64: Allocate vsyscall page as a commpage
  linux-user: Honor PT_GNU_STACK
  tests/tcg/i386: Move smc_code2 to an executable section
  accel/tcg: Remove PageDesc code_bitmap
  accel/tcg: Use bool for page_find_alloc
  accel/tcg: Merge tb_htable_lookup into caller
  accel/tcg: Move qemu_ram_addr_from_host_nofail to physmem.c
  accel/tcg: Properly implement get_page_addr_code for user-only
  accel/tcg: Use probe_access_internal for softmmu
get_page_addr_code_hostp
  accel/tcg: Add nofault parameter to get_page_addr_code_hostp
  accel/tcg: Unlock mmap_lock after longjmp
  accel/tcg: Hoist get_page_addr_code out of tb_lookup
  accel/tcg: Hoist get_page_addr_code out of tb_gen_code
  accel/tcg: Raise PROT_EXEC exception early
  accel/tcg: Remove translator_ldsw
  accel/tcg: Add pc and host_pc params to gen_intermediate_code
  accel/tcg: Add fast path for translator_ld*
  accel/tcg: Use DisasContextBase in plugin_gen_tb_start

 accel/tcg/internal.h  |   7 +-
 include/elf.h |   1 +
 include/exec/cpu-common.h |   1 +
 include/exec/exec-all.h   |  87 +---
 include/exec/plugin-gen.h |   7 +-
 include/exec/translator.h |  85 
 linux-user/arm/target_cpu.h   |   4 +-
 linux-user/qemu.h |   1 +
 accel/tcg/cpu-exec.c  | 184 ++
 accel/tcg/cputlb.c|  93 +
 accel/tcg/plugin-gen.c|  23 +++--
 accel/tcg/translate-all.c | 120 --
 accel/tcg/translator.c| 122 +-
 accel/tcg/user-exec.c |  15 +++
 linux-user/elfload.c  |  80 ++-
 softmmu/physmem.c |  12 +++
 target/alpha/translate.c  |   5 +-
 target/arm/translate.c|   5 +-
 target/avr/translate.c|   5 +-
 target/cris/translate.c   |   5 +-
 target/hexagon/translate.c|   6 +-
 target/hppa/translate.c   |   5 +-
 target/i386/tcg/translate.c   |   7 +-
 target/loongarch/translate.c  |   6 +-
 target/m68k/translate.c   |   5 +-
 target/microblaze/translate.c |   5 +-
 target/mips/tcg/translate.c   |   5 +-
 target/nios2/translate.c  |   5 +-
 target/openrisc/translate.c   |   6 +-
 target/ppc/translate.c|   5 +-
 target/riscv/translate.c  |   5 +-
 target/rx/translate.c |   5 +-
 target/s390x/tcg/translate.c  |   5 +-
 target/sh4/translate.c|   5 +-
 target/sparc/translate.c  |   5 +-
 target/tricore/translate.c|   6 +-
 target/xtensa/translate.c |   6 +-
 tests/tcg/i386/test-i386.c|   2 +-
 38 files changed, 532 insertions(+), 424 deletions(-)

-- 
2.34.1




Re: [PATCH v3 1/4] accel/tcg: Invalidate translations when clearing PAGE_EXEC

2022-08-12 Thread Richard Henderson

On 8/12/22 08:02, Ilya Leoshkevich wrote:

tb_lookup() skips get_page_addr_code() if tb is found in tb_jmp_cache.
I assume it's a bug?


Yes, I think so.  I've rearranged that for other reasons, and so may have inadvertently 
fix this.  I'll post the in-progress work in a moment.



r~



Re: [PATCH 00/62] target/arm: Implement FEAT_HAFDBS

2022-08-12 Thread Richard Henderson

On 8/12/22 09:31, Peter Maydell wrote:

Is it possible to rearrange this patchset so the easy
refactoring patches that do "use a struct to return
values from get_phys_addr and friends" are at the front
(ie before the stuff that touches core code) ?
That way they're easy to take into the tree early while
the rest of the series is still under review...


Yes, I think so.


r~



Re: [PULL 0/1] Linux user for 7.1 patches

2022-08-12 Thread Richard Henderson

On 8/12/22 03:33, Laurent Vivier wrote:

The following changes since commit a6b1c53e79d08a99a28cc3e67a3e1a7c34102d6b:

   Merge tag 'linux-user-for-7.1-pull-request' of 
https://gitlab.com/laurent_vivier/qemu into staging (2022-08-10 10:26:57 -0700)

are available in the Git repository at:

   https://gitlab.com/laurent_vivier/qemu.git 
tags/linux-user-for-7.1-pull-request

for you to fetch changes up to dbbf89751b14aa5d281bad3af273e9ffaae82262:

   linux-user/aarch64: Reset target data on MADV_DONTNEED (2022-08-11 11:34:17 
+0200)


Pull request linux-user 20220812


Applied, thanks.  Please update https://wiki.qemu.org/ChangeLog/7.1 as 
appropriate.


r~






Vitaly Buka (1):
   linux-user/aarch64: Reset target data on MADV_DONTNEED

  accel/tcg/translate-all.c | 26 ++
  include/exec/cpu-all.h|  1 +
  linux-user/mmap.c |  3 +++
  3 files changed, 30 insertions(+)






[PATCH] target/arm: Rearrange cpu64.c so all the CPU initfns are together

2022-08-12 Thread Peter Maydell
cpu64.c has ended up in a slightly odd order -- it starts with the
initfns for most of the models-real-hardware CPUs; after that comes a
bunch of support code for SVE, SME, pauth and LPA2 properties.  Then
come the initfns for the 'host' and 'max' CPU types, and then after
that one more models-real-hardware CPU initfn, for a64fx.  (This
ordering is partly historical and partly required because a64fx needs
the SVE properties.)

Reorder the file into:
 * CPU property support functions
 * initfns for real hardware CPUs
 * initfns for host and max
 * class boilerplate

Signed-off-by: Peter Maydell 
---
I started off thinking this would be a relatively simple "move the
a64fx initfn up to live with the others", but because we effectively
have to move all the cpu initfns the diffstat has ended up quite big.
On the other hand this patch is purely code motion, and the resulting
order in the file does seem to me to be more sensible.
---
 target/arm/cpu64.c | 712 ++---
 1 file changed, 356 insertions(+), 356 deletions(-)

diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index 78e27f778ac..741f959fbe8 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -37,313 +37,6 @@
 #include "internals.h"
 
 
-static void aarch64_a57_initfn(Object *obj)
-{
-ARMCPU *cpu = ARM_CPU(obj);
-
-cpu->dtb_compatible = "arm,cortex-a57";
-set_feature(>env, ARM_FEATURE_V8);
-set_feature(>env, ARM_FEATURE_NEON);
-set_feature(>env, ARM_FEATURE_GENERIC_TIMER);
-set_feature(>env, ARM_FEATURE_AARCH64);
-set_feature(>env, ARM_FEATURE_CBAR_RO);
-set_feature(>env, ARM_FEATURE_EL2);
-set_feature(>env, ARM_FEATURE_EL3);
-set_feature(>env, ARM_FEATURE_PMU);
-cpu->kvm_target = QEMU_KVM_ARM_TARGET_CORTEX_A57;
-cpu->midr = 0x411fd070;
-cpu->revidr = 0x;
-cpu->reset_fpsid = 0x41034070;
-cpu->isar.mvfr0 = 0x10110222;
-cpu->isar.mvfr1 = 0x1211;
-cpu->isar.mvfr2 = 0x0043;
-cpu->ctr = 0x8444c004;
-cpu->reset_sctlr = 0x00c50838;
-cpu->isar.id_pfr0 = 0x0131;
-cpu->isar.id_pfr1 = 0x00011011;
-cpu->isar.id_dfr0 = 0x03010066;
-cpu->id_afr0 = 0x;
-cpu->isar.id_mmfr0 = 0x10101105;
-cpu->isar.id_mmfr1 = 0x4000;
-cpu->isar.id_mmfr2 = 0x0126;
-cpu->isar.id_mmfr3 = 0x02102211;
-cpu->isar.id_isar0 = 0x02101110;
-cpu->isar.id_isar1 = 0x13112111;
-cpu->isar.id_isar2 = 0x21232042;
-cpu->isar.id_isar3 = 0x01112131;
-cpu->isar.id_isar4 = 0x00011142;
-cpu->isar.id_isar5 = 0x00011121;
-cpu->isar.id_isar6 = 0;
-cpu->isar.id_aa64pfr0 = 0x;
-cpu->isar.id_aa64dfr0 = 0x10305106;
-cpu->isar.id_aa64isar0 = 0x00011120;
-cpu->isar.id_aa64mmfr0 = 0x1124;
-cpu->isar.dbgdidr = 0x3516d000;
-cpu->isar.dbgdevid = 0x01110f13;
-cpu->isar.dbgdevid1 = 0x2;
-cpu->isar.reset_pmcr_el0 = 0x41013000;
-cpu->clidr = 0x0a200023;
-cpu->ccsidr[0] = 0x701fe00a; /* 32KB L1 dcache */
-cpu->ccsidr[1] = 0x201fe012; /* 48KB L1 icache */
-cpu->ccsidr[2] = 0x70ffe07a; /* 2048KB L2 cache */
-cpu->dcz_blocksize = 4; /* 64 bytes */
-cpu->gic_num_lrs = 4;
-cpu->gic_vpribits = 5;
-cpu->gic_vprebits = 5;
-cpu->gic_pribits = 5;
-define_cortex_a72_a57_a53_cp_reginfo(cpu);
-}
-
-static void aarch64_a53_initfn(Object *obj)
-{
-ARMCPU *cpu = ARM_CPU(obj);
-
-cpu->dtb_compatible = "arm,cortex-a53";
-set_feature(>env, ARM_FEATURE_V8);
-set_feature(>env, ARM_FEATURE_NEON);
-set_feature(>env, ARM_FEATURE_GENERIC_TIMER);
-set_feature(>env, ARM_FEATURE_AARCH64);
-set_feature(>env, ARM_FEATURE_CBAR_RO);
-set_feature(>env, ARM_FEATURE_EL2);
-set_feature(>env, ARM_FEATURE_EL3);
-set_feature(>env, ARM_FEATURE_PMU);
-cpu->kvm_target = QEMU_KVM_ARM_TARGET_CORTEX_A53;
-cpu->midr = 0x410fd034;
-cpu->revidr = 0x;
-cpu->reset_fpsid = 0x41034070;
-cpu->isar.mvfr0 = 0x10110222;
-cpu->isar.mvfr1 = 0x1211;
-cpu->isar.mvfr2 = 0x0043;
-cpu->ctr = 0x84448004; /* L1Ip = VIPT */
-cpu->reset_sctlr = 0x00c50838;
-cpu->isar.id_pfr0 = 0x0131;
-cpu->isar.id_pfr1 = 0x00011011;
-cpu->isar.id_dfr0 = 0x03010066;
-cpu->id_afr0 = 0x;
-cpu->isar.id_mmfr0 = 0x10101105;
-cpu->isar.id_mmfr1 = 0x4000;
-cpu->isar.id_mmfr2 = 0x0126;
-cpu->isar.id_mmfr3 = 0x02102211;
-cpu->isar.id_isar0 = 0x02101110;
-cpu->isar.id_isar1 = 0x13112111;
-cpu->isar.id_isar2 = 0x21232042;
-cpu->isar.id_isar3 = 0x01112131;
-cpu->isar.id_isar4 = 0x00011142;
-cpu->isar.id_isar5 = 0x00011121;
-cpu->isar.id_isar6 = 0;
-cpu->isar.id_aa64pfr0 = 0x;
-cpu->isar.id_aa64dfr0 = 0x10305106;
-cpu->isar.id_aa64isar0 = 0x00011120;
-cpu->isar.id_aa64mmfr0 = 0x1122; /* 40 bit physical addr */
-cpu->isar.dbgdidr = 0x3516d000;
-cpu->isar.dbgdevid = 0x00110f13;
-cpu->isar.dbgdevid1 = 0x1;
-

Re: [PATCH 00/62] target/arm: Implement FEAT_HAFDBS

2022-08-12 Thread Peter Maydell
On Sun, 3 Jul 2022 at 09:25, Richard Henderson
 wrote:
>
> This is a major reorg to arm page table walking.  While the result
> here is "merely" Hardware-assited Access Flag and Dirty Bit Setting
> (HAFDBS), the ultimate goal is the Realm Management Extension (RME).
> RME "recommends" that HAFDBS be implemented (I_CSLWZ).

> Richard Henderson (62):
>   accel/tcg: Introduce PageEntryExtra
>   target/arm: Enable PageEntryExtra
>   target/arm: Fix MTE check in sve_ldnfff1_r
>   target/arm: Record tagged bit for user-only in sve_probe_page
>   target/arm: Use PageEntryExtra for MTE
>   target/arm: Use PageEntryExtra for BTI
>   include/exec: Remove target_tlb_bitN from MemTxAttrs
>   target/arm: Create GetPhysAddrResult
>   target/arm: Fix ipa_secure in get_phys_addr
>   target/arm: Use GetPhysAddrResult in get_phys_addr_lpae
>   target/arm: Use GetPhysAddrResult in get_phys_addr_v6
>   target/arm: Use GetPhysAddrResult in get_phys_addr_v5
>   target/arm: Use GetPhysAddrResult in get_phys_addr_pmsav5
>   target/arm: Use GetPhysAddrResult in get_phys_addr_pmsav7
>   target/arm: Use GetPhysAddrResult in get_phys_addr_pmsav8
>   target/arm: Use GetPhysAddrResult in pmsav8_mpu_lookup
>   target/arm: Remove is_subpage argument to pmsav8_mpu_lookup
>   target/arm: Add is_secure parameter to v8m_security_lookup
>   target/arm: Add is_secure parameter to pmsav8_mpu_lookup
>   target/arm: Add is_secure parameter to get_phys_addr_v5
>   target/arm: Add is_secure parameter to get_phys_addr_v6
>   target/arm: Add secure parameter to get_phys_addr_pmsav8
>   target/arm: Add is_secure parameter to pmsav7_use_background_region
>   target/arm: Add is_secure parameter to get_phys_addr_lpae
>   target/arm: Add is_secure parameter to get_phys_addr_pmsav7
>   target/arm: Add is_secure parameter to regime_translation_disabled
>   target/arm: Add is_secure parameter to get_phys_addr_pmsav5

Is it possible to rearrange this patchset so the easy
refactoring patches that do "use a struct to return
values from get_phys_addr and friends" are at the front
(ie before the stuff that touches core code) ?
That way they're easy to take into the tree early while
the rest of the series is still under review...

thanks
-- PMM



Re: [RFC v2 02/10] Drop unused static function return values

2022-08-12 Thread Alberto Faria
On Wed, Aug 3, 2022 at 12:15 PM Richard W.M. Jones  wrote:
> If it helps to think about this, Coverity checks for consistency.
> Across the whole code base, is the return value of a function used or
> ignored consistently.  You will see Coverity errors like:
>
>   Error: CHECKED_RETURN (CWE-252): [#def37]
>   libnbd-1.12.5/fuse/operations.c:180: check_return: Calling "nbd_poll" 
> without checking return value (as is done elsewhere 5 out of 6 times).
>   libnbd-1.12.5/examples/aio-connect-read.c:96: example_checked: Example 
> 1: "nbd_poll(nbd, -1)" has its value checked in "nbd_poll(nbd, -1) == -1".
>   libnbd-1.12.5/examples/aio-connect-read.c:128: example_checked: Example 
> 2: "nbd_poll(nbd, -1)" has its value checked in "nbd_poll(nbd, -1) == -1".
>   libnbd-1.12.5/examples/strict-structured-reads.c:246: example_checked: 
> Example 3: "nbd_poll(nbd, -1)" has its value checked in "nbd_poll(nbd, -1) == 
> -1".
>   libnbd-1.12.5/ocaml/nbd-c.c:2599: example_assign: Example 4: Assigning: 
> "r" = return value from "nbd_poll(h, timeout)".
>   libnbd-1.12.5/ocaml/nbd-c.c:2602: example_checked: Example 4 (cont.): 
> "r" has its value checked in "r == -1".
>   libnbd-1.12.5/python/methods.c:2806: example_assign: Example 5: 
> Assigning: "ret" = return value from "nbd_poll(h, timeout)".
>   libnbd-1.12.5/python/methods.c:2808: example_checked: Example 5 
> (cont.): "ret" has its value checked in "ret == -1".
>   #  178|   /* Dispatch work while there are commands in flight. */
>   #  179|   while (thread->in_flight > 0)
>   #  180|->   nbd_poll (h, -1);
>   #  181| }
>   #  182|
>
> What it's saying is that in this code base, nbd_poll's return value
> was checked by the caller 5 out of 6 times, but ignored here.  (This
> turned out to be a real bug which we fixed).
>
> It seems like the check implemented in your patch is: If the return
> value is used 0 times anywhere in the code base, change the return
> value to 'void'.  Coverity would not flag this.
>
> Maybe a consistent use check is better?

Note that the analyzer is currently limited to analyzing a single
translation unit at a time, so we would only be able to implement a
consistent use check for static functions (this is why the current
"return-value-never-used" check only applies to static functions).

It may be worthwhile exploring cross-translation unit analysis,
although it may be difficult to accomplish while also avoiding
reanalyzing the entire tree every time a single translation unit is
modified.

Alberto




Re: [PATCH for-7.2 v4 06/11] ppc/pnv: enable user created pnv-phb for powernv8

2022-08-12 Thread Frederic Barrat




On 11/08/2022 18:39, Daniel Henrique Barboza wrote:

The bulk of the work was already done by previous patches.

Use defaults_enabled() to determine whether we need to create the
default devices or not.

Reviewed-by: Cédric Le Goater 
Signed-off-by: Daniel Henrique Barboza 
---



The QOM relationship for user-created root port is not ideal, but it's 
addressed in the last patch of the series.

Reviewed-by: Frederic Barrat 

  Fred



  hw/pci-host/pnv_phb.c |  9 +++--
  hw/ppc/pnv.c  | 32 ++--
  2 files changed, 25 insertions(+), 16 deletions(-)

diff --git a/hw/pci-host/pnv_phb.c b/hw/pci-host/pnv_phb.c
index 5dc44f45d1..1f53ff77c5 100644
--- a/hw/pci-host/pnv_phb.c
+++ b/hw/pci-host/pnv_phb.c
@@ -17,6 +17,7 @@
  #include "hw/ppc/pnv.h"
  #include "hw/qdev-properties.h"
  #include "qom/object.h"
+#include "sysemu/sysemu.h"
  
  
  /*

@@ -166,6 +167,10 @@ static void pnv_phb_realize(DeviceState *dev, Error **errp)
  pnv_phb4_bus_init(dev, PNV_PHB4(phb->backend));
  }
  
+if (phb->version == 3 && !defaults_enabled()) {

+return;
+}
+
  pnv_phb_attach_root_port(pci);
  }
  
@@ -201,7 +206,7 @@ static void pnv_phb_class_init(ObjectClass *klass, void *data)

  dc->realize = pnv_phb_realize;
  device_class_set_props(dc, pnv_phb_properties);
  set_bit(DEVICE_CATEGORY_BRIDGE, dc->categories);
-dc->user_creatable = false;
+dc->user_creatable = true;
  }
  
  static void pnv_phb_root_port_reset(DeviceState *dev)

@@ -292,7 +297,7 @@ static void pnv_phb_root_port_class_init(ObjectClass 
*klass, void *data)
  device_class_set_parent_reset(dc, pnv_phb_root_port_reset,
>parent_reset);
  dc->reset = _phb_root_port_reset;
-dc->user_creatable = false;
+dc->user_creatable = true;
  
  k->vendor_id = PCI_VENDOR_ID_IBM;

  /* device_id will be written during realize() */
diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
index e53e9e297d..e82d6522f0 100644
--- a/hw/ppc/pnv.c
+++ b/hw/ppc/pnv.c
@@ -1178,20 +1178,22 @@ static void pnv_chip_power8_instance_init(Object *obj)
  
  object_initialize_child(obj, "homer", >homer, TYPE_PNV8_HOMER);
  
-chip8->num_phbs = pcc->num_phbs;

-
-for (i = 0; i < chip8->num_phbs; i++) {
-Object *phb = object_new(TYPE_PNV_PHB);
-
-/*
- * We need the chip to parent the PHB to allow the DT
- * to build correctly (via pnv_xscom_dt()).
- *
- * TODO: the PHB should be parented by a PEC device that, at
- * this moment, is not modelled powernv8/phb3.
- */
-object_property_add_child(obj, "phb[*]", phb);
-chip8->phbs[i] = PNV_PHB(phb);
+if (defaults_enabled()) {
+chip8->num_phbs = pcc->num_phbs;
+
+for (i = 0; i < chip8->num_phbs; i++) {
+Object *phb = object_new(TYPE_PNV_PHB);
+
+/*
+ * We need the chip to parent the PHB to allow the DT
+ * to build correctly (via pnv_xscom_dt()).
+ *
+ * TODO: the PHB should be parented by a PEC device that, at
+ * this moment, is not modelled powernv8/phb3.
+ */
+object_property_add_child(obj, "phb[*]", phb);
+chip8->phbs[i] = PNV_PHB(phb);
+}
  }
  
  }

@@ -2130,6 +2132,8 @@ static void pnv_machine_power8_class_init(ObjectClass 
*oc, void *data)
  
  pmc->compat = compat;

  pmc->compat_size = sizeof(compat);
+
+machine_class_allow_dynamic_sysbus_dev(mc, TYPE_PNV_PHB);
  }
  
  static void pnv_machine_power9_class_init(ObjectClass *oc, void *data)




Re: [BUG] cxl can not create region

2022-08-12 Thread Jonathan Cameron via
On Fri, 12 Aug 2022 09:03:02 -0700
Dan Williams  wrote:

> Jonathan Cameron wrote:
> > On Thu, 11 Aug 2022 18:08:57 +0100
> > Jonathan Cameron via  wrote:
> >   
> > > On Tue, 9 Aug 2022 17:08:25 +0100
> > > Jonathan Cameron  wrote:
> > >   
> > > > On Tue, 9 Aug 2022 21:07:06 +0800
> > > > Bobo WL  wrote:
> > > > 
> > > > > Hi Jonathan
> > > > > 
> > > > > Thanks for your reply!
> > > > > 
> > > > > On Mon, Aug 8, 2022 at 8:37 PM Jonathan Cameron
> > > > >  wrote:  
> > > > > >
> > > > > > Probably not related to your problem, but there is a disconnect in 
> > > > > > QEMU /
> > > > > > kernel assumptionsaround the presence of an HDM decoder when a HB 
> > > > > > only
> > > > > > has a single root port. Spec allows it to be provided or not as an 
> > > > > > implementation choice.
> > > > > > Kernel assumes it isn't provide. Qemu assumes it is.
> > > > > >
> > > > > > The temporary solution is to throw in a second root port on the HB 
> > > > > > and not
> > > > > > connect anything to it.  Longer term I may special case this so 
> > > > > > that the particular
> > > > > > decoder defaults to pass through settings in QEMU if there is only 
> > > > > > one root port.
> > > > > >
> > > > > 
> > > > > You are right! After adding an extra HB in qemu, I can create a x1
> > > > > region successfully.
> > > > > But have some errors in Nvdimm:
> > > > > 
> > > > > [   74.925838] Unknown online node for memory at 0x100, 
> > > > > assuming node 0
> > > > > [   74.925846] Unknown target node for memory at 0x100, 
> > > > > assuming node 0
> > > > > [   74.927470] nd_region region0: nmem0: is disabled, failing probe   
> > > > >
> > > > 
> > > > Ah. I've seen this one, but not chased it down yet.  Was on my todo 
> > > > list to chase
> > > > down. Once I reach this state I can verify the HDM Decode is correct 
> > > > which is what
> > > > I've been using to test (Which wasn't true until earlier this week). 
> > > > I'm currently testing via devmem, more for historical reasons than 
> > > > because it makes
> > > > that much sense anymore.  
> > > 
> > > *embarassed cough*.  We haven't fully hooked the LSA up in qemu yet.
> > > I'd forgotten that was still on the todo list. I don't think it will
> > > be particularly hard to do and will take a look in next few days.
> > > 
> > > Very very indirectly this error is causing a driver probe fail that means 
> > > that
> > > we hit a code path that has a rather odd looking check on NDD_LABELING.
> > > Should not have gotten near that path though - hence the problem is 
> > > actually
> > > when we call cxl_pmem_get_config_data() and it returns an error because
> > > we haven't fully connected up the command in QEMU.  
> > 
> > So a least one bug in QEMU. We were not supporting variable length payloads 
> > on mailbox
> > inputs (but were on outputs).  That hasn't mattered until we get to LSA 
> > writes.
> > We just need to relax condition on the supplied length.
> > 
> > diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
> > index c352a935c4..fdda9529fe 100644
> > --- a/hw/cxl/cxl-mailbox-utils.c
> > +++ b/hw/cxl/cxl-mailbox-utils.c
> > @@ -510,7 +510,7 @@ void cxl_process_mailbox(CXLDeviceState *cxl_dstate)
> >  cxl_cmd = _cmd_set[set][cmd];
> >  h = cxl_cmd->handler;
> >  if (h) {
> > -if (len == cxl_cmd->in) {
> > +if (len == cxl_cmd->in || !cxl_cmd->in) {
> >  cxl_cmd->payload = cxl_dstate->mbox_reg_state +
> >  A_CXL_DEV_CMD_PAYLOAD;
> >  ret = (*h)(cxl_cmd, cxl_dstate, );
> > 
> > 
> > This lets the nvdimm/region probe fine, but I'm getting some issues with
> > namespace capacity so I'll look at what is causing that next.
> > Unfortunately I'm not that familiar with the driver/nvdimm side of things
> > so it's take a while to figure out what kicks off what!  
> 
> The whirlwind tour is that 'struct nd_region' instances that represent a
> persitent memory address range are composed of one more mappings of
> 'struct nvdimm' objects. The nvdimm object is driven by the dimm driver
> in drivers/nvdimm/dimm.c. That driver is mainly charged with unlocking
> the dimm (if locked) and interrogating the label area to look for
> namespace labels.
> 
> The label command calls are routed to the '->ndctl()' callback that was
> registered when the CXL nvdimm_bus_descriptor was created. That callback
> handles both 'bus' scope calls, currently none for CXL, and per nvdimm
> calls. cxl_pmem_nvdimm_ctl() translates those generic LIBNVDIMM commands
> to CXL commands.
> 
> The 'struct nvdimm' objects that the CXL side registers have the
> NDD_LABELING flag set which means that namespaces need to be explicitly
> created / provisioned from region capacity. Otherwise, if
> drivers/nvdimm/dimm.c does not find a namespace-label-index block then
> the region reverts to label-less mode and a default namespace equal to
> the size of the region is instantiated.
> 
> 

Re: [PATCH for-7.2 v4 08/11] ppc/pnv: enable user created pnv-phb for powernv9

2022-08-12 Thread Frederic Barrat




On 11/08/2022 18:39, Daniel Henrique Barboza wrote:

Enable pnv-phb user created devices for powernv9 now that we have
everything in place.

Reviewed-by: Cédric Le Goater 
Signed-off-by: Daniel Henrique Barboza 
---



Same comment as in patch 6 regarding the QOM relationship of the 
user-created root port.

Reviewed-by: Frederic Barrat 

  Fred



  hw/pci-host/pnv_phb.c  | 2 +-
  hw/pci-host/pnv_phb4_pec.c | 6 --
  hw/ppc/pnv.c   | 2 ++
  3 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/hw/pci-host/pnv_phb.c b/hw/pci-host/pnv_phb.c
index 1f53ff77c5..17d9960aa1 100644
--- a/hw/pci-host/pnv_phb.c
+++ b/hw/pci-host/pnv_phb.c
@@ -167,7 +167,7 @@ static void pnv_phb_realize(DeviceState *dev, Error **errp)
  pnv_phb4_bus_init(dev, PNV_PHB4(phb->backend));
  }
  
-if (phb->version == 3 && !defaults_enabled()) {

+if (!defaults_enabled()) {
  return;
  }
  
diff --git a/hw/pci-host/pnv_phb4_pec.c b/hw/pci-host/pnv_phb4_pec.c

index 8dc363d69c..9871f462cd 100644
--- a/hw/pci-host/pnv_phb4_pec.c
+++ b/hw/pci-host/pnv_phb4_pec.c
@@ -146,8 +146,10 @@ static void pnv_pec_realize(DeviceState *dev, Error **errp)
  pec->num_phbs = pecc->num_phbs[pec->index];
  
  /* Create PHBs if running with defaults */

-for (i = 0; i < pec->num_phbs; i++) {
-pnv_pec_default_phb_realize(pec, i, errp);
+if (defaults_enabled()) {
+for (i = 0; i < pec->num_phbs; i++) {
+pnv_pec_default_phb_realize(pec, i, errp);
+}
  }
  
  /* Initialize the XSCOM regions for the PEC registers */

diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
index 0644f45a1d..ec0558ed1c 100644
--- a/hw/ppc/pnv.c
+++ b/hw/ppc/pnv.c
@@ -2188,6 +2188,8 @@ static void pnv_machine_power9_class_init(ObjectClass 
*oc, void *data)
  pmc->compat = compat;
  pmc->compat_size = sizeof(compat);
  pmc->dt_power_mgt = pnv_dt_power_mgt;
+
+machine_class_allow_dynamic_sysbus_dev(mc, TYPE_PNV_PHB);
  }
  
  static void pnv_machine_power10_class_init(ObjectClass *oc, void *data)




Re: [BUG] cxl can not create region

2022-08-12 Thread Dan Williams
Jonathan Cameron wrote:
> On Thu, 11 Aug 2022 18:08:57 +0100
> Jonathan Cameron via  wrote:
> 
> > On Tue, 9 Aug 2022 17:08:25 +0100
> > Jonathan Cameron  wrote:
> > 
> > > On Tue, 9 Aug 2022 21:07:06 +0800
> > > Bobo WL  wrote:
> > >   
> > > > Hi Jonathan
> > > > 
> > > > Thanks for your reply!
> > > > 
> > > > On Mon, Aug 8, 2022 at 8:37 PM Jonathan Cameron
> > > >  wrote:
> > > > >
> > > > > Probably not related to your problem, but there is a disconnect in 
> > > > > QEMU /
> > > > > kernel assumptionsaround the presence of an HDM decoder when a HB only
> > > > > has a single root port. Spec allows it to be provided or not as an 
> > > > > implementation choice.
> > > > > Kernel assumes it isn't provide. Qemu assumes it is.
> > > > >
> > > > > The temporary solution is to throw in a second root port on the HB 
> > > > > and not
> > > > > connect anything to it.  Longer term I may special case this so that 
> > > > > the particular
> > > > > decoder defaults to pass through settings in QEMU if there is only 
> > > > > one root port.
> > > > >  
> > > > 
> > > > You are right! After adding an extra HB in qemu, I can create a x1
> > > > region successfully.
> > > > But have some errors in Nvdimm:
> > > > 
> > > > [   74.925838] Unknown online node for memory at 0x100, 
> > > > assuming node 0
> > > > [   74.925846] Unknown target node for memory at 0x100, 
> > > > assuming node 0
> > > > [   74.927470] nd_region region0: nmem0: is disabled, failing probe
> > > 
> > > Ah. I've seen this one, but not chased it down yet.  Was on my todo list 
> > > to chase
> > > down. Once I reach this state I can verify the HDM Decode is correct 
> > > which is what
> > > I've been using to test (Which wasn't true until earlier this week). 
> > > I'm currently testing via devmem, more for historical reasons than 
> > > because it makes
> > > that much sense anymore.
> > 
> > *embarassed cough*.  We haven't fully hooked the LSA up in qemu yet.
> > I'd forgotten that was still on the todo list. I don't think it will
> > be particularly hard to do and will take a look in next few days.
> > 
> > Very very indirectly this error is causing a driver probe fail that means 
> > that
> > we hit a code path that has a rather odd looking check on NDD_LABELING.
> > Should not have gotten near that path though - hence the problem is actually
> > when we call cxl_pmem_get_config_data() and it returns an error because
> > we haven't fully connected up the command in QEMU.
> 
> So a least one bug in QEMU. We were not supporting variable length payloads 
> on mailbox
> inputs (but were on outputs).  That hasn't mattered until we get to LSA 
> writes.
> We just need to relax condition on the supplied length.
> 
> diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
> index c352a935c4..fdda9529fe 100644
> --- a/hw/cxl/cxl-mailbox-utils.c
> +++ b/hw/cxl/cxl-mailbox-utils.c
> @@ -510,7 +510,7 @@ void cxl_process_mailbox(CXLDeviceState *cxl_dstate)
>  cxl_cmd = _cmd_set[set][cmd];
>  h = cxl_cmd->handler;
>  if (h) {
> -if (len == cxl_cmd->in) {
> +if (len == cxl_cmd->in || !cxl_cmd->in) {
>  cxl_cmd->payload = cxl_dstate->mbox_reg_state +
>  A_CXL_DEV_CMD_PAYLOAD;
>  ret = (*h)(cxl_cmd, cxl_dstate, );
> 
> 
> This lets the nvdimm/region probe fine, but I'm getting some issues with
> namespace capacity so I'll look at what is causing that next.
> Unfortunately I'm not that familiar with the driver/nvdimm side of things
> so it's take a while to figure out what kicks off what!

The whirlwind tour is that 'struct nd_region' instances that represent a
persitent memory address range are composed of one more mappings of
'struct nvdimm' objects. The nvdimm object is driven by the dimm driver
in drivers/nvdimm/dimm.c. That driver is mainly charged with unlocking
the dimm (if locked) and interrogating the label area to look for
namespace labels.

The label command calls are routed to the '->ndctl()' callback that was
registered when the CXL nvdimm_bus_descriptor was created. That callback
handles both 'bus' scope calls, currently none for CXL, and per nvdimm
calls. cxl_pmem_nvdimm_ctl() translates those generic LIBNVDIMM commands
to CXL commands.

The 'struct nvdimm' objects that the CXL side registers have the
NDD_LABELING flag set which means that namespaces need to be explicitly
created / provisioned from region capacity. Otherwise, if
drivers/nvdimm/dimm.c does not find a namespace-label-index block then
the region reverts to label-less mode and a default namespace equal to
the size of the region is instantiated.

If you are seeing small mismatches in namespace capacity then it may
just be the fact that by default 'ndctl create-namespace' results in an
'fsdax' mode namespace which just means that it is a block device where
1.5% of the capacity is reserved for 'struct page' metadata. You should
be able to see namespace capacity == 

Re: [PATCH for-7.2 v4 11/11] ppc/pnv: fix QOM parenting of user creatable root ports

2022-08-12 Thread Frederic Barrat




On 11/08/2022 18:39, Daniel Henrique Barboza wrote:

User creatable root ports are being parented by the 'peripheral' or the
'peripheral-anon' container. This happens because this is the regular
QOM schema for sysbus devices that are added via the command line.

Let's make this QOM hierarchy similar to what we have with default root
ports, i.e. the root port must be parented by the pnv-root-bus. To do
that we change the qom and bus parent of the root port during
root_port_realize(). The realize() is shared by the default root port
code path, so we can remove the code inside pnv_phb_attach_root_port()
that was adding the root port as a child of the bus as well.

While we're at it, change pnv_phb_attach_root_port() to receive a PCIBus
instead of a PCIHostState to make it clear that the function does not
make use of the PHB.

Signed-off-by: Daniel Henrique Barboza 
---
  hw/pci-host/pnv_phb.c | 35 +++
  1 file changed, 15 insertions(+), 20 deletions(-)

diff --git a/hw/pci-host/pnv_phb.c b/hw/pci-host/pnv_phb.c
index 17d9960aa1..65ed1f9eb4 100644
--- a/hw/pci-host/pnv_phb.c
+++ b/hw/pci-host/pnv_phb.c
@@ -51,27 +51,11 @@ static void pnv_parent_bus_fixup(DeviceState *parent, 
DeviceState *child,
  }
  }
  
-/*

- * Attach a root port device.
- *
- * 'index' will be used both as a PCIE slot value and to calculate
- * QOM id. 'chip_id' is going to be used as PCIE chassis for the
- * root port.
- */
-static void pnv_phb_attach_root_port(PCIHostState *pci)
+static void pnv_phb_attach_root_port(PCIBus *bus)
  {
  PCIDevice *root = pci_new(PCI_DEVFN(0, 0), TYPE_PNV_PHB_ROOT_PORT);
-const char *dev_id = DEVICE(root)->id;
-g_autofree char *default_id = NULL;
-int index;
-
-index = object_property_get_int(OBJECT(pci->bus), "phb-id", _fatal);
-default_id = g_strdup_printf("%s[%d]", TYPE_PNV_PHB_ROOT_PORT, index);
-
-object_property_add_child(OBJECT(pci->bus), dev_id ? dev_id : default_id,
-  OBJECT(root));
  
-pci_realize_and_unref(root, pci->bus, _fatal);

+pci_realize_and_unref(root, bus, _fatal);
  }
  
  /*

@@ -171,7 +155,7 @@ static void pnv_phb_realize(DeviceState *dev, Error **errp)
  return;
  }
  
-pnv_phb_attach_root_port(pci);

+pnv_phb_attach_root_port(pci->bus);
  }
  
  static const char *pnv_phb_root_bus_path(PCIHostState *host_bridge,

@@ -240,12 +224,18 @@ static void pnv_phb_root_port_realize(DeviceState *dev, 
Error **errp)
  {
  PCIERootPortClass *rpc = PCIE_ROOT_PORT_GET_CLASS(dev);
  PnvPHBRootPort *phb_rp = PNV_PHB_ROOT_PORT(dev);
-PCIBus *bus = PCI_BUS(qdev_get_parent_bus(dev));
+BusState *qbus = qdev_get_parent_bus(dev);
+PCIBus *bus = PCI_BUS(qbus);
  PCIDevice *pci = PCI_DEVICE(dev);
  uint16_t device_id = 0;
  Error *local_err = NULL;
  int chip_id, index;
  
+/*

+ * 'index' will be used both as a PCIE slot value and to calculate
+ * QOM id. 'chip_id' is going to be used as PCIE chassis for the
+ * root port.
+ */
  chip_id = object_property_get_int(OBJECT(bus), "chip-id", _fatal);
  index = object_property_get_int(OBJECT(bus), "phb-id", _fatal);
  
@@ -253,6 +243,11 @@ static void pnv_phb_root_port_realize(DeviceState *dev, Error **errp)

  qdev_prop_set_uint8(dev, "chassis", chip_id);
  qdev_prop_set_uint16(dev, "slot", index);
  
+pnv_parent_qom_fixup(OBJECT(bus), OBJECT(dev), index);

+if (!qdev_set_parent_bus(dev, qbus, _fatal)) {



That line looks surprising at first, because we got qbus from 
qdev_get_parent_bus() just above, so it looks like a noop. I talked to 
Daniel about it: the device<->bus relationship is correct when entering 
the function but the call to pnv_parent_qom_fixup() interferes with the 
bus relationship, so it needs to be re-established.

Short of a better suggestion, it probably need a comment.

  Fred




+return;
+}
+
  rpc->parent_realize(dev, _err);
  if (local_err) {
  error_propagate(errp, local_err);




Re: [RFC v2 02/10] Drop unused static function return values

2022-08-12 Thread Alberto Faria
On Wed, Aug 3, 2022 at 1:30 PM Peter Maydell  wrote:
> The problem with a patch like this is that it rolls up into a
> single patch changes to the API of many functions in multiple
> subsystems across the whole codebase. Some of those changes
> might be right; some might be wrong. No single person is going
> to be in a position to review the whole lot, and a +248-403
> patch email makes it very unwieldy to try to discuss.
>
> If you want to propose some of these I think you need to:
>  * split it out so that you're only suggesting changes in
>one subsystem at a time
>  * look at the places you are suggesting changes, to see if
>the correct answer is actually "add the missing error
>check in the caller(s)"
>  * not change places that are following standard API patterns
>like "return bool and have an Error** argument"

Sounds good. For now, I'll limit the changes to a few representative
cases e.g. in the block layer, since this patch is mostly intended as
a demonstration of the type of issue that the check catches. Once
there is agreement on the semantics for the check, I'll probably send
a separate tree-wide series with per-subsystem patches.

Thanks,
Alberto




Re: [RFC v2 00/10] Introduce an extensible static analyzer

2022-08-12 Thread Alberto Faria
On Thu, Aug 4, 2022 at 12:44 PM Marc-André Lureau
 wrote:
> Hi
>
> Great work so far! This seems easier to hack than my attempt to use
> clang-tidy to write some qemu checks
> (https://github.com/elmarco/clang-tools-extra)
>
> The code seems quite generic, I wonder if such a tool in python wasn't
> already developed (I couldn't find it easily searching on github).
>
> Why not make it standalone from qemu? Similar to
> https://gitlab.com/qemu-project/python-qemu-qmp, you could have your
> own release management, issue tracker, code formatting, license, CI
> etc. (you should add copyright header in each file, at least that's
> pretty much required in qemu nowadays). You could also have the
> qemu-specific checks there imho (clang-tidy has google & llvm specific
> checks too)

This is an interesting idea. Indeed, the analyzer is essentially a
more easily extensible, Python version of clang-tidy (without the big
built-in library of checks).

I think I'll continue working on it embedded in QEMU for now, mostly
because it depends on some aspects of the build system, and gradually
generalize it to a point where it could be made into a standalone
thing.

> It would be nice to write some docs, in docs/devel/testing.rst and
> some new meson/ninja/make targets to run the checks directly from a
> build tree.

Sounds good, I'll work on it.

> On fc36, I had several dependencies I needed to install manually (imho
> they should have been pulled by python3-clang), but more annoyingly I
> got:
> clang.cindex.LibclangError: libclang.so: cannot open shared object
> file: No such file or directory. To provide a path to libclang use
> Config.set_library_path() or Config.set_library_file().
>
> clang-libs doesn't install libclang.so, I wonder why. I made a link
> manually and it works, but it's probably incorrect. I'll try to open
> issues for the clang packaging.

That's strange. Thanks for looking into this.

Alberto




Re: [BUG] cxl can not create region

2022-08-12 Thread Jonathan Cameron via
On Thu, 11 Aug 2022 18:08:57 +0100
Jonathan Cameron via  wrote:

> On Tue, 9 Aug 2022 17:08:25 +0100
> Jonathan Cameron  wrote:
> 
> > On Tue, 9 Aug 2022 21:07:06 +0800
> > Bobo WL  wrote:
> >   
> > > Hi Jonathan
> > > 
> > > Thanks for your reply!
> > > 
> > > On Mon, Aug 8, 2022 at 8:37 PM Jonathan Cameron
> > >  wrote:
> > > >
> > > > Probably not related to your problem, but there is a disconnect in QEMU 
> > > > /
> > > > kernel assumptionsaround the presence of an HDM decoder when a HB only
> > > > has a single root port. Spec allows it to be provided or not as an 
> > > > implementation choice.
> > > > Kernel assumes it isn't provide. Qemu assumes it is.
> > > >
> > > > The temporary solution is to throw in a second root port on the HB and 
> > > > not
> > > > connect anything to it.  Longer term I may special case this so that 
> > > > the particular
> > > > decoder defaults to pass through settings in QEMU if there is only one 
> > > > root port.
> > > >  
> > > 
> > > You are right! After adding an extra HB in qemu, I can create a x1
> > > region successfully.
> > > But have some errors in Nvdimm:
> > > 
> > > [   74.925838] Unknown online node for memory at 0x100, assuming 
> > > node 0
> > > [   74.925846] Unknown target node for memory at 0x100, assuming 
> > > node 0
> > > [   74.927470] nd_region region0: nmem0: is disabled, failing probe
> > 
> > Ah. I've seen this one, but not chased it down yet.  Was on my todo list to 
> > chase
> > down. Once I reach this state I can verify the HDM Decode is correct which 
> > is what
> > I've been using to test (Which wasn't true until earlier this week). 
> > I'm currently testing via devmem, more for historical reasons than because 
> > it makes
> > that much sense anymore.
> 
> *embarassed cough*.  We haven't fully hooked the LSA up in qemu yet.
> I'd forgotten that was still on the todo list. I don't think it will
> be particularly hard to do and will take a look in next few days.
> 
> Very very indirectly this error is causing a driver probe fail that means that
> we hit a code path that has a rather odd looking check on NDD_LABELING.
> Should not have gotten near that path though - hence the problem is actually
> when we call cxl_pmem_get_config_data() and it returns an error because
> we haven't fully connected up the command in QEMU.

So a least one bug in QEMU. We were not supporting variable length payloads on 
mailbox
inputs (but were on outputs).  That hasn't mattered until we get to LSA writes.
We just need to relax condition on the supplied length.

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index c352a935c4..fdda9529fe 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -510,7 +510,7 @@ void cxl_process_mailbox(CXLDeviceState *cxl_dstate)
 cxl_cmd = _cmd_set[set][cmd];
 h = cxl_cmd->handler;
 if (h) {
-if (len == cxl_cmd->in) {
+if (len == cxl_cmd->in || !cxl_cmd->in) {
 cxl_cmd->payload = cxl_dstate->mbox_reg_state +
 A_CXL_DEV_CMD_PAYLOAD;
 ret = (*h)(cxl_cmd, cxl_dstate, );


This lets the nvdimm/region probe fine, but I'm getting some issues with
namespace capacity so I'll look at what is causing that next.
Unfortunately I'm not that familiar with the driver/nvdimm side of things
so it's take a while to figure out what kicks off what!

Jonathan

> 
> Jonathan
> 
> 
> >   
> > > 
> > > And x4 region still failed with same errors, using latest cxl/preview
> > > branch don't work.
> > > I have picked "Two CXL emulation fixes" patches in qemu, still not 
> > > working.
> > > 
> > > Bob
> 
> 




Re: [PATCH for-7.2 v4 10/11] ppc/pnv: user creatable pnv-phb for powernv10

2022-08-12 Thread Frederic Barrat




On 11/08/2022 18:39, Daniel Henrique Barboza wrote:

Given that powernv9 and powernv10 uses the same pnv-phb backend, the
logic to allow user created pnv-phbs for powernv10 is already in place.
Let's flip the switch.

Reviewed-by: Cédric Le Goater 
Signed-off-by: Daniel Henrique Barboza 
---



Reviewed-by: Frederic Barrat 

  Fred



  hw/ppc/pnv.c | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
index e6c14fe231..9bf35ca9d6 100644
--- a/hw/ppc/pnv.c
+++ b/hw/ppc/pnv.c
@@ -2226,6 +2226,8 @@ static void pnv_machine_power10_class_init(ObjectClass 
*oc, void *data)
  pmc->dt_power_mgt = pnv_dt_power_mgt;
  
  xfc->match_nvt = pnv10_xive_match_nvt;

+
+machine_class_allow_dynamic_sysbus_dev(mc, TYPE_PNV_PHB);
  }
  
  static bool pnv_machine_get_hb(Object *obj, Error **errp)




Re: [PATCH for-7.2 v4 09/11] ppc/pnv: change pnv_phb4_get_pec() to also retrieve chip10->pecs

2022-08-12 Thread Frederic Barrat




On 11/08/2022 18:39, Daniel Henrique Barboza wrote:

The function assumes that we're always dealing with a PNV9_CHIP()
object. This is not the case when the pnv-phb device belongs to a
powernv10 machine.

Change pnv_phb4_get_pec() to be able to work with PNV10_CHIP() if
necessary.

Signed-off-by: Daniel Henrique Barboza 
---



Reviewed-by: Frederic Barrat 

  Fred



  hw/ppc/pnv.c | 17 +++--
  1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
index ec0558ed1c..e6c14fe231 100644
--- a/hw/ppc/pnv.c
+++ b/hw/ppc/pnv.c
@@ -284,17 +284,30 @@ static void pnv_dt_icp(PnvChip *chip, void *fdt, uint32_t 
pir,
  static PnvPhb4PecState *pnv_phb4_get_pec(PnvChip *chip, PnvPHB4 *phb,
   Error **errp)
  {
-Pnv9Chip *chip9 = PNV9_CHIP(chip);
+PnvPHB *phb_base = phb->phb_base;
+PnvPhb4PecState *pecs = NULL;
  int chip_id = phb->chip_id;
  int index = phb->phb_id;
  int i, j;
  
+if (phb_base->version == 4) {

+Pnv9Chip *chip9 = PNV9_CHIP(chip);
+
+pecs = chip9->pecs;
+} else if (phb_base->version == 5) {
+Pnv10Chip *chip10 = PNV10_CHIP(chip);
+
+pecs = chip10->pecs;
+} else {
+g_assert_not_reached();
+}
+
  for (i = 0; i < chip->num_pecs; i++) {
  /*
   * For each PEC, check the amount of phbs it supports
   * and see if the given phb4 index matches an index.
   */
-PnvPhb4PecState *pec = >pecs[i];
+PnvPhb4PecState *pec = [i];
  
  for (j = 0; j < pec->num_phbs; j++) {

  if (index == pnv_phb4_pec_get_phb_id(pec, j)) {




Re: [PATCH for-7.2 v4 07/11] ppc/pnv: add PHB4 helpers for user created pnv-phb

2022-08-12 Thread Frederic Barrat




On 11/08/2022 18:39, Daniel Henrique Barboza wrote:

The PHB4 backend relies on a link with the corresponding PEC element.
This is trivial to do during machine_init() time for default devices,
but not so much for user created ones.

pnv_phb4_get_pec() is a small variation of the function that was
reverted by commit 9c10d86fee "ppc/pnv: Remove user-created PHB{3,4,5}
devices". We'll use it to determine the appropriate PEC for a given user
created pnv-phb that uses a PHB4 backend.

This is done during realize() time, in pnv_phb_user_device_init().

Signed-off-by: Daniel Henrique Barboza 
---



Reviewed-by: Frederic Barrat 

  Fred



  hw/ppc/pnv.c | 35 ---
  1 file changed, 32 insertions(+), 3 deletions(-)

diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
index e82d6522f0..0644f45a1d 100644
--- a/hw/ppc/pnv.c
+++ b/hw/ppc/pnv.c
@@ -281,6 +281,34 @@ static void pnv_dt_icp(PnvChip *chip, void *fdt, uint32_t 
pir,
  g_free(reg);
  }
  
+static PnvPhb4PecState *pnv_phb4_get_pec(PnvChip *chip, PnvPHB4 *phb,

+ Error **errp)
+{
+Pnv9Chip *chip9 = PNV9_CHIP(chip);
+int chip_id = phb->chip_id;
+int index = phb->phb_id;
+int i, j;
+
+for (i = 0; i < chip->num_pecs; i++) {
+/*
+ * For each PEC, check the amount of phbs it supports
+ * and see if the given phb4 index matches an index.
+ */
+PnvPhb4PecState *pec = >pecs[i];
+
+for (j = 0; j < pec->num_phbs; j++) {
+if (index == pnv_phb4_pec_get_phb_id(pec, j)) {
+return pec;
+}
+}
+}
+error_setg(errp,
+   "pnv-phb4 chip-id %d index %d didn't match any existing PEC",
+   chip_id, index);
+
+return NULL;
+}
+
  /*
   * Adds a PnvPHB to the chip. Returns the parent obj of the
   * PHB which varies with each version (phb version 3 is parented
@@ -302,10 +330,11 @@ Object *pnv_chip_add_phb(PnvChip *chip, PnvPHB *phb, 
Error **errp)
  chip8->num_phbs++;
  
  return OBJECT(chip);

-} else {
-/* phb4 support will be added later */
-return NULL;
  }
+
+phb->pec = pnv_phb4_get_pec(chip, PNV_PHB4(phb->backend), errp);
+
+return OBJECT(phb->pec);
  }
  
  static void pnv_chip_power8_dt_populate(PnvChip *chip, void *fdt)




[PATCH] hw/arm/bcm2835_property: Add support for RPI_FIRMWARE_FRAMEBUFFER_GET_NUM_DISPLAYS

2022-08-12 Thread Enrik Berkhan
In more recent Raspbian OS Linux kernels, the fb driver gives up
immediately if RPI_FIRMWARE_FRAMEBUFFER_GET_NUM_DISPLAYS fails or no
displays are reported.

This change simply always reports one display. It makes bcm2835_fb work
again with these more recent kernels.

Signed-off-by: Enrik Berkhan 
---
 hw/misc/bcm2835_property.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/hw/misc/bcm2835_property.c b/hw/misc/bcm2835_property.c
index e94e951057..890ae7bae5 100644
--- a/hw/misc/bcm2835_property.c
+++ b/hw/misc/bcm2835_property.c
@@ -270,6 +270,10 @@ static void 
bcm2835_property_mbox_push(BCM2835PropertyState *s, uint32_t value)
 stl_le_phys(>dma_as, value + 12, 0);
 resplen = 4;
 break;
+case 0x00040013: /* Get number of displays */
+stl_le_phys(>dma_as, value + 12, 1);
+resplen = 4;
+break;
 
 case 0x00060001: /* Get DMA channels */
 /* channels 2-5 */
-- 
2.34.1




Re: qemu-system-aarch64: Failed to retrieve host CPU features

2022-08-12 Thread Marc Zyngier
On Fri, 12 Aug 2022 10:25:55 +0100,
Peter Maydell  wrote:
> 
> I've added some more relevant mailing lists to the cc.
> 
> On Fri, 12 Aug 2022 at 09:45, Vitaly Chikunov  wrote:
> > On Fri, Aug 12, 2022 at 05:14:27AM +0300, Vitaly Chikunov wrote:
> > > I noticed that we starting to get many errors like this:
> > >
> > >   qemu-system-aarch64: Failed to retrieve host CPU features
> > >
> > > Where many is 1-2% per run, depends on host, host is Kunpeng-920, and
> > > Linux kernel is v5.15.59, but it started to appear months before that.
> > >
> > > strace shows in erroneous case:
> > >
> > >   1152244 ioctl(9, KVM_CREATE_VM, 0x30)   = -1 EINTR (Interrupted system 
> > > call)
> > >
> > > And I see in target/arm/kvm.c:kvm_arm_create_scratch_host_vcpu:
> > >
> > > vmfd = ioctl(kvmfd, KVM_CREATE_VM, max_vm_pa_size);
> > > if (vmfd < 0) {
> > > goto err;
> > > }
> > >
> > > Maybe it should restart ioctl on EINTR?
> > >
> > > I don't see EINTR documented in ioctl(2) nor in Linux'
> > > Documentation/virt/kvm/api.rst for KVM_CREATE_VM, but for KVM_RUN it
> > > says "an unmasked signal is pending".
> >
> > I am suggested that almost any blocking syscall could return EINTR, so I
> > checked the strace log and it does not show evidence of arriving a signal,
> > the log ends like this:
> >
> >   1152244 openat(AT_FDCWD, "/dev/kvm", O_RDWR|O_CLOEXEC) = 9
> >   1152244 ioctl(9, KVM_CHECK_EXTENSION, KVM_CAP_ARM_VM_IPA_SIZE) = 48
> >   1152244 ioctl(9, KVM_CREATE_VM, 0x30)   = -1 EINTR (Interrupted system 
> > call)
> >   1152244 close(9)= 0
> >   1152244 newfstatat(2, "", {st_dev=makedev(0, 0xd), st_ino=57869925, 
> > st_mode=S_IFIFO|0600, st_nlink=1, st_uid=517, st_gid=517, st_blksize=4096, 
> > st_blocks=0, st_size=0, st_atime=1660268019 /* 
> > 2022-08-12T01:33:39.850436293+ */, st_atime_nsec=850436293, 
> > st_mtime=1660268019 /* 2022-08-12T01:33:39.850436293+ */, 
> > st_mtime_nsec=850436293, st_ctime=1660268019 /* 
> > 2022-08-12T01:33:39.850436293+ */, st_ctime_nsec=850436293}, 
> > AT_EMPTY_PATH) = 0
> >   1152244 write(2, "qemu-system-aarch64: Failed to r"..., 58) = 58
> >   1152244 exit_group(1)   = ?
> >   1152245 <... clock_nanosleep resumed> ) = ?
> >   1152245 +++ exited with 1 +++
> >   1152244 +++ exited with 1 +++
> 
> KVM folks: should we expect that KVM_CREATE_VM might fail EINTR
> and need retrying?
> 
> (I suspect the answer is "yes", given we do this in the generic
> code in kvm-all.c.)

Interestingly, this has cropped up in the (distant) past:

https://lists.gnu.org/archive/html/qemu-devel/2014-01/msg01031.html

and seems to point at the path I was mentioning earlier (the code
hasn't changed too much since, apparently).

I'd still like to understand the underlying reason though.

Thanks,

M.

-- 
Without deviation from the norm, progress is not possible.



Re: [PATCH v2 0/8] parallels: Refactor the code of images checks and fix a bug

2022-08-12 Thread Denis V. Lunev

On 11.08.2022 17:00, Alexander Ivanov wrote:

Fix image inflation when offset in BAT is out of image.

Replace whole BAT syncing by flushing only dirty blocks.

Move all the checks outside the main check function in
separate functions

Use WITH_QEMU_LOCK_GUARD for more clean code.

Alexander Ivanov (8):
   parallels: Out of image offset in BAT leads to image inflation
   parallels: Move BAT entry setting to a separate function
   parallels: Replace bdrv_co_pwrite_sync by bdrv_co_flush for BAT
 flushing
   parallels: Move check of unclean image to a separate function
   parallels: Move check of cluster outside image to a separate function
   parallels: Move check of leaks to a separate function
   parallels: Move statistic collection to a separate function
   parallels: Replace qemu_co_mutex_lock by WITH_QEMU_LOCK_GUARD

  block/parallels.c | 188 --
  1 file changed, 132 insertions(+), 56 deletions(-)


I believe that this series is ready to go once commit
messages will be improved.

Den



Re: [PATCH v3 1/4] accel/tcg: Invalidate translations when clearing PAGE_EXEC

2022-08-12 Thread Ilya Leoshkevich
On Thu, 2022-08-11 at 08:42 -0700, Richard Henderson wrote:
> On 8/11/22 02:28, Ilya Leoshkevich wrote:
> > How is qemu-user's get_page_addr_code() involved here?
> > 
> > I tried to experiment with it, and while I agree that it looks
> > buggy,
> > it's called only from translation code paths. If we already have a
> > translation block, these code paths are not used.
> 
> It's called from tb_lookup too, when we're trying to find an existing
> TB.
> 
> 
> r~
> 

Oh, I see. I was first worried about direct block chaining with
goto_tb, but it turned out that translator_use_goto_tb() prevented it.

tb_lookup() skips get_page_addr_code() if tb is found in tb_jmp_cache.
I assume it's a bug?



Re: [PATCH v2] hw/smbios: support for type 8 (port connector)

2022-08-12 Thread Michael S. Tsirkin
On Fri, Aug 12, 2022 at 03:51:53PM +0200, Hal Martin wrote:
> PATCH v1: add support for SMBIOS type 8 to qemu
> PATCH v2: incorporate patch v1 feedback and add smbios type=8 to qemu-options

history after --- pls

> internal_reference: internal reference designator
> external_reference: external reference designator
> connector_type: hex value for port connector type (see SMBIOS 7.9.2)
> port_type: hex value for port type (see SMBIOS 7.9.3)
> 
> After studying various vendor implementationsi (Dell, Lenovo, MSI),
> the value of internal connector type was hard-coded to 0x0 (None).
> 
> Example usage:
> -smbios 
> type=8,internal_reference=JUSB1,external_reference=USB1,connector_type=0x12,port_type=0x10
>  \
> -smbios type=8,internal_reference=JAUD1,external_reference="Audio 
> Jack",connector_type=0x1f,port_type=0x1d \
> -smbios 
> type=8,internal_reference=LAN,external_reference=Ethernet,connector_type=0x0b,port_type=0x1f
>  \
> -smbios 
> type=8,internal_reference=PS2,external_reference=Mouse,connector_type=0x0f,port_type=0x0e
>  \
> -smbios 
> type=8,internal_reference=PS2,external_reference=Keyboard,connector_type=0x0f,port_type=0x0d
> 
> 
> Signed-off-by: Hal Martin 

We are in freeze, I tagged this for after the release.
Just to make sure pls ping me after the release if possible.



> ---
>  hw/smbios/smbios.c   | 63 
>  include/hw/firmware/smbios.h | 10 ++
>  qemu-options.hx  |  2 ++
>  3 files changed, 75 insertions(+)
> 
> diff --git a/hw/smbios/smbios.c b/hw/smbios/smbios.c
> index 60349ee402..578cae0f0a 100644
> --- a/hw/smbios/smbios.c
> +++ b/hw/smbios/smbios.c
> @@ -111,6 +111,13 @@ static struct {
>  .processor_id = 0,
>  };
>  
> +struct type8_instance {
> +const char *internal_reference, *external_reference;
> +uint8_t connector_type, port_type;
> +QTAILQ_ENTRY(type8_instance) next;
> +};
> +static QTAILQ_HEAD(, type8_instance) type8 = QTAILQ_HEAD_INITIALIZER(type8);
> +
>  static struct {
>  size_t nvalues;
>  char **values;
> @@ -337,6 +344,29 @@ static const QemuOptDesc qemu_smbios_type4_opts[] = {
>  { /* end of list */ }
>  };
>  
> +static const QemuOptDesc qemu_smbios_type8_opts[] = {
> +{
> +.name = "internal_reference",
> +.type = QEMU_OPT_STRING,
> +.help = "internal reference designator",
> +},
> +{
> +.name = "external_reference",
> +.type = QEMU_OPT_STRING,
> +.help = "external reference designator",
> +},
> +{
> +.name = "connector_type",
> +.type = QEMU_OPT_NUMBER,
> +.help = "connector type",
> +},
> +{
> +.name = "port_type",
> +.type = QEMU_OPT_NUMBER,
> +.help = "port type",
> +},
> +};
> +
>  static const QemuOptDesc qemu_smbios_type11_opts[] = {
>  {
>  .name = "value",
> @@ -718,6 +748,26 @@ static void smbios_build_type_4_table(MachineState *ms, 
> unsigned instance)
>  smbios_type4_count++;
>  }
>  
> +static void smbios_build_type_8_table(void)
> +{
> +unsigned instance = 0;
> +struct type8_instance *t8;
> +
> +QTAILQ_FOREACH(t8, , next) {
> +SMBIOS_BUILD_TABLE_PRE(8, T0_BASE + instance, true);
> +
> +SMBIOS_TABLE_SET_STR(8, internal_reference_str, 
> t8->internal_reference);
> +SMBIOS_TABLE_SET_STR(8, external_reference_str, 
> t8->external_reference);
> +/* most vendors seem to set this to None */
> +t->internal_connector_type = 0x0;
> +t->external_connector_type = t8->connector_type;
> +t->port_type = t8->port_type;
> +
> +SMBIOS_BUILD_TABLE_POST;
> +instance++;
> +}
> +}
> +
>  static void smbios_build_type_11_table(void)
>  {
>  char count_str[128];
> @@ -1030,6 +1080,7 @@ void smbios_get_tables(MachineState *ms,
>  smbios_build_type_4_table(ms, i);
>  }
>  
> +smbios_build_type_8_table();
>  smbios_build_type_11_table();
>  
>  #define MAX_DIMM_SZ (16 * GiB)
> @@ -1346,6 +1397,18 @@ void smbios_entry_add(QemuOpts *opts, Error **errp)
> UINT16_MAX);
>  }
>  return;
> +case 8:
> +if (!qemu_opts_validate(opts, qemu_smbios_type8_opts, errp)) {
> +return;
> +}
> +struct type8_instance *t;
> +t = g_new0(struct type8_instance, 1);
> +save_opt(>internal_reference, opts, "internal_reference");
> +save_opt(>external_reference, opts, "external_reference");
> +t->connector_type = qemu_opt_get_number(opts, "connector_type", 
> 0);
> +t->port_type = qemu_opt_get_number(opts, "port_type", 0);
> +QTAILQ_INSERT_TAIL(, t, next);
> +return;
>  case 11:
>  if (!qemu_opts_validate(opts, qemu_smbios_type11_opts, errp)) {
>  return;
> diff --git a/include/hw/firmware/smbios.h 

Re: [PATCH v2 8/8] parallels: Replace qemu_co_mutex_lock by WITH_QEMU_LOCK_GUARD

2022-08-12 Thread Denis V. Lunev

On 11.08.2022 17:00, Alexander Ivanov wrote:

Replace the way we use mutex in parallels_co_check() for more clean code.

I think that "cleaness" is the same, but new code would be just shorter ;)
or less error prone.


v2: Fix an incorrect usage of WITH_QEMU_LOCK_GUARD.

Signed-off-by: Alexander Ivanov 
---
  block/parallels.c | 26 --
  1 file changed, 12 insertions(+), 14 deletions(-)

diff --git a/block/parallels.c b/block/parallels.c
index d0364182bb..e124a8bb7d 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -553,24 +553,22 @@ static int coroutine_fn 
parallels_co_check(BlockDriverState *bs,
  BDRVParallelsState *s = bs->opaque;
  int ret;
  
-qemu_co_mutex_lock(>lock);

+WITH_QEMU_LOCK_GUARD(>lock) {
+parallels_check_unclean(bs, res, fix);
  
-parallels_check_unclean(bs, res, fix);

+ret = parallels_check_outside_image(bs, res, fix);
+if (ret < 0) {
+return ret;
+}
  
-ret = parallels_check_outside_image(bs, res, fix);

-if (ret < 0) {
-goto out;
-}
-
-ret = parallels_check_leak(bs, res, fix);
-if (ret < 0) {
-goto out;
-}
+ret = parallels_check_leak(bs, res, fix);
+if (ret < 0) {
+return ret;
+}
  
-parallels_collect_statistics(bs, res, fix);

+parallels_collect_statistics(bs, res, fix);
  
-out:

-qemu_co_mutex_unlock(>lock);
+}
  
  ret = bdrv_co_flush(bs);

  if (ret < 0) {





Re: [PATCH for-7.2 v4 05/11] ppc/pnv: turn chip8->phbs[] into a PnvPHB* array

2022-08-12 Thread Frederic Barrat




On 11/08/2022 18:39, Daniel Henrique Barboza wrote:

When enabling user created PHBs (a change reverted by commit 9c10d86fee)
we were handling PHBs created by default versus by the user in different
manners. The only difference between these PHBs is that one will have a
valid phb3->chip that is assigned during pnv_chip_power8_realize(),
while the user created needs to search which chip it belongs to.

Aside from that there shouldn't be any difference. Making the default
PHBs behave in line with the user created ones will make it easier to
re-introduce them later on. It will also make the code easier to follow
since we are dealing with them in equal manner.

The first step is to turn chip8->phbs[] into a PnvPHB3 pointer array.
This will allow us to assign user created PHBs into it later on. The way
we initilize the default case is now more in line with that would happen
with the user created case: the object is created, parented by the chip
because pnv_xscom_dt() relies on it, and then assigned to the array.

Signed-off-by: Daniel Henrique Barboza 
---



Reviewed-by: Frederic Barrat 

  Fred



  hw/ppc/pnv.c | 27 ++-
  include/hw/ppc/pnv.h |  6 +-
  2 files changed, 27 insertions(+), 6 deletions(-)

diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
index 2deaac17f7..e53e9e297d 100644
--- a/hw/ppc/pnv.c
+++ b/hw/ppc/pnv.c
@@ -294,6 +294,13 @@ static void pnv_dt_icp(PnvChip *chip, void *fdt, uint32_t 
pir,
  Object *pnv_chip_add_phb(PnvChip *chip, PnvPHB *phb, Error **errp)
  {
  if (phb->version == 3) {
+Pnv8Chip *chip8 = PNV8_CHIP(chip);
+
+phb->chip = chip;
+
+chip8->phbs[chip8->num_phbs] = phb;
+chip8->num_phbs++;
+
  return OBJECT(chip);
  } else {
  /* phb4 support will be added later */
@@ -681,7 +688,7 @@ static void pnv_chip_power8_pic_print_info(PnvChip *chip, 
Monitor *mon)
  ics_pic_print_info(>psi.ics, mon);
  
  for (i = 0; i < chip8->num_phbs; i++) {

-PnvPHB *phb = >phbs[i];
+PnvPHB *phb = chip8->phbs[i];
  PnvPHB3 *phb3 = PNV_PHB3(phb->backend);
  
  pnv_phb3_msi_pic_print_info(>msis, mon);

@@ -1174,7 +1181,17 @@ static void pnv_chip_power8_instance_init(Object *obj)
  chip8->num_phbs = pcc->num_phbs;
  
  for (i = 0; i < chip8->num_phbs; i++) {

-object_initialize_child(obj, "phb[*]", >phbs[i], TYPE_PNV_PHB);
+Object *phb = object_new(TYPE_PNV_PHB);
+
+/*
+ * We need the chip to parent the PHB to allow the DT
+ * to build correctly (via pnv_xscom_dt()).
+ *
+ * TODO: the PHB should be parented by a PEC device that, at
+ * this moment, is not modelled powernv8/phb3.
+ */
+object_property_add_child(obj, "phb[*]", phb);
+chip8->phbs[i] = PNV_PHB(phb);
  }
  
  }

@@ -1290,7 +1307,7 @@ static void pnv_chip_power8_realize(DeviceState *dev, 
Error **errp)
  
  /* PHB controllers */

  for (i = 0; i < chip8->num_phbs; i++) {
-PnvPHB *phb = >phbs[i];
+PnvPHB *phb = chip8->phbs[i];
  
  object_property_set_int(OBJECT(phb), "index", i, _fatal);

  object_property_set_int(OBJECT(phb), "chip-id", chip->chip_id,
@@ -1958,7 +1975,7 @@ static ICSState *pnv_ics_get(XICSFabric *xi, int irq)
  }
  
  for (j = 0; j < chip8->num_phbs; j++) {

-PnvPHB *phb = >phbs[j];
+PnvPHB *phb = chip8->phbs[j];
  PnvPHB3 *phb3 = PNV_PHB3(phb->backend);
  
  if (ics_valid_irq(>lsis, irq)) {

@@ -1997,7 +2014,7 @@ static void pnv_ics_resend(XICSFabric *xi)
  ics_resend(>psi.ics);
  
  for (j = 0; j < chip8->num_phbs; j++) {

-PnvPHB *phb = >phbs[j];
+PnvPHB *phb = chip8->phbs[j];
  PnvPHB3 *phb3 = PNV_PHB3(phb->backend);
  
  ics_resend(>lsis);

diff --git a/include/hw/ppc/pnv.h b/include/hw/ppc/pnv.h
index 781d0acffa..49433281d7 100644
--- a/include/hw/ppc/pnv.h
+++ b/include/hw/ppc/pnv.h
@@ -81,7 +81,11 @@ struct Pnv8Chip {
  PnvHomer homer;
  
  #define PNV8_CHIP_PHB3_MAX 4

-PnvPHB   phbs[PNV8_CHIP_PHB3_MAX];
+/*
+ * The array is used to allow quick access to the phbs by
+ * pnv_ics_get_child() and pnv_ics_resend_child().
+ */
+PnvPHB   *phbs[PNV8_CHIP_PHB3_MAX];
  uint32_t num_phbs;
  
  XICSFabric*xics;




Re: qemu-system-aarch64: Failed to retrieve host CPU features

2022-08-12 Thread Marc Zyngier
Hi Peter,

On Fri, 12 Aug 2022 10:25:55 +0100,
Peter Maydell  wrote:
> 
> I've added some more relevant mailing lists to the cc.
> 
> On Fri, 12 Aug 2022 at 09:45, Vitaly Chikunov  wrote:
> > On Fri, Aug 12, 2022 at 05:14:27AM +0300, Vitaly Chikunov wrote:
> > > I noticed that we starting to get many errors like this:
> > >
> > >   qemu-system-aarch64: Failed to retrieve host CPU features
> > >
> > > Where many is 1-2% per run, depends on host, host is Kunpeng-920, and
> > > Linux kernel is v5.15.59, but it started to appear months before that.
> > >
> > > strace shows in erroneous case:
> > >
> > >   1152244 ioctl(9, KVM_CREATE_VM, 0x30)   = -1 EINTR (Interrupted system 
> > > call)
> > >
> > > And I see in target/arm/kvm.c:kvm_arm_create_scratch_host_vcpu:
> > >
> > > vmfd = ioctl(kvmfd, KVM_CREATE_VM, max_vm_pa_size);
> > > if (vmfd < 0) {
> > > goto err;
> > > }
> > >
> > > Maybe it should restart ioctl on EINTR?
> > >
> > > I don't see EINTR documented in ioctl(2) nor in Linux'
> > > Documentation/virt/kvm/api.rst for KVM_CREATE_VM, but for KVM_RUN it
> > > says "an unmasked signal is pending".
> >
> > I am suggested that almost any blocking syscall could return EINTR, so I
> > checked the strace log and it does not show evidence of arriving a signal,
> > the log ends like this:
> >
> >   1152244 openat(AT_FDCWD, "/dev/kvm", O_RDWR|O_CLOEXEC) = 9
> >   1152244 ioctl(9, KVM_CHECK_EXTENSION, KVM_CAP_ARM_VM_IPA_SIZE) = 48
> >   1152244 ioctl(9, KVM_CREATE_VM, 0x30)   = -1 EINTR (Interrupted system 
> > call)
> >   1152244 close(9)= 0
> >   1152244 newfstatat(2, "", {st_dev=makedev(0, 0xd), st_ino=57869925, 
> > st_mode=S_IFIFO|0600, st_nlink=1, st_uid=517, st_gid=517, st_blksize=4096, 
> > st_blocks=0, st_size=0, st_atime=1660268019 /* 
> > 2022-08-12T01:33:39.850436293+ */, st_atime_nsec=850436293, 
> > st_mtime=1660268019 /* 2022-08-12T01:33:39.850436293+ */, 
> > st_mtime_nsec=850436293, st_ctime=1660268019 /* 
> > 2022-08-12T01:33:39.850436293+ */, st_ctime_nsec=850436293}, 
> > AT_EMPTY_PATH) = 0
> >   1152244 write(2, "qemu-system-aarch64: Failed to r"..., 58) = 58
> >   1152244 exit_group(1)   = ?
> >   1152245 <... clock_nanosleep resumed> ) = ?
> >   1152245 +++ exited with 1 +++
> >   1152244 +++ exited with 1 +++
> 
> KVM folks: should we expect that KVM_CREATE_VM might fail EINTR
> and need retrying?

In general, yes. But for this particular one, this is pretty odd.

The only path I can so far see that would match this behaviour is if
mm_take_all_locks() (called from __mmu_notifier_register()) was
getting interrupted by a signal (I'm looking at a 5.19-ish kernel,
which may slightly differ from the 5.15 mentioned above).

But as Vitaly points out, it doesn't seem to be a signal delivered
here.

Vitaly: could you please share your exact test case (full qemu command
line), and instrument your kernel to see if mm_take_all_locks() is the
one failing?

Thanks,

M.

-- 
Without deviation from the norm, progress is not possible.



Re: [PATCH for-7.2 v4 04/11] ppc/pnv: add helpers for pnv-phb user devices

2022-08-12 Thread Frederic Barrat




On 11/08/2022 18:39, Daniel Henrique Barboza wrote:

pnv_parent_qom_fixup() and pnv_parent_bus_fixup() are versions of the
helpers that were reverted by commit 9c10d86fee "ppc/pnv: Remove
user-created PHB{3,4,5} devices". They are needed to amend the QOM and
bus hierarchies of user created pnv-phbs, matching them with default
pnv-phbs.

A new helper pnv_phb_user_device_init() is created to handle
user-created devices setup. We're going to call it inside
pnv_phb_realize() in case we're realizing an user created device. This
will centralize all user device realated in a single spot, leaving the
realize functions of the phb3/phb4 backends untouched.

Another helper called pnv_chip_add_phb() was added to handle the
particularities of each chip version when adding a new PHB.

Signed-off-by: Daniel Henrique Barboza 
---



Just a minor typo in a comment below.
Reviewed-by: Frederic Barrat 



diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
index f9e5a3d248..2deaac17f7 100644
--- a/hw/ppc/pnv.c
+++ b/hw/ppc/pnv.c
@@ -281,6 +281,26 @@ static void pnv_dt_icp(PnvChip *chip, void *fdt, uint32_t 
pir,
  g_free(reg);
  }
  
+/*

+ * Adds a PnvPHB to the chip. Returns the parent obj of the
+ * PHB which varies with each version (phb version 3 is parented
+ * by the chip, version 4 and 4 are parented by the PEC



typo-^

  Fred



+ * device).
+ *
+ * TODO: for version 3 we're still parenting the PHB with the
+ * chip. We should parent with a (so far not implemented)
+ * PHB3 PEC device.
+ */
+Object *pnv_chip_add_phb(PnvChip *chip, PnvPHB *phb, Error **errp)
+{
+if (phb->version == 3) {
+return OBJECT(chip);
+} else {
+/* phb4 support will be added later */
+return NULL;
+}
+}
+
  static void pnv_chip_power8_dt_populate(PnvChip *chip, void *fdt)
  {
  static const char compat[] = "ibm,power8-xscom\0ibm,xscom";
diff --git a/include/hw/ppc/pnv.h b/include/hw/ppc/pnv.h
index 033d907287..781d0acffa 100644
--- a/include/hw/ppc/pnv.h
+++ b/include/hw/ppc/pnv.h
@@ -231,6 +231,7 @@ struct PnvMachineState {
  };
  
  PnvChip *pnv_get_chip(PnvMachineState *pnv, uint32_t chip_id);

+Object *pnv_chip_add_phb(PnvChip *chip, PnvPHB *phb, Error **errp);
  
  #define PNV_FDT_ADDR  0x0100

  #define PNV_TIMEBASE_FREQ 51200ULL




Re: [PATCH v2 7/8] parallels: Move statistic collection to a separate function

2022-08-12 Thread Denis V. Lunev

On 11.08.2022 17:00, Alexander Ivanov wrote:

v2: Move fragmentation counting code to this function too.


same note here about ChnageLog and motivation


Signed-off-by: Alexander Ivanov 
---
  block/parallels.c | 54 +++
  1 file changed, 31 insertions(+), 23 deletions(-)

diff --git a/block/parallels.c b/block/parallels.c
index 8737eadfb4..d0364182bb 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -518,48 +518,56 @@ static int parallels_check_leak(BlockDriverState *bs,
  return 0;
  }
  
-static int coroutine_fn parallels_co_check(BlockDriverState *bs,

-   BdrvCheckResult *res,
-   BdrvCheckMode fix)
+static void parallels_collect_statistics(BlockDriverState *bs,
+ BdrvCheckResult *res,
+ BdrvCheckMode fix)
  {
  BDRVParallelsState *s = bs->opaque;
-int64_t prev_off;
-int ret;
+int64_t off, prev_off;
  uint32_t i;
  
-

-qemu_co_mutex_lock(>lock);
-
-parallels_check_unclean(bs, res, fix);
-
-ret = parallels_check_outside_image(bs, res, fix);
-if (ret < 0) {
-goto out;
-}
-
-ret = parallels_check_leak(bs, res, fix);
-if (ret < 0) {
-goto out;
-}
-
  res->bfi.total_clusters = s->bat_size;
  res->bfi.compressed_clusters = 0; /* compression is not supported */
  
  prev_off = 0;

  for (i = 0; i < s->bat_size; i++) {
-int64_t off = bat2sect(s, i) << BDRV_SECTOR_BITS;
+off = bat2sect(s, i) << BDRV_SECTOR_BITS;
  if (off == 0) {
  prev_off = 0;
  continue;
  }
  
-res->bfi.allocated_clusters++;

-
  if (prev_off != 0 && (prev_off + s->cluster_size) != off) {
  res->bfi.fragmented_clusters++;
  }
+
  prev_off = off;
+res->bfi.allocated_clusters++;
  }
+}
+
+static int coroutine_fn parallels_co_check(BlockDriverState *bs,
+   BdrvCheckResult *res,
+   BdrvCheckMode fix)
+{
+BDRVParallelsState *s = bs->opaque;
+int ret;
+
+qemu_co_mutex_lock(>lock);
+
+parallels_check_unclean(bs, res, fix);
+
+ret = parallels_check_outside_image(bs, res, fix);
+if (ret < 0) {
+goto out;
+}
+
+ret = parallels_check_leak(bs, res, fix);
+if (ret < 0) {
+goto out;
+}
+
+parallels_collect_statistics(bs, res, fix);
  
  out:

  qemu_co_mutex_unlock(>lock);





Re: [PATCH v2 6/8] parallels: Move check of leaks to a separate function

2022-08-12 Thread Denis V. Lunev

On 11.08.2022 17:00, Alexander Ivanov wrote:

v2: No changes.

same notes about motivation, changelog as before


Signed-off-by: Alexander Ivanov 
---
  block/parallels.c | 85 +--
  1 file changed, 52 insertions(+), 33 deletions(-)

diff --git a/block/parallels.c b/block/parallels.c
index 12104ba5ad..8737eadfb4 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -469,14 +469,13 @@ static int parallels_check_outside_image(BlockDriverState 
*bs,
  return 0;
  }
  
-static int coroutine_fn parallels_co_check(BlockDriverState *bs,

-   BdrvCheckResult *res,
-   BdrvCheckMode fix)
+static int parallels_check_leak(BlockDriverState *bs,
+BdrvCheckResult *res,
+BdrvCheckMode fix)
  {
  BDRVParallelsState *s = bs->opaque;
-int64_t size, prev_off, high_off;
-int ret;
-uint32_t i;
+int64_t size, off, high_off, count;
+int i, ret;
  
  size = bdrv_getlength(bs->file->bs);

  if (size < 0) {
@@ -484,41 +483,16 @@ static int coroutine_fn 
parallels_co_check(BlockDriverState *bs,
  return size;
  }
  
-qemu_co_mutex_lock(>lock);

-
-parallels_check_unclean(bs, res, fix);
-
-ret = parallels_check_outside_image(bs, res, fix);
-if (ret < 0) {
-goto out;
-}
-
-res->bfi.total_clusters = s->bat_size;
-res->bfi.compressed_clusters = 0; /* compression is not supported */
-
  high_off = 0;
-prev_off = 0;
  for (i = 0; i < s->bat_size; i++) {
-int64_t off = bat2sect(s, i) << BDRV_SECTOR_BITS;
-if (off == 0) {
-prev_off = 0;
-continue;
-}
-
-res->bfi.allocated_clusters++;
+off = bat2sect(s, i) << BDRV_SECTOR_BITS;
  if (off > high_off) {
  high_off = off;
  }
-
-if (prev_off != 0 && (prev_off + s->cluster_size) != off) {
-res->bfi.fragmented_clusters++;
-}
-prev_off = off;
  }
  
  res->image_end_offset = high_off + s->cluster_size;

  if (size > res->image_end_offset) {
-int64_t count;
  count = DIV_ROUND_UP(size - res->image_end_offset, s->cluster_size);
  fprintf(stderr, "%s space leaked at the end of the image %" PRId64 
"\n",
  fix & BDRV_FIX_LEAKS ? "Repairing" : "ERROR",
@@ -536,11 +510,56 @@ static int coroutine_fn 
parallels_co_check(BlockDriverState *bs,
  if (ret < 0) {
  error_report_err(local_err);
  res->check_errors++;
-goto out;
+return ret;
  }
  res->leaks_fixed += count;
  }
  }
+return 0;
+}
+
+static int coroutine_fn parallels_co_check(BlockDriverState *bs,
+   BdrvCheckResult *res,
+   BdrvCheckMode fix)
+{
+BDRVParallelsState *s = bs->opaque;
+int64_t prev_off;
+int ret;
+uint32_t i;
+
+


please remove extra empty line. This looks like an accident

+qemu_co_mutex_lock(>lock);
+
+parallels_check_unclean(bs, res, fix);
+
+ret = parallels_check_outside_image(bs, res, fix);
+if (ret < 0) {
+goto out;
+}
+
+ret = parallels_check_leak(bs, res, fix);
+if (ret < 0) {
+goto out;
+}
+
+res->bfi.total_clusters = s->bat_size;
+res->bfi.compressed_clusters = 0; /* compression is not supported */
+
+prev_off = 0;
+for (i = 0; i < s->bat_size; i++) {
+int64_t off = bat2sect(s, i) << BDRV_SECTOR_BITS;
+if (off == 0) {
+prev_off = 0;
+continue;
+}
+
+res->bfi.allocated_clusters++;
+
+if (prev_off != 0 && (prev_off + s->cluster_size) != off) {
+res->bfi.fragmented_clusters++;
+}
+prev_off = off;
+}
  
  out:

  qemu_co_mutex_unlock(>lock);





Re: [PULL 02/28] target/arm: Add coproc parameter to syn_fp_access_trap

2022-08-12 Thread Peter Maydell
On Fri, 10 Jun 2022 at 17:07, Peter Maydell  wrote:
>
> From: Richard Henderson 
>
> With ARMv8, this field is always RES0.
> With ARMv7, targeting EL2 and TA=0, it is always 0xA.

I was just looking at this change again because we still
have the loose end of syn_simd_access_trap() not being used,
and I realized that the claim in this commit message and the
comment isn't right. The "RES0 or fill in TA/copro fields"
test is not v8 vs v7, but "are we reporting this syndrome
to AArch64 in ESR_ELx or to AArch32 in HSR?".

I filed https://gitlab.com/qemu-project/qemu/-/issues/1153
to make a note of this since we might not get around to
fixing this for a while, given it's not very important.

thanks
-- PMM



Re: [PATCH v2 5/8] parallels: Move check of cluster outside image to a separate function

2022-08-12 Thread Denis V. Lunev

On 11.08.2022 17:00, Alexander Ivanov wrote:

v2: Move unrelated helper parallels_set_bat_entry creation to
 a separate patch.

same notes as for previous patch



Signed-off-by: Alexander Ivanov 
---
  block/parallels.c | 48 ++-
  1 file changed, 35 insertions(+), 13 deletions(-)

diff --git a/block/parallels.c b/block/parallels.c
index c53b2810cf..12104ba5ad 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -439,6 +439,36 @@ static void parallels_check_unclean(BlockDriverState *bs,
  }
  }
  
+static int parallels_check_outside_image(BlockDriverState *bs,

+ BdrvCheckResult *res,
+ BdrvCheckMode fix)
+{
+BDRVParallelsState *s = bs->opaque;
+uint32_t i;
+int64_t off, size;
+
+size = bdrv_getlength(bs->file->bs);
+if (size < 0) {
+res->check_errors++;
+return size;
+}
+
+for (i = 0; i < s->bat_size; i++) {
+off = bat2sect(s, i) << BDRV_SECTOR_BITS;
+if (off > size) {
+fprintf(stderr, "%s cluster %u is outside image\n",
+fix & BDRV_FIX_ERRORS ? "Repairing" : "ERROR", i);
+res->corruptions++;
+if (fix & BDRV_FIX_ERRORS) {
+parallels_set_bat_entry(s, i, 0);
+res->corruptions_fixed++;
+}
+}
+}
+
+return 0;
+}
+
  static int coroutine_fn parallels_co_check(BlockDriverState *bs,
 BdrvCheckResult *res,
 BdrvCheckMode fix)
@@ -458,6 +488,11 @@ static int coroutine_fn 
parallels_co_check(BlockDriverState *bs,
  
  parallels_check_unclean(bs, res, fix);
  
+ret = parallels_check_outside_image(bs, res, fix);

+if (ret < 0) {
+goto out;
+}
+
  res->bfi.total_clusters = s->bat_size;
  res->bfi.compressed_clusters = 0; /* compression is not supported */
  
@@ -470,19 +505,6 @@ static int coroutine_fn parallels_co_check(BlockDriverState *bs,

  continue;
  }
  
-/* cluster outside the image */

-if (off > size) {
-fprintf(stderr, "%s cluster %u is outside image\n",
-fix & BDRV_FIX_ERRORS ? "Repairing" : "ERROR", i);
-res->corruptions++;
-if (fix & BDRV_FIX_ERRORS) {
-prev_off = 0;
-parallels_set_bat_entry(s, i, 0);
-res->corruptions_fixed++;
-continue;
-}
-}
-
  res->bfi.allocated_clusters++;
  if (off > high_off) {
  high_off = off;





Re: [PATCH for-7.2 v4 01/11] ppc/pnv: add phb-id/chip-id PnvPHB3RootBus properties

2022-08-12 Thread Frederic Barrat




On 11/08/2022 18:39, Daniel Henrique Barboza wrote:

We rely on the phb-id and chip-id, which are PHB properties, to assign
chassis and slot to the root port. For default devices this is no big
deal: the root port is being created under pnv_phb_realize() and the
values are being passed on via the 'index' and 'chip-id' of the
pnv_phb_attach_root_port() helper.

If we want to implement user created root ports we have a problem. The
user created root port will not be aware of which PHB it belongs to,
unless we're willing to violate QOM best practices and access the PHB
via dev->parent_bus->parent. What we can do is to access the root bus
parent bus.

Since we're already assigning the root port as QOM child of the bus, and
the bus is initiated using PHB properties, let's add phb-id and chip-id
as properties of the bus. This will allow us trivial access to them, for
both user-created and default root ports, without doing anything too
shady with QOM.

Reviewed-by: Cédric Le Goater 
Signed-off-by: Daniel Henrique Barboza 
---



Reviewed-by: Frederic Barrat 

  Fred



  hw/pci-host/pnv_phb3.c | 50 ++
  include/hw/pci-host/pnv_phb3.h |  9 +-
  2 files changed, 58 insertions(+), 1 deletion(-)

diff --git a/hw/pci-host/pnv_phb3.c b/hw/pci-host/pnv_phb3.c
index d4c04a281a..af8575c007 100644
--- a/hw/pci-host/pnv_phb3.c
+++ b/hw/pci-host/pnv_phb3.c
@@ -1006,6 +1006,11 @@ void pnv_phb3_bus_init(DeviceState *dev, PnvPHB3 *phb)
   >pci_mmio, >pci_io,
   0, 4, TYPE_PNV_PHB3_ROOT_BUS);
  
+object_property_set_int(OBJECT(pci->bus), "phb-id", phb->phb_id,

+_abort);
+object_property_set_int(OBJECT(pci->bus), "chip-id", phb->chip_id,
+_abort);
+
  pci_setup_iommu(pci->bus, pnv_phb3_dma_iommu, phb);
  }
  
@@ -1105,10 +1110,55 @@ static const TypeInfo pnv_phb3_type_info = {

  .instance_init = pnv_phb3_instance_init,
  };
  
+static void pnv_phb3_root_bus_get_prop(Object *obj, Visitor *v,

+   const char *name,
+   void *opaque, Error **errp)
+{
+PnvPHB3RootBus *bus = PNV_PHB3_ROOT_BUS(obj);
+uint64_t value = 0;
+
+if (strcmp(name, "phb-id") == 0) {
+value = bus->phb_id;
+} else {
+value = bus->chip_id;
+}
+
+visit_type_size(v, name, , errp);
+}
+
+static void pnv_phb3_root_bus_set_prop(Object *obj, Visitor *v,
+   const char *name,
+   void *opaque, Error **errp)
+
+{
+PnvPHB3RootBus *bus = PNV_PHB3_ROOT_BUS(obj);
+uint64_t value;
+
+if (!visit_type_size(v, name, , errp)) {
+return;
+}
+
+if (strcmp(name, "phb-id") == 0) {
+bus->phb_id = value;
+} else {
+bus->chip_id = value;
+}
+}
+
  static void pnv_phb3_root_bus_class_init(ObjectClass *klass, void *data)
  {
  BusClass *k = BUS_CLASS(klass);
  
+object_class_property_add(klass, "phb-id", "int",

+  pnv_phb3_root_bus_get_prop,
+  pnv_phb3_root_bus_set_prop,
+  NULL, NULL);
+
+object_class_property_add(klass, "chip-id", "int",
+  pnv_phb3_root_bus_get_prop,
+  pnv_phb3_root_bus_set_prop,
+  NULL, NULL);
+
  /*
   * PHB3 has only a single root complex. Enforce the limit on the
   * parent bus
diff --git a/include/hw/pci-host/pnv_phb3.h b/include/hw/pci-host/pnv_phb3.h
index bff69201d9..4854f6d2f6 100644
--- a/include/hw/pci-host/pnv_phb3.h
+++ b/include/hw/pci-host/pnv_phb3.h
@@ -104,9 +104,16 @@ struct PnvPBCQState {
  };
  
  /*

- * PHB3 PCIe Root port
+ * PHB3 PCIe Root Bus
   */
  #define TYPE_PNV_PHB3_ROOT_BUS "pnv-phb3-root"
+struct PnvPHB3RootBus {
+PCIBus parent;
+
+uint32_t chip_id;
+uint32_t phb_id;
+};
+OBJECT_DECLARE_SIMPLE_TYPE(PnvPHB3RootBus, PNV_PHB3_ROOT_BUS)
  
  /*

   * PHB3 PCIe Host Bridge for PowerNV machines (POWER8)




Re: [PATCH v2 4/8] parallels: Move check of unclean image to a separate function

2022-08-12 Thread Denis V. Lunev

On 11.08.2022 17:00, Alexander Ivanov wrote:

v2: Revert the condition with s->header_unclean.

same comment about change log as previously

And commit message misses motivation part, why we are
doing this rework. What is the goal of this change?

The code part is clean.


Signed-off-by: Alexander Ivanov 
---
  block/parallels.c | 31 +--
  1 file changed, 21 insertions(+), 10 deletions(-)

diff --git a/block/parallels.c b/block/parallels.c
index 6879ea4597..c53b2810cf 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -419,6 +419,25 @@ static coroutine_fn int 
parallels_co_readv(BlockDriverState *bs,
  return ret;
  }
  
+static void parallels_check_unclean(BlockDriverState *bs,

+BdrvCheckResult *res,
+BdrvCheckMode fix)
+{
+BDRVParallelsState *s = bs->opaque;
+
+if (!s->header_unclean) {
+return;
+}
+
+fprintf(stderr, "%s image was not closed correctly\n",
+fix & BDRV_FIX_ERRORS ? "Repairing" : "ERROR");
+res->corruptions++;
+if (fix & BDRV_FIX_ERRORS) {
+/* parallels_close will do the job right */
+res->corruptions_fixed++;
+s->header_unclean = false;
+}
+}
  
  static int coroutine_fn parallels_co_check(BlockDriverState *bs,

 BdrvCheckResult *res,
@@ -436,16 +455,8 @@ static int coroutine_fn 
parallels_co_check(BlockDriverState *bs,
  }
  
  qemu_co_mutex_lock(>lock);

-if (s->header_unclean) {
-fprintf(stderr, "%s image was not closed correctly\n",
-fix & BDRV_FIX_ERRORS ? "Repairing" : "ERROR");
-res->corruptions++;
-if (fix & BDRV_FIX_ERRORS) {
-/* parallels_close will do the job right */
-res->corruptions_fixed++;
-s->header_unclean = false;
-}
-}
+
+parallels_check_unclean(bs, res, fix);
  
  res->bfi.total_clusters = s->bat_size;

  res->bfi.compressed_clusters = 0; /* compression is not supported */





Re: [PATCH for-7.2 v4 03/11] ppc/pnv: set root port chassis and slot using Bus properties

2022-08-12 Thread Frederic Barrat




On 11/08/2022 18:39, Daniel Henrique Barboza wrote:

For default root ports we have a way of accessing chassis and slot,
before root_port_realize(), via pnv_phb_attach_root_port(). For the
future user created root ports this won't be the case: we can't use
this helper because we don't have access to the PHB phb-id/chip-id
values.

In earlier patches we've added phb-id and chip-id to pnv-phb-root-bus
objects. We're now able to use the bus to retrieve them. The bus is
reachable for both user created and default devices, so we're changing
all the code paths. This also allow us to validate these changes with
the existing default devices.

Reviewed-by: Cédric Le Goater 
Signed-off-by: Daniel Henrique Barboza 
---



Reviewed-by: Frederic Barrat 

  Fred



  hw/pci-host/pnv_phb.c | 25 -
  1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/hw/pci-host/pnv_phb.c b/hw/pci-host/pnv_phb.c
index c47ed92462..826c0c144e 100644
--- a/hw/pci-host/pnv_phb.c
+++ b/hw/pci-host/pnv_phb.c
@@ -25,21 +25,19 @@
   * QOM id. 'chip_id' is going to be used as PCIE chassis for the
   * root port.
   */
-static void pnv_phb_attach_root_port(PCIHostState *pci, int index, int chip_id)
+static void pnv_phb_attach_root_port(PCIHostState *pci)
  {
  PCIDevice *root = pci_new(PCI_DEVFN(0, 0), TYPE_PNV_PHB_ROOT_PORT);
-g_autofree char *default_id = g_strdup_printf("%s[%d]",
-  TYPE_PNV_PHB_ROOT_PORT,
-  index);
  const char *dev_id = DEVICE(root)->id;
+g_autofree char *default_id = NULL;
+int index;
+
+index = object_property_get_int(OBJECT(pci->bus), "phb-id", _fatal);
+default_id = g_strdup_printf("%s[%d]", TYPE_PNV_PHB_ROOT_PORT, index);
  
  object_property_add_child(OBJECT(pci->bus), dev_id ? dev_id : default_id,

OBJECT(root));
  
-/* Set unique chassis/slot values for the root port */

-qdev_prop_set_uint8(DEVICE(root), "chassis", chip_id);
-qdev_prop_set_uint16(DEVICE(root), "slot", index);
-
  pci_realize_and_unref(root, pci->bus, _fatal);
  }
  
@@ -93,7 +91,7 @@ static void pnv_phb_realize(DeviceState *dev, Error **errp)

  pnv_phb4_bus_init(dev, PNV_PHB4(phb->backend));
  }
  
-pnv_phb_attach_root_port(pci, phb->phb_id, phb->chip_id);

+pnv_phb_attach_root_port(pci);
  }
  
  static const char *pnv_phb_root_bus_path(PCIHostState *host_bridge,

@@ -162,9 +160,18 @@ static void pnv_phb_root_port_realize(DeviceState *dev, 
Error **errp)
  {
  PCIERootPortClass *rpc = PCIE_ROOT_PORT_GET_CLASS(dev);
  PnvPHBRootPort *phb_rp = PNV_PHB_ROOT_PORT(dev);
+PCIBus *bus = PCI_BUS(qdev_get_parent_bus(dev));
  PCIDevice *pci = PCI_DEVICE(dev);
  uint16_t device_id = 0;
  Error *local_err = NULL;
+int chip_id, index;
+
+chip_id = object_property_get_int(OBJECT(bus), "chip-id", _fatal);
+index = object_property_get_int(OBJECT(bus), "phb-id", _fatal);
+
+/* Set unique chassis/slot values for the root port */
+qdev_prop_set_uint8(dev, "chassis", chip_id);
+qdev_prop_set_uint16(dev, "slot", index);
  
  rpc->parent_realize(dev, _err);

  if (local_err) {




Re: [PATCH for-7.2 v4 02/11] ppc/pnv: add phb-id/chip-id PnvPHB4RootBus properties

2022-08-12 Thread Frederic Barrat




On 11/08/2022 18:39, Daniel Henrique Barboza wrote:

The same rationale provided in the PHB3 bus case applies here.

Note: we could have merged both buses in a single object, like we did
with the root ports, and spare some boilerplate. The reason we opted to
preserve both buses objects is twofold:

- there's not user side advantage in doing so. Unifying the root ports
presents a clear user QOL change when we enable user created devices back.
The buses objects, aside from having a different QOM name, is transparent
to the user;

- we leave a door opened in case we want to increase the root port limit
for phb4/5 later on without having to deal with phb3 code.

Reviewed-by: Cédric Le Goater 
Signed-off-by: Daniel Henrique Barboza 
---



Reviewed-by: Frederic Barrat 

  Fred



  hw/pci-host/pnv_phb4.c | 51 ++
  include/hw/pci-host/pnv_phb4.h | 10 +++
  2 files changed, 61 insertions(+)

diff --git a/hw/pci-host/pnv_phb4.c b/hw/pci-host/pnv_phb4.c
index b98c394713..824e1a73fb 100644
--- a/hw/pci-host/pnv_phb4.c
+++ b/hw/pci-host/pnv_phb4.c
@@ -1551,6 +1551,12 @@ void pnv_phb4_bus_init(DeviceState *dev, PnvPHB4 *phb)
   pnv_phb4_set_irq, pnv_phb4_map_irq, phb,
   >pci_mmio, >pci_io,
   0, 4, TYPE_PNV_PHB4_ROOT_BUS);
+
+object_property_set_int(OBJECT(pci->bus), "phb-id", phb->phb_id,
+_abort);
+object_property_set_int(OBJECT(pci->bus), "chip-id", phb->chip_id,
+_abort);
+
  pci_setup_iommu(pci->bus, pnv_phb4_dma_iommu, phb);
  pci->bus->flags |= PCI_BUS_EXTENDED_CONFIG_SPACE;
  }
@@ -1708,10 +1714,55 @@ static const TypeInfo pnv_phb5_type_info = {
  .instance_size = sizeof(PnvPHB4),
  };
  
+static void pnv_phb4_root_bus_get_prop(Object *obj, Visitor *v,

+   const char *name,
+   void *opaque, Error **errp)
+{
+PnvPHB4RootBus *bus = PNV_PHB4_ROOT_BUS(obj);
+uint64_t value = 0;
+
+if (strcmp(name, "phb-id") == 0) {
+value = bus->phb_id;
+} else {
+value = bus->chip_id;
+}
+
+visit_type_size(v, name, , errp);
+}
+
+static void pnv_phb4_root_bus_set_prop(Object *obj, Visitor *v,
+   const char *name,
+   void *opaque, Error **errp)
+
+{
+PnvPHB4RootBus *bus = PNV_PHB4_ROOT_BUS(obj);
+uint64_t value;
+
+if (!visit_type_size(v, name, , errp)) {
+return;
+}
+
+if (strcmp(name, "phb-id") == 0) {
+bus->phb_id = value;
+} else {
+bus->chip_id = value;
+}
+}
+
  static void pnv_phb4_root_bus_class_init(ObjectClass *klass, void *data)
  {
  BusClass *k = BUS_CLASS(klass);
  
+object_class_property_add(klass, "phb-id", "int",

+  pnv_phb4_root_bus_get_prop,
+  pnv_phb4_root_bus_set_prop,
+  NULL, NULL);
+
+object_class_property_add(klass, "chip-id", "int",
+  pnv_phb4_root_bus_get_prop,
+  pnv_phb4_root_bus_set_prop,
+  NULL, NULL);
+
  /*
   * PHB4 has only a single root complex. Enforce the limit on the
   * parent bus
diff --git a/include/hw/pci-host/pnv_phb4.h b/include/hw/pci-host/pnv_phb4.h
index 20aa4819d3..50d4faa001 100644
--- a/include/hw/pci-host/pnv_phb4.h
+++ b/include/hw/pci-host/pnv_phb4.h
@@ -45,7 +45,17 @@ typedef struct PnvPhb4DMASpace {
  QLIST_ENTRY(PnvPhb4DMASpace) list;
  } PnvPhb4DMASpace;
  
+/*

+ * PHB4 PCIe Root Bus
+ */
  #define TYPE_PNV_PHB4_ROOT_BUS "pnv-phb4-root"
+struct PnvPHB4RootBus {
+PCIBus parent;
+
+uint32_t chip_id;
+uint32_t phb_id;
+};
+OBJECT_DECLARE_SIMPLE_TYPE(PnvPHB4RootBus, PNV_PHB4_ROOT_BUS)
  
  /*

   * PHB4 PCIe Host Bridge for PowerNV machines (POWER9)




Re: [PATCH v2 3/8] parallels: Replace bdrv_co_pwrite_sync by bdrv_co_flush for BAT flushing

2022-08-12 Thread Denis V. Lunev

Use generic infrastructure for BAT writing in parallels_co_check()

On 11.08.2022 17:00, Alexander Ivanov wrote:

It's too costly to write all the BAT to the disk. Let the flush function
write only dirty blocks.
Use parallels_set_bat_entry for setting a BAT entry and marking a relevant
block as dirty.
Move bdrv_co_flush call outside the locked area.

This is not correct too:
- with a lot of cases inside parallels_co_check, which will be split to
  smaller functions, we would like to avoid writing BAT once each
  stage is complete

Thus the comment should look like the following:
"BAT is written in the context of conventional operations over
the image inside bdrv_co_flush() when it calls
parallels_co_flush_to_os() callback. Thus we should not
modify BAT array directly, but call parallels_set_bat_entry()
helper and bdrv_co_flush() further on. After that there is no
need to manually write BAT and track its modification.

This makes code more generic and allows to split
parallels_set_bat_entry() for independent pieces."


v2: Patch order was changed so the replacement is done in parallels_co_check.
 Now we use a helper to set BAT entry and mark the block dirty.

Same note about changelog as n other patches.

Side note. I like Linux kernel a lot and there is a good
guide in. Please look how commit message
https://www.kernel.org/doc/html/latest/process/submitting-patches.html

May be you could spend some more time on commit
message.

With a better message:
Reviewed-by: Denis V. Lunev 


Signed-off-by: Alexander Ivanov 
---
  block/parallels.c | 19 +++
  1 file changed, 7 insertions(+), 12 deletions(-)

diff --git a/block/parallels.c b/block/parallels.c
index 7f68f3cbc9..6879ea4597 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -428,7 +428,6 @@ static int coroutine_fn parallels_co_check(BlockDriverState 
*bs,
  int64_t size, prev_off, high_off;
  int ret;
  uint32_t i;
-bool flush_bat = false;
  
  size = bdrv_getlength(bs->file->bs);

  if (size < 0) {
@@ -467,9 +466,8 @@ static int coroutine_fn parallels_co_check(BlockDriverState 
*bs,
  res->corruptions++;
  if (fix & BDRV_FIX_ERRORS) {
  prev_off = 0;
-s->bat_bitmap[i] = 0;
+parallels_set_bat_entry(s, i, 0);
  res->corruptions_fixed++;
-flush_bat = true;
  continue;
  }
  }
@@ -485,15 +483,6 @@ static int coroutine_fn 
parallels_co_check(BlockDriverState *bs,
  prev_off = off;
  }
  
-ret = 0;

-if (flush_bat) {
-ret = bdrv_co_pwrite_sync(bs->file, 0, s->header_size, s->header, 0);
-if (ret < 0) {
-res->check_errors++;
-goto out;
-}
-}
-
  res->image_end_offset = high_off + s->cluster_size;
  if (size > res->image_end_offset) {
  int64_t count;
@@ -522,6 +511,12 @@ static int coroutine_fn 
parallels_co_check(BlockDriverState *bs,
  
  out:

  qemu_co_mutex_unlock(>lock);
+
+ret = bdrv_co_flush(bs);
+if (ret < 0) {
+res->check_errors++;
+}
+
  return ret;
  }
  





Re: [PATCH v2 2/8] parallels: Move BAT entry setting to a separate function

2022-08-12 Thread Denis V. Lunev

On 11.08.2022 17:00, Alexander Ivanov wrote:

Will need to set BAT entry in multiple places.
Move the code of settings entries and marking relevant blocks dirty
to a separate helper parallels_set_bat_entry.

The comment and the patch text is ambiguous.
You say that we need to set BAT in multiple places
but patch changes one place only. I think that it would
be better to say like the following:

"parallels: create parallels_set_bat_entry_helper() to assign BAT value

This helper will be reused in next patches during parallels_co_check
rework to simplify its code"


v2: A new patch - a part of a splitted patch.

Same note about version tracking

With above taken into account:
Reviewed-by: Denis V. Lunev 


Signed-off-by: Alexander Ivanov 
---
  block/parallels.c | 12 +---
  1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/block/parallels.c b/block/parallels.c
index a76cf9d993..7f68f3cbc9 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -165,6 +165,13 @@ static int64_t block_status(BDRVParallelsState *s, int64_t 
sector_num,
  return start_off;
  }
  
+static void parallels_set_bat_entry(BDRVParallelsState *s,

+uint32_t index, uint32_t offset)
+{
+s->bat_bitmap[index] = offset;
+bitmap_set(s->bat_dirty_bmap, bat_entry_off(index) / s->bat_dirty_block, 
1);
+}
+
  static int64_t allocate_clusters(BlockDriverState *bs, int64_t sector_num,
   int nb_sectors, int *pnum)
  {
@@ -250,10 +257,9 @@ static int64_t allocate_clusters(BlockDriverState *bs, 
int64_t sector_num,
  }
  
  for (i = 0; i < to_allocate; i++) {

-s->bat_bitmap[idx + i] = cpu_to_le32(s->data_end / s->off_multiplier);
+parallels_set_bat_entry(s, idx + i,
+cpu_to_le32(s->data_end / s->off_multiplier));
  s->data_end += s->tracks;
-bitmap_set(s->bat_dirty_bmap,
-   bat_entry_off(idx + i) / s->bat_dirty_block, 1);
  }
  
  return bat2sect(s, idx) + sector_num % s->tracks;





Re: [PATCH 1/2] target/riscv: fence.i: update decode pattern

2022-08-12 Thread Peter Maydell
On Fri, 12 Aug 2022 at 15:11, Philipp Tomsich  wrote:
>
> On Fri, 12 Aug 2022 at 16:01, Andrew Jones  wrote:
> >
> > > Update the decode pattern to reflect the specification.
> >
> > I got hung-up on this for a bit since there isn't any "must-be-0" fields,
>
> Please refer to '“Zifencei” Instruction-Fetch Fence, Version 2.0' in
> the specification.
> The encoding diagram clearly states 0 for imm[11:0], 0 for rs1 and 0 for rd.
>
> However, there is an explanatory paragraph below (unfortunately, it is
> not clear whether this is normative or informative):

> > The unused fields in the FENCE.I instruction, imm[11:0], rs1, and rd, are 
> > reserved for finer-grain fences in future extensions. For forward 
> > compatibility, base implementations shall ignore these fields, and standard 
> > software shall zero these fields.

That's pretty clear that this patch is wrong, then -- QEMU
is an implementation, and so we must ignore these fields.
Otherwise when a future version of the spec defines a finer-grain
fence instruction in this part of the encoding space, older
QEMU will incorrectly make software that uses it crash.

If you think the spec is insufficiently clear about whether that
is normative then that would be something to raise with the
spec authors, preferably before anybody builds hardware that
enforces must-be-zeroes on these fields...

thanks
-- PMM



Re: [PATCH 2/2] target/riscv: fence: reconcile with specification

2022-08-12 Thread Philipp Tomsich
Happy to lower it back into the decode file.
However, I initially pulled it up into the trans-function to more
closely match the ISA specification: there is only one FENCE
instruction with 3 arguments (FM, PRED, and SUCC).
One might argue that the decode table for "RV32I Base Instruction Set"
in the specification lists FENCE.TSO as a separate instruction, but
the normative text doesn't (and FENCE overlaps FENCE.TSO in the
tabular representation) — so I would consider the table as
informative.

I'll wait until we see what consensus emerges from the discussion.

Philipp.

On Fri, 12 Aug 2022 at 15:21, Peter Maydell  wrote:
>
> On Fri, 12 Aug 2022 at 14:17, Philipp Tomsich  
> wrote:
> >
> > Our decoding of fence-instructions is problematic in respect to the
> > RISC-V ISA specification:
> > - rs and rd are ignored, but need to be 0
> > - fm is ignored
> >
> > This change adjusts the decode pattern to enfore rs and rd being 0,
> > and validates the fm-field (together with pred/succ for FENCE.TSO) to
> > determine whether a reserved instruction is specified.
> >
> > While the specification allows UNSPECIFIED behaviour for reserved
> > instructions, we now always raise an illegal instruction exception.
> >
> > Signed-off-by: Philipp Tomsich 
> >
> > ---
> >
> >  target/riscv/insn32.decode  |  2 +-
> >  target/riscv/insn_trans/trans_rvi.c.inc | 19 ++-
> >  2 files changed, 19 insertions(+), 2 deletions(-)
> >
> > diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
> > index 089128c3dc..4e53df1b62 100644
> > --- a/target/riscv/insn32.decode
> > +++ b/target/riscv/insn32.decode
> > @@ -150,7 +150,7 @@ srl  000 .. 101 . 0110011 @r
> >  sra  010 .. 101 . 0110011 @r
> >  or   000 .. 110 . 0110011 @r
> >  and  000 .. 111 . 0110011 @r
> > -fence pred:4 succ:4 - 000 - 000
> > +fencefm:4 pred:4 succ:4 0 000 0 000
> >  fence_i   0 001 0 000
> >  csrrw . 001 . 1110011 @csr
> >  csrrs . 010 . 1110011 @csr
> > diff --git a/target/riscv/insn_trans/trans_rvi.c.inc 
> > b/target/riscv/insn_trans/trans_rvi.c.inc
> > index ca8e3d1ea1..515bb3b22a 100644
> > --- a/target/riscv/insn_trans/trans_rvi.c.inc
> > +++ b/target/riscv/insn_trans/trans_rvi.c.inc
> > @@ -795,7 +795,24 @@ static bool trans_srad(DisasContext *ctx, arg_srad *a)
> >
> >  static bool trans_fence(DisasContext *ctx, arg_fence *a)
> >  {
> > -/* FENCE is a full memory barrier. */
> > +switch (a->fm) {
> > +case 0b:
> > + /* normal fence */
> > + break;
> > +
> > +case 0b0001:
> > + /* FENCE.TSO requires PRED and SUCC to be RW */
> > + if (a->pred != 0xb0011 || a->succ != 0b0011) {
> > +return false;
> > + }
> > + break;
> > +
> > +default:
> > +/* reserved for future use */
> > +return false;
> > +}
>
> I think it would be neater to do this decode in the
> .decode file, rather than by hand in the trans function.
>
> thanks
> -- PMM



Re: [PATCH v2 1/8] parallels: Out of image offset in BAT leads to image inflation

2022-08-12 Thread Denis V. Lunev

On 11.08.2022 17:00, Alexander Ivanov wrote:

When an image is opened, data_end field in BDRVParallelsState
is setted as the biggest offset in the BAT plus cluster size.
If there is a corrupted offset pointing outside the image,
the image size increase accordingly. It potentially leads
to attempts to create a file size of petabytes.

Set the data_end field with the original file size if the image
was opened for checking and repairing purposes or raise an error.

v2: No changes.

Changelog should be below ---
In that case it will not be merged.

There are a lot of typos/mistakes inside, I'd better use the comment
below.

"data_end field in BDRVParallelsState is set to the biggest offset present
in BAT. If this offset is outside of the image, any further write will 
create

the cluster at this offset and/or the image will be truncated to this
offset on close. This is definitely not correct and should be fixed."

With this change:
Reviewed-by: Denis V. Lunev 


Signed-off-by: Alexander Ivanov 
---
  block/parallels.c | 17 +
  1 file changed, 17 insertions(+)

diff --git a/block/parallels.c b/block/parallels.c
index a229c06f25..a76cf9d993 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -732,6 +732,7 @@ static int parallels_open(BlockDriverState *bs, QDict 
*options, int flags,
  BDRVParallelsState *s = bs->opaque;
  ParallelsHeader ph;
  int ret, size, i;
+int64_t file_size;
  QemuOpts *opts = NULL;
  Error *local_err = NULL;
  char *buf;
@@ -811,6 +812,22 @@ static int parallels_open(BlockDriverState *bs, QDict 
*options, int flags,
  }
  }
  
+file_size = bdrv_getlength(bs->file->bs);

+if (file_size < 0) {
+goto fail;
+}
+
+file_size >>= BDRV_SECTOR_BITS;
+if (s->data_end > file_size) {
+if (flags & BDRV_O_CHECK) {
+s->data_end = file_size;
+} else {
+error_setg(errp, "parallels: Offset in BAT is out of image");
+ret = -EINVAL;
+goto fail;
+}
+}
+
  if (le32_to_cpu(ph.inuse) == HEADER_INUSE_MAGIC) {
  /* Image was not closed correctly. The check is mandatory */
  s->header_unclean = true;





Re: [PATCH 1/2] target/riscv: fence.i: update decode pattern

2022-08-12 Thread Philipp Tomsich
On Fri, 12 Aug 2022 at 16:01, Andrew Jones  wrote:
>
> > Update the decode pattern to reflect the specification.
>
> I got hung-up on this for a bit since there isn't any "must-be-0" fields,

Please refer to '“Zifencei” Instruction-Fetch Fence, Version 2.0' in
the specification.
The encoding diagram clearly states 0 for imm[11:0], 0 for rs1 and 0 for rd.

However, there is an explanatory paragraph below (unfortunately, it is
not clear whether this is normative or informative):
> The unused fields in the FENCE.I instruction, imm[11:0], rs1, and rd, are 
> reserved for finer-grain fences in future extensions. For forward 
> compatibility, base implementations shall ignore these fields, and standard 
> software shall zero these fields.

Strictly speaking, this patch may be too restrictive (it violates the
"for forward-compatibility" part — which I consider informative only,
though).

Thanks,
Philipp.



Re: [PATCH 2/2] target/riscv: fence: reconcile with specification

2022-08-12 Thread Andrew Jones
On Fri, Aug 12, 2022 at 03:13:04PM +0200, Philipp Tomsich wrote:
> Our decoding of fence-instructions is problematic in respect to the
> RISC-V ISA specification:
> - rs and rd are ignored, but need to be 0
> - fm is ignored
> 
> This change adjusts the decode pattern to enfore rs and rd being 0,
> and validates the fm-field (together with pred/succ for FENCE.TSO) to
> determine whether a reserved instruction is specified.
> 
> While the specification allows UNSPECIFIED behaviour for reserved
> instructions, we now always raise an illegal instruction exception.
> 
> Signed-off-by: Philipp Tomsich 
> 
> ---
> 
>  target/riscv/insn32.decode  |  2 +-
>  target/riscv/insn_trans/trans_rvi.c.inc | 19 ++-
>  2 files changed, 19 insertions(+), 2 deletions(-)
> 
> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
> index 089128c3dc..4e53df1b62 100644
> --- a/target/riscv/insn32.decode
> +++ b/target/riscv/insn32.decode
> @@ -150,7 +150,7 @@ srl  000 .. 101 . 0110011 @r
>  sra  010 .. 101 . 0110011 @r
>  or   000 .. 110 . 0110011 @r
>  and  000 .. 111 . 0110011 @r
> -fence pred:4 succ:4 - 000 - 000
> +fencefm:4 pred:4 succ:4 0 000 0 000
>  fence_i   0 001 0 000
>  csrrw . 001 . 1110011 @csr
>  csrrs . 010 . 1110011 @csr
> diff --git a/target/riscv/insn_trans/trans_rvi.c.inc 
> b/target/riscv/insn_trans/trans_rvi.c.inc
> index ca8e3d1ea1..515bb3b22a 100644
> --- a/target/riscv/insn_trans/trans_rvi.c.inc
> +++ b/target/riscv/insn_trans/trans_rvi.c.inc
> @@ -795,7 +795,24 @@ static bool trans_srad(DisasContext *ctx, arg_srad *a)
>  
>  static bool trans_fence(DisasContext *ctx, arg_fence *a)
>  {
> -/* FENCE is a full memory barrier. */
> +switch (a->fm) {
> +case 0b:
> + /* normal fence */
> + break;
> +
> +case 0b0001:
> + /* FENCE.TSO requires PRED and SUCC to be RW */
> + if (a->pred != 0xb0011 || a->succ != 0b0011) {
> +return false;
> + }
> + break;
> +
> +default:
> +/* reserved for future use */
> +return false;
> +}
> +
> +/* We implement FENCE(.TSO) is a full memory barrier. */

s/is/as/

Thanks,
drew

>  tcg_gen_mb(TCG_MO_ALL | TCG_BAR_SC);
>  return true;
>  }
> -- 
> 2.34.1
> 
> 



Re: [PATCH 1/2] target/riscv: fence.i: update decode pattern

2022-08-12 Thread Andrew Jones


Please use a cover-letter for multi-patch patch series.

On Fri, Aug 12, 2022 at 03:13:03PM +0200, Philipp Tomsich wrote:
> The RISC-V specification specifies imm12, rs1 and rd to be all-zeros,
> so we can't ignore these bits when decoding into fence.i.
> 
> Update the decode pattern to reflect the specification.

I got hung-up on this for a bit since there isn't any "must-be-0" fields,
only ignored fields, but the next patch gives a clue which helped me make
sense of this. The encoding of these instructions with ignored fields set
to anything except zero gets into reserved instruction territory, and QEMU
may legally raise an illegal-instruction in that case, which this patch
will start doing. It'd be nice to have a bit more text in this commit
message to make that clear.

> 
> Signed-off-by: Philipp Tomsich 
> ---
> 
>  target/riscv/insn32.decode | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
> index 014127d066..089128c3dc 100644
> --- a/target/riscv/insn32.decode
> +++ b/target/riscv/insn32.decode
> @@ -151,7 +151,7 @@ sra  010 .. 101 . 0110011 @r
>  or   000 .. 110 . 0110011 @r
>  and  000 .. 111 . 0110011 @r
>  fence pred:4 succ:4 - 000 - 000
> -fence_i         - 001 - 000
> +fence_i   0 001 0 000
^ need two more spaces here to line up with fence.

>  csrrw . 001 . 1110011 @csr
>  csrrs . 010 . 1110011 @csr
>  csrrc . 011 . 1110011 @csr
> -- 
> 2.34.1
> 
> 

Thanks,
drew



[PATCH v2] hw/smbios: support for type 8 (port connector)

2022-08-12 Thread Hal Martin
PATCH v1: add support for SMBIOS type 8 to qemu
PATCH v2: incorporate patch v1 feedback and add smbios type=8 to qemu-options

internal_reference: internal reference designator
external_reference: external reference designator
connector_type: hex value for port connector type (see SMBIOS 7.9.2)
port_type: hex value for port type (see SMBIOS 7.9.3)

After studying various vendor implementationsi (Dell, Lenovo, MSI),
the value of internal connector type was hard-coded to 0x0 (None).

Example usage:
-smbios 
type=8,internal_reference=JUSB1,external_reference=USB1,connector_type=0x12,port_type=0x10
 \
-smbios type=8,internal_reference=JAUD1,external_reference="Audio 
Jack",connector_type=0x1f,port_type=0x1d \
-smbios 
type=8,internal_reference=LAN,external_reference=Ethernet,connector_type=0x0b,port_type=0x1f
 \
-smbios 
type=8,internal_reference=PS2,external_reference=Mouse,connector_type=0x0f,port_type=0x0e
 \
-smbios 
type=8,internal_reference=PS2,external_reference=Keyboard,connector_type=0x0f,port_type=0x0d


Signed-off-by: Hal Martin 

---
 hw/smbios/smbios.c   | 63 
 include/hw/firmware/smbios.h | 10 ++
 qemu-options.hx  |  2 ++
 3 files changed, 75 insertions(+)

diff --git a/hw/smbios/smbios.c b/hw/smbios/smbios.c
index 60349ee402..578cae0f0a 100644
--- a/hw/smbios/smbios.c
+++ b/hw/smbios/smbios.c
@@ -111,6 +111,13 @@ static struct {
 .processor_id = 0,
 };
 
+struct type8_instance {
+const char *internal_reference, *external_reference;
+uint8_t connector_type, port_type;
+QTAILQ_ENTRY(type8_instance) next;
+};
+static QTAILQ_HEAD(, type8_instance) type8 = QTAILQ_HEAD_INITIALIZER(type8);
+
 static struct {
 size_t nvalues;
 char **values;
@@ -337,6 +344,29 @@ static const QemuOptDesc qemu_smbios_type4_opts[] = {
 { /* end of list */ }
 };
 
+static const QemuOptDesc qemu_smbios_type8_opts[] = {
+{
+.name = "internal_reference",
+.type = QEMU_OPT_STRING,
+.help = "internal reference designator",
+},
+{
+.name = "external_reference",
+.type = QEMU_OPT_STRING,
+.help = "external reference designator",
+},
+{
+.name = "connector_type",
+.type = QEMU_OPT_NUMBER,
+.help = "connector type",
+},
+{
+.name = "port_type",
+.type = QEMU_OPT_NUMBER,
+.help = "port type",
+},
+};
+
 static const QemuOptDesc qemu_smbios_type11_opts[] = {
 {
 .name = "value",
@@ -718,6 +748,26 @@ static void smbios_build_type_4_table(MachineState *ms, 
unsigned instance)
 smbios_type4_count++;
 }
 
+static void smbios_build_type_8_table(void)
+{
+unsigned instance = 0;
+struct type8_instance *t8;
+
+QTAILQ_FOREACH(t8, , next) {
+SMBIOS_BUILD_TABLE_PRE(8, T0_BASE + instance, true);
+
+SMBIOS_TABLE_SET_STR(8, internal_reference_str, 
t8->internal_reference);
+SMBIOS_TABLE_SET_STR(8, external_reference_str, 
t8->external_reference);
+/* most vendors seem to set this to None */
+t->internal_connector_type = 0x0;
+t->external_connector_type = t8->connector_type;
+t->port_type = t8->port_type;
+
+SMBIOS_BUILD_TABLE_POST;
+instance++;
+}
+}
+
 static void smbios_build_type_11_table(void)
 {
 char count_str[128];
@@ -1030,6 +1080,7 @@ void smbios_get_tables(MachineState *ms,
 smbios_build_type_4_table(ms, i);
 }
 
+smbios_build_type_8_table();
 smbios_build_type_11_table();
 
 #define MAX_DIMM_SZ (16 * GiB)
@@ -1346,6 +1397,18 @@ void smbios_entry_add(QemuOpts *opts, Error **errp)
UINT16_MAX);
 }
 return;
+case 8:
+if (!qemu_opts_validate(opts, qemu_smbios_type8_opts, errp)) {
+return;
+}
+struct type8_instance *t;
+t = g_new0(struct type8_instance, 1);
+save_opt(>internal_reference, opts, "internal_reference");
+save_opt(>external_reference, opts, "external_reference");
+t->connector_type = qemu_opt_get_number(opts, "connector_type", 0);
+t->port_type = qemu_opt_get_number(opts, "port_type", 0);
+QTAILQ_INSERT_TAIL(, t, next);
+return;
 case 11:
 if (!qemu_opts_validate(opts, qemu_smbios_type11_opts, errp)) {
 return;
diff --git a/include/hw/firmware/smbios.h b/include/hw/firmware/smbios.h
index 4b7ad77a44..e7d386f7c8 100644
--- a/include/hw/firmware/smbios.h
+++ b/include/hw/firmware/smbios.h
@@ -189,6 +189,16 @@ struct smbios_type_4 {
 uint16_t processor_family2;
 } QEMU_PACKED;
 
+/* SMBIOS type 8 - Port Connector Information */
+struct smbios_type_8 {
+struct smbios_structure_header header;
+uint8_t internal_reference_str;
+uint8_t internal_connector_type;
+uint8_t external_reference_str;
+uint8_t external_connector_type;
+

[PATCH v4 1/1] os-posix: asynchronous teardown for shutdown on Linux

2022-08-12 Thread Claudio Imbrenda
This patch adds support for asynchronously tearing down a VM on Linux.

When qemu terminates, either naturally or because of a fatal signal,
the VM is torn down. If the VM is huge, it can take a considerable
amount of time for it to be cleaned up. In case of a protected VM, it
might take even longer than a non-protected VM (this is the case on
s390x, for example).

Some users might want to shut down a VM and restart it immediately,
without having to wait. This is especially true if management
infrastructure like libvirt is used.

This patch implements a simple trick on Linux to allow qemu to return
immediately, with the teardown of the VM being performed
asynchronously.

If the new commandline option -async-teardown is used, a new process is
spawned from qemu at startup, using the clone syscall, in such way that
it will share its address space with qemu.The new process will have the
name "cleanup/". It will wait until qemu terminates
completely, and then it will exit itself.

This allows qemu to terminate quickly, without having to wait for the
whole address space to be torn down. The cleanup process will exit
after qemu, so it will be the last user of the address space, and
therefore it will take care of the actual teardown. The cleanup
process will share the same cgroups as qemu, so both memory usage and
cpu time will be accounted properly.

If possible, close_range will be used in the cleanup process to close
all open file descriptors. If it is not available or if it fails, /proc
will be used to determine which file descriptors to close.

If the cleanup process is forcefully killed with SIGKILL before the
main qemu process has terminated completely, the mechanism is defeated
and the teardown will not be asynchronous.

This feature can already be used with libvirt by adding the following
to the XML domain definition to pass the parameter to qemu directly:

  http://libvirt.org/schemas/domain/qemu/1.0;>
  
  

Signed-off-by: Claudio Imbrenda 
Reviewed-by: Murilo Opsfelder Araujo 
Tested-by: Murilo Opsfelder Araujo 
---
 include/qemu/async-teardown.h |  22 +
 meson.build   |   1 +
 os-posix.c|   6 ++
 qemu-options.hx   |  19 +
 util/async-teardown.c | 155 ++
 util/meson.build  |   1 +
 6 files changed, 204 insertions(+)
 create mode 100644 include/qemu/async-teardown.h
 create mode 100644 util/async-teardown.c

diff --git a/include/qemu/async-teardown.h b/include/qemu/async-teardown.h
new file mode 100644
index 00..092e7a37e7
--- /dev/null
+++ b/include/qemu/async-teardown.h
@@ -0,0 +1,22 @@
+/*
+ * Asynchronous teardown
+ *
+ * Copyright IBM, Corp. 2022
+ *
+ * Authors:
+ *  Claudio Imbrenda 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or (at your
+ * option) any later version.  See the COPYING file in the top-level directory.
+ *
+ */
+#ifndef QEMU_ASYNC_TEARDOWN_H
+#define QEMU_ASYNC_TEARDOWN_H
+
+#include "config-host.h"
+
+#ifdef CONFIG_LINUX
+void init_async_teardown(void);
+#endif
+
+#endif
diff --git a/meson.build b/meson.build
index 294e9a8f32..7bccad93d0 100644
--- a/meson.build
+++ b/meson.build
@@ -1892,6 +1892,7 @@ config_host_data.set('HAVE_SYS_IOCCOM_H', 
cc.has_header('sys/ioccom.h'))
 config_host_data.set('HAVE_SYS_KCOV_H', cc.has_header('sys/kcov.h'))
 
 # has_function
+config_host_data.set('CONFIG_CLOSE_RANGE', cc.has_function('close_range'))
 config_host_data.set('CONFIG_ACCEPT4', cc.has_function('accept4'))
 config_host_data.set('CONFIG_CLOCK_ADJTIME', cc.has_function('clock_adjtime'))
 config_host_data.set('CONFIG_DUP3', cc.has_function('dup3'))
diff --git a/os-posix.c b/os-posix.c
index 321fc4bd13..4858650c3e 100644
--- a/os-posix.c
+++ b/os-posix.c
@@ -39,6 +39,7 @@
 
 #ifdef CONFIG_LINUX
 #include 
+#include "qemu/async-teardown.h"
 #endif
 
 /*
@@ -150,6 +151,11 @@ int os_parse_cmd_args(int index, const char *optarg)
 case QEMU_OPTION_daemonize:
 daemonize = 1;
 break;
+#if defined(CONFIG_LINUX)
+case QEMU_OPTION_asyncteardown:
+init_async_teardown();
+break;
+#endif
 default:
 return -1;
 }
diff --git a/qemu-options.hx b/qemu-options.hx
index 3f23a42fa8..f913fc307f 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -4743,6 +4743,25 @@ HXCOMM Internal use
 DEF("qtest", HAS_ARG, QEMU_OPTION_qtest, "", QEMU_ARCH_ALL)
 DEF("qtest-log", HAS_ARG, QEMU_OPTION_qtest_log, "", QEMU_ARCH_ALL)
 
+#ifdef __linux__
+DEF("async-teardown", 0, QEMU_OPTION_asyncteardown,
+"-async-teardown enable asynchronous teardown\n",
+QEMU_ARCH_ALL)
+#endif
+SRST
+``-async-teardown``
+Enable asynchronous teardown. A new process called "cleanup/"
+will be created at startup sharing the address space with the main qemu
+process, using clone. It will wait for the main qemu process to
+terminate completely, and then exit.
+This allows qemu to terminate very quickly even if the guest was

Re: [PATCH 2/2] target/riscv: fence: reconcile with specification

2022-08-12 Thread Peter Maydell
On Fri, 12 Aug 2022 at 14:17, Philipp Tomsich  wrote:
>
> Our decoding of fence-instructions is problematic in respect to the
> RISC-V ISA specification:
> - rs and rd are ignored, but need to be 0
> - fm is ignored
>
> This change adjusts the decode pattern to enfore rs and rd being 0,
> and validates the fm-field (together with pred/succ for FENCE.TSO) to
> determine whether a reserved instruction is specified.
>
> While the specification allows UNSPECIFIED behaviour for reserved
> instructions, we now always raise an illegal instruction exception.
>
> Signed-off-by: Philipp Tomsich 
>
> ---
>
>  target/riscv/insn32.decode  |  2 +-
>  target/riscv/insn_trans/trans_rvi.c.inc | 19 ++-
>  2 files changed, 19 insertions(+), 2 deletions(-)
>
> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
> index 089128c3dc..4e53df1b62 100644
> --- a/target/riscv/insn32.decode
> +++ b/target/riscv/insn32.decode
> @@ -150,7 +150,7 @@ srl  000 .. 101 . 0110011 @r
>  sra  010 .. 101 . 0110011 @r
>  or   000 .. 110 . 0110011 @r
>  and  000 .. 111 . 0110011 @r
> -fence pred:4 succ:4 - 000 - 000
> +fencefm:4 pred:4 succ:4 0 000 0 000
>  fence_i   0 001 0 000
>  csrrw . 001 . 1110011 @csr
>  csrrs . 010 . 1110011 @csr
> diff --git a/target/riscv/insn_trans/trans_rvi.c.inc 
> b/target/riscv/insn_trans/trans_rvi.c.inc
> index ca8e3d1ea1..515bb3b22a 100644
> --- a/target/riscv/insn_trans/trans_rvi.c.inc
> +++ b/target/riscv/insn_trans/trans_rvi.c.inc
> @@ -795,7 +795,24 @@ static bool trans_srad(DisasContext *ctx, arg_srad *a)
>
>  static bool trans_fence(DisasContext *ctx, arg_fence *a)
>  {
> -/* FENCE is a full memory barrier. */
> +switch (a->fm) {
> +case 0b:
> + /* normal fence */
> + break;
> +
> +case 0b0001:
> + /* FENCE.TSO requires PRED and SUCC to be RW */
> + if (a->pred != 0xb0011 || a->succ != 0b0011) {
> +return false;
> + }
> + break;
> +
> +default:
> +/* reserved for future use */
> +return false;
> +}

I think it would be neater to do this decode in the
.decode file, rather than by hand in the trans function.

thanks
-- PMM



[PATCH 1/2] target/riscv: fence.i: update decode pattern

2022-08-12 Thread Philipp Tomsich
The RISC-V specification specifies imm12, rs1 and rd to be all-zeros,
so we can't ignore these bits when decoding into fence.i.

Update the decode pattern to reflect the specification.

Signed-off-by: Philipp Tomsich 
---

 target/riscv/insn32.decode | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 014127d066..089128c3dc 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -151,7 +151,7 @@ sra  010 .. 101 . 0110011 @r
 or   000 .. 110 . 0110011 @r
 and  000 .. 111 . 0110011 @r
 fence pred:4 succ:4 - 000 - 000
-fence_i         - 001 - 000
+fence_i   0 001 0 000
 csrrw . 001 . 1110011 @csr
 csrrs . 010 . 1110011 @csr
 csrrc . 011 . 1110011 @csr
-- 
2.34.1




[PATCH 2/2] target/riscv: fence: reconcile with specification

2022-08-12 Thread Philipp Tomsich
Our decoding of fence-instructions is problematic in respect to the
RISC-V ISA specification:
- rs and rd are ignored, but need to be 0
- fm is ignored

This change adjusts the decode pattern to enfore rs and rd being 0,
and validates the fm-field (together with pred/succ for FENCE.TSO) to
determine whether a reserved instruction is specified.

While the specification allows UNSPECIFIED behaviour for reserved
instructions, we now always raise an illegal instruction exception.

Signed-off-by: Philipp Tomsich 

---

 target/riscv/insn32.decode  |  2 +-
 target/riscv/insn_trans/trans_rvi.c.inc | 19 ++-
 2 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 089128c3dc..4e53df1b62 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -150,7 +150,7 @@ srl  000 .. 101 . 0110011 @r
 sra  010 .. 101 . 0110011 @r
 or   000 .. 110 . 0110011 @r
 and  000 .. 111 . 0110011 @r
-fence pred:4 succ:4 - 000 - 000
+fencefm:4 pred:4 succ:4 0 000 0 000
 fence_i   0 001 0 000
 csrrw . 001 . 1110011 @csr
 csrrs . 010 . 1110011 @csr
diff --git a/target/riscv/insn_trans/trans_rvi.c.inc 
b/target/riscv/insn_trans/trans_rvi.c.inc
index ca8e3d1ea1..515bb3b22a 100644
--- a/target/riscv/insn_trans/trans_rvi.c.inc
+++ b/target/riscv/insn_trans/trans_rvi.c.inc
@@ -795,7 +795,24 @@ static bool trans_srad(DisasContext *ctx, arg_srad *a)
 
 static bool trans_fence(DisasContext *ctx, arg_fence *a)
 {
-/* FENCE is a full memory barrier. */
+switch (a->fm) {
+case 0b:
+ /* normal fence */
+ break;
+
+case 0b0001:
+ /* FENCE.TSO requires PRED and SUCC to be RW */
+ if (a->pred != 0xb0011 || a->succ != 0b0011) {
+return false;
+ }
+ break;
+
+default:
+/* reserved for future use */
+return false;
+}
+
+/* We implement FENCE(.TSO) is a full memory barrier. */
 tcg_gen_mb(TCG_MO_ALL | TCG_BAR_SC);
 return true;
 }
-- 
2.34.1




Re: [PATCH v2 3/9] target/arm: Make RVBAR available for all ARMv8 CPUs

2022-08-12 Thread Peter Maydell
(I've added your rwth-aachen.de address because the quicinc
one seems to be bouncing :-( )

On Mon, 18 Jul 2022 at 12:54, Tobias Roehmel  wrote:
>
> From: Tobias Röhmel 
>
> Signed-off-by: Tobias Röhmel 

Having looked a bit more carefully at the architecture
manual, I think this is not complete. In particular
you don't actually define the AArch32 RVBAR register
(which needs to exist with permissions permitting only
EL2 access if EL2 is the highest EL, and with permissions
for EL1 access if EL1 is the highest EL, in the same way
the AArch64 RVBAR_EL2 and RVBAR_EL1 work (ie by adding
AArch32 alias registers where we define RVBAR_EL2 and
RVBAR_EL1).

(AArch32 has no equivalent of RVBAR_EL3; MVBAR is kind
of the same thing, but it's not required that it has the
actual reset address in it on reset. So we can ignore that
as we're architecturally compliant for A-profile, and for
the R52 we don't care about EL3.).

The commit message here could also usefully be expanded
to explain what we're doing and why it's the right thing.

> ---
>  target/arm/cpu.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/target/arm/cpu.c b/target/arm/cpu.c
> index 1b5d535788..9007768418 100644
> --- a/target/arm/cpu.c
> +++ b/target/arm/cpu.c
> @@ -258,6 +258,10 @@ static void arm_cpu_reset(DeviceState *dev)
>  env->cp15.cpacr_el1 = FIELD_DP64(env->cp15.cpacr_el1,
>   CPACR, CP11, 3);
>  #endif
> +if (arm_feature(env, ARM_FEATURE_V8)) {
> +env->cp15.rvbar = cpu->rvbar_prop;
> +env->regs[15] = cpu->rvbar_prop;
> +}
>  }
>
>  #if defined(CONFIG_USER_ONLY)
> @@ -1273,7 +1277,7 @@ void arm_cpu_post_init(Object *obj)
>  qdev_property_add_static(DEVICE(obj), 
> _cpu_reset_hivecs_property);
>  }
>
> -if (arm_feature(>env, ARM_FEATURE_AARCH64)) {
> +if (arm_feature(>env, ARM_FEATURE_V8)) {
>  object_property_add_uint64_ptr(obj, "rvbar",
> >rvbar_prop,
> OBJ_PROP_FLAG_READWRITE);

thanks
-- PMM



Re: [PATCH v2 2/9] target/arm: Don't add all MIDR aliases for Cortex-R

2022-08-12 Thread Peter Maydell
On Mon, 18 Jul 2022 at 12:54, Tobias Roehmel  wrote:
>
> From: Tobias Röhmel 
>
> Cortex-R52 has the MPUIR register which has the same encoding
> has the MIDR alias with opc2=4. So we only add that alias
> when we are not realizing a Cortex-R.
>
> Signed-off-by: Tobias Röhmel 
> ---
>  target/arm/helper.c | 12 +---
>  1 file changed, 9 insertions(+), 3 deletions(-)
>
> diff --git a/target/arm/helper.c b/target/arm/helper.c
> index 6457e6301c..03bdc3d149 100644
> --- a/target/arm/helper.c
> +++ b/target/arm/helper.c
> @@ -8189,9 +8189,6 @@ void register_cp_regs_for_features(ARMCPU *cpu)
>.fieldoffset = offsetof(CPUARMState, cp15.c0_cpuid),
>.readfn = midr_read },
>  /* crn = 0 op1 = 0 crm = 0 op2 = 4,7 : AArch32 aliases of MIDR */

Moving the op2=4 register definition makes this comment out of date,
so you need to update it (ie remove the '4' here, and add a comment
similarly noting what id_v8_midr_alias_cp_reginfo[] is doing).

> -{ .name = "MIDR", .type = ARM_CP_ALIAS | ARM_CP_CONST,
> -  .cp = 15, .crn = 0, .crm = 0, .opc1 = 0, .opc2 = 4,
> -  .access = PL1_R, .resetvalue = cpu->midr },
>  { .name = "MIDR", .type = ARM_CP_ALIAS | ARM_CP_CONST,
>.cp = 15, .crn = 0, .crm = 0, .opc1 = 0, .opc2 = 7,
>.access = PL1_R, .resetvalue = cpu->midr },
> @@ -8201,6 +8198,11 @@ void register_cp_regs_for_features(ARMCPU *cpu)
>.accessfn = access_aa64_tid1,
>.type = ARM_CP_CONST, .resetvalue = cpu->revidr },
>  };
> +ARMCPRegInfo id_v8_midr_alias_cp_reginfo[] = {
> +{ .name = "MIDR", .type = ARM_CP_ALIAS | ARM_CP_CONST,
> +  .cp = 15, .crn = 0, .crm = 0, .opc1 = 0, .opc2 = 4,
> +  .access = PL1_R, .resetvalue = cpu->midr },
> +};

If there's only one register, you don't need to define a
single-element array, you can use define_one_arm_cp_reg()
and pass it a single ARMCPRegInfo struct.

>  ARMCPRegInfo id_cp_reginfo[] = {
>  /* These are common to v8 and pre-v8 */
>  { .name = "CTR",
> @@ -8264,8 +8266,12 @@ void register_cp_regs_for_features(ARMCPU *cpu)
>  id_mpuir_reginfo.access = PL1_RW;
>  id_tlbtr_reginfo.access = PL1_RW;
>  }
> +
>  if (arm_feature(env, ARM_FEATURE_V8)) {
>  define_arm_cp_regs(cpu, id_v8_midr_cp_reginfo);
> +if (!arm_feature(env, ARM_FEATURE_V8_R)) {

You can use ARM_FEATURE_PMSA here.

> +define_arm_cp_regs(cpu, id_v8_midr_alias_cp_reginfo);
> +}
>  } else {
>  define_arm_cp_regs(cpu, id_pre_v8_midr_cp_reginfo);
>  }
> --

thanks
-- PMM



Re: [PATCH 1/2] osdeps: Introduce qemu_socketpair()

2022-08-12 Thread Peter Maydell
On Fri, 12 Aug 2022 at 12:44,  wrote:
>
> From: Guoyi Tu 
>
> qemu_socketpair() will create a pair of connected sockets
> with FD_CLOEXEC set
>
> Signed-off-by: Guoyi Tu 
> ---
>  include/qemu/sockets.h |  3 +++
>  util/osdep.c   | 24 
>  2 files changed, 27 insertions(+)
>
> diff --git a/include/qemu/sockets.h b/include/qemu/sockets.h
> index 038faa157f..52cf2855df 100644
> --- a/include/qemu/sockets.h
> +++ b/include/qemu/sockets.h
> @@ -14,6 +14,9 @@ int inet_aton(const char *cp, struct in_addr *ia);
>  /* misc helpers */
>  bool fd_is_socket(int fd);
>  int qemu_socket(int domain, int type, int protocol);
> +#ifndef WIN32
> +int qemu_socketpair(int domain, int type, int protocol, int sv[2]);

Any new function declaration in a header file needs a
doc-comment documenting what it does, please.

> +#endif
>  int qemu_accept(int s, struct sockaddr *addr, socklen_t *addrlen);
>  int socket_set_cork(int fd, int v);
>  int socket_set_nodelay(int fd);
> diff --git a/util/osdep.c b/util/osdep.c
> index 60fcbbaebe..4b1ab623c7 100644
> --- a/util/osdep.c
> +++ b/util/osdep.c
> @@ -481,6 +481,30 @@ int qemu_socket(int domain, int type, int protocol)
>  return ret;
>  }
>
> +#ifndef _WIN32

If this function only exists and is usable on posix
hosts, put it in util/oslib-posix.c rather than having
it here with a win32 ifdef.

thanks
-- PMM



Re: [PATCH v3 1/1] os-posix: asynchronous teardown for shutdown on Linux

2022-08-12 Thread Claudio Imbrenda
On Fri, 12 Aug 2022 08:38:59 -0300
Murilo Opsfelder Araújo  wrote:

> On 8/12/22 04:26, Claudio Imbrenda wrote:
> > On Thu, 11 Aug 2022 23:05:52 -0300
> > Murilo Opsfelder Araújo  wrote:
> >  
> >> On 8/11/22 11:02, Daniel P. Berrangé wrote:
> >> [...]  
> > Hmm, I was hoping you could just use SIGKILL to guarantee that this
> > gets killed off.  Is SIGKILL delivered too soon to allow for the
> > main QEMU process to have exited quickly ?  
> 
>  yes, I tried. qemu has not finished exiting when the signal is
>  delivered, the cleanup process dies before qemu, which defeats the
>  purpose  
> >>>
> >>> Ok, too bad.
> >>>  
> > If so I wonder what happens when systemd just delivers SIGKILL to
> > all processes in the cgroup - I'm not sure there's a guarantee it
> > will SIGKILL the main qemu before it SIGKILLs this helper  
> 
>  I'm afraid in that case there is no guarantee.
> 
>  for what it's worth, both virsh shutdown and destroy seem to do things
>  properly.  
> >>>
> >>> Hmm, probably because libvirt tells QEMU to exit before systemd comes
> >>> along and tells everything in the cgroup to die with SIGKILL.  
> >>
> >> It seems Libvirt sends SIGKILL if qemu process doesn't terminate within 10
> >> seconds after Libvirt sent SIGTERM:
> >>
> >> https://gitlab.com/libvirt/libvirt/-/blob/0615df084ec9996b5df88d6a1b59c557e22f3a12/src/util/virprocess.c#L375
> >>   
> >
> > but this is fine.
> >
> > with asynchronous teardown, qemu will exit almost immediately when
> > receiving SIGTERM, and the cleanup process will start cleaning up.  
> 
> Under normal and orderly conditions, yes.
> 
> >> So I guess this patch happened to work with Libvirt because the main qemu
> >> process terminated before the timeout and before SIGKILL was delivered.  
> >
> > it seems so
> >  
> >>
> >> The cleanup process is trying to solve the problem where the main qemu 
> >> process
> >> takes too long to terminate. However, if the cleanup process itself takes 
> >> too
> >> long, SIGKILL will be sent by Libvirt anyway.  
> >
> > but that is not a problem, the sole purpose of the cleanup process is
> > to terminate _after_ qemu. it doesn't matter what happens after qemu
> > has terminated. if you look at the patch, after going to great lengths
> > to assure that qemu has terminated, all the child process does is
> > _exit(0).
> >  
> >>
> >> Perhaps we can describe this situation in the parameter help, e.g.: If
> >> management layer decides to send SIGKILL (e.g.: due to timeout or 
> >> deliberate
> >> decision), the cleanup process can exit before the main process, deceiving 
> >> its
> >> purpose.  
> >
> > if the management layer (or the user) decides to send SIGKILL
> > immediately to the whole cgroup without sending SIGTERM first, then
> > this whole asynchronous teardown mechanism is defeated, yes.  
> 
> This situation is what we likely want to describe in the parameter help. I 
> don't
> want to give users the false impression that this option will *always* behave
> the manner we expect it to work *most* of the time.

fair enough, I'll improve the documentation

> 
> --
> Murilo




[PULL 3/5] tests/unit: fix a -Wformat-truncation warning

2022-08-12 Thread Peter Maydell
From: Marc-André Lureau 

../tests/test-qobject-input-visitor.c: In function ‘test_visitor_in_list’:
../tests/test-qobject-input-visitor.c:454:49: warning: ‘%d’ directive output 
may be truncated writing between 1 and 10 bytes into a region of size 6 
[-Wformat-truncation=]
  454 | snprintf(string, sizeof(string), "string%d", i);
  | ^~
../tests/test-qobject-input-visitor.c:454:42: note: directive argument in the 
range [0, 2147483606]
  454 | snprintf(string, sizeof(string), "string%d", i);
  |  ^~
../tests/test-qobject-input-visitor.c:454:9: note: ‘snprintf’ output between 8 
and 17 bytes into a destination of size 12
  454 | snprintf(string, sizeof(string), "string%d", i);
  | ^~~

Rather than trying to be clever, since this is called 3 times during
tests, let's simply use g_strdup_printf().

Signed-off-by: Marc-André Lureau 
Reviewed-by: Markus Armbruster 
Message-id: 20220810121513.1356081-1-marcandre.lur...@redhat.com
Reviewed-by: Peter Maydell 
[PMM: fixed commit message typos]
Signed-off-by: Peter Maydell 
---
 tests/unit/test-qobject-input-visitor.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/tests/unit/test-qobject-input-visitor.c 
b/tests/unit/test-qobject-input-visitor.c
index 14329dabcfe..5f614afdbf1 100644
--- a/tests/unit/test-qobject-input-visitor.c
+++ b/tests/unit/test-qobject-input-visitor.c
@@ -447,9 +447,8 @@ static void test_visitor_in_list(TestInputVisitorData *data,
 g_assert(head != NULL);
 
 for (i = 0, item = head; item; item = item->next, i++) {
-char string[12];
+g_autofree char *string = g_strdup_printf("string%d", i);
 
-snprintf(string, sizeof(string), "string%d", i);
 g_assert_cmpstr(item->value->string, ==, string);
 g_assert_cmpint(item->value->integer, ==, 42 + i);
 }
-- 
2.25.1




Re: [PATCH 2/2] vhost-user: Call qemu_socketpair() instead of socketpair()

2022-08-12 Thread Peter Maydell
On Fri, 12 Aug 2022 at 12:44,  wrote:
>
> From: Guoyi Tu 
>
> set close-on-exec flag on the new opened file descriptors at default

What goes wrong if we don't do this? The commit message
is a good place to explain what bug the commit is fixing,
and its consequences.

thanks
-- PMM



  1   2   >