On Tue, Jul 29, 2025 at 11:25:12AM +0200, Nam Cao wrote: > On Tue, Jul 29, 2025 at 10:46:51AM +0200, Gabriele Monaco wrote: > > On Mon, 2025-07-28 at 17:53 +0200, Nam Cao wrote: > > > I gave this a try on riscv64 and observed some errors: > > > > > > [ 620.696055] rv: monitor sts does not allow event sched_switch on > > > state enable_to_exit > > > [ 621.047705] rv: monitor sts does not allow event sched_switch on > > > state enable_to_exit > > > [ 642.440209] rv: monitor sts does not allow event sched_switch on > > > state enable_to_exit > > > > > > I tested with two user programs: > > > > > > int main() { asm ("unimp"); } > > > int main() { asm ("ebreak"); } > > > > > > The two programs are repeatedly executed: > > > > > > #!/bin/bash > > > ./test1 & > > > ./test2 & > > > # ... repeat lots of time > > > > > > Any idea? > > > > Mmh I see what you're doing here.. > > Those instructions are supposed to raise some sort of exception in the > > CPU which apparently disables and enables interrupts without raising an > > interrupt handler tracepoint (the discriminator for this monitor). > > This lets the monitor believe we passed the time a switch is possible > > and complain when it actually sees one. > > > > I still couldn't reproduce it on my VM, yet I find the timing a bit > > strange: it's alright we handle the illegal instruction like this, but > > do we really end up doing that while scheduling although it doesn't > > look like an interrupt?! > > > > Could you share a bit more about your riscv setup? It might some > > configuration/hardware specific thing. > > Kernel: > - base: ftrace/for-next > - config: defconfig + mod2noconfig + PREEMPT_RT + monitors > > Hardware: > qemu-system-riscv64 -machine virt \ > -kernel ../linux/arch/riscv/boot/Image \ > -append "console=ttyS0 root=/dev/vda rw" \ > -nographic \ > -drive if=virtio,format=raw,file=riscv64.img \ > -smp 4 -m 4G > > riscv64.img is a Debian trixie image from debootstrap > > Test: > echo 0 > /proc/sys/debug/exception-trace > ./testall # see attached
I should note that this takes a few tries before something shows up. Below is the backtrace, in case it helps: illegal 3246 [000] 1020.132675: rv:error_sts: event sched_switch not expected in the state enable_to_exit ffffffff8013231c __traceiter_error_sts+0x28 ([kernel.kallsyms]) ffffffff8013231c __traceiter_error_sts+0x28 ([kernel.kallsyms]) ffffffff80138aa4 da_event_sts+0x198 ([kernel.kallsyms]) ffffffff80138cf0 handle_sched_switch+0x46 ([kernel.kallsyms]) ffffffff80aaf222 __schedule+0x4ba ([kernel.kallsyms]) ffffffff80aafb80 preempt_schedule_irq+0x32 ([kernel.kallsyms]) ffffffff80aac714 irqentry_exit+0x76 ([kernel.kallsyms]) ffffffff80aac1dc do_irq+0x38 ([kernel.kallsyms]) ffffffff80ab7da6 __lock_text_end+0x12e ([kernel.kallsyms]) ffffffff80a93e50 mas_find+0x0 ([kernel.kallsyms]) ffffffff8021ea60 vms_clear_ptes+0xe8 ([kernel.kallsyms]) ffffffff8021f81a vms_complete_munmap_vmas+0x58 ([kernel.kallsyms]) ffffffff80220706 do_vmi_align_munmap+0x15c ([kernel.kallsyms]) ffffffff802207d0 do_vmi_munmap+0xa6 ([kernel.kallsyms]) ffffffff80221f3c __vm_munmap+0xa2 ([kernel.kallsyms]) ffffffff8020be7c vm_munmap+0xe ([kernel.kallsyms]) ffffffff802bbdbe elf_load+0x14c ([kernel.kallsyms]) ffffffff802bc1f4 load_elf_binary+0x36e ([kernel.kallsyms]) ffffffff80264426 bprm_execve+0x254 ([kernel.kallsyms]) ffffffff8026570c do_execveat_common.isra.0+0x11e ([kernel.kallsyms]) ffffffff802664de __riscv_sys_execve+0x32 ([kernel.kallsyms]) ffffffff80aabf84 do_trap_ecall_u+0x1bc ([kernel.kallsyms]) ffffffff80ab7dc8 __lock_text_end+0x150 ([kernel.kallsyms])