[PATCH] mtd: rawnand: fix an error code in nand_setup_interface()
We should return an error code if the timing mode is not acknowledged by the NAND chip. Fixes: 415ae78ffb5d ("mtd: rawnand: check ONFI timings have been acked by the chip") Signed-off-by: Dan Carpenter --- >From static analysis. Not tested. drivers/mtd/nand/raw/nand_base.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/mtd/nand/raw/nand_base.c b/drivers/mtd/nand/raw/nand_base.c index fb072c95..d83c0503f96f 100644 --- a/drivers/mtd/nand/raw/nand_base.c +++ b/drivers/mtd/nand/raw/nand_base.c @@ -880,6 +880,7 @@ static int nand_setup_interface(struct nand_chip *chip, int chipnr) if (tmode_param[0] != chip->best_interface_config->timings.mode) { pr_warn("timing mode %d not acknowledged by the NAND chip\n", chip->best_interface_config->timings.mode); + ret = -EINVAL; goto err_reset_chip; } -- 2.30.2
Re: [syzbot] KASAN: use-after-free Read in get_wchan
On Wed, Apr 14, 2021 at 7:52 AM syzbot wrote: > > Hello, > > syzbot found the following issue on: > > HEAD commit:b2b3d18f riscv: Make NUMA depend on MMU > git tree: git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux.git > fixes > console output: https://syzkaller.appspot.com/x/log.txt?x=12b59d16d0 > kernel config: https://syzkaller.appspot.com/x/.config?x=81b3e7c68dad6e > dashboard link: https://syzkaller.appspot.com/bug?extid=0806291048161061627c > userspace arch: riscv64 > > Unfortunately, I don't have any reproducer for this issue yet. > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > Reported-by: syzbot+08062910481610616...@syzkaller.appspotmail.com > > == > BUG: KASAN: use-after-free in walk_stackframe > arch/riscv/kernel/stacktrace.c:60 [inline] > BUG: KASAN: use-after-free in get_wchan+0x156/0x196 > arch/riscv/kernel/stacktrace.c:136 > Read of size 8 at addr ffe0058e3d90 by task syz-executor.0/4667 > > CPU: 1 PID: 4667 Comm: syz-executor.0 Not tainted > 5.12.0-rc5-syzkaller-00721-gb2b3d18fc20e #0 > Hardware name: riscv-virtio,qemu (DT) > Call Trace: > [] walk_stackframe+0x0/0x23c arch/riscv/kernel/traps.c:201 > [] dump_backtrace+0x40/0x4e > arch/riscv/kernel/stacktrace.c:113 > [] show_stack+0x22/0x2e arch/riscv/kernel/stacktrace.c:118 > [] __dump_stack lib/dump_stack.c:79 [inline] > [] dump_stack+0x148/0x1d8 lib/dump_stack.c:120 > [] print_address_description.constprop.0+0x52/0x31e > mm/kasan/report.c:232 > [] __kasan_report mm/kasan/report.c:399 [inline] > [] kasan_report+0x16e/0x18c mm/kasan/report.c:416 > [] check_region_inline mm/kasan/generic.c:180 [inline] > [] __asan_load8+0x6e/0x80 mm/kasan/generic.c:253 > [] walk_stackframe arch/riscv/kernel/stacktrace.c:60 > [inline] If it's walking the stack of another task, then it needs to use READ_ONCE_NOCHECK. See x86/arm64/s390 for examples: https://elixir.bootlin.com/linux/v5.12-rc7/A/ident/READ_ONCE_NOCHECK > [] get_wchan+0x156/0x196 arch/riscv/kernel/stacktrace.c:136 > [] proc_pid_wchan+0x48/0xa4 fs/proc/base.c:390 > [] proc_single_show+0x9c/0x13c fs/proc/base.c:774 > [] seq_read_iter+0x2e0/0x8f2 fs/seq_file.c:227 > [] seq_read+0x200/0x298 fs/seq_file.c:159 > [] vfs_read+0x108/0x2ac fs/read_write.c:494 > [] ksys_read+0xb4/0x1b8 fs/read_write.c:634 > [] __do_sys_read fs/read_write.c:644 [inline] > [] sys_read+0x28/0x36 fs/read_write.c:642 > [] ret_from_syscall+0x0/0x2 > > The buggy address belongs to the page: > page:ffcf0216b8c0 refcount:0 mapcount:0 mapping: > index:0x0 pfn:0x85ae3 > flags: 0xffe() > raw: 0ffe ffcf0216b8c8 ffcf0216b8c8 > raw: > page dumped because: kasan: bad access detected > > Memory state around the buggy address: > ffe0058e3c80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > ffe0058e3d00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > >ffe0058e3d80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > ^ > ffe0058e3e00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > ffe0058e3e80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > == > > > --- > This report is generated by a bot. It may contain errors. > See https://goo.gl/tpsmEJ for more information about syzbot. > syzbot engineers can be reached at syzkal...@googlegroups.com. > > syzbot will keep track of this issue. See: > https://goo.gl/tpsmEJ#status for how to communicate with syzbot. > > -- > You received this message because you are subscribed to the Google Groups > "syzkaller-bugs" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to syzkaller-bugs+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/syzkaller-bugs/9862e005bfe859c8%40google.com.
[PATCH 4/4] selftests: add tests for process_vm_exec
Output: $ make run_tests TAP version 13 1..4 # selftests: process_vm_exec: process_vm_exec # 1..1 # ok 1 275 ns/syscall # # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0 ok 1 selftests: process_vm_exec: process_vm_exec # selftests: process_vm_exec: process_vm_exec_fault # 1..1 # ok 1 789 ns/signal # # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0 ok 2 selftests: process_vm_exec: process_vm_exec_fault # selftests: process_vm_exec: ptrace_vm_exec # 1..1 # ok 1 1378 ns/syscall# Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0 ok 3 selftests: process_vm_exec: ptrace_vm_exec # selftests: process_vm_exec: process_vm_exec_syscall # 1..1 # ok 1 write works as expectd # # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0 ok 4 selftests: process_vm_exec: process_vm_exec_syscall Signed-off-by: Andrei Vagin --- .../selftests/process_vm_exec/Makefile| 7 ++ tools/testing/selftests/process_vm_exec/log.h | 26 .../process_vm_exec/process_vm_exec.c | 105 + .../process_vm_exec/process_vm_exec_fault.c | 111 ++ .../process_vm_exec/process_vm_exec_syscall.c | 81 + .../process_vm_exec/ptrace_vm_exec.c | 111 ++ 6 files changed, 441 insertions(+) create mode 100644 tools/testing/selftests/process_vm_exec/Makefile create mode 100644 tools/testing/selftests/process_vm_exec/log.h create mode 100644 tools/testing/selftests/process_vm_exec/process_vm_exec.c create mode 100644 tools/testing/selftests/process_vm_exec/process_vm_exec_fault.c create mode 100644 tools/testing/selftests/process_vm_exec/process_vm_exec_syscall.c create mode 100644 tools/testing/selftests/process_vm_exec/ptrace_vm_exec.c diff --git a/tools/testing/selftests/process_vm_exec/Makefile b/tools/testing/selftests/process_vm_exec/Makefile new file mode 100644 index ..bdf7fcf0fdd3 --- /dev/null +++ b/tools/testing/selftests/process_vm_exec/Makefile @@ -0,0 +1,7 @@ +# SPDX-License-Identifier: GPL-2.0 + +UNAME_M := $(shell uname -m) +TEST_GEN_PROGS_x86_64 := process_vm_exec process_vm_exec_fault ptrace_vm_exec process_vm_exec_syscall +TEST_GEN_PROGS += $(TEST_GEN_PROGS_$(UNAME_M)) + +include ../lib.mk diff --git a/tools/testing/selftests/process_vm_exec/log.h b/tools/testing/selftests/process_vm_exec/log.h new file mode 100644 index ..ef268c2cf2b8 --- /dev/null +++ b/tools/testing/selftests/process_vm_exec/log.h @@ -0,0 +1,26 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#ifndef __SELFTEST_PROCESS_VM_EXEC_LOG_H__ +#define __SELFTEST_PROCESS_VM_EXEC_LOG_H__ + +#define pr_msg(fmt, lvl, ...) \ + ksft_print_msg("[%s] (%s:%d)\t" fmt "\n", \ + lvl, __FILE__, __LINE__, ##__VA_ARGS__) + +#define pr_p(func, fmt, ...) func(fmt ": %m", ##__VA_ARGS__) + +#define pr_err(fmt, ...) \ + ({ \ + ksft_test_result_error(fmt "\n", ##__VA_ARGS__); \ + -1; \ + }) + +#define pr_fail(fmt, ...) \ + ({ \ + ksft_test_result_fail(fmt "\n", ##__VA_ARGS__); \ + -1; \ + }) + +#define pr_perror(fmt, ...)pr_p(pr_err, fmt, ##__VA_ARGS__) + +#endif diff --git a/tools/testing/selftests/process_vm_exec/process_vm_exec.c b/tools/testing/selftests/process_vm_exec/process_vm_exec.c new file mode 100644 index ..aa4009c43e01 --- /dev/null +++ b/tools/testing/selftests/process_vm_exec/process_vm_exec.c @@ -0,0 +1,105 @@ +// SPDX-License-Identifier: GPL-2.0 + +#define _GNU_SOURCE +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include "asm/unistd.h" +#include +#include + +#include "../kselftest.h" +#include "log.h" + +#ifndef __NR_process_vm_exec +#define __NR_process_vm_exec 441 +#endif + +#define TEST_SYSCALL 123 +#define TEST_SYSCALL_RET 456 +#define TEST_MARKER 789 +#define TEST_TIMEOUT 5 +#define TEST_STACK_SIZE 65536 + +static inline long __syscall1(long n, long a1) +{ + unsigned long ret; + + __asm__ __volatile__ ("syscall" : "=a"(ret) : "a"(n), "D"(a1) : "rcx", "r11", "memory"); + + return ret; +} + +int marker; + +static void guest(void) +{ + while (1) + if (__syscall1(TEST_SYSCALL, marker) != TEST_SYSCALL_RET) + abort(); +} + +int main(int argc, char **argv) +{ + struct sigcontext ctx = {}; + struct timespec start, cur; + int status, ret; + pid_t pid; + long sysnr; + void *stack; + + ksft_set_plan(1); + + stack = mmap(NULL, TEST_STACK_SIZE, PROT_READ | PROT_WRITE,
[PATCH 3/4] arch/x86: allow to execute syscalls via process_vm_exec
process_vm_exec allows to execute code in an address space of another process. It changes the current address space to the target address space and resume the current process with registers from sigcontex that is passed in the arguments. This changes adds the PROCESS_VM_EXEC_SYSCALL flag and if it is set process_vm_exec will execute a system call with arguments from sigcontext. process_vm_exec retuns 0 if the system call has been executed and an error code in other cases. A return code of the system call can be found in a proper register in sigcontext. Signed-off-by: Andrei Vagin --- arch/x86/entry/common.c | 5 - arch/x86/kernel/process_vm_exec.c| 29 +++- include/linux/entry-common.h | 2 ++ include/linux/process_vm_exec.h | 2 ++ include/uapi/linux/process_vm_exec.h | 8 kernel/entry/common.c| 2 +- 6 files changed, 45 insertions(+), 3 deletions(-) create mode 100644 include/uapi/linux/process_vm_exec.h diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c index 42eac459b25b..8de02ca19aca 100644 --- a/arch/x86/entry/common.c +++ b/arch/x86/entry/common.c @@ -40,7 +40,10 @@ __visible noinstr void do_syscall_64(unsigned long nr, struct pt_regs *regs) { #ifdef CONFIG_PROCESS_VM_EXEC - if (current->exec_mm && current->exec_mm->ctx) { + struct exec_mm *exec_mm = current->exec_mm; + + if (exec_mm && exec_mm->ctx && + !(exec_mm->flags & PROCESS_VM_EXEC_SYSCALL)) { kernel_siginfo_t info = { .si_signo = SIGSYS, .si_call_addr = (void __user *)KSTK_EIP(current), diff --git a/arch/x86/kernel/process_vm_exec.c b/arch/x86/kernel/process_vm_exec.c index 28b32330f744..9124b23f1e9b 100644 --- a/arch/x86/kernel/process_vm_exec.c +++ b/arch/x86/kernel/process_vm_exec.c @@ -11,6 +11,7 @@ #include #include #include +#include #include static void swap_mm(struct mm_struct *prev_mm, struct mm_struct *target_mm) @@ -73,7 +74,7 @@ SYSCALL_DEFINE6(process_vm_exec, pid_t, pid, struct sigcontext __user *, uctx, sigset_t mask; - if (flags) + if (flags & ~PROCESS_VM_EXEC_SYSCALL) return -EINVAL; if (sizemask != sizeof(sigset_t)) @@ -97,6 +98,9 @@ SYSCALL_DEFINE6(process_vm_exec, pid_t, pid, struct sigcontext __user *, uctx, } current_pt_regs()->ax = 0; + if (flags & PROCESS_VM_EXEC_SYSCALL) + syscall_exit_to_user_mode_prepare(current_pt_regs()); + ret = swap_vm_exec_context(uctx); if (ret < 0) goto err_mm_put; @@ -117,6 +121,29 @@ SYSCALL_DEFINE6(process_vm_exec, pid_t, pid, struct sigcontext __user *, uctx, mmgrab(prev_mm); swap_mm(prev_mm, mm); + if (flags & PROCESS_VM_EXEC_SYSCALL) { + struct pt_regs *regs = current_pt_regs(); + kernel_siginfo_t info; + int sysno; + + regs->orig_ax = regs->ax; + regs->ax = -ENOSYS; + sysno = syscall_get_nr(current, regs); + + do_syscall_64(sysno, regs); + + restore_vm_exec_context(regs); + info.si_signo = SIGSYS; + info.si_call_addr = (void __user *)KSTK_EIP(current); + info.si_arch = syscall_get_arch(current); + info.si_syscall = sysno; + ret = copy_siginfo_to_user(current->exec_mm->siginfo, ); + current_pt_regs()->orig_ax = __NR_process_vm_exec; + current_pt_regs()->ax = -ENOSYS; + syscall_enter_from_user_mode_work(current_pt_regs(), current_pt_regs()->orig_ax); + return ret; + } + ret = current_pt_regs()->ax; return ret; diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h index 474f29638d2c..d0ebbe9ca9e4 100644 --- a/include/linux/entry-common.h +++ b/include/linux/entry-common.h @@ -285,6 +285,8 @@ static inline void arch_syscall_exit_tracehook(struct pt_regs *regs, bool step) } #endif +void syscall_exit_to_user_mode_prepare(struct pt_regs *regs); + /** * syscall_exit_to_user_mode - Handle work before returning to user mode * @regs: Pointer to currents pt_regs diff --git a/include/linux/process_vm_exec.h b/include/linux/process_vm_exec.h index a02535fbd5c8..2e04b4875a92 100644 --- a/include/linux/process_vm_exec.h +++ b/include/linux/process_vm_exec.h @@ -2,6 +2,8 @@ #ifndef _LINUX_PROCESS_VM_EXEC_H #define _LINUX_PROCESS_VM_EXEC_H +#include + struct exec_mm { struct sigcontext *ctx; struct mm_struct *mm; diff --git a/include/uapi/linux/process_vm_exec.h b/include/uapi/linux/process_vm_exec.h new file mode 100644 index ..35465b5d3ebf --- /dev/null +++ b/include/uapi/linux/process_vm_exec.h @@ -0,0 +1,8 @@ +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ + +#ifndef _UAPI_LINUX_PROCESS_VM_EXEC_H
[PATCH 1/4] signal: add a helper to restore a process state from sigcontex
It will be used to implement process_vm_exec. Signed-off-by: Andrei Vagin --- arch/x86/kernel/signal.c | 78 ++-- 1 file changed, 43 insertions(+), 35 deletions(-) diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c index be0d7d4152ec..cc269a20dd5f 100644 --- a/arch/x86/kernel/signal.c +++ b/arch/x86/kernel/signal.c @@ -79,51 +79,43 @@ static void force_valid_ss(struct pt_regs *regs) # define CONTEXT_COPY_SIZE sizeof(struct sigcontext) #endif -static int restore_sigcontext(struct pt_regs *regs, - struct sigcontext __user *usc, +static int __restore_sigcontext(struct pt_regs *regs, + struct sigcontext __user *sc, unsigned long uc_flags) { - struct sigcontext sc; - - /* Always make any pending restarted system calls return -EINTR */ - current->restart_block.fn = do_no_restart_syscall; - - if (copy_from_user(, usc, CONTEXT_COPY_SIZE)) - return -EFAULT; - #ifdef CONFIG_X86_32 - set_user_gs(regs, sc.gs); - regs->fs = sc.fs; - regs->es = sc.es; - regs->ds = sc.ds; + set_user_gs(regs, sc->gs); + regs->fs = sc->fs; + regs->es = sc->es; + regs->ds = sc->ds; #endif /* CONFIG_X86_32 */ - regs->bx = sc.bx; - regs->cx = sc.cx; - regs->dx = sc.dx; - regs->si = sc.si; - regs->di = sc.di; - regs->bp = sc.bp; - regs->ax = sc.ax; - regs->sp = sc.sp; - regs->ip = sc.ip; + regs->bx = sc->bx; + regs->cx = sc->cx; + regs->dx = sc->dx; + regs->si = sc->si; + regs->di = sc->di; + regs->bp = sc->bp; + regs->ax = sc->ax; + regs->sp = sc->sp; + regs->ip = sc->ip; #ifdef CONFIG_X86_64 - regs->r8 = sc.r8; - regs->r9 = sc.r9; - regs->r10 = sc.r10; - regs->r11 = sc.r11; - regs->r12 = sc.r12; - regs->r13 = sc.r13; - regs->r14 = sc.r14; - regs->r15 = sc.r15; + regs->r8 = sc->r8; + regs->r9 = sc->r9; + regs->r10 = sc->r10; + regs->r11 = sc->r11; + regs->r12 = sc->r12; + regs->r13 = sc->r13; + regs->r14 = sc->r14; + regs->r15 = sc->r15; #endif /* CONFIG_X86_64 */ /* Get CS/SS and force CPL3 */ - regs->cs = sc.cs | 0x03; - regs->ss = sc.ss | 0x03; + regs->cs = sc->cs | 0x03; + regs->ss = sc->ss | 0x03; - regs->flags = (regs->flags & ~FIX_EFLAGS) | (sc.flags & FIX_EFLAGS); + regs->flags = (regs->flags & ~FIX_EFLAGS) | (sc->flags & FIX_EFLAGS); /* disable syscall checks */ regs->orig_ax = -1; @@ -136,10 +128,26 @@ static int restore_sigcontext(struct pt_regs *regs, force_valid_ss(regs); #endif - return fpu__restore_sig((void __user *)sc.fpstate, + return fpu__restore_sig((void __user *)sc->fpstate, IS_ENABLED(CONFIG_X86_32)); } +static int restore_sigcontext(struct pt_regs *regs, + struct sigcontext __user *usc, + unsigned long uc_flags) +{ + struct sigcontext sc; + + /* Always make any pending restarted system calls return -EINTR */ + current->restart_block.fn = do_no_restart_syscall; + + if (copy_from_user(, usc, CONTEXT_COPY_SIZE)) + return -EFAULT; + + return __restore_sigcontext(regs, , uc_flags); +} + + static __always_inline int __unsafe_setup_sigcontext(struct sigcontext __user *sc, void __user *fpstate, struct pt_regs *regs, unsigned long mask) -- 2.29.2
[PATCH 2/4] arch/x86: implement the process_vm_exec syscall
This change introduces the new system call: process_vm_exec(pid_t pid, struct sigcontext *uctx, unsigned long flags, siginfo_t * uinfo, sigset_t *sigmask, size_t sizemask) process_vm_exec allows to execute the current process in an address space of another process. process_vm_exec swaps the current address space with an address space of a specified process, sets a state from sigcontex and resumes the process. When a process receives a signal or calls a system call, process_vm_exec saves the process state back to sigcontext, restores the origin address space, restores the origin process state, and returns to userspace. If it was interrupted by a signal and the signal is in the user_mask, the signal is dequeued and information about it is saved in uinfo. If process_vm_exec is interrupted by a system call, a synthetic siginfo for the SIGSYS signal is generated. The behavior of this system call is similar to PTRACE_SYSEMU but everything is happing in the context of one process, so process_vm_exec shows a better performance. PTRACE_SYSEMU is primarily used to implement sandboxes (application kernels) like User-mode Linux or gVisor. These type of sandboxes intercepts applications system calls and acts as the guest kernel. A simple benchmark, where a "tracee" process executes systems calls in a loop and a "tracer" process traps syscalls and handles them just incrementing the tracee instruction pointer to skip the syscall instruction shows that process_vm_exec works more than 5 times faster than PTRACE_SYSEMU. Signed-off-by: Andrei Vagin --- arch/Kconfig | 15 +++ arch/x86/Kconfig | 1 + arch/x86/entry/common.c| 16 +++ arch/x86/entry/syscalls/syscall_64.tbl | 1 + arch/x86/include/asm/sigcontext.h | 2 + arch/x86/kernel/Makefile | 1 + arch/x86/kernel/process_vm_exec.c | 133 + arch/x86/kernel/signal.c | 47 + include/linux/process_vm_exec.h| 15 +++ include/linux/sched.h | 7 ++ include/linux/syscalls.h | 6 ++ include/uapi/asm-generic/unistd.h | 4 +- kernel/fork.c | 9 ++ kernel/sys_ni.c| 2 + 14 files changed, 258 insertions(+), 1 deletion(-) create mode 100644 arch/x86/kernel/process_vm_exec.c create mode 100644 include/linux/process_vm_exec.h diff --git a/arch/Kconfig b/arch/Kconfig index ba4e966484ab..3ed9b8fb1727 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -514,6 +514,21 @@ config SECCOMP_FILTER See Documentation/userspace-api/seccomp_filter.rst for details. +config HAVE_ARCH_PROCESS_VM_EXEC + bool + help + An arch should select this symbol to support the process_vm_exec system call. + +config PROCESS_VM_EXEC + prompt "Enable the process_vm_exec syscall" + def_bool y + depends on HAVE_ARCH_PROCESS_VM_EXEC + help + process_vm_exec allows executing code and system calls in a specified + address space. + + If unsure, say Y. + config HAVE_ARCH_STACKLEAK bool help diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index fbf26e0f7a6a..1c7ebb58865e 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -27,6 +27,7 @@ config X86_64 select ARCH_SUPPORTS_INT128 if CC_HAS_INT128 select ARCH_USE_CMPXCHG_LOCKREF select HAVE_ARCH_SOFT_DIRTY + select HAVE_ARCH_PROCESS_VM_EXEC select MODULES_USE_ELF_RELA select NEED_DMA_MAP_STATE select SWIOTLB diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c index 870efeec8bda..42eac459b25b 100644 --- a/arch/x86/entry/common.c +++ b/arch/x86/entry/common.c @@ -19,6 +19,7 @@ #include #include #include +#include #ifdef CONFIG_XEN_PV #include @@ -38,6 +39,21 @@ #ifdef CONFIG_X86_64 __visible noinstr void do_syscall_64(unsigned long nr, struct pt_regs *regs) { +#ifdef CONFIG_PROCESS_VM_EXEC + if (current->exec_mm && current->exec_mm->ctx) { + kernel_siginfo_t info = { + .si_signo = SIGSYS, + .si_call_addr = (void __user *)KSTK_EIP(current), + .si_arch = syscall_get_arch(current), + .si_syscall = nr, + }; + restore_vm_exec_context(regs); + regs->ax = copy_siginfo_to_user(current->exec_mm->siginfo, ); + syscall_exit_to_user_mode(regs); + return; + } +#endif + nr = syscall_enter_from_user_mode(regs, nr); instrumentation_begin(); diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl index 379819244b91..2a8e27b2d87e 100644 --- a/arch/x86/entry/syscalls/syscall_64.tbl +++ b/arch/x86/entry/syscalls/syscall_64.tbl @@ -362,6 +362,7 @@ 438common pidfd_getfd sys_pidfd_getfd 439common
[PATCH 0/4 POC] Allow executing code and syscalls in another address space
We already have process_vm_readv and process_vm_writev to read and write to a process memory faster than we can do this with ptrace. And now it is time for process_vm_exec that allows executing code in an address space of another process. We can do this with ptrace but it is much slower. = Use-cases = Here are two known use-cases. The first one is “application kernel” sandboxes like User-mode Linux and gVisor. In this case, we have a process that runs the sandbox kernel and a set of stub processes that are used to manage guest address spaces. Guest code is executed in the context of stub processes but all system calls are intercepted and handled in the sandbox kernel. Right now, these sort of sandboxes use PTRACE_SYSEMU to trap system calls, but the process_vm_exec can significantly speed them up. Another use-case is CRIU (Checkpoint/Restore in User-space). Several process properties can be received only from the process itself. Right now, we use a parasite code that is injected into the process. We do this with ptrace but it is slow, unsafe, and tricky. process_vm_exec can simplify the process of injecting a parasite code and it will allow pre-dump memory without stopping processes. The pre-dump here is when we enable a memory tracker and dump the memory while a process is continue running. On each interaction we dump memory that has been changed from the previous iteration. In the final step, we will stop processes and dump their full state. Right now the most effective way to dump process memory is to create a set of pipes and splice memory into these pipes from the parasite code. With process_vm_exec, we will be able to call vmsplice directly. It means that we will not need to stop a process to inject the parasite code. = How it works = process_vm_exec has two modes: * Execute code in an address space of a target process and stop on any signal or system call. * Execute a system call in an address space of a target process. int process_vm_exec(pid_t pid, struct sigcontext uctx, unsigned long flags, siginfo_t siginfo, sigset_t *sigmask, size_t sizemask) PID - target process identification. We can consider to use pidfd instead of PID here. sigcontext contains a process state with what the process will be resumed after switching the address space and then when a process will be stopped, its sate will be saved back to sigcontext. siginfo is information about a signal that has interrupted the process. If a process is interrupted by a system call, signfo will contain a synthetic siginfo of the SIGSYS signal. sigmask is a set of signals that process_vm_exec returns via signfo. # How fast is it In the fourth patch, you can find two benchmarks that execute a function that calls system calls in a loop. ptrace_vm_exe uses ptrace to trap system calls, proces_vm_exec uses the process_vm_exec syscall to do the same thing. ptrace_vm_exec: 1446 ns/syscall ptrocess_vm_exec: 289 ns/syscall PS: This version is just a prototype. Its goal is to collect the initial feedback, to discuss the interfaces, and maybe to get some advice on implementation.. Cc: Andrew Morton Cc: Andy Lutomirski Cc: Anton Ivanov Cc: Christian Brauner Cc: Dmitry Safonov <0x7f454...@gmail.com> Cc: Ingo Molnar Cc: Jeff Dike Cc: Mike Rapoport Cc: Michael Kerrisk (man-pages) Cc: Oleg Nesterov Cc: Peter Zijlstra Cc: Richard Weinberger Cc: Thomas Gleixner Andrei Vagin (4): signal: add a helper to restore a process state from sigcontex arch/x86: implement the process_vm_exec syscall arch/x86: allow to execute syscalls via process_vm_exec selftests: add tests for process_vm_exec arch/Kconfig | 15 ++ arch/x86/Kconfig | 1 + arch/x86/entry/common.c | 19 +++ arch/x86/entry/syscalls/syscall_64.tbl| 1 + arch/x86/include/asm/sigcontext.h | 2 + arch/x86/kernel/Makefile | 1 + arch/x86/kernel/process_vm_exec.c | 160 ++ arch/x86/kernel/signal.c | 125 ++ include/linux/entry-common.h | 2 + include/linux/process_vm_exec.h | 17 ++ include/linux/sched.h | 7 + include/linux/syscalls.h | 6 + include/uapi/asm-generic/unistd.h | 4 +- include/uapi/linux/process_vm_exec.h | 8 + kernel/entry/common.c | 2 +- kernel/fork.c | 9 + kernel/sys_ni.c | 2 + .../selftests/process_vm_exec/Makefile| 7 + tools/testing/selftests/process_vm_exec/log.h | 26 +++ .../process_vm_exec/process_vm_exec.c | 105 .../process_vm_exec/process_vm_exec_fault.c | 111 .../process_vm_exec/process_vm_exec_syscall.c | 81 + .../process_vm_exec/ptrace_vm_exec.c
Re: [PATCH] riscv: locks: introduce ticket-based spinlock implementation
On Tue, Apr 13, 2021 at 6:54 PM David Laight wrote: > > From: Catalin Marinas > > Sent: 13 April 2021 11:45 > ... > > This indeed needs some care. IIUC RISC-V has similar restrictions as arm > > here, no load/store instructions are allowed between LR and SC. You > > can't guarantee that the compiler won't spill some variable onto the > > stack. > > You can probably never guarantee the compiler won't spill to stack. > Especially if someone compiles with -O0. > > Which probably means that anything using LR/SC must be written in > asm and the C wrappers disabled. Agree, and cmpxchg has been widely used in Linux. I think it's the last requirement for complex atomic API, although cmpxchg has ABA problem: CPU0 CPU1 === == do { old32 = load32; *ptr32 = new32_tmp; *ptr32 = old32; load32 = cmpxchg(ptr32, old32, new32); //still success } while (load32 != old32); That means cmpxhg only cares about the result but not the middle situation. It's different from LR/SC or AMO instructions. > > David > > - > Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 > 1PT, UK > Registration No: 1397386 (Wales) > -- Best Regards Guo Ren ML: https://lore.kernel.org/linux-csky/
[PATCH v7 3/9] regulator: add warning flags
Add 'warning' level events and error flags to regulator core. Current regulator core notifications are used to inform consumers about errors where HW is misbehaving in such way it is assumed to be broken/unrecoverable. There are PMICs which are designed for system(s) that may have use for regulator indications sent before HW is damaged so that some board/consumer specific recovery-event can be performed while continuing most of the normal operations. Add new WARNING level events and notifications to be used for that purpose. Signed-off-by: Matti Vaittinen --- No changes since RFC-v2 --- include/linux/regulator/consumer.h | 14 ++ 1 file changed, 14 insertions(+) diff --git a/include/linux/regulator/consumer.h b/include/linux/regulator/consumer.h index 20e84a84fb77..f72ca73631be 100644 --- a/include/linux/regulator/consumer.h +++ b/include/linux/regulator/consumer.h @@ -119,6 +119,16 @@ struct regulator_dev; #define REGULATOR_EVENT_PRE_DISABLE0x400 #define REGULATOR_EVENT_ABORT_DISABLE 0x800 #define REGULATOR_EVENT_ENABLE 0x1000 +/* + * Following notifications should be emitted only if detected condition + * is such that the HW is likely to still be working but consumers should + * take a recovery action to prevent problems esacalating into errors. + */ +#define REGULATOR_EVENT_UNDER_VOLTAGE_WARN 0x2000 +#define REGULATOR_EVENT_OVER_CURRENT_WARN 0x4000 +#define REGULATOR_EVENT_OVER_VOLTAGE_WARN 0x8000 +#define REGULATOR_EVENT_OVER_TEMP_WARN 0x1 +#define REGULATOR_EVENT_WARN_MASK 0x1E000 /* * Regulator errors that can be queried using regulator_get_error_flags @@ -138,6 +148,10 @@ struct regulator_dev; #define REGULATOR_ERROR_FAIL BIT(4) #define REGULATOR_ERROR_OVER_TEMP BIT(5) +#define REGULATOR_ERROR_UNDER_VOLTAGE_WARN BIT(6) +#define REGULATOR_ERROR_OVER_CURRENT_WARN BIT(7) +#define REGULATOR_ERROR_OVER_VOLTAGE_WARN BIT(8) +#define REGULATOR_ERROR_OVER_TEMP_WARN BIT(9) /** * struct pre_voltage_change_data - Data sent with PRE_VOLTAGE_CHANGE event -- 2.25.4 -- Matti Vaittinen, Linux device drivers ROHM Semiconductors, Finland SWDC Kiviharjunlenkki 1E 90220 OULU FINLAND ~~~ "I don't think so," said Rene Descartes. Just then he vanished ~~~ Simon says - in Latin please. ~~~ "non cogito me" dixit Rene Descarte, deinde evanescavit ~~~ Thanks to Simon Glass for the translation =]
Re: [PATCH net-next] net: Space: remove hp100 probe
On Wed, Apr 14, 2021, 00:42 Stephen Hemminger wrote: > > On Tue, 13 Apr 2021 16:16:17 +0200 Arnd Bergmann wrote: > > > */ > > static struct devprobe2 isa_probes[] __initdata = { > > -#if defined(CONFIG_HP100) && defined(CONFIG_ISA) /* ISA, EISA */ > > - {hp100_probe, 0}, > > -#endif > > #ifdef CONFIG_3C515 > > {tc515_probe, 0}, > > #endif > > Thanks, do we even need to have the static initialization anymore? I actually did some more cleanups after I sent the above patch when I found out that this code still exists. It turned out that above half of the static initializations are completely pointless because the drivers never rely on the netdev= command line arguments and can simply be changed to always using module_init() instead of relying on net_olddevs_init() for the built-in case. The remaining ones are all ISA drivers: 3c515, Ultra, WD80x3, NE2000, Lance, SMC9194, CS89x0, NI65 and COPS. With my cleanups, I move the netdev_boot_setup infrastructure into drivers/net/Space.c and only compile it when at least one of these eight drivers is enabled. All these drivers also support being built as loadable modules, but in that configuration they only support a single device (back in the day you could copy the module and just load it twice to support more than one instance, not sure we still want to support that). None of these drivers have a maintainer listed, but I suppose there are still some PC/104 machines with NE2000 network cards that could theoretically run a modern kernel. Arnd
[PATCH v7 2/9] reboot: thermal: Export hardware protection shutdown
Thermal core contains a logic for safety shutdown. System is attempted to be powered off if temperature exceeds safety limits. Currently this can be also utilized by regulator subsystem as a final protection measure if PMICs report dangerous over-voltage, over-current or over-temperature and if per regulator counter measures fail or do not exist. Move this logic to kernel/reboot.c and export the functionality for other subsystems to use. Also replace the mutex with a spinlock to allow using the function from any context. Also the EMIF bus code has implemented a safety shut-down. EMIF does not attempt orderly_poweroff at all. Thus the EMIF code is not converted to use this new function. Signed-off-by: Matti Vaittinen --- Changelog v7: - new patch Please note - this patch has received only a minimal amount of testing. (The new API call was tested to shut-down my system at driver probe but no odd corner-cases have been tested). Any testing for thermal shutdown is appreciated. --- drivers/thermal/thermal_core.c | 63 ++--- include/linux/reboot.h | 1 + kernel/reboot.c| 86 ++ 3 files changed, 91 insertions(+), 59 deletions(-) diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c index 996c038f83a4..b1444845af38 100644 --- a/drivers/thermal/thermal_core.c +++ b/drivers/thermal/thermal_core.c @@ -36,10 +36,8 @@ static LIST_HEAD(thermal_governor_list); static DEFINE_MUTEX(thermal_list_lock); static DEFINE_MUTEX(thermal_governor_lock); -static DEFINE_MUTEX(poweroff_lock); static atomic_t in_suspend; -static bool power_off_triggered; static struct thermal_governor *def_governor; @@ -327,70 +325,18 @@ static void handle_non_critical_trips(struct thermal_zone_device *tz, int trip) def_governor->throttle(tz, trip); } -/** - * thermal_emergency_poweroff_func - emergency poweroff work after a known delay - * @work: work_struct associated with the emergency poweroff function - * - * This function is called in very critical situations to force - * a kernel poweroff after a configurable timeout value. - */ -static void thermal_emergency_poweroff_func(struct work_struct *work) -{ - /* -* We have reached here after the emergency thermal shutdown -* Waiting period has expired. This means orderly_poweroff has -* not been able to shut off the system for some reason. -* Try to shut down the system immediately using kernel_power_off -* if populated -*/ - WARN(1, "Attempting kernel_power_off: Temperature too high\n"); - kernel_power_off(); - - /* -* Worst of the worst case trigger emergency restart -*/ - WARN(1, "Attempting emergency_restart: Temperature too high\n"); - emergency_restart(); -} - -static DECLARE_DELAYED_WORK(thermal_emergency_poweroff_work, - thermal_emergency_poweroff_func); - -/** - * thermal_emergency_poweroff - Trigger an emergency system poweroff - * - * This may be called from any critical situation to trigger a system shutdown - * after a known period of time. By default this is not scheduled. - */ -static void thermal_emergency_poweroff(void) +void thermal_zone_device_critical(struct thermal_zone_device *tz) { - int poweroff_delay_ms = CONFIG_THERMAL_EMERGENCY_POWEROFF_DELAY_MS; /* * poweroff_delay_ms must be a carefully profiled positive value. -* Its a must for thermal_emergency_poweroff_work to be scheduled +* Its a must for forced_emergency_poweroff_work to be scheduled. */ - if (poweroff_delay_ms <= 0) - return; - schedule_delayed_work(_emergency_poweroff_work, - msecs_to_jiffies(poweroff_delay_ms)); -} + int poweroff_delay_ms = CONFIG_THERMAL_EMERGENCY_POWEROFF_DELAY_MS; -void thermal_zone_device_critical(struct thermal_zone_device *tz) -{ dev_emerg(>device, "%s: critical temperature reached, " "shutting down\n", tz->type); - mutex_lock(_lock); - if (!power_off_triggered) { - /* -* Queue a backup emergency shutdown in the event of -* orderly_poweroff failure -*/ - thermal_emergency_poweroff(); - orderly_poweroff(true); - power_off_triggered = true; - } - mutex_unlock(_lock); + hw_protection_shutdown("Temperature too high", poweroff_delay_ms); } EXPORT_SYMBOL(thermal_zone_device_critical); @@ -1549,7 +1495,6 @@ static int __init thermal_init(void) ida_destroy(_cdev_ida); mutex_destroy(_list_lock); mutex_destroy(_governor_lock); - mutex_destroy(_lock); return result; } postcore_initcall(thermal_init); diff --git a/include/linux/reboot.h b/include/linux/reboot.h index 3734cd8f38a8..af907a3d68d1 100644 ---
[syzbot] KASAN: use-after-free Read in get_wchan
Hello, syzbot found the following issue on: HEAD commit:b2b3d18f riscv: Make NUMA depend on MMU git tree: git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux.git fixes console output: https://syzkaller.appspot.com/x/log.txt?x=12b59d16d0 kernel config: https://syzkaller.appspot.com/x/.config?x=81b3e7c68dad6e dashboard link: https://syzkaller.appspot.com/bug?extid=0806291048161061627c userspace arch: riscv64 Unfortunately, I don't have any reproducer for this issue yet. IMPORTANT: if you fix the issue, please add the following tag to the commit: Reported-by: syzbot+08062910481610616...@syzkaller.appspotmail.com == BUG: KASAN: use-after-free in walk_stackframe arch/riscv/kernel/stacktrace.c:60 [inline] BUG: KASAN: use-after-free in get_wchan+0x156/0x196 arch/riscv/kernel/stacktrace.c:136 Read of size 8 at addr ffe0058e3d90 by task syz-executor.0/4667 CPU: 1 PID: 4667 Comm: syz-executor.0 Not tainted 5.12.0-rc5-syzkaller-00721-gb2b3d18fc20e #0 Hardware name: riscv-virtio,qemu (DT) Call Trace: [] walk_stackframe+0x0/0x23c arch/riscv/kernel/traps.c:201 [] dump_backtrace+0x40/0x4e arch/riscv/kernel/stacktrace.c:113 [] show_stack+0x22/0x2e arch/riscv/kernel/stacktrace.c:118 [] __dump_stack lib/dump_stack.c:79 [inline] [] dump_stack+0x148/0x1d8 lib/dump_stack.c:120 [] print_address_description.constprop.0+0x52/0x31e mm/kasan/report.c:232 [] __kasan_report mm/kasan/report.c:399 [inline] [] kasan_report+0x16e/0x18c mm/kasan/report.c:416 [] check_region_inline mm/kasan/generic.c:180 [inline] [] __asan_load8+0x6e/0x80 mm/kasan/generic.c:253 [] walk_stackframe arch/riscv/kernel/stacktrace.c:60 [inline] [] get_wchan+0x156/0x196 arch/riscv/kernel/stacktrace.c:136 [] proc_pid_wchan+0x48/0xa4 fs/proc/base.c:390 [] proc_single_show+0x9c/0x13c fs/proc/base.c:774 [] seq_read_iter+0x2e0/0x8f2 fs/seq_file.c:227 [] seq_read+0x200/0x298 fs/seq_file.c:159 [] vfs_read+0x108/0x2ac fs/read_write.c:494 [] ksys_read+0xb4/0x1b8 fs/read_write.c:634 [] __do_sys_read fs/read_write.c:644 [inline] [] sys_read+0x28/0x36 fs/read_write.c:642 [] ret_from_syscall+0x0/0x2 The buggy address belongs to the page: page:ffcf0216b8c0 refcount:0 mapcount:0 mapping: index:0x0 pfn:0x85ae3 flags: 0xffe() raw: 0ffe ffcf0216b8c8 ffcf0216b8c8 raw: page dumped because: kasan: bad access detected Memory state around the buggy address: ffe0058e3c80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ffe0058e3d00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff >ffe0058e3d80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ^ ffe0058e3e00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ffe0058e3e80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff == --- This report is generated by a bot. It may contain errors. See https://goo.gl/tpsmEJ for more information about syzbot. syzbot engineers can be reached at syzkal...@googlegroups.com. syzbot will keep track of this issue. See: https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
[PATCH v7 1/9] dt_bindings: Add protection limit properties
Support specifying protection/error/warning limits for regulator over current, over temperature and over/under voltage. Most of the PMICs support only "protection" feature but few setups do also support error/warning level indications. On many ICs most of the protection limits can't actually be set. But for example the ampere limit for over-current protection on ROHM BD9576 can be configured - or feature can be completely disabled. Provide limit setting for all protections/errors for the sake of the completeness and do that using own properties for all so that not all users would need to set all levels when only one or few are supported. Signed-off-by: Matti Vaittinen Reviewed-by: Rob Herring --- No changes since RFC-v2 --- .../bindings/regulator/regulator.yaml | 82 +++ 1 file changed, 82 insertions(+) diff --git a/Documentation/devicetree/bindings/regulator/regulator.yaml b/Documentation/devicetree/bindings/regulator/regulator.yaml index 6d0bc9cd4040..a6ae9ecae5cc 100644 --- a/Documentation/devicetree/bindings/regulator/regulator.yaml +++ b/Documentation/devicetree/bindings/regulator/regulator.yaml @@ -117,6 +117,88 @@ properties: description: Enable over current protection. type: boolean + regulator-oc-protection-microamp: +description: Set over current protection limit. This is a limit where + hardware performs emergency shutdown. Zero can be passed to disable + protection and value '1' indicates that protection should be enabled but + limit setting can be omitted. + + regulator-oc-error-microamp: +description: Set over current error limit. This is a limit where part of + the hardware propably is malfunctional and damage prevention is requested. + Zero can be passed to disable error detection and value '1' indicates + that detection should be enabled but limit setting can be omitted. + + regulator-oc-warn-microamp: +description: Set over current warning limit. This is a limit where hardware + is assumed still to be functional but approaching limit where it gets + damaged. Recovery actions should be initiated. Zero can be passed to + disable detection and value '1' indicates that detection should + be enabled but limit setting can be omitted. + + regulator-ov-protection-microvolt: +description: Set over voltage protection limit. This is a limit where + hardware performs emergency shutdown. Zero can be passed to disable + protection and value '1' indicates that protection should be enabled but + limit setting can be omitted. Limit is given as microvolt offset from + voltage set to regulator. + + regulator-ov-error-microvolt: +description: Set over voltage error limit. This is a limit where part of + the hardware propably is malfunctional and damage prevention is requested + Zero can be passed to disable error detection and value '1' indicates + that detection should be enabled but limit setting can be omitted. Limit + is given as microvolt offset from voltage set to regulator. + + regulator-ov-warn-microvolt: +description: Set over voltage warning limit. This is a limit where hardware + is assumed still to be functional but approaching limit where it gets + damaged. Recovery actions should be initiated. Zero can be passed to + disable detection and value '1' indicates that detection should + be enabled but limit setting can be omitted. Limit is given as microvolt + offset from voltage set to regulator. + + regulator-uv-protection-microvolt: +description: Set over under voltage protection limit. This is a limit where + hardware performs emergency shutdown. Zero can be passed to disable + protection and value '1' indicates that protection should be enabled but + limit setting can be omitted. Limit is given as microvolt offset from + voltage set to regulator. + + regulator-uv-error-microvolt: +description: Set under voltage error limit. This is a limit where part of + the hardware propably is malfunctional and damage prevention is requested + Zero can be passed to disable error detection and value '1' indicates + that detection should be enabled but limit setting can be omitted. Limit + is given as microvolt offset from voltage set to regulator. + + regulator-uv-warn-microvolt: +description: Set over under voltage warning limit. This is a limit where + hardware is assumed still to be functional but approaching limit where + it gets damaged. Recovery actions should be initiated. Zero can be passed + to disable detection and value '1' indicates that detection should + be enabled but limit setting can be omitted. Limit is given as microvolt + offset from voltage set to regulator. + + regulator-temp-protection-kelvin: +description: Set over temperature protection limit. This is a limit where + hardware performs emergency shutdown.
Re: [PATCH 1/2] kunit: Fix formatting of KUNIT tests to meet the standard
On Wed, Apr 14, 2021 at 12:33:02AM -0400, Nico Pache wrote: > There are few instances of KUNIT tests that are not properly defined. > This commit focuses on correcting these issues to match the standard > defined in the Documentation. > > Issues Fixed: > - Tests should default to KUNIT_ALL_TESTS > - Tests configs tristate should have `if !KUNIT_ALL_TESTS` > - Tests should end in KUNIT_TEST, some fixes have been applied to > correct issues were KUNIT_TESTS is used or KUNIT is not mentioned. You are changing lots of different things all in one patch, please break this up into "one config option change per patch" to make it easier to review and get merged through the proper subsystem. thanks, greg k-h
[PATCH v7 0/9] Extend regulator notification support
Extend regulator notification support This series extends the regulator notification and error flag support. Initial discussion on the topic can be found here: https://lore.kernel.org/lkml/6046836e22b8252983f08d5621c35ececb97820d.ca...@fi.rohmeurope.com/ This series is built on top of the BD9576MUF support patch series v9 which is currently in MFD tree at immutable branch ib-mfd-watchdog-5.13 https://lore.kernel.org/lkml/cover.1615219345.git.matti.vaitti...@fi.rohmeurope.com/ (The series should apply without those patches but there is compile time dependency to definitions brought in at the last patch of the BD9576 series. This should be Ok though as there is a Kconfig dependency in BD9576 regulator driver) In a nutshell - the series adds: 1. WARNING level events/error flags. (Patch 3) Current regulator 'ERROR' event notifications for over/under voltage, over current and over temperature are used to indicate condition where monitored entity is so badly "off" that it actually indicates a hardware error which can not be recovered. The most typical hanling for that is believed to be a (graceful) system-shutdown. Here we add set of 'WARNING' level flags to allow sending notifications to consumers before things are 'that badly off' so that consumer drivers can implement recovery-actions. 2. Device-tree properties for specifying limit values. (Patches 1, 5) Add limits for above mentioned 'ERROR' and 'WARNING' levels (which send notifications to consumers) and also for a 'PROTECTION' level (which will be used to immediately shut-down the regulator(s) W/O informing consumer drivers. Typically implemented by hardware). Property parsing is implemented in regulator core which then calls callback operations for limit setting from the IC drivers. A warning is emitted if protection is requested by device tree but the underlying IC does not support configuring requested protection. 3. Helpers which can be registered by IC. (Patch 4) Target is to avoid implementing IRQ handling and IRQ storm protection in each IC driver. (Many of the ICs implementin these IRQs do not allow masking or acking the IRQ but keep the IRQ asserted for the whole duration of problem keeping the processor in IRQ handling loop). 4. Emergency poweroff function (refactored out of the thermal_core to kernel/reboot.c) which is called if IC fires error IRQs but IC reading fails and given retry-count is exceeded. (Patches 2, 4) Please note that the mutex in the emergency shutdown was replaced by a spinlock to allow calls from any context. The helper was attempted to be done so it could be used to implement roughly same logic as is used in qcom-labibb regulator. This means amongst other things a safety shut-down if IC registers are not readable. Using these shut-down retry counters are optional. The idea is that the helper could be also used by simpler ICs which do not provide status register(s) which can be used to check if error is still active. ICs which do not have such status register can simply omit the 'renable' callback (and retry-counts etc) - and helper assumes the situation is Ok and re-enables IRQ after given time period. If problem persists the handler is ran again and another notification is sent - but at least the delay allows processor to avoid IRQ loop. Patch 7 takes this notification support in use at BD9576MUF. Patch 8 is related to MFD change which is not really related to the RFC here. It was added to this series in order to avoid potential conflicts. Patch 9 adds a maintainers entry. Changelog v7: general: - rebased on v5.12-rc7 - new patch for refactoring the hw-failure reboot logic out of thermal_core.c for others to use. notification helpers: - fix regulator error_flags query - grammar/typos - do not BUG() but attempt to shut-down the system - use BITS_PER_TYPE() Changelog v6: Add MAINTAINERS entry Changes to IRQ notifiers - move devm functions to drivers/regulator/devres.c - drop irq validity check - use devm_add_action_or_reset() - fix styling issues - fix kerneldocs Changelog v5: - Fix the badly formatted pr_emerg() call. Changelog v4: - rebased on v5.12-rc6 - dropped RFC - fix external FET DT-binding. - improve prints for cases when expecting HW failure. - styling and typos Changelog v3: Regulator core: - Fix dangling pointer access at regulator_irq_helper() stpmic1_regulator: - fix function prototype (compile error) bd9576-regulator: - Update over current limits to what was given in new data-sheet (REV00K) - Allow over-current monitoring without external FET. Set limits to values given in data-sheet (REV00K). Changelog v2: Generic: - rebase on v5.12-rc2 + BD9576 series - Split devm variant of delayed wq to own series Regulator framework: - Provide non devm variant of IRQ notification helpers - shorten dt-property names as suggested by Rob - unconditionally call
Re: [PATCH 1/2] kunit: Fix formatting of KUNIT tests to meet the standard
On Wed, Apr 14, 2021 at 12:33:02AM -0400, Nico Pache wrote: > There are few instances of KUNIT tests that are not properly defined. > This commit focuses on correcting these issues to match the standard > defined in the Documentation. > > Issues Fixed: > - Tests should default to KUNIT_ALL_TESTS > - Tests configs tristate should have `if !KUNIT_ALL_TESTS` > - Tests should end in KUNIT_TEST, some fixes have been applied to > correct issues were KUNIT_TESTS is used or KUNIT is not mentioned. > > No functional changes other than CONFIG name changes > Signed-off-by: Nico Pache Please put a blank line before your signed-off-by, as is required by the tools :( thanks, greg k-h
Re: [PATCH 3/4] mce/copyin: fix to not SIGBUS when copying from user hits poison
On Tue, 13 Apr 2021 12:07:22 +0200, Petkov, Borislav wrote: >> KVM apparently passes a machine check into the guest. > Ah, there it is: > static void kvm_send_hwpoison_signal(unsigned long address, struct > task_struct *tsk) > { > send_sig_mceerr(BUS_MCEERR_AR, (void __user *)address, PAGE_SHIFT, > tsk); > } This path is when EPT #PF finds accesses to a hwpoisoned page and sends SIGBUS to user space (KVM exits into user space) with the same semantic as if regular #PF found access to a hwpoisoned page. The KVM_X86_SET_MCE ioctl actually injects a machine check into the guest. We are in process to launch a product with MCE recovery capability in a KVM based virtualization product and plan to expand the scope of the application of it in the near future. > So what I'm missing with all this fun is, yeah, sure, we have this > facility out there but who's using it? Is anyone even using it at all? The in-memory database and analytical domain are definitely using it. A couple examples: SAP HANA - as we've tested and planned to launch as a strategic enterprise use case with MCE recovery capability in our product SQL server - https://support.microsoft.com/en-us/help/2967651/inf-sql-server-may-display-memory-corruption-and-recovery-errors Cheers, -Jue
Re: [syzbot] unexpected kernel reboot (4)
On Tue, Apr 13, 2021 at 11:27 PM syzbot wrote: > > Hello, > > syzbot found the following issue on: > > HEAD commit:89698bec Merge tag 'm68knommu-for-v5.12-rc7' of git://git... > git tree: upstream > console output: https://syzkaller.appspot.com/x/log.txt?x=1243fcfed0 > kernel config: https://syzkaller.appspot.com/x/.config?x=b234ddbbe2953747 > dashboard link: https://syzkaller.appspot.com/bug?extid=9ce030d4c89856b27619 > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=173e92fed0 > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=1735da2ed0 > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > Reported-by: syzbot+9ce030d4c89856b27...@syzkaller.appspotmail.com > > output_len: 0x0e74eb68 > kernel_total_size: 0x0f226000 > needed_size: 0x0f40 > trampoline_32bit: 0x0009d000 > Decompressing Linux... Parsing ELF... done. > Booting the kernel. +linux-input The reproducer connects some USB HID device and communicates with the driver. Previously we observed reboots because HID devices can trigger reboot SYSRQ, but we disable it with "CONFIG_MAGIC_SYSRQ is not set". How else can a USB device reboot the machine? Is it possible to disable it? I don't see any direct includes of in drivers/usb/* r0 = syz_usb_connect$hid(0x0, 0x36, &(0x7f00)=ANY=[@ANYBLOB="12010e402609411b000109022400010904010301092110012201000905810308"], 0x0) syz_usb_control_io(r0, 0x0, 0x0) syz_usb_control_io$hid(r0, &(0x7f001440)={0x24, 0x0, 0x0, &(0x7f40)={0x0, 0x22, 0x1, {[@local]}}, 0x0}, 0x0) syz_usb_ep_write(r0, 0x0, 0xfd, &(0x7f000b80)="34981a23c3490d163907e65ff758478e74cd7dc073700ebf655f1ce3394018ade882075917a36a30ad3594f98282ea729f3620534fd655c69ebec66aa7397e843ee79879e825e6a31a189616c611912dee259ab9d8ff1566c90ae8985ec380bcab6b8265695f7b76654377adab6b1930de1f44060021f50f1dd3fff126f862f378ef2deb2d4331b9bcb3f394062133b4bb44a7f168473f7ca3d9945bfb4c456b22428a7a11d5d7df1fcc4f7ffad0e526d34321fb6aedfb5dd4fc6797cba2cf45369daea9f0953bf1a8343aa7548f3f9817c6a1bedde9dcaa4b8eed4a493828384fc9ccb7d230967ea0cb2003076ac7d9f386a5fbaec392") > --- > This report is generated by a bot. It may contain errors. > See https://goo.gl/tpsmEJ for more information about syzbot. > syzbot engineers can be reached at syzkal...@googlegroups.com. > > syzbot will keep track of this issue. See: > https://goo.gl/tpsmEJ#status for how to communicate with syzbot. > syzbot can test patches for this issue, for details see: > https://goo.gl/tpsmEJ#testing-patches
Re: [PATCH 1/5] mm/swapfile: add percpu_ref support for swap
Dennis Zhou writes: > On Wed, Apr 14, 2021 at 11:59:03AM +0800, Huang, Ying wrote: >> Dennis Zhou writes: >> >> > Hello, >> > >> > On Wed, Apr 14, 2021 at 10:06:48AM +0800, Huang, Ying wrote: >> >> Miaohe Lin writes: >> >> >> >> > On 2021/4/14 9:17, Huang, Ying wrote: >> >> >> Miaohe Lin writes: >> >> >> >> >> >>> On 2021/4/12 15:24, Huang, Ying wrote: >> >> "Huang, Ying" writes: >> >> >> >> > Miaohe Lin writes: >> >> > >> >> >> We will use percpu-refcount to serialize against concurrent >> >> >> swapoff. This >> >> >> patch adds the percpu_ref support for later fixup. >> >> >> >> >> >> Signed-off-by: Miaohe Lin >> >> >> --- >> >> >> include/linux/swap.h | 2 ++ >> >> >> mm/swapfile.c| 25 ++--- >> >> >> 2 files changed, 24 insertions(+), 3 deletions(-) >> >> >> >> >> >> diff --git a/include/linux/swap.h b/include/linux/swap.h >> >> >> index 144727041e78..849ba5265c11 100644 >> >> >> --- a/include/linux/swap.h >> >> >> +++ b/include/linux/swap.h >> >> >> @@ -240,6 +240,7 @@ struct swap_cluster_list { >> >> >> * The in-memory structure used to track swap areas. >> >> >> */ >> >> >> struct swap_info_struct { >> >> >> + struct percpu_ref users;/* serialization against >> >> >> concurrent swapoff */ >> >> >>unsigned long flags; /* SWP_USED etc: see above */ >> >> >>signed shortprio; /* swap priority of this type */ >> >> >>struct plist_node list; /* entry in swap_active_head */ >> >> >> @@ -260,6 +261,7 @@ struct swap_info_struct { >> >> >>struct block_device *bdev; /* swap device or bdev of swap >> >> >> file */ >> >> >>struct file *swap_file; /* seldom referenced */ >> >> >>unsigned int old_block_size;/* seldom referenced */ >> >> >> + struct completion comp; /* seldom referenced */ >> >> >> #ifdef CONFIG_FRONTSWAP >> >> >>unsigned long *frontswap_map; /* frontswap in-use, one bit >> >> >> per page */ >> >> >>atomic_t frontswap_pages; /* frontswap pages in-use >> >> >> counter */ >> >> >> diff --git a/mm/swapfile.c b/mm/swapfile.c >> >> >> index 149e77454e3c..724173cd7d0c 100644 >> >> >> --- a/mm/swapfile.c >> >> >> +++ b/mm/swapfile.c >> >> >> @@ -39,6 +39,7 @@ >> >> >> #include >> >> >> #include >> >> >> #include >> >> >> +#include >> >> >> >> >> >> #include >> >> >> #include >> >> >> @@ -511,6 +512,15 @@ static void swap_discard_work(struct >> >> >> work_struct *work) >> >> >>spin_unlock(>lock); >> >> >> } >> >> >> >> >> >> +static void swap_users_ref_free(struct percpu_ref *ref) >> >> >> +{ >> >> >> + struct swap_info_struct *si; >> >> >> + >> >> >> + si = container_of(ref, struct swap_info_struct, users); >> >> >> + complete(>comp); >> >> >> + percpu_ref_exit(>users); >> >> > >> >> > Because percpu_ref_exit() is used, we cannot use >> >> > percpu_ref_tryget() in >> >> > get_swap_device(), better to add comments there. >> >> >> >> I just noticed that the comments of percpu_ref_tryget_live() says, >> >> >> >> * This function is safe to call as long as @ref is between init and >> >> exit. >> >> >> >> While we need to call get_swap_device() almost at any time, so it's >> >> better to avoid to call percpu_ref_exit() at all. This will waste >> >> some >> >> memory, but we need to follow the API definition to avoid potential >> >> issues in the long term. >> >> >>> >> >> >>> I have to admit that I'am not really familiar with percpu_ref. So I >> >> >>> read the >> >> >>> implementation code of the percpu_ref and found >> >> >>> percpu_ref_tryget_live() could >> >> >>> be called after exit now. But you're right we need to follow the API >> >> >>> definition >> >> >>> to avoid potential issues in the long term. >> >> >>> >> >> >> >> And we need to call percpu_ref_init() before insert the >> >> swap_info_struct >> >> into the swap_info[]. >> >> >>> >> >> >>> If we remove the call to percpu_ref_exit(), we should not use >> >> >>> percpu_ref_init() >> >> >>> here because *percpu_ref->data is assumed to be NULL* in >> >> >>> percpu_ref_init() while >> >> >>> this is not the case as we do not call percpu_ref_exit(). Maybe >> >> >>> percpu_ref_reinit() >> >> >>> or percpu_ref_resurrect() will do the work. >> >> >>> >> >> >>> One more thing, how could I distinguish the killed percpu_ref from >> >> >>> newly allocated one? >> >> >>> It seems percpu_ref_is_dying is only safe to call when @ref is >> >> >>> between init and exit. >> >> >>> Maybe I could do this in alloc_swap_info()? >> >> >> >> >> >> Yes. In alloc_swap_info(), you can distinguish newly allocated and >> >> >> reused
Re: [PATCH v7 2/3] dt-bindings: touchscreen: Add HY46XX bindings
On Tue, Apr 13, 2021 at 04:44:45PM +0200, Giulio Benetti wrote: > This adds device tree bindings for the Hycon HY46XX touchscreen series. > > Signed-off-by: Giulio Benetti Applied, thank you. -- Dmitry
Re: [PATCH v7 1/3] dt-bindings: Add Hycon Technology vendor prefix
On Tue, Apr 13, 2021 at 04:44:44PM +0200, Giulio Benetti wrote: > Update Documentation/devicetree/bindings/vendor-prefixes.yaml to > include "hycon" as a vendor prefix for "Hycon Technology". > Company website: https://www.hycontek.com/ > > Signed-off-by: Giulio Benetti > Reviewed-by: Jonathan Neuschäfer > Acked-by: Rob Herring Applied, thank you. -- Dmitry
Re: [PATCH v7 3/3] Input: add driver for the Hycon HY46XX touchpanel series
Hi Giulio, On Tue, Apr 13, 2021 at 04:44:46PM +0200, Giulio Benetti wrote: > + > + input_mt_report_pointer_emulation(tsdata->input, true); For touchscreens it does not make much sense to report BTN_DOUBLETAP, BTN_TRIPLETAP, etc, events (they are really for touchpads), so I changed this to input_mt_report_pointer_emulation(tsdata->input, false); to only report ABS_X, ABS_Y, and BTN_TOUCH, and applied. Thanks. -- Dmitry
Re: [PATCH 5.4 v3 1/1] iommu/vt-d: Fix agaw for a supported 48 bit guest address width
On Tue, Apr 13, 2021 at 11:05:34AM -0700, Saeed Mirzamohammadi wrote: > Hi Greg, > > I don’t have any commit ID since the fix is not in mainline or any Linus’ > tree yet. The driver has completely changed for newer stable versions (and > also mainline) and the fix only applies for 5.4, 4.19, and 4.14 stable > kernels. Why can we not just take what is in mainline? And if not, then you need to document the heck out of this in the changelog text, and get all of the related maintainers in the area to sign off on this. Diverging from Linus's tree creates a big burden over time, you have to make this really really obvious why you are doing this, and what you are doing here so that everyone agrees with it. Remember, 90% of all of these types of "do it differently than Linus's tree" are buggy and cause problems, be very careful. Please fix up and resend. thanks, greg k-h
Re: [PATCH net-next v2 1/2] of: net: pass the dst buffer to of_get_mac_address()
Hi Michael, url: https://github.com/0day-ci/linux/commits/Michael-Walle/of-net-support-non-platform-devices-in-of_get_mac_address/20210406-234030 base: https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git cc0626c2aaed8e475efdd85fa374b497a7192e35 config: x86_64-randconfig-m001-20210406 (attached as .config) compiler: gcc-9 (Debian 9.3.0-22) 9.3.0 If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot Reported-by: Dan Carpenter smatch warnings: drivers/net/ethernet/xilinx/xilinx_axienet_main.c:2069 axienet_probe() warn: passing a valid pointer to 'PTR_ERR' vim +/PTR_ERR +2069 drivers/net/ethernet/xilinx/xilinx_axienet_main.c 522856cefaf09d Robert Hancock 2019-06-06 2060 /* Check for Ethernet core IRQ (optional) */ 522856cefaf09d Robert Hancock 2019-06-06 2061 if (lp->eth_irq <= 0) 522856cefaf09d Robert Hancock 2019-06-06 2062 dev_info(>dev, "Ethernet core IRQ not defined\n"); 522856cefaf09d Robert Hancock 2019-06-06 2063 8a3b7a252dca9f Daniel Borkmann 2012-01-19 2064 /* Retrieve the MAC address */ 411b125c6ace1f Michael Walle 2021-04-06 2065 ret = of_get_mac_address(pdev->dev.of_node, mac_addr); 411b125c6ace1f Michael Walle 2021-04-06 2066 if (!ret) { 411b125c6ace1f Michael Walle 2021-04-06 2067 axienet_set_mac_address(ndev, mac_addr); 411b125c6ace1f Michael Walle 2021-04-06 2068 } else { d05a9ed5c3a773 Robert Hancock 2019-06-06 @2069 dev_warn(>dev, "could not find MAC address property: %ld\n", d05a9ed5c3a773 Robert Hancock 2019-06-06 2070 PTR_ERR(mac_addr)); ^ This should print "ret". 411b125c6ace1f Michael Walle 2021-04-06 2071 axienet_set_mac_address(ndev, NULL); 8a3b7a252dca9f Daniel Borkmann 2012-01-19 2072 } 8a3b7a252dca9f Daniel Borkmann 2012-01-19 2073 8a3b7a252dca9f Daniel Borkmann 2012-01-19 2074 lp->coalesce_count_rx = XAXIDMA_DFT_RX_THRESHOLD; 8a3b7a252dca9f Daniel Borkmann 2012-01-19 2075 lp->coalesce_count_tx = XAXIDMA_DFT_TX_THRESHOLD; --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org .config.gz Description: application/gzip
Re: [PATCH][next] iommu/vt-d: Fix out-bounds-warning in intel_svm_page_response()
Hi Gustavo, On 4/14/21 3:54 AM, Gustavo A. R. Silva wrote: Replace call to memcpy() with just a couple of simple assignments in order to fix the following out-of-bounds warning: drivers/iommu/intel/svm.c:1198:4: warning: 'memcpy' offset [25, 32] from the object at 'desc' is out of the bounds of referenced subobject 'qw2' with type 'long long unsigned int' at offset 16 [-Warray-bounds] The problem is that the original code is trying to copy data into a couple of struct members adjacent to each other in a single call to memcpy(). This causes a legitimate compiler warning because memcpy() overruns the length of This helps with the ongoing efforts to globally enable -Warray-bounds and get us closer to being able to tighten the FORTIFY_SOURCE routines on memcpy(). Link: https://github.com/KSPP/linux/issues/109 Signed-off-by: Gustavo A. R. Silva --- drivers/iommu/intel/svm.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c index 5165cea90421..65909f504c50 100644 --- a/drivers/iommu/intel/svm.c +++ b/drivers/iommu/intel/svm.c @@ -1194,9 +1194,10 @@ int intel_svm_page_response(struct device *dev, desc.qw1 = QI_PGRP_IDX(prm->grpid) | QI_PGRP_LPIG(last_page); desc.qw2 = 0; desc.qw3 = 0; - if (private_present) - memcpy(, prm->private_data, - sizeof(prm->private_data)); The same memcpy() is used in multiple places in this file. Did they compile the same warnings? Or there are multiple patches to fix them one by one? Best regards, baolu + if (private_present) { + desc.qw2 = prm->private_data[0]; + desc.qw3 = prm->private_data[1]; + } qi_submit_sync(iommu, , 1, 0); }
Re: Question on KASAN calltrace record in RT
On Wed, Apr 14, 2021 at 6:00 AM Mike Galbraith wrote: > > On Tue, 2021-04-13 at 17:29 +0200, Dmitry Vyukov wrote: > > On Tue, Apr 6, 2021 at 10:26 AM Zhang, Qiang > > wrote: > > > > > > Hello everyone > > > > > > In RT system, after Andrew test, found the following calltrace , > > > in KASAN, we record callstack through stack_depot_save(), in this > > > function, may be call alloc_pages, but in RT, the spin_lock replace with > > > rt_mutex in alloc_pages(), if before call this function, the irq is > > > disabled, > > > will trigger following calltrace. > > > > > > maybe add array[KASAN_STACK_DEPTH] in struct kasan_track to record > > > callstack in RT system. > > > > > > Is there a better solution ? > > > > Hi Qiang, > > > > Adding 2 full stacks per heap object can increase memory usage too much. > > The stackdepot has a preallocation mechanism, I would start with > > adding interrupts check here: > > https://elixir.bootlin.com/linux/v5.12-rc7/source/lib/stackdepot.c#L294 > > and just not do preallocation in interrupt context. This will solve > > the problem, right? > > Hm, this thing might actually be (sorta?) working, modulo one startup > gripe. The CRASH_DUMP inspired gripe I get with !RT appeared (and shut > up when told I don't care given kdump has worked just fine for ages:), > but no more might_sleep() gripeage. > > > CONFIG_KASAN_SHADOW_OFFSET=0xdc00 > CONFIG_HAVE_ARCH_KASAN=y > CONFIG_HAVE_ARCH_KASAN_VMALLOC=y > CONFIG_CC_HAS_KASAN_GENERIC=y > CONFIG_KASAN=y > CONFIG_KASAN_GENERIC=y > CONFIG_KASAN_OUTLINE=y > # CONFIG_KASAN_INLINE is not set > CONFIG_KASAN_STACK=1 > CONFIG_KASAN_VMALLOC=y > # CONFIG_KASAN_MODULE_TEST is not set > > --- > lib/stackdepot.c | 10 +- > 1 file changed, 5 insertions(+), 5 deletions(-) > > --- a/lib/stackdepot.c > +++ b/lib/stackdepot.c > @@ -71,7 +71,7 @@ static void *stack_slabs[STACK_ALLOC_MAX > static int depot_index; > static int next_slab_inited; > static size_t depot_offset; > -static DEFINE_SPINLOCK(depot_lock); > +static DEFINE_RAW_SPINLOCK(depot_lock); > > static bool init_stack_slab(void **prealloc) > { > @@ -265,7 +265,7 @@ depot_stack_handle_t stack_depot_save(un > struct page *page = NULL; > void *prealloc = NULL; > unsigned long flags; > - u32 hash; > + u32 hash, may_prealloc = !IS_ENABLED(CONFIG_PREEMPT_RT) || > preemptible(); > > if (unlikely(nr_entries == 0) || stack_depot_disable) > goto fast_exit; > @@ -291,7 +291,7 @@ depot_stack_handle_t stack_depot_save(un > * The smp_load_acquire() here pairs with smp_store_release() to > * |next_slab_inited| in depot_alloc_stack() and init_stack_slab(). > */ > - if (unlikely(!smp_load_acquire(_slab_inited))) { > + if (unlikely(!smp_load_acquire(_slab_inited) && may_prealloc)) { > /* > * Zero out zone modifiers, as we don't have specific zone > * requirements. Keep the flags related to allocation in > atomic > @@ -305,7 +305,7 @@ depot_stack_handle_t stack_depot_save(un > prealloc = page_address(page); > } > > - spin_lock_irqsave(_lock, flags); > + raw_spin_lock_irqsave(_lock, flags); > > found = find_stack(*bucket, entries, nr_entries, hash); > if (!found) { > @@ -329,7 +329,7 @@ depot_stack_handle_t stack_depot_save(un > WARN_ON(!init_stack_slab()); > } > > - spin_unlock_irqrestore(_lock, flags); > + raw_spin_unlock_irqrestore(_lock, flags); > exit: > if (prealloc) { > /* Nobody used this memory, ok to free it. */ > > [0.692437] BUG: sleeping function called from invalid context at > kernel/locking/rtmutex.c:943 > [0.692439] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1, > name: swapper/0 > [0.692442] Preemption disabled at: > [0.692443] [] on_each_cpu_cond_mask+0x30/0xb0 > [0.692451] CPU: 5 PID: 1 Comm: swapper/0 Not tainted > 5.12.0.g2afefec-tip-rt #5 > [0.692454] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C > 09/23/2013 > [0.692456] Call Trace: > [0.692458] ? on_each_cpu_cond_mask+0x30/0xb0 > [0.692462] dump_stack+0x8a/0xb5 > [0.692467] ___might_sleep.cold+0xfe/0x112 > [0.692471] rt_spin_lock+0x1c/0x60 HI Mike, If freeing pages from smp_call_function is not OK, then perhaps we need just to collect the objects to be freed to the task/CPU that executes kasan_quarantine_remove_cache and it will free them (we know it can free objects). > [0.692475] free_unref_page+0x117/0x3c0 > [0.692481] qlist_free_all+0x60/0xd0 > [0.692485] per_cpu_remove_cache+0x5b/0x70 > [0.692488] smp_call_function_many_cond+0x185/0x3d0 > [0.692492] ? qlist_move_cache+0xe0/0xe0 > [0.692495] ? qlist_move_cache+0xe0/0xe0 > [0.692497] on_each_cpu_cond_mask+0x44/0xb0 > [0.692501] kasan_quarantine_remove_cache+0x52/0xf0
Re: [PATCH] x86: Accelerate copy_page with non-temporal in X86
on 2021/4/13 22:53, Borislav Petkov wrote: > I thought "should be better" too last time when I measured rep; movs vs > NT stores but actual measurements showed no real difference. Mabye the NT stores make difference when store to slow dimms, like the persistent memory I just tested. Also, it likely reduces unnecessary cache load and flush, and benifits the running processes which have data cached. -- Best wishes Kemeng Shi
Re: [PATCH v2 3/4] powerpc: Rename probe_kernel_read_inst()
Christophe Leroy writes: > When probe_kernel_read_inst() was created, it was to mimic > probe_kernel_read() function. > > Since then, probe_kernel_read() has been renamed > copy_from_kernel_nofault(). > > Rename probe_kernel_read_inst() into copy_from_kernel_nofault_inst(). At first glance I read it as copy from kernel nofault instruction. How about copy_inst_from_kernel_nofault()? -aneesh
Re: [Outreachy kernel] [PATCH] :staging: rtl8723bs: Remove useless led_blink_hdl()
On Tue, Apr 13, 2021 at 10:08:32PM +0200, Fabio M. De Francesco wrote: > On Tuesday, April 13, 2021 9:48:44 PM CEST Matthew Wilcox wrote: > > On Tue, Apr 13, 2021 at 09:45:03PM +0200, Fabio M. De Francesco wrote: > > > 1) The driver doesn't call that function from anywhere else than the > > > macro. 2) You have explained that the macro add its symbol to a slot > > > in an array that would shift all the subsequent elements down if that > > > macro is not used exactly in the line where it is. > > > 3) Dan Carpenter said that that array is full of null functions (or > > > empty slots?). > > > > > > Unless that function is called anonymously dereferencing its address > > > from the position it occupies in the array, I'm not able to see what > > > else means can any caller use. > > > > > > I know I have much less experience than you with C: what can go wrong? > > > > Here's where the driver calls that function: > > > > $ git grep wlancmds drivers/staging/rtl8723bs/ > > drivers/staging/rtl8723bs/core/rtw_cmd.c:static struct cmd_hdl wlancmds[] > > = { drivers/staging/rtl8723bs/core/rtw_cmd.c: if > > (pcmd->cmdcode < ARRAY_SIZE(wlancmds)) { > > drivers/staging/rtl8723bs/core/rtw_cmd.c: cmd_hdl > > = wlancmds[pcmd->cmdcode].h2cfuns; > > > OK, I had imagined an anonymous call from its location in the array (as I > wrote in the last phrase of my message). However, I thought that it could > have been an improbable possibility, not a real one. > > Linux uses a lot of interesting ideas that newcomers like me should learn. > Things here are trickier than they appear at first sight. One trick would be to build the Smatch cross function database. https://www.spinics.net/lists/smatch/msg00568.html Then you could do: $ ~/path/to/smatch_data/db/smdb.py led_blink_hdl file | caller | function | type | parameter | key | value | drivers/staging/rtl8723bs/core/rtw_cmd.c | rtw_cmd_thread | rtw_cmd_thread ptr cmd_hdl | INTERNAL | -1 | | uchar(*)(struct adapter*, uchar*) drivers/staging/rtl8188eu/core/rtw_cmd.c | rtw_cmd_thread | rtw_cmd_thread ptr cmd_hdl | INTERNAL | -1 | | uchar(*)(struct adapter*, uchar*) drivers/staging/rtl8188eu/core/rtw_cmd.c | rtw_cmd_thread | rtw_cmd_thread ptr cmd_hdl | BUF_SIZE | 1 |pbuf | 1,4,6,8,12,14,16,19-20,23-24,48,740,884,892,900,960 Which says that led_blink_hdl() is called as a function pointer called "cmd_hdl" from rtw_cmd_thread(). Hm... It says it can be called from either rtw_cmd_thread() function (the rtl8723bs or rtl8188eu version) which is not ideal. But also basically harmless so whatever... regards, dan carpenter
Re: [PATCH] mm: Define ARCH_HAS_FIRST_USER_ADDRESS
Le 14/04/2021 à 04:54, Anshuman Khandual a écrit : Currently most platforms define FIRST_USER_ADDRESS as 0UL duplicating the same code all over. Instead define a new option ARCH_HAS_FIRST_USER_ADDRESS for those platforms which would override generic default FIRST_USER_ADDRESS value 0UL. This makes it much cleaner with reduced code. Cc: linux-al...@vger.kernel.org Cc: linux-snps-...@lists.infradead.org Cc: linux-arm-ker...@lists.infradead.org Cc: linux-c...@vger.kernel.org Cc: linux-hexa...@vger.kernel.org Cc: linux-i...@vger.kernel.org Cc: linux-m...@lists.linux-m68k.org Cc: linux-m...@vger.kernel.org Cc: openr...@lists.librecores.org Cc: linux-par...@vger.kernel.org Cc: linuxppc-...@lists.ozlabs.org Cc: linux-ri...@lists.infradead.org Cc: linux-s...@vger.kernel.org Cc: linux...@vger.kernel.org Cc: sparcli...@vger.kernel.org Cc: linux...@lists.infradead.org Cc: linux-xte...@linux-xtensa.org Cc: x...@kernel.org Cc: linux...@kvack.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Anshuman Khandual --- arch/alpha/include/asm/pgtable.h | 1 - arch/arc/include/asm/pgtable.h | 6 -- arch/arm/Kconfig | 1 + arch/arm64/include/asm/pgtable.h | 2 -- arch/csky/include/asm/pgtable.h | 1 - arch/hexagon/include/asm/pgtable.h | 3 --- arch/ia64/include/asm/pgtable.h | 1 - arch/m68k/include/asm/pgtable_mm.h | 1 - arch/microblaze/include/asm/pgtable.h| 2 -- arch/mips/include/asm/pgtable-32.h | 1 - arch/mips/include/asm/pgtable-64.h | 1 - arch/nds32/Kconfig | 1 + arch/nios2/include/asm/pgtable.h | 2 -- arch/openrisc/include/asm/pgtable.h | 1 - arch/parisc/include/asm/pgtable.h| 2 -- arch/powerpc/include/asm/book3s/pgtable.h| 1 - arch/powerpc/include/asm/nohash/32/pgtable.h | 1 - arch/powerpc/include/asm/nohash/64/pgtable.h | 2 -- arch/riscv/include/asm/pgtable.h | 2 -- arch/s390/include/asm/pgtable.h | 2 -- arch/sh/include/asm/pgtable.h| 2 -- arch/sparc/include/asm/pgtable_32.h | 1 - arch/sparc/include/asm/pgtable_64.h | 3 --- arch/um/include/asm/pgtable-2level.h | 1 - arch/um/include/asm/pgtable-3level.h | 1 - arch/x86/include/asm/pgtable_types.h | 2 -- arch/xtensa/include/asm/pgtable.h| 1 - include/linux/mm.h | 4 mm/Kconfig | 4 29 files changed, 10 insertions(+), 43 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 8ba434287387..47098ccd715e 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -46,6 +46,10 @@ extern int sysctl_page_lock_unfairness; void init_mm_internals(void); +#ifndef ARCH_HAS_FIRST_USER_ADDRESS I guess you didn't test it . :) should be #ifndef CONFIG_ARCH_HAS_FIRST_USER_ADDRESS +#define FIRST_USER_ADDRESS 0UL +#endif But why do we need a config option at all for that ? Why not just: #ifndef FIRST_USER_ADDRESS #define FIRST_USER_ADDRESS 0UL #endif + #ifndef CONFIG_NEED_MULTIPLE_NODES/* Don't use mapnrs, do it properly */ extern unsigned long max_mapnr; diff --git a/mm/Kconfig b/mm/Kconfig index 24c045b24b95..373fbe377075 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -806,6 +806,10 @@ config VMAP_PFN config ARCH_USES_HIGH_VMA_FLAGS bool + +config ARCH_HAS_FIRST_USER_ADDRESS + bool + config ARCH_HAS_PKEYS bool
Re: [RFC PATCH v8 11/19] virtio/vsock: dequeue callback for SOCK_SEQPACKET
I'll fix some issues of this patch found by kernel test robot On 13.04.2021 15:44, Arseny Krasnov wrote: > This adds transport callback and it's logic for SEQPACKET dequeue. > Callback fetches RW packets from rx queue of socket until whole record > is copied(if user's buffer is full, user is not woken up). This is done > to not stall sender, because if we wake up user and it leaves syscall, > nobody will send credit update for rest of record, and sender will wait > for next enter of read syscall at receiver's side. So if user buffer is > full, we just send credit update and drop data. > > Signed-off-by: Arseny Krasnov > --- > v7 -> v8: > - Things like SEQ_BEGIN, SEQ_END, 'msg_len' and 'msg_id' now removed. >This callback fetches and copies RW packets to user's buffer, until >last packet of message found(this packet is marked in 'flags' field >of header). > > include/linux/virtio_vsock.h| 5 ++ > net/vmw_vsock/virtio_transport_common.c | 73 + > 2 files changed, 78 insertions(+) > > diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h > index dc636b727179..02acf6e9ae04 100644 > --- a/include/linux/virtio_vsock.h > +++ b/include/linux/virtio_vsock.h > @@ -80,6 +80,11 @@ virtio_transport_dgram_dequeue(struct vsock_sock *vsk, > struct msghdr *msg, > size_t len, int flags); > > +ssize_t > +virtio_transport_seqpacket_dequeue(struct vsock_sock *vsk, > +struct msghdr *msg, > +int flags, > +bool *msg_ready); > s64 virtio_transport_stream_has_data(struct vsock_sock *vsk); > s64 virtio_transport_stream_has_space(struct vsock_sock *vsk); > > diff --git a/net/vmw_vsock/virtio_transport_common.c > b/net/vmw_vsock/virtio_transport_common.c > index 833104b71a1c..8492b8bd5df5 100644 > --- a/net/vmw_vsock/virtio_transport_common.c > +++ b/net/vmw_vsock/virtio_transport_common.c > @@ -393,6 +393,67 @@ virtio_transport_stream_do_dequeue(struct vsock_sock > *vsk, > return err; > } > > +static int virtio_transport_seqpacket_do_dequeue(struct vsock_sock *vsk, > + struct msghdr *msg, > + int flags, > + bool *msg_ready) > +{ > + struct virtio_vsock_sock *vvs = vsk->trans; > + struct virtio_vsock_pkt *pkt; > + int err = 0; > + size_t user_buf_len = msg->msg_iter.count; > + > + *msg_ready = false; > + spin_lock_bh(>rx_lock); > + > + while (!*msg_ready && !list_empty(>rx_queue) && err >= 0) { > + pkt = list_first_entry(>rx_queue, struct virtio_vsock_pkt, > list); > + > + if (le16_to_cpu(pkt->hdr.op) == VIRTIO_VSOCK_OP_RW) { > + size_t bytes_to_copy; > + size_t pkt_len; > + > + pkt_len = (size_t)le32_to_cpu(pkt->hdr.len); > + bytes_to_copy = min(user_buf_len, pkt_len); > + > + /* sk_lock is held by caller so no one else can dequeue. > + * Unlock rx_lock since memcpy_to_msg() may sleep. > + */ > + spin_unlock_bh(>rx_lock); > + > + if (memcpy_to_msg(msg, pkt->buf, bytes_to_copy)) { > + err = -EINVAL; > + break; > + } > + > + spin_lock_bh(>rx_lock); > + > + /* If user sets 'MSG_TRUNC' we return real length > + * of message. > + */ > + if (flags & MSG_TRUNC) > + err += pkt_len; > + else > + err += bytes_to_copy; > + > + user_buf_len -= bytes_to_copy; > + > + if (pkt->hdr.flags & VIRTIO_VSOCK_SEQ_EOR) > + *msg_ready = true; > + } > + > + virtio_transport_dec_rx_pkt(vvs, pkt); > + list_del(>list); > + virtio_transport_free_pkt(pkt); > + } > + > + spin_unlock_bh(>rx_lock); > + > + virtio_transport_send_credit_update(vsk); > + > + return err; > +} > + > ssize_t > virtio_transport_stream_dequeue(struct vsock_sock *vsk, > struct msghdr *msg, > @@ -405,6 +466,18 @@ virtio_transport_stream_dequeue(struct vsock_sock *vsk, > } > EXPORT_SYMBOL_GPL(virtio_transport_stream_dequeue); > > +ssize_t > +virtio_transport_seqpacket_dequeue(struct vsock_sock *vsk, > +struct msghdr *msg, > +int flags, bool *msg_ready) > +{ > + if (flags & MSG_PEEK) > + return -EOPNOTSUPP; > + > + return virtio_transport_seqpacket_do_dequeue(vsk, msg, flags, > msg_ready); > +}
Re: [PATCH v2 1/2] perf/core: Share an event with multiple cgroups
Hi Namhyung, I love your patch! Perhaps something to improve: [auto build test WARNING on tip/perf/core] [also build test WARNING on tip/master linux/master linus/master v5.12-rc7 next-20210413] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch] url: https://github.com/0day-ci/linux/commits/Namhyung-Kim/perf-core-Sharing-events-with-multiple-cgroups/20210413-124251 base: https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git cface0326a6c2ae5c8f47bd466f07624b3e348a7 config: x86_64-allyesconfig (attached as .config) compiler: gcc-9 (Debian 9.3.0-22) 9.3.0 If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot cocci warnings: (new ones prefixed by >>) >> kernel/events/core.c:5925:13-20: WARNING opportunity for memdup_user Please review and possibly fold the followup patch. --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org .config.gz Description: application/gzip
[PATCH] perf/core: fix memdup_user.cocci warnings
From: kernel test robot kernel/events/core.c:5925:13-20: WARNING opportunity for memdup_user Use memdup_user rather than duplicating its implementation This is a little bit restricted to reduce false positives Generated by: scripts/coccinelle/api/memdup_user.cocci CC: Namhyung Kim Reported-by: kernel test robot Signed-off-by: kernel test robot --- url: https://github.com/0day-ci/linux/commits/Namhyung-Kim/perf-core-Sharing-events-with-multiple-cgroups/20210413-124251 base: https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git cface0326a6c2ae5c8f47bd466f07624b3e348a7 core.c | 12 +++- 1 file changed, 3 insertions(+), 9 deletions(-) --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -5922,15 +5922,9 @@ static long _perf_ioctl(struct perf_even cgrp_bufsz = nr_cgrps * sizeof(*cgrp_buf); - cgrp_buf = kmalloc(cgrp_bufsz, GFP_KERNEL); - if (cgrp_buf == NULL) - return -ENOMEM; - - if (copy_from_user(cgrp_buf, (u64 __user *)(arg + 8), - cgrp_bufsz)) { - kfree(cgrp_buf); - return -EFAULT; - } + cgrp_buf = memdup_user((u64 __user *)(arg + 8), cgrp_bufsz); + if (IS_ERR(cgrp_buf)) + return PTR_ERR(cgrp_buf); ret = perf_event_attach_cgroup_node(event, nr_cgrps, cgrp_buf);
Re: [PATCH] USB: Don't set USB_PORT_FEAT_SUSPEND on WD19's Realtek Hub
On Tue, Apr 13, 2021 at 10:44 PM Alan Stern wrote: > > On Tue, Apr 13, 2021 at 03:52:14PM +0800, Chris Chiu wrote: > > On Mon, Apr 12, 2021 at 11:12 PM Alan Stern > > wrote: > > > > > > On Mon, Apr 12, 2021 at 11:00:06PM +0800, chris.c...@canonical.com wrote: > > > > The USB_PORT_FEAT_SUSPEND is not really necessary due to the > > > > "global suspend" in USB 2.0 spec. It's only for many hub devices > > > > which don't relay wakeup requests from the devices connected to > > > > downstream ports. For this realtek hub, there's no problem waking > > > > up the system from connected keyboard. > > > > > > What about runtime suspend? That _does_ require USB_PORT_FEAT_SUSPEND. > > > > It's hard to reproduce the same thing with runtime PM. I also don't > > know the aggressive > > way to trigger runtime suspend. So I'm assuming the same thing will happen > > in > > runtime PM case because they both go the same usb_port_resume path. Could > > you please suggest a better way to verify this for runtime PM? > > To put a USB device into runtime suspend, do this: > > echo 0 >/sys/bus/usb/devices/.../bConfigurationValue > echo auto >/sys/bus/usb/devices/.../power/control > > where ... is the pathname for the device you want to suspend. (Note > that this will unbind the device from its driver, so make sure there's > no possibility of data loss before you do it.) > > To resume the device, write "on" to the power/control file. You can > verify the runtime-PM status by reading the files in the power/ > subdirectory. > Thanks for the instructions. I can hit the same timeout problem with runtime PM. The fail rate seems the same as normal PM. (around 1/4 ~ 1/7) root@:/sys/bus/usb/devices/3-4.3# echo auto > power/control root@:/sys/bus/usb/devices/3-4.3# echo on > power/control root@:/sys/bus/usb/devices/3-4.3# dmesg -c [ 2789.679807] usb 3-4: kworker/7:0 timed out on ep0out len=0/0 [ 2789.679812] usb 3-4-port3: can't suspend, status -110 [ 2789.680078] usb 3-4.3: Failed to suspend device, error -110 > > > > This commit bypasses the USB_PORT_FEAT_SUSPEND for the quirky hub. > > > > > > > > Signed-off-by: Chris Chiu > > > > --- > > > > > > > > > > diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c > > > > index 7f71218cc1e5..8478d49bba77 100644 > > > > --- a/drivers/usb/core/hub.c > > > > +++ b/drivers/usb/core/hub.c > > > > @@ -3329,8 +3329,11 @@ int usb_port_suspend(struct usb_device *udev, > > > > pm_message_t msg) > > > >* descendants is enabled for remote wakeup. > > > >*/ > > > > else if (PMSG_IS_AUTO(msg) || > > > > usb_wakeup_enabled_descendants(udev) > 0) > > > > - status = set_port_feature(hub->hdev, port1, > > > > - USB_PORT_FEAT_SUSPEND); > > > > + if (udev->quirks & USB_QUIRK_NO_SET_FEAT_SUSPEND) > > > > > > You should test hub->hdev->quirks, here, not udev->quirks. The quirk > > > belongs to the Realtek hub, not to the device that's plugged into the > > > hub. > > > > > > > Thanks for pointing that out. I'll verify again and propose a V2 after > > it's done. > > Another thing to consider: You shouldn't return 0 from usb_port_suspend > if the port wasn't actually suspended. We don't want to kernel to have > a false idea of the hardware's current state. > So we still need the "really_suspend=false". What if I replace it with the following? It's a little verbose but expressive enough. Any suggestions? + else if (!(hub->hdev->quirks & USB_QUIRK_NO_SET_FEAT_SUSPEND) && + (PMSG_IS_AUTO(msg) || usb_wakeup_enabled_descendants(udev) > 0)) + status = set_port_feature(hub->hdev, port1, + USB_PORT_FEAT_SUSPEND); else { really_suspend = false; status = 0; Chris > Alan Stern
Re: [PATCH v2] firmware_loader: fix use-after-free in firmware_fallback_sysfs
On Tue, Apr 13, 2021 at 04:51:38PM +, Luis Chamberlain wrote: > On Tue, Apr 13, 2021 at 04:12:42PM +0530, Anirudh Rayabharam wrote: > > The use-after-free happens when a fw_priv object has been freed but > > hasn't been removed from the pending list (pending_fw_head). The next > > time fw_load_sysfs_fallback tries to insert into the list, it ends up > > accessing the pending_list member of the previoiusly freed fw_priv. > > > > In commit bcfbd3523f3c ("firmware: fix a double abort case with > > fw_load_sysfs_fallback"), fw_load_abort() is skipped if > > fw_sysfs_wait_timeout() returns -ENOENT. This causes the fw_priv to > > not be removed from the pending list. > > > > To fix this, delete the fw_priv from the pending list when retval > > is -ENOENT instead of skipping the entire block. > > > > Fixes: bcfbd3523f3c ("firmware: fix a double abort case with > > fw_load_sysfs_fallback") > > Reported-by: syzbot+de271708674e20930...@syzkaller.appspotmail.com > > Tested-by: syzbot+de271708674e20930...@syzkaller.appspotmail.com > > Signed-off-by: Anirudh Rayabharam > > Thanks for your patch Anirudh, but please also see this reply to the > issue: > > http://lkml.kernel.org/r/20210403013143.gv4...@42.do-not-panic.com Hi Luis! Thanks for pointing me to this. I completely forgot to check the existing discussion on this issue. > > The way you patched the issue is just a band-aid, meaning we keep on > moving the issue further and it seems that's just the wrong approach. > > Can you try the patch in that thread, to verify if the UAF goes away? The patch in that thread doesn't work. But I think I know what's missing. The root problem here is that all code paths that abort fw load don't remove it from the pending list. For example: _request_firmware() -> fw_abort_batch_reqs() -> fw_state_aborted() In the above code path, the fw_priv is aborted but not removed from pending list. So, the patch in the above thread fails because the load is being aborted after it has been added to the list. So, to fix the root cause of this issue we should make it so that all aborts remove the fw_priv from the pending list. Perhaps we should add a list_del_init in __fw_set_state() just before calling complete_all(). This way, all code paths that abort will delete the fw_priv from the list. The patch in the above thread also makes some changes to the error codes. Since it isn't directly related to the UAF, the error codes change should be a separate patch right? Thanks! - Anirudh.
Re: [PATCH v7 4/7] mm,hugetlb: Split prep_new_huge_page functionality
On Tue, Apr 13, 2021 at 02:33:41PM -0700, Mike Kravetz wrote: > > -static void prep_new_huge_page(struct hstate *h, struct page *page, int > > nid) > > +/* > > + * Must be called with the hugetlb lock held > > + */ > > +static void __prep_account_new_huge_page(struct hstate *h, int nid) > > +{ > > + h->nr_huge_pages++; > > + h->nr_huge_pages_node[nid]++; > > I would prefer if we also move setting the destructor to this routine. > set_compound_page_dtor(page, HUGETLB_PAGE_DTOR); Uhm, but that is the routine that does the accounting, it feels wrong here, plus... > > That way, PageHuge() will be false until it 'really' is a huge page. > If not, we could potentially go into that retry loop in > dissolve_free_huge_page or alloc_and_dissolve_huge_page in patch 5. ...I do not follow here, could you please elaborate some more? Unless I am missing something, behaviour should not be any different with this patch. Thanks -- Oscar Salvador SUSE L3
Re: [PATCH] staging: mt7621-pci: stop using of_pci_range_to_resource
Hi Illya, On Wed, Apr 14, 2021 at 6:10 AM Ilya Lipnitskiy wrote: > > Hi Sergio, > > Just as an aside, are you planning to move staging/mt7621-pci into > arch/mips/pci at some point? This driver seems more maintained (by > you!) than many in-tree drivers... Yes, I am planning to move it and maintain it. There is minor stuff that must be changed before that but I hope to give that all a try after the next merge window. Best regards, Sergio Paracuellos > Ilya > > On Sat, Apr 10, 2021 at 12:23 PM Sergio Paracuellos > wrote: > > > > Hi Ilya, > > > > On Sat, Apr 10, 2021 at 7:33 PM Ilya Lipnitskiy > > wrote: > > > > > > The logic here was already overriding the erroneous IO addresses > > > returned from of_pci_range_to_resource, which is the bulk of the logic. > > > > > > So stop using it altogether and initialize the fields explicitly, as > > > done in aeba3731b150 ("powerpc/pci: Fix IO space breakage after > > > of_pci_range_to_resource() change"). > > > > > > Signed-off-by: Ilya Lipnitskiy > > > Cc: Sergio Paracuellos > > > --- > > > drivers/staging/mt7621-pci/pci-mt7621.c | 11 ++- > > > 1 file changed, 6 insertions(+), 5 deletions(-) > > > > Looks good to me, thanks! I have also tested this in gnubee pc1 > > platform with no regressions at all when io bars are assigned: > > > > [ 16.378956] mt7621-pci 1e14.pcie: host bridge /pcie@1e14 ranges: > > [ 16.392405] mt7621-pci 1e14.pcie: MEM > > 0x006000..0x006fff -> 0x00 > > [ 16.408796] mt7621-pci 1e14.pcie: IO > > 0x001e16..0x001e16 -> 0x00 > > [ 16.425264] mt7621-pci-phy 1e149000.pcie-phy: PHY for 0xbe149000 > > (dual port = 1) > > [ 16.440452] mt7621-pci-phy 1e14a000.pcie-phy: PHY for 0xbe14a000 > > (dual port = 0) > > [ 16.678713] mt7621-pci 1e14.pcie: PCIE0 enabled > > [ 16.688435] mt7621-pci 1e14.pcie: PCIE1 enabled > > [ 16.698160] mt7621-pci 1e14.pcie: PCIE2 enabled > > [ 16.707886] mt7621-pci 1e14.pcie: PCI coherence region base: > > 0x6000, mask/settings: 0xf002 > > [ 16.726623] mt7621-pci 1e14.pcie: PCI host bridge to bus :00 > > [ 16.739309] pci_bus :00: root bus resource [io > > 0x1e16-0x1e16] > > [ 16.753008] pci_bus :00: root bus resource [mem > > 0x6000-0x6fff] > > [ 16.766709] pci_bus :00: root bus resource [bus 00-ff] > > [ 16.777649] pci_bus :00: root bus resource [mem > > 0x6000-0x6fff] (bus address [0x-0x0fff]) > > [ 16.797986] pci :00:00.0: [0e8d:0801] type 01 class 0x060400 > > [ 16.809973] pci :00:00.0: reg 0x10: [mem 0x-0x7fff] > > [ 16.822467] pci :00:00.0: reg 0x14: initial BAR value 0x > > invalid > > [ 16.836511] pci :00:00.0: reg 0x14: [mem size 0x0001] > > [ 16.848048] pci :00:00.0: supports D1 > > [ 16.856051] pci :00:00.0: PME# supported from D0 D1 D3hot > > [ 16.867943] pci :00:01.0: [0e8d:0801] type 01 class 0x060400 > > [ 16.879960] pci :00:01.0: reg 0x10: [mem 0x-0x7fff] > > [ 16.892454] pci :00:01.0: reg 0x14: initial BAR value 0x > > invalid > > [ 16.906497] pci :00:01.0: reg 0x14: [mem size 0x0001] > > [ 16.918040] pci :00:01.0: supports D1 > > [ 16.926031] pci :00:01.0: PME# supported from D0 D1 D3hot > > [ 16.937903] pci :00:02.0: [0e8d:0801] type 01 class 0x060400 > > [ 16.949915] pci :00:02.0: reg 0x10: [mem 0x-0x7fff] > > [ 16.962412] pci :00:02.0: reg 0x14: initial BAR value 0x > > invalid > > [ 16.976466] pci :00:02.0: reg 0x14: [mem size 0x0001] > > [ 16.987991] pci :00:02.0: supports D1 > > [ 16.995986] pci :00:02.0: PME# supported from D0 D1 D3hot > > [ 17.008716] pci :00:00.0: bridge configuration invalid ([bus > > 00-00]), reconfiguring > > [ 17.024695] pci :00:01.0: bridge configuration invalid ([bus > > 00-00]), reconfiguring > > [ 17.040654] pci :00:02.0: bridge configuration invalid ([bus > > 00-00]), reconfiguring > > [ 17.056868] pci :01:00.0: [1b21:0611] type 00 class 0x010185 > > [ 17.068882] pci :01:00.0: reg 0x10: [io 0x-0x0007] > > [ 17.080003] pci :01:00.0: reg 0x14: [io 0x-0x0003] > > [ 17.091115] pci :01:00.0: reg 0x18: [io 0x-0x0007] > > [ 17.102238] pci :01:00.0: reg 0x1c: [io 0x-0x0003] > > [ 17.113350] pci :01:00.0: reg 0x20: [io 0x-0x000f] > > [ 17.124463] pci :01:00.0: reg 0x24: initial BAR value 0x > > invalid > > [ 17.138508] pci :01:00.0: reg 0x24: [mem size 0x0200] > > [ 17.150115] pci :01:00.0: 2.000 Gb/s available PCIe bandwidth, > > limited by 2.5 GT/s PCIe x1 link at :00:00.0 (capable of 4.000 > > Gb/s with 5.0 GT/s PCIe x1 link) > > [ 17.204594] pci :00:00.0: PCI bridge to [bus 01-ff] > > [ 17.215109] pci :00:00.0: bridge window [io 0x-0x0fff] > > [ 17.227257] pci :00:00.0:
Re: [PATCH v7 5/7] mm: Make alloc_contig_range handle free hugetlb pages
On Tue, Apr 13, 2021 at 03:29:02PM -0700, Mike Kravetz wrote: > > spin_lock_irq(hugetlb_lock) > > 1) update_and_free_page > >PageHuge() == F > >__free_pages() > > 2) enqueue_huge_page > >SetPageHugeFreed() > > spin_unlock(_lock) > > Very small nit, the above should be spin_unlock_irq(_lock) Right, I missed it somehow. > > + /* > > +* The page might have been dissolved from under our feet, so make sure > > +* to carefully check the state under the lock. > > +* Return success when racing as if we dissolved the page ourselves. > > +*/ > > + spin_lock_irq(_lock); > > + if (PageHuge(page)) { > > + head = compound_head(page); > > + h = page_hstate(head); > > + } else { > > + spin_unlock(_lock); > > Should be be spin_unlock_irq(_lock); > > Other than that, it looks good. Yeah, I will amend it in the next version. Thanks Mike! -- Oscar Salvador SUSE L3
Re: [PATCH v7 6/7] mm: Make alloc_contig_range handle in-use hugetlb pages
On Tue, Apr 13, 2021 at 03:48:53PM -0700, Mike Kravetz wrote: > The label free_new is: > > free_new: > spin_unlock_irq(_lock); > __free_pages(new_page, huge_page_order(h)); > > return ret; > > So, we are locking and immediately unlocking without any code in > between. Usually, I don't like like multiple labels before return. > However, perhaps we should add another to avoid this unnecessary > cycle. On the other hand, this is an uncommon race condition so the > simple code may be acceptable. I guess we could have something like: free_new: spin_unlock_irq(_lock); free_new_nolock: __free_pages(new_page, huge_page_order(h)); return ret; And let the retry go to there without locking. But as you said, the racecondition is rare enough, so I am not sure if this buys us much. But I can certainly add it if you feel strong about it. -- Oscar Salvador SUSE L3
[char-misc-next] mei: me: add Alder Lake P device id.
Add Alder Lake P device ID. Cc: Signed-off-by: Tomas Winkler --- drivers/misc/mei/hw-me-regs.h | 1 + drivers/misc/mei/pci-me.c | 1 + 2 files changed, 2 insertions(+) diff --git a/drivers/misc/mei/hw-me-regs.h b/drivers/misc/mei/hw-me-regs.h index 14be76d4c2e6..cb34925e10f1 100644 --- a/drivers/misc/mei/hw-me-regs.h +++ b/drivers/misc/mei/hw-me-regs.h @@ -105,6 +105,7 @@ #define MEI_DEV_ID_ADP_S 0x7AE8 /* Alder Lake Point S */ #define MEI_DEV_ID_ADP_LP 0x7A60 /* Alder Lake Point LP */ +#define MEI_DEV_ID_ADP_P 0x51E0 /* Alder Lake Point P */ /* * MEI HW Section diff --git a/drivers/misc/mei/pci-me.c b/drivers/misc/mei/pci-me.c index a7e179626b63..c3393b383e59 100644 --- a/drivers/misc/mei/pci-me.c +++ b/drivers/misc/mei/pci-me.c @@ -111,6 +111,7 @@ static const struct pci_device_id mei_me_pci_tbl[] = { {MEI_PCI_DEVICE(MEI_DEV_ID_ADP_S, MEI_ME_PCH15_CFG)}, {MEI_PCI_DEVICE(MEI_DEV_ID_ADP_LP, MEI_ME_PCH15_CFG)}, + {MEI_PCI_DEVICE(MEI_DEV_ID_ADP_P, MEI_ME_PCH15_CFG)}, /* required last entry */ {0, } -- 2.26.3
Re: [PATCH v2 00/16] Multigenerational LRU Framework
On Tue, Apr 13, 2021 at 09:40:12PM -0600, Yu Zhao wrote: > On Tue, Apr 13, 2021 at 5:14 PM Dave Chinner wrote: > > On Tue, Apr 13, 2021 at 10:13:24AM -0600, Jens Axboe wrote: > > > On 4/13/21 1:51 AM, SeongJae Park wrote: > > > > From: SeongJae Park > > > > > > > > Hello, > > > > > > > > > > > > Very interesting work, thank you for sharing this :) > > > > > > > > On Tue, 13 Apr 2021 00:56:17 -0600 Yu Zhao wrote: > > > > > > > >> What's new in v2 > > > >> > > > >> Special thanks to Jens Axboe for reporting a regression in buffered > > > >> I/O and helping test the fix. > > > > > > > > Is the discussion open? If so, could you please give me a link? > > > > > > I wasn't on the initial post (or any of the lists it was posted to), but > > > it's on the google page reclaim list. Not sure if that is public or not. > > > > > > tldr is that I was pretty excited about this work, as buffered IO tends > > > to suck (a lot) for high throughput applications. My test case was > > > pretty simple: > > > > > > Randomly read a fast device, using 4k buffered IO, and watch what > > > happens when the page cache gets filled up. For this particular test, > > > we'll initially be doing 2.1GB/sec of IO, and then drop to 1.5-1.6GB/sec > > > with kswapd using a lot of CPU trying to keep up. That's mainline > > > behavior. > > > > I see this exact same behaviour here, too, but I RCA'd it to > > contention between the inode and memory reclaim for the mapping > > structure that indexes the page cache. Basically the mapping tree > > lock is the contention point here - you can either be adding pages > > to the mapping during IO, or memory reclaim can be removing pages > > from the mapping, but we can't do both at once. > > > > So we end up with kswapd spinning on the mapping tree lock like so > > when doing 1.6GB/s in 4kB buffered IO: > > > > - 20.06% 0.00% [kernel] [k] kswapd > > > >▒ > >- 20.06% kswapd > > > >▒ > > - 20.05% balance_pgdat > > > >▒ > > - 20.03% shrink_node > > > >▒ > > - 19.92% shrink_lruvec > > > >▒ > >- 19.91% shrink_inactive_list > > > >▒ > > - 19.22% shrink_page_list > > > >▒ > > - 17.51% __remove_mapping > > > >▒ > > - 14.16% _raw_spin_lock_irqsave > > > >▒ > >- 14.14% do_raw_spin_lock > > > >▒ > > __pv_queued_spin_lock_slowpath > > > >▒ > > - 1.56% __delete_from_page_cache > > > >▒ > > 0.63% xas_store > > > >▒ > > - 0.78% _raw_spin_unlock_irqrestore > > > >▒ > >- 0.69% do_raw_spin_unlock > > > >▒ > > __raw_callee_save___pv_queued_spin_unlock > > > >▒ > > - 0.82% free_unref_page_list > > > >▒ > > - 0.72% free_unref_page_commit > >
Re: [PATCH v7 clocksource 3/5] clocksource: Check per-CPU clock synchronization when marked unstable
On Tue, Apr 13, 2021 at 10:49:11PM +0200, Thomas Gleixner wrote: > Paul, > > On Mon, Apr 12 2021 at 16:18, Paul E. McKenney wrote: > > On Mon, Apr 12, 2021 at 10:37:10PM +0200, Thomas Gleixner wrote: > >> On Mon, Apr 12 2021 at 12:57, Paul E. McKenney wrote: > >> > On Mon, Apr 12, 2021 at 08:54:03PM +0200, Thomas Gleixner wrote: > >> >> > I will send a new series out later today, Pacific Time. > >> >> > >> >> Can you do me a favour and send it standalone and not as yet another > >> >> reply to this existing thread maze. A trivial lore link to the previous > >> >> version gives enough context. > >> > > >> > Will do! > >> > > >> > Of course, it turns out that lockdep also doesn't like waited-on > >> > smp_call_function_single() invocations from timer handlers, > >> > so I am currently looking at other options for dealing with that > >> > potential use-after-free. I am starting to like the looks of "only set > >> > CLOCK_SOURCE_VERIFY_PERCPU on statically allocated clocksource structures > >> > and let KASAN enforce this restriction", but I have not quite given up > >> > on making it more general. > >> > >> The simplest point is in the thread under the clocksource_mutex which > >> prevents anything from vanishing under your feet. > > > > And lockdep is -much- happier with the setup shown below, so thank > > you again! > > But it is too simple now :) ... > > > diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c > > index f047c6cb056c..34dc38b6b923 100644 > > --- a/kernel/time/clocksource.c > > +++ b/kernel/time/clocksource.c > > @@ -519,6 +515,13 @@ static int __clocksource_watchdog_kthread(void) > > unsigned long flags; > > int select = 0; > > > > + /* Do any required per-CPU skew verification. */ > > + list_for_each_entry(cs, _list, wd_list) { > > + if ((cs->flags & (CLOCK_SOURCE_UNSTABLE | > > CLOCK_SOURCE_VERIFY_PERCPU)) == > > + (CLOCK_SOURCE_UNSTABLE | CLOCK_SOURCE_VERIFY_PERCPU)) > > + clocksource_verify_percpu(cs); > > + } > > because that list is _NOT_ protected by the clocksource_mutex as you > noticed yourself already. > > But you don't have to walk that list at all because the only interesting > thing is the currently active clocksource, which is about to be changed > in case the watchdog marked it unstable and cannot be changed by any > other code concurrently because clocksource_mutex is held. > > So all you need is: > > if (curr_clocksource && > curr_clocksource->flags & CLOCK_SOURCE_UNSTABLE && > curr_clocksource->flags & CLOCK_SOURCE_VERIFY_PERCPU) > clocksource_verify_percpu_wreckage(curr_clocksource); > > Hmm? With the addition of a clocksource=tsc boot parameter, this approach does appear to work, thank you! I sent out the updated series. Thanx, Paul
Re: [Outreachy kernel] [PATCH] staging: rtl8723bs: hal: Remove four set but not used variables
On Tue, Apr 13, 2021 at 08:42:22PM +0200, Fabio M. De Francesco wrote: > Removed four variables that were set but not used. > > Signed-off-by: Fabio M. De Francesco > --- > drivers/staging/rtl8723bs/hal/rtl8723b_hal_init.c | 7 +-- > 1 file changed, 1 insertion(+), 6 deletions(-) > > diff --git a/drivers/staging/rtl8723bs/hal/rtl8723b_hal_init.c > b/drivers/staging/rtl8723bs/hal/rtl8723b_hal_init.c > index 77f8353c5ce5..fad6a3bfe07c 100644 > --- a/drivers/staging/rtl8723bs/hal/rtl8723b_hal_init.c > +++ b/drivers/staging/rtl8723bs/hal/rtl8723b_hal_init.c > @@ -3199,12 +3199,10 @@ static void hw_var_set_mlme_join(struct adapter > *padapter, u8 variable, u8 *val) > > void CCX_FwC2HTxRpt_8723b(struct adapter *padapter, u8 *pdata, u8 len) > { > - u8 seq_no; > > #define GET_8723B_C2H_TX_RPT_LIFE_TIME_OVER(_Header) > LE_BITS_TO_1BYTE((_Header + 0), 6, 1) > #define GET_8723B_C2H_TX_RPT_RETRY_OVER(_Header) > LE_BITS_TO_1BYTE((_Header + 0), 7, 1) > > - seq_no = *(pdata+6); > > if (GET_8723B_C2H_TX_RPT_RETRY_OVER(pdata) | > GET_8723B_C2H_TX_RPT_LIFE_TIME_OVER(pdata)) { Now we have two blank lines in a row. Please delete one. regards, dan carpenter
Re: [PATCH] [v4] Input: Add "Select" button to Microsoft Xbox One controller.
That line was using tab + 4 spaces on the left and was reformatted to use 2 tabs. If you don't like it I've uploaded patch v5 not touching that line. On Tue, Apr 13, 2021 at 5:34 AM Bastien Nocera wrote: > > On Tue, 2021-04-13 at 01:02 +, Chris Ye wrote: > > Add "Select" button input capability and input event mapping for > > Microsoft Xbox One controller. From product site this is also > > referred as > > "Share" button. > > Fixed Microsoft Xbox One controller select button not working under > > USB > > connection. > > > > Signed-off-by: Chris Ye > > --- > > drivers/input/joystick/xpad.c | 8 +++- > > 1 file changed, 7 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/input/joystick/xpad.c > > b/drivers/input/joystick/xpad.c > > index 9f0d07dcbf06..99cb8bb78570 100644 > > --- a/drivers/input/joystick/xpad.c > > +++ b/drivers/input/joystick/xpad.c > > @@ -79,6 +79,7 @@ > > #define MAP_DPAD_TO_BUTTONS(1 << 0) > > #define MAP_TRIGGERS_TO_BUTTONS(1 << 1) > > #define MAP_STICKS_TO_NULL (1 << 2) > > +#define MAP_SELECT_BUTTON (1 << 3) > > #define DANCEPAD_MAP_CONFIG(MAP_DPAD_TO_BUTTONS > > | \ > > MAP_TRIGGERS_TO_BUTTONS | > > MAP_STICKS_TO_NULL) > > > > @@ -130,6 +131,7 @@ static const struct xpad_device { > > { 0x045e, 0x02e3, "Microsoft X-Box One Elite pad", 0, > > XTYPE_XBOXONE }, > > { 0x045e, 0x02ea, "Microsoft X-Box One S pad", 0, > > XTYPE_XBOXONE }, > > { 0x045e, 0x0719, "Xbox 360 Wireless Receiver", > > MAP_DPAD_TO_BUTTONS, XTYPE_XBOX360W }, > > + { 0x045e, 0x0b12, "Microsoft Xbox One X pad", > > MAP_SELECT_BUTTON, XTYPE_XBOXONE }, > > { 0x046d, 0xc21d, "Logitech Gamepad F310", 0, XTYPE_XBOX360 > > }, > > { 0x046d, 0xc21e, "Logitech Gamepad F510", 0, XTYPE_XBOX360 > > }, > > { 0x046d, 0xc21f, "Logitech Gamepad F710", 0, XTYPE_XBOX360 > > }, > > @@ -862,6 +864,8 @@ static void xpadone_process_packet(struct > > usb_xpad *xpad, u16 cmd, unsigned char > > /* menu/view buttons */ > > input_report_key(dev, BTN_START, data[4] & 0x04); > > input_report_key(dev, BTN_SELECT, data[4] & 0x08); > > + if (xpad->mapping & MAP_SELECT_BUTTON) > > + input_report_key(dev, KEY_RECORD, data[22] & 0x01); > > > > /* buttons A,B,X,Y */ > > input_report_key(dev, BTN_A,data[4] & 0x10); > > @@ -1669,9 +1673,11 @@ static int xpad_init_input(struct usb_xpad > > *xpad) > > > > /* set up model-specific ones */ > > if (xpad->xtype == XTYPE_XBOX360 || xpad->xtype == > > XTYPE_XBOX360W || > > - xpad->xtype == XTYPE_XBOXONE) { > > + xpad->xtype == XTYPE_XBOXONE) { > > Why the indentation change here? > > > for (i = 0; xpad360_btn[i] >= 0; i++) > > input_set_capability(input_dev, EV_KEY, > > xpad360_btn[i]); > > + if (xpad->mapping & MAP_SELECT_BUTTON) > > + input_set_capability(input_dev, EV_KEY, > > KEY_RECORD); > > } else { > > for (i = 0; xpad_btn[i] >= 0; i++) > > input_set_capability(input_dev, EV_KEY, > > xpad_btn[i]); > >
Re: [PATCH] Documentation: syscalls: add a note about ABI-agnostic types
Ping? On Fri, Apr 09, 2021 at 01:43:04PM -0700, Yury Norov wrote: > Recently added memfd_secret() syscall had a flags parameter passed > as unsigned long, which requires creation of compat entry for it. > It was possible to change the type of flags to unsigned int and so > avoid bothering with compat layer. > > https://www.spinics.net/lists/linux-mm/msg251550.html > > Documentation/process/adding-syscalls.rst doesn't point clearly about > preference of ABI-agnostic types. This patch adds such notification. > > Signed-off-by: Yury Norov > --- > Documentation/process/adding-syscalls.rst | 7 +++ > 1 file changed, 7 insertions(+) > > diff --git a/Documentation/process/adding-syscalls.rst > b/Documentation/process/adding-syscalls.rst > index 9af35f4ec728..46add16edf14 100644 > --- a/Documentation/process/adding-syscalls.rst > +++ b/Documentation/process/adding-syscalls.rst > @@ -172,6 +172,13 @@ arguments (i.e. parameter 1, 3, 5), to allow use of > contiguous pairs of 32-bit > registers. (This concern does not apply if the arguments are part of a > structure that's passed in by pointer.) > > +Whenever possible, try to use ABI-agnostic types for passing parameters to > +a syscall in order to avoid creating compat entry for it. Linux supports two > +ABI models - ILP32 and LP64. The types like ``void *``, ``long``, ``size_t``, > +``off_t`` have different size in those ABIs; types like ``char`` and ``int`` > +have the same size and don't require a compat layer support. For flags, it's > +always better to use ``unsigned int``. > + > > Proposing the API > - > -- > 2.25.1
Re: [syzbot] WARNING in unsafe_follow_pfn
On Tue, Apr 13, 2021 at 03:11:45PM -0300, Jason Gunthorpe wrote: > On Tue, Apr 13, 2021 at 07:20:12PM +0200, Dmitry Vyukov wrote: > > > > Plus users are going to be seeing this as well. According to the commit > > > > message for 69bacee7f9ad ("mm: Add unsafe_follow_pfn") "Unfortunately > > > > there's some users where this is not fixable (like v4l userptr of iomem > > > > mappings)". It sort of seems crazy to dump this giant splat and then > > > > tell users to ignore it forever because it can't be fixed... 0_0 > > > > > > I think the discussion conclusion was that this interface should not > > > be used by userspace anymore, it is obsolete by some new interface? > > > > > > It should be protected by some kconfig and the kconfig should be > > > turned off for syzkaller runs. > > > > If this is not a kernel bug, then it must not use WARN_ON[_ONCE]. It > > makes the kernel untestable for both automated systems and humans: > > It is a kernel security bug triggerable by userspace. > > > And if it's a kernel bug reachable from user-space, then I think this > > code should be removed entirely, not just on all testing systems. Or > > otherwise if we are not removing it for some reason, then it needs to > > be fixed. > > Legacy embedded systems apparently require it. Are legacy embedded systems ever going to update their kernel? It might be better to just remove it. (I don't really have any details outside of your email so I don't know). regards, dan carpenter
[PATCH v8 clocksource 3/5] clocksource: Check per-CPU clock synchronization when marked unstable
Some sorts of per-CPU clock sources have a history of going out of synchronization with each other. However, this problem has purportedy been solved in the past ten years. Except that it is all too possible that the problem has instead simply been made less likely, which might mean that some of the occasional "Marking clocksource 'tsc' as unstable" messages might be due to desynchronization. How would anyone know? Therefore apply CPU-to-CPU synchronization checking to newly unstable clocksource that are marked with the new CLOCK_SOURCE_VERIFY_PERCPU flag. Lists of desynchronized CPUs are printed, with the caveat that if it is the reporting CPU that is itself desynchronized, it will appear that all the other clocks are wrong. Just like in real life. Cc: John Stultz Cc: Thomas Gleixner Cc: Stephen Boyd Cc: Jonathan Corbet Cc: Mark Rutland Cc: Marc Zyngier Cc: Andi Kleen Reported-by: Chris Mason [ paulmck: Add "static" to clocksource_verify_one_cpu() per kernel test robot feedback. ] [ paulmck: Apply Thomas Gleixner feedback. ] Signed-off-by: Paul E. McKenney --- arch/x86/kernel/kvmclock.c | 2 +- arch/x86/kernel/tsc.c | 3 +- include/linux/clocksource.h | 2 +- kernel/time/clocksource.c | 63 + 4 files changed, 67 insertions(+), 3 deletions(-) diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c index 1fc0962c89c0..97eeaf164296 100644 --- a/arch/x86/kernel/kvmclock.c +++ b/arch/x86/kernel/kvmclock.c @@ -169,7 +169,7 @@ struct clocksource kvm_clock = { .read = kvm_clock_get_cycles, .rating = 400, .mask = CLOCKSOURCE_MASK(64), - .flags = CLOCK_SOURCE_IS_CONTINUOUS, + .flags = CLOCK_SOURCE_IS_CONTINUOUS | CLOCK_SOURCE_VERIFY_PERCPU, .enable = kvm_cs_enable, }; EXPORT_SYMBOL_GPL(kvm_clock); diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c index f70dffc2771f..56289170753c 100644 --- a/arch/x86/kernel/tsc.c +++ b/arch/x86/kernel/tsc.c @@ -1151,7 +1151,8 @@ static struct clocksource clocksource_tsc = { .mask = CLOCKSOURCE_MASK(64), .flags = CLOCK_SOURCE_IS_CONTINUOUS | CLOCK_SOURCE_VALID_FOR_HRES | - CLOCK_SOURCE_MUST_VERIFY, + CLOCK_SOURCE_MUST_VERIFY | + CLOCK_SOURCE_VERIFY_PERCPU, .vdso_clock_mode= VDSO_CLOCKMODE_TSC, .enable = tsc_cs_enable, .resume = tsc_resume, diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h index 86d143db6523..83a3ebff7456 100644 --- a/include/linux/clocksource.h +++ b/include/linux/clocksource.h @@ -131,7 +131,7 @@ struct clocksource { #define CLOCK_SOURCE_UNSTABLE 0x40 #define CLOCK_SOURCE_SUSPEND_NONSTOP 0x80 #define CLOCK_SOURCE_RESELECT 0x100 - +#define CLOCK_SOURCE_VERIFY_PERCPU 0x200 /* simplify initialization of mask field */ #define CLOCKSOURCE_MASK(bits) GENMASK_ULL((bits) - 1, 0) diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c index 3ac19a7859f5..69311deab428 100644 --- a/kernel/time/clocksource.c +++ b/kernel/time/clocksource.c @@ -211,6 +211,63 @@ static void clocksource_watchdog_inject_delay(void) WARN_ON_ONCE(injectfail < 0); } +static u64 csnow_mid; +static cpumask_t cpus_ahead; +static cpumask_t cpus_behind; + +static void clocksource_verify_one_cpu(void *csin) +{ + struct clocksource *cs = (struct clocksource *)csin; + + csnow_mid = cs->read(cs); +} + +static void clocksource_verify_percpu(struct clocksource *cs) +{ + int64_t cs_nsec, cs_nsec_max, cs_nsec_min; + u64 csnow_begin, csnow_end; + bool firsttime = 1; + int testcpu; + s64 delta; + int cpu; + + cpumask_clear(_ahead); + cpumask_clear(_behind); + preempt_disable(); + testcpu = smp_processor_id(); + pr_warn("Checking clocksource %s synchronization from CPU %d.\n", cs->name, testcpu); + for_each_online_cpu(cpu) { + if (cpu == testcpu) + continue; + csnow_begin = cs->read(cs); + smp_call_function_single(cpu, clocksource_verify_one_cpu, cs, 1); + csnow_end = cs->read(cs); + delta = (s64)((csnow_mid - csnow_begin) & cs->mask); + if (delta < 0) + cpumask_set_cpu(cpu, _behind); + delta = (csnow_end - csnow_mid) & cs->mask; + if (delta < 0) + cpumask_set_cpu(cpu, _ahead); + delta = clocksource_delta(csnow_end, csnow_begin, cs->mask); + cs_nsec = clocksource_cyc2ns(delta, cs->mult, cs->shift); + if (firsttime || cs_nsec > cs_nsec_max) + cs_nsec_max = cs_nsec; + if (firsttime || cs_nsec <
[PATCH v8 clocksource 2/5] clocksource: Retry clock read if long delays detected
When the clocksource watchdog marks a clock as unstable, this might be due to that clock being unstable or it might be due to delays that happen to occur between the reads of the two clocks. Yes, interrupts are disabled across those two reads, but there are no shortage of things that can delay interrupts-disabled regions of code ranging from SMI handlers to vCPU preemption. It would be good to have some indication as to why the clock was marked unstable. Therefore, re-read the watchdog clock on either side of the read from the clock under test. If the watchdog clock shows an excessive time delta between its pair of reads, the reads are retried. The maximum number of retries is specified by a new kernel boot parameter clocksource.max_read_retries, which defaults to three, that is, up to four reads, one initial and up to three retries. If retries were required, a message is printed on the console. If the number of retries is exceeded, the clock under test will be marked unstable. However, the probability of this happening due to various sorts of delays is quite small. In addition, the reason (clock-read delays) for the unstable marking will be apparent. Cc: John Stultz Cc: Thomas Gleixner Cc: Stephen Boyd Cc: Jonathan Corbet Cc: Mark Rutland Cc: Marc Zyngier Cc: Andi Kleen Reported-by: Chris Mason [ paulmck: Per-clocksource retries per Neeraj Upadhyay feedback. ] [ paulmck: Don't reset injectfail per Neeraj Upadhyay feedback. ] [ paulmck: Apply Thomas Gleixner feedback. ] Signed-off-by: Paul E. McKenney --- kernel/time/clocksource.c | 24 1 file changed, 20 insertions(+), 4 deletions(-) diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c index 4be4391aa72f..3ac19a7859f5 100644 --- a/kernel/time/clocksource.c +++ b/kernel/time/clocksource.c @@ -124,6 +124,7 @@ static void __clocksource_change_rating(struct clocksource *cs, int rating); */ #define WATCHDOG_INTERVAL (HZ >> 1) #define WATCHDOG_THRESHOLD (NSEC_PER_SEC >> 4) +#define WATCHDOG_MAX_SKEW (NSEC_PER_SEC >> 6) static void clocksource_watchdog_work(struct work_struct *work) { @@ -213,9 +214,10 @@ static void clocksource_watchdog_inject_delay(void) static void clocksource_watchdog(struct timer_list *unused) { struct clocksource *cs; - u64 csnow, wdnow, cslast, wdlast, delta; - int64_t wd_nsec, cs_nsec; + u64 csnow, wdnow, wdagain, cslast, wdlast, delta; + int64_t wd_nsec, wdagain_delta, wderr_nsec = 0, cs_nsec; int next_cpu, reset_pending; + int nretries; spin_lock(_lock); if (!watchdog_running) @@ -224,6 +226,7 @@ static void clocksource_watchdog(struct timer_list *unused) reset_pending = atomic_read(_reset_pending); list_for_each_entry(cs, _list, wd_list) { + nretries = 0; /* Clocksource already marked unstable? */ if (cs->flags & CLOCK_SOURCE_UNSTABLE) { @@ -232,11 +235,24 @@ static void clocksource_watchdog(struct timer_list *unused) continue; } +retry: local_irq_disable(); - csnow = cs->read(cs); - clocksource_watchdog_inject_delay(); wdnow = watchdog->read(watchdog); + clocksource_watchdog_inject_delay(); + csnow = cs->read(cs); + wdagain = watchdog->read(watchdog); local_irq_enable(); + delta = clocksource_delta(wdagain, wdnow, watchdog->mask); + wdagain_delta = clocksource_cyc2ns(delta, watchdog->mult, watchdog->shift); + if (wdagain_delta > WATCHDOG_MAX_SKEW) { + wderr_nsec = wdagain_delta; + if (nretries++ < max_read_retries) + goto retry; + } + if (nretries) { + pr_warn("timekeeping watchdog on CPU%d: %s read-back delay of %lldns, attempt %d\n", + smp_processor_id(), watchdog->name, wderr_nsec, nretries); + } /* Clocksource initialized ? */ if (!(cs->flags & CLOCK_SOURCE_WATCHDOG) || -- 2.25.1
[PATCH v8 clocksource 1/5] clocksource: Provide module parameters to inject delays in watchdog
When the clocksource watchdog marks a clock as unstable, this might be due to that clock being unstable or it might be due to delays that happen to occur between the reads of the two clocks. Yes, interrupts are disabled across those two reads, but there are no shortage of things that can delay interrupts-disabled regions of code ranging from SMI handlers to vCPU preemption. It would be good to have some indication as to why the clock was marked unstable. The first step is a way of injecting such delays. Therefore, provide clocksource.inject_delay_freq and clocksource.inject_delay_run kernel boot parameters that specify that sufficient delay be injected to cause the clocksource_watchdog() function to mark a clock unstable. This delay is injected every Nth set of M calls to clocksource_watchdog(), where N is the value specified for the inject_delay_freq boot parameter and M is the value specified for the inject_delay_run boot parameter. Values of zero or less for either parameter disable delay injection, and the default for clocksource.inject_delay_freq is zero, that is, disabled. The default for clocksource.inject_delay_run is the value one, that is single-call runs. This facility is intended for diagnostic use only, and should be avoided on production systems. Cc: John Stultz Cc: Thomas Gleixner Cc: Stephen Boyd Cc: Jonathan Corbet Cc: Mark Rutland Cc: Marc Zyngier Cc: Andi Kleen [ paulmck: Apply Rik van Riel feedback. ] Reported-by: Chris Mason Signed-off-by: Paul E. McKenney --- .../admin-guide/kernel-parameters.txt | 22 +++ kernel/time/clocksource.c | 27 +++ 2 files changed, 49 insertions(+) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 04545725f187..fc57952dcba0 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -583,6 +583,28 @@ loops can be debugged more effectively on production systems. + clocksource.inject_delay_freq= [KNL] + Number of runs of calls to clocksource_watchdog() + before delays are injected between reads from the + two clocksources. Values less than or equal to + zero disable this delay injection. These delays + can cause clocks to be marked unstable, so use + of this parameter should therefore be avoided on + production systems. Defaults to zero (disabled). + + clocksource.inject_delay_run= [KNL] + Run lengths of clocksource_watchdog() delay + injections. Specifying the value 8 will result + in eight consecutive delays followed by eight + times the value specified for inject_delay_freq + of consecutive non-delays. + + clocksource.max_read_retries= [KNL] + Number of clocksource_watchdog() retries due to + external delays before the clock will be marked + unstable. Defaults to three retries, that is, + four attempts to read the clock under test. + clearcpuid=BITNUM[,BITNUM...] [X86] Disable CPUID feature X for the kernel. See arch/x86/include/asm/cpufeatures.h for the valid bit diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c index cce484a2cc7c..4be4391aa72f 100644 --- a/kernel/time/clocksource.c +++ b/kernel/time/clocksource.c @@ -14,6 +14,7 @@ #include /* for spin_unlock_irq() using preempt_count() m68k */ #include #include +#include #include "tick-internal.h" #include "timekeeping_internal.h" @@ -184,6 +185,31 @@ void clocksource_mark_unstable(struct clocksource *cs) spin_unlock_irqrestore(_lock, flags); } +static int inject_delay_freq; +module_param(inject_delay_freq, int, 0644); +static int inject_delay_run = 1; +module_param(inject_delay_run, int, 0644); +static int max_read_retries = 3; +module_param(max_read_retries, int, 0644); + +static void clocksource_watchdog_inject_delay(void) +{ + int i; + static int injectfail = -1; + + if (inject_delay_freq <= 0 || inject_delay_run <= 0) + return; + if (injectfail < 0 || injectfail > INT_MAX / 2) + injectfail = inject_delay_run; + if (!(++injectfail / inject_delay_run % inject_delay_freq)) { + pr_warn("%s(): Injecting delay.\n", __func__); + for (i = 0; i < 2 * WATCHDOG_THRESHOLD / NSEC_PER_MSEC; i++) + udelay(1000); + pr_warn("%s(): Done injecting delay.\n", __func__); + } + WARN_ON_ONCE(injectfail < 0); +} + static void clocksource_watchdog(struct timer_list *unused) {
[PATCH v8 clocksource 5/5] clocksource: Limit number of CPUs checked for clock synchronization
Currently, if skew is detected on a clock marked CLOCK_SOURCE_VERIFY_PERCPU, that clock is checked on all CPUs. This is thorough, but might not be what you want on a system with a few tens of CPUs, let alone a few hundred of them. Therefore, by default check only up to eight randomly chosen CPUs. Also provide a new clocksource.verify_n_cpus kernel boot parameter. A value of -1 says to check all of the CPUs, and a non-negative value says to randomly select that number of CPUs, without concern about selecting the same CPU multiple times. However, make use of a cpumask so that a given CPU will be checked at most once. Suggested-by: Thomas Gleixner # For verify_n_cpus=1. Cc: John Stultz Cc: Thomas Gleixner Cc: Stephen Boyd Cc: Jonathan Corbet Cc: Mark Rutland Cc: Marc Zyngier Cc: Andi Kleen Signed-off-by: Paul E. McKenney --- .../admin-guide/kernel-parameters.txt | 10 +++ kernel/time/clocksource.c | 74 ++- 2 files changed, 82 insertions(+), 2 deletions(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index f9da90f2791f..c09be8392bd6 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -621,6 +621,16 @@ unstable. Defaults to three retries, that is, four attempts to read the clock under test. + clocksource.verify_n_cpus= [KNL] + Limit the number of CPUs checked for clocksources + marked with CLOCK_SOURCE_VERIFY_PERCPU that + are marked unstable due to excessive skew. + A negative value says to check all CPUs, while + zero says not to check any. Values larger than + nr_cpu_ids are silently truncated to nr_cpu_ids. + The actual CPUs are chosen randomly, with + no replacement if the same CPU is chosen twice. + clearcpuid=BITNUM[,BITNUM...] [X86] Disable CPUID feature X for the kernel. See arch/x86/include/asm/cpufeatures.h for the valid bit diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c index 5f641b15ec6c..8f4967e59b05 100644 --- a/kernel/time/clocksource.c +++ b/kernel/time/clocksource.c @@ -15,6 +15,8 @@ #include #include #include +#include +#include #include "tick-internal.h" #include "timekeeping_internal.h" @@ -194,6 +196,8 @@ static int inject_delay_shift_percpu = -1; module_param(inject_delay_shift_percpu, int, 0644); static int max_read_retries = 3; module_param(max_read_retries, int, 0644); +static int verify_n_cpus = 8; +module_param(verify_n_cpus, int, 0644); static void clocksource_watchdog_inject_delay(void) { @@ -216,6 +220,55 @@ static void clocksource_watchdog_inject_delay(void) static u64 csnow_mid; static cpumask_t cpus_ahead; static cpumask_t cpus_behind; +static cpumask_t cpus_chosen; + +static void clocksource_verify_choose_cpus(void) +{ + int cpu, i, n = verify_n_cpus; + + if (n < 0) { + /* Check all of the CPUs. */ + cpumask_copy(_chosen, cpu_online_mask); + cpumask_clear_cpu(smp_processor_id(), _chosen); + return; + } + + /* If no checking desired, or no other CPU to check, leave. */ + cpumask_clear(_chosen); + if (n == 0 || num_online_cpus() <= 1) + return; + + /* Make sure to select at least one CPU other than the current CPU. */ + cpu = cpumask_next(-1, cpu_online_mask); + if (cpu == smp_processor_id()) + cpu = cpumask_next(cpu, cpu_online_mask); + if (WARN_ON_ONCE(cpu >= nr_cpu_ids)) + return; + cpumask_set_cpu(cpu, _chosen); + + /* Force a sane value for the boot parameter. */ + if (n > nr_cpu_ids) + n = nr_cpu_ids - 1; + + /* +* Randomly select the specified number of CPUs. If the same +* CPU is selected multiple times, that CPU is checked only once, +* and no replacement CPU is selected. This gracefully handles +* situations where verify_n_cpus is greater than the number of +* CPUs that are currently online. +*/ + for (i = 1; i < n; i++) { + cpu = prandom_u32() % n; + cpu = cpumask_next(cpu - 1, cpu_online_mask); + if (cpu >= nr_cpu_ids) + cpu = cpumask_next(-1, cpu_online_mask); + if (!WARN_ON_ONCE(cpu >= nr_cpu_ids)) + cpumask_set_cpu(cpu, _chosen); + } + + /* Don't verify ourselves. */ + cpumask_clear_cpu(smp_processor_id(), _chosen); +} static void clocksource_verify_one_cpu(void *csin) { @@ -239,12 +292,22 @@ static void clocksource_verify_percpu(struct clocksource *cs) s64 delta;
[PATCH v8 clocksource 4/5] clocksource: Provide a module parameter to fuzz per-CPU clock checking
Code that checks for clock desynchronization must itself be tested, so create a new clocksource.inject_delay_shift_percpu= kernel boot parameter that adds or subtracts a large value from the check read, using the specified bit of the CPU ID to determine whether to add or to subtract. Cc: John Stultz Cc: Thomas Gleixner Cc: Stephen Boyd Cc: Jonathan Corbet Cc: Mark Rutland Cc: Marc Zyngier Cc: Andi Kleen Reported-by: Chris Mason [ paulmck: Apply Randy Dunlap feedback. ] Signed-off-by: Paul E. McKenney --- Documentation/admin-guide/kernel-parameters.txt | 16 kernel/time/clocksource.c | 10 +- 2 files changed, 25 insertions(+), 1 deletion(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index fc57952dcba0..f9da90f2791f 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -599,6 +599,22 @@ times the value specified for inject_delay_freq of consecutive non-delays. + clocksource.inject_delay_shift_percpu= [KNL] + Clocksource delay injection partitions the CPUs + into two sets, one whose clocks are moved ahead + and the other whose clocks are moved behind. + This kernel parameter selects the CPU-number + bit that determines which of these two sets the + corresponding CPU is placed into. For example, + setting this parameter to the value 4 will result + in the first set containing alternating groups + of 16 CPUs whose clocks are moved ahead, while + the second set will contain the rest of the CPUs, + whose clocks are moved behind. + + The default value of -1 disables this type of + error injection. + clocksource.max_read_retries= [KNL] Number of clocksource_watchdog() retries due to external delays before the clock will be marked diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c index 69311deab428..5f641b15ec6c 100644 --- a/kernel/time/clocksource.c +++ b/kernel/time/clocksource.c @@ -190,6 +190,8 @@ static int inject_delay_freq; module_param(inject_delay_freq, int, 0644); static int inject_delay_run = 1; module_param(inject_delay_run, int, 0644); +static int inject_delay_shift_percpu = -1; +module_param(inject_delay_shift_percpu, int, 0644); static int max_read_retries = 3; module_param(max_read_retries, int, 0644); @@ -218,8 +220,14 @@ static cpumask_t cpus_behind; static void clocksource_verify_one_cpu(void *csin) { struct clocksource *cs = (struct clocksource *)csin; + s64 delta = 0; + int sign; - csnow_mid = cs->read(cs); + if (inject_delay_shift_percpu >= 0) { + sign = ((smp_processor_id() >> inject_delay_shift_percpu) & 0x1) * 2 - 1; + delta = sign * NSEC_PER_SEC; + } + csnow_mid = cs->read(cs) + delta; } static void clocksource_verify_percpu(struct clocksource *cs) -- 2.25.1
[PATCH v8 clocksource] Do not mark clocks unstable due to delays for v5.13
Hello! If there is a sufficient delay between reading the watchdog clock and the clock under test, the clock under test will be marked unstable through no fault of its own. This series checks for this, doing limited retries to get a good set of clock reads. If the clock is marked unstable and is marked as being per-CPU, cross-CPU synchronization is checked. This series also provides delay injection, which may be enabled via kernel boot parameters to test the checking for delays. Note that "sufficient delay" can be provided by SMIs, NMIs, and of course vCPU preemption. 1. Provide module parameters to inject delays in watchdog. 2. Retry clock read if long delays detected. 3. Check per-CPU clock synchronization when marked unstable. 4. Provide a module parameter to fuzz per-CPU clock checking. 5. Limit number of CPUs checked for clock synchronization. Changes since v7, based on Thomas Gleixner feedback: o Fix embarrassing git-format-patch operator error. o Merge pairwise clock-desynchronization checking into the checking of per-CPU clock synchronization when marked unstable. o Do selective per-CPU checking rather than blindly checking all CPUs. Provide a clocksource.verify_n_cpus kernel boot parameter to control this behavior, with the value -1 choosing the old check-all-CPUs behavior. The default is to randomly check 8 CPUs. o Fix the clock-desynchronization checking to avoid a potential use-after-free error for dynamically allocated clocksource structures. o Remove redundance "wdagain_nsec < 0" from clocksource_watchdog() clocksource skew checking. o Update commit logs and do code-style updates. Changes since v5: o Rebased to v5.12-rc5. Changes since v4: o Rebased to v5.12-rc1. Changes since v3: o Rebased to v5.11. o Apply Randy Dunlap feedback. Changes since v2: o Rebased to v5.11-rc6. o Updated Cc: list. Changes since v1: o Applied feedback from Rik van Riel. o Rebased to v5.11-rc3. o Stripped "RFC" from the subject lines. Thanx, Paul Documentation/admin-guide/kernel-parameters.txt | 26 +++ b/Documentation/admin-guide/kernel-parameters.txt | 22 ++ b/arch/x86/kernel/kvmclock.c |2 b/arch/x86/kernel/tsc.c |3 b/include/linux/clocksource.h |2 b/kernel/time/clocksource.c | 27 +++ kernel/time/clocksource.c | 171 +- 7 files changed, 243 insertions(+), 10 deletions(-)
xilinx_axienet_main.c:undefined reference to `of_address_to_resource'
Hi Robert, FYI, the error/warning still remains. tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master head: 50987beca096a7ed4f453a6da245fd6a2fadedeb commit: 57baf8cc70ea4cf5503c9d42f31f6a86d7f5ff1a net: axienet: Handle deferred probe on clock properly date: 9 weeks ago config: s390-randconfig-c023-20210414 (attached as .config) compiler: s390-linux-gcc (GCC) 9.3.0 reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=57baf8cc70ea4cf5503c9d42f31f6a86d7f5ff1a git remote add linus https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git git fetch --no-tags linus master git checkout 57baf8cc70ea4cf5503c9d42f31f6a86d7f5ff1a # save the attached .config to linux build tree COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=s390 If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot All errors (new ones prefixed by >>): s390-linux-ld: drivers/net/arcnet/com90xx.o: in function `com90xx_exit': com90xx.c:(.exit.text+0xe6): undefined reference to `iounmap' s390-linux-ld: drivers/net/arcnet/com90xx.o: in function `check_mirror': com90xx.c:(.init.text+0x4c): undefined reference to `ioremap' s390-linux-ld: com90xx.c:(.init.text+0xa8): undefined reference to `iounmap' s390-linux-ld: drivers/net/arcnet/com90xx.o: in function `com90xx_probe': com90xx.c:(.init.text+0x1182): undefined reference to `ioremap' s390-linux-ld: com90xx.c:(.init.text+0x15dc): undefined reference to `iounmap' s390-linux-ld: com90xx.c:(.init.text+0x2234): undefined reference to `iounmap' s390-linux-ld: com90xx.c:(.init.text+0x23b2): undefined reference to `iounmap' s390-linux-ld: com90xx.c:(.init.text+0x26d6): undefined reference to `ioremap' s390-linux-ld: com90xx.c:(.init.text+0x29ae): undefined reference to `iounmap' s390-linux-ld: com90xx.c:(.init.text+0x2e28): undefined reference to `iounmap' s390-linux-ld: drivers/net/arcnet/arc-rimi.o: in function `arc_rimi_exit': arc-rimi.c:(.exit.text+0x50): undefined reference to `iounmap' s390-linux-ld: drivers/net/arcnet/arc-rimi.o: in function `arc_rimi_init': arc-rimi.c:(.init.text+0x5ca): undefined reference to `ioremap' s390-linux-ld: arc-rimi.c:(.init.text+0x6c8): undefined reference to `iounmap' s390-linux-ld: arc-rimi.c:(.init.text+0xac4): undefined reference to `iounmap' s390-linux-ld: arc-rimi.c:(.init.text+0xc10): undefined reference to `ioremap' s390-linux-ld: arc-rimi.c:(.init.text+0xeea): undefined reference to `iounmap' s390-linux-ld: drivers/net/arcnet/arc-rimi.o: in function `check_mirror': arc-rimi.c:(.text.unlikely+0x4c): undefined reference to `ioremap' s390-linux-ld: arc-rimi.c:(.text.unlikely+0xa8): undefined reference to `iounmap' s390-linux-ld: drivers/net/ethernet/altera/altera_tse_main.o: in function `request_and_map': altera_tse_main.c:(.text+0xe54): undefined reference to `devm_ioremap' s390-linux-ld: drivers/net/ethernet/fujitsu/fmvj18x_cs.o: in function `fmvj18x_release': fmvj18x_cs.c:(.text+0x10d2): undefined reference to `iounmap' s390-linux-ld: drivers/net/ethernet/fujitsu/fmvj18x_cs.o: in function `fmvj18x_config': fmvj18x_cs.c:(.text+0x3096): undefined reference to `ioremap' s390-linux-ld: fmvj18x_cs.c:(.text+0x3132): undefined reference to `iounmap' s390-linux-ld: fmvj18x_cs.c:(.text+0x3896): undefined reference to `ioremap' s390-linux-ld: fmvj18x_cs.c:(.text+0x3bba): undefined reference to `iounmap' s390-linux-ld: fmvj18x_cs.c:(.text+0x3f62): undefined reference to `iounmap' s390-linux-ld: drivers/net/ethernet/xilinx/ll_temac_main.o: in function `temac_probe': ll_temac_main.c:(.text+0x6044): undefined reference to `devm_platform_ioremap_resource_byname' s390-linux-ld: ll_temac_main.c:(.text+0x63fc): undefined reference to `devm_of_iomap' s390-linux-ld: ll_temac_main.c:(.text+0x65ee): undefined reference to `devm_platform_ioremap_resource' s390-linux-ld: drivers/net/ethernet/xilinx/ll_temac_mdio.o: in function `temac_mdio_setup': ll_temac_mdio.c:(.text+0x3b0): undefined reference to `of_address_to_resource' s390-linux-ld: drivers/net/ethernet/xilinx/xilinx_axienet_main.o: in function `axienet_probe': >> xilinx_axienet_main.c:(.text+0x59dc): undefined reference to >> `of_address_to_resource' s390-linux-ld: xilinx_axienet_main.c:(.text+0x5a18): undefined reference to `devm_ioremap_resource' s390-linux-ld: xilinx_axienet_main.c:(.text+0x5b78): undefined reference to `devm_ioremap_resource' s390-linux-ld: xilinx_axienet_main.c:(.text+0x6548): undefined reference to `devm_ioremap_resource' s390-linux-ld: drivers/net/ethernet/xircom/xirc2ps_cs.o: in function
[PATCH 2/2] m68k: update configs to match the proper KUNIT syntax
No functional changes other than CONFIG name changes Signed-off-by: Nico Pache --- arch/m68k/configs/amiga_defconfig| 6 +++--- arch/m68k/configs/apollo_defconfig | 6 +++--- arch/m68k/configs/atari_defconfig| 6 +++--- arch/m68k/configs/bvme6000_defconfig | 6 +++--- arch/m68k/configs/hp300_defconfig| 6 +++--- arch/m68k/configs/mac_defconfig | 6 +++--- arch/m68k/configs/multi_defconfig| 6 +++--- arch/m68k/configs/mvme147_defconfig | 6 +++--- arch/m68k/configs/mvme16x_defconfig | 6 +++--- arch/m68k/configs/q40_defconfig | 6 +++--- arch/m68k/configs/sun3_defconfig | 6 +++--- arch/m68k/configs/sun3x_defconfig| 6 +++--- 12 files changed, 36 insertions(+), 36 deletions(-) diff --git a/arch/m68k/configs/amiga_defconfig b/arch/m68k/configs/amiga_defconfig index 786656090c50..77cc4ff7ae3a 100644 --- a/arch/m68k/configs/amiga_defconfig +++ b/arch/m68k/configs/amiga_defconfig @@ -655,11 +655,11 @@ CONFIG_TEST_BLACKHOLE_DEV=m CONFIG_FIND_BIT_BENCHMARK=m CONFIG_TEST_FIRMWARE=m CONFIG_TEST_SYSCTL=m -CONFIG_BITFIELD_KUNIT=m +CONFIG_BITFIELD_KUNIT_TEST=m CONFIG_RESOURCE_KUNIT_TEST=m -CONFIG_LINEAR_RANGES_TEST=m +CONFIG_LINEAR_RANGES_KUNIT_TEST=m CONFIG_CMDLINE_KUNIT_TEST=m -CONFIG_BITS_TEST=m +CONFIG_BITS_KUNIT_TEST=m CONFIG_TEST_UDELAY=m CONFIG_TEST_STATIC_KEYS=m CONFIG_TEST_KMOD=m diff --git a/arch/m68k/configs/apollo_defconfig b/arch/m68k/configs/apollo_defconfig index 9bb12be4a38e..86913bdb265b 100644 --- a/arch/m68k/configs/apollo_defconfig +++ b/arch/m68k/configs/apollo_defconfig @@ -611,11 +611,11 @@ CONFIG_TEST_BLACKHOLE_DEV=m CONFIG_FIND_BIT_BENCHMARK=m CONFIG_TEST_FIRMWARE=m CONFIG_TEST_SYSCTL=m -CONFIG_BITFIELD_KUNIT=m +CONFIG_BITFIELD_KUNIT_TEST=m CONFIG_RESOURCE_KUNIT_TEST=m -CONFIG_LINEAR_RANGES_TEST=m +CONFIG_LINEAR_RANGES_KUNIT_TEST=m CONFIG_CMDLINE_KUNIT_TEST=m -CONFIG_BITS_TEST=m +CONFIG_BITS_KUNIT_TEST=m CONFIG_TEST_UDELAY=m CONFIG_TEST_STATIC_KEYS=m CONFIG_TEST_KMOD=m diff --git a/arch/m68k/configs/atari_defconfig b/arch/m68k/configs/atari_defconfig index 413232626d9d..6b5c35e7be44 100644 --- a/arch/m68k/configs/atari_defconfig +++ b/arch/m68k/configs/atari_defconfig @@ -633,11 +633,11 @@ CONFIG_TEST_BLACKHOLE_DEV=m CONFIG_FIND_BIT_BENCHMARK=m CONFIG_TEST_FIRMWARE=m CONFIG_TEST_SYSCTL=m -CONFIG_BITFIELD_KUNIT=m +CONFIG_BITFIELD_KUNIT_TEST=m CONFIG_RESOURCE_KUNIT_TEST=m -CONFIG_LINEAR_RANGES_TEST=m +CONFIG_LINEAR_RANGES_KUNIT_TEST=m CONFIG_CMDLINE_KUNIT_TEST=m -CONFIG_BITS_TEST=m +CONFIG_BITS_KUNIT_TEST=m CONFIG_TEST_UDELAY=m CONFIG_TEST_STATIC_KEYS=m CONFIG_TEST_KMOD=m diff --git a/arch/m68k/configs/bvme6000_defconfig b/arch/m68k/configs/bvme6000_defconfig index 819cc70b06d8..8fbd238d9d29 100644 --- a/arch/m68k/configs/bvme6000_defconfig +++ b/arch/m68k/configs/bvme6000_defconfig @@ -604,11 +604,11 @@ CONFIG_TEST_BLACKHOLE_DEV=m CONFIG_FIND_BIT_BENCHMARK=m CONFIG_TEST_FIRMWARE=m CONFIG_TEST_SYSCTL=m -CONFIG_BITFIELD_KUNIT=m +CONFIG_BITFIELD_KUNIT_TEST=m CONFIG_RESOURCE_KUNIT_TEST=m -CONFIG_LINEAR_RANGES_TEST=m +CONFIG_LINEAR_RANGES_KUNIT_TEST=m CONFIG_CMDLINE_KUNIT_TEST=m -CONFIG_BITS_TEST=m +CONFIG_BITS_KUNIT_TEST=m CONFIG_TEST_UDELAY=m CONFIG_TEST_STATIC_KEYS=m CONFIG_TEST_KMOD=m diff --git a/arch/m68k/configs/hp300_defconfig b/arch/m68k/configs/hp300_defconfig index 8f8d5968713b..dbebbc079611 100644 --- a/arch/m68k/configs/hp300_defconfig +++ b/arch/m68k/configs/hp300_defconfig @@ -613,11 +613,11 @@ CONFIG_TEST_BLACKHOLE_DEV=m CONFIG_FIND_BIT_BENCHMARK=m CONFIG_TEST_FIRMWARE=m CONFIG_TEST_SYSCTL=m -CONFIG_BITFIELD_KUNIT=m +CONFIG_BITFIELD_KUNIT_TEST=m CONFIG_RESOURCE_KUNIT_TEST=m -CONFIG_LINEAR_RANGES_TEST=m +CONFIG_LINEAR_RANGES_KUNIT_TEST=m CONFIG_CMDLINE_KUNIT_TEST=m -CONFIG_BITS_TEST=m +CONFIG_BITS_KUNIT_TEST=m CONFIG_TEST_UDELAY=m CONFIG_TEST_STATIC_KEYS=m CONFIG_TEST_KMOD=m diff --git a/arch/m68k/configs/mac_defconfig b/arch/m68k/configs/mac_defconfig index bf15e6c1c939..3ccafd1db067 100644 --- a/arch/m68k/configs/mac_defconfig +++ b/arch/m68k/configs/mac_defconfig @@ -636,11 +636,11 @@ CONFIG_TEST_BLACKHOLE_DEV=m CONFIG_FIND_BIT_BENCHMARK=m CONFIG_TEST_FIRMWARE=m CONFIG_TEST_SYSCTL=m -CONFIG_BITFIELD_KUNIT=m +CONFIG_BITFIELD_KUNIT_TEST=m CONFIG_RESOURCE_KUNIT_TEST=m -CONFIG_LINEAR_RANGES_TEST=m +CONFIG_LINEAR_RANGES_KUNIT_TEST=m CONFIG_CMDLINE_KUNIT_TEST=m -CONFIG_BITS_TEST=m +CONFIG_BITS_KUNIT_TEST=m CONFIG_TEST_UDELAY=m CONFIG_TEST_STATIC_KEYS=m CONFIG_TEST_KMOD=m diff --git a/arch/m68k/configs/multi_defconfig b/arch/m68k/configs/multi_defconfig index 5466d48fcd9d..572c95f1c8d7 100644 --- a/arch/m68k/configs/multi_defconfig +++ b/arch/m68k/configs/multi_defconfig @@ -722,11 +722,11 @@ CONFIG_TEST_BLACKHOLE_DEV=m CONFIG_FIND_BIT_BENCHMARK=m CONFIG_TEST_FIRMWARE=m CONFIG_TEST_SYSCTL=m -CONFIG_BITFIELD_KUNIT=m +CONFIG_BITFIELD_KUNIT_TEST=m CONFIG_RESOURCE_KUNIT_TEST=m -CONFIG_LINEAR_RANGES_TEST=m +CONFIG_LINEAR_RANGES_KUNIT_TEST=m CONFIG_CMDLINE_KUNIT_TEST=m -CONFIG_BITS_TEST=m
[PATCH 1/2] kunit: Fix formatting of KUNIT tests to meet the standard
There are few instances of KUNIT tests that are not properly defined. This commit focuses on correcting these issues to match the standard defined in the Documentation. Issues Fixed: - Tests should default to KUNIT_ALL_TESTS - Tests configs tristate should have `if !KUNIT_ALL_TESTS` - Tests should end in KUNIT_TEST, some fixes have been applied to correct issues were KUNIT_TESTS is used or KUNIT is not mentioned. No functional changes other than CONFIG name changes Signed-off-by: Nico Pache --- drivers/base/test/Kconfig | 2 +- drivers/base/test/Makefile | 2 +- fs/ext4/.kunitconfig | 2 +- fs/ext4/Kconfig| 2 +- fs/ext4/Makefile | 2 +- lib/Kconfig.debug | 21 + lib/Makefile | 6 +++--- net/mptcp/Kconfig | 2 +- net/mptcp/Makefile | 2 +- net/mptcp/crypto.c | 2 +- net/mptcp/token.c | 2 +- sound/soc/Kconfig | 2 +- sound/soc/Makefile | 4 ++-- 13 files changed, 28 insertions(+), 23 deletions(-) diff --git a/drivers/base/test/Kconfig b/drivers/base/test/Kconfig index ba225eb1b761..2f3fa31a948e 100644 --- a/drivers/base/test/Kconfig +++ b/drivers/base/test/Kconfig @@ -8,7 +8,7 @@ config TEST_ASYNC_DRIVER_PROBE The module name will be test_async_driver_probe.ko If unsure say N. -config KUNIT_DRIVER_PE_TEST +config DRIVER_PE_KUNIT_TEST bool "KUnit Tests for property entry API" if !KUNIT_ALL_TESTS depends on KUNIT=y default KUNIT_ALL_TESTS diff --git a/drivers/base/test/Makefile b/drivers/base/test/Makefile index 2f15fae8625f..64b2f3d744d5 100644 --- a/drivers/base/test/Makefile +++ b/drivers/base/test/Makefile @@ -1,5 +1,5 @@ # SPDX-License-Identifier: GPL-2.0 obj-$(CONFIG_TEST_ASYNC_DRIVER_PROBE) += test_async_driver_probe.o -obj-$(CONFIG_KUNIT_DRIVER_PE_TEST) += property-entry-test.o +obj-$(CONFIG_DRIVER_PE_KUNIT_TEST) += property-entry-test.o CFLAGS_REMOVE_property-entry-test.o += -fplugin-arg-structleak_plugin-byref -fplugin-arg-structleak_plugin-byref-all diff --git a/fs/ext4/.kunitconfig b/fs/ext4/.kunitconfig index bf51da7cd9fc..81d4da667740 100644 --- a/fs/ext4/.kunitconfig +++ b/fs/ext4/.kunitconfig @@ -1,3 +1,3 @@ CONFIG_KUNIT=y CONFIG_EXT4_FS=y -CONFIG_EXT4_KUNIT_TESTS=y +CONFIG_EXT4_KUNIT_TEST=y diff --git a/fs/ext4/Kconfig b/fs/ext4/Kconfig index 86699c8cab28..1569d3872136 100644 --- a/fs/ext4/Kconfig +++ b/fs/ext4/Kconfig @@ -101,7 +101,7 @@ config EXT4_DEBUG If you select Y here, then you will be able to turn on debugging using dynamic debug control for mb_debug() / ext_debug() msgs. -config EXT4_KUNIT_TESTS +config EXT4_KUNIT_TEST tristate "KUnit tests for ext4" if !KUNIT_ALL_TESTS depends on EXT4_FS && KUNIT default KUNIT_ALL_TESTS diff --git a/fs/ext4/Makefile b/fs/ext4/Makefile index 49e7af6cc93f..4e28e380d51b 100644 --- a/fs/ext4/Makefile +++ b/fs/ext4/Makefile @@ -15,5 +15,5 @@ ext4-y:= balloc.o bitmap.o block_validity.o dir.o ext4_jbd2.o extents.o \ ext4-$(CONFIG_EXT4_FS_POSIX_ACL) += acl.o ext4-$(CONFIG_EXT4_FS_SECURITY)+= xattr_security.o ext4-inode-test-objs += inode-test.o -obj-$(CONFIG_EXT4_KUNIT_TESTS) += ext4-inode-test.o +obj-$(CONFIG_EXT4_KUNIT_TEST) += ext4-inode-test.o ext4-$(CONFIG_FS_VERITY) += verity.o diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index 417c3d3e521b..7ade91511da6 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -2279,9 +2279,10 @@ config TEST_SYSCTL If unsure, say N. -config BITFIELD_KUNIT - tristate "KUnit test bitfield functions at runtime" +config BITFIELD_KUNIT_TEST_TEST + tristate "KUnit test bitfield functions at runtime" if !KUNIT_ALL_TESTS depends on KUNIT + default KUNIT_ALL_TESTS help Enable this option to test the bitfield functions at boot. @@ -2296,8 +2297,9 @@ config BITFIELD_KUNIT If unsure, say N. config RESOURCE_KUNIT_TEST - tristate "KUnit test for resource API" + tristate "KUnit test for resource API" if !KUNIT_ALL_TESTS depends on KUNIT + default KUNIT_ALL_TESTS help This builds the resource API unit test. Tests the logic of API provided by resource.c and ioport.h. @@ -2337,9 +2339,10 @@ config LIST_KUNIT_TEST If unsure, say N. -config LINEAR_RANGES_TEST - tristate "KUnit test for linear_ranges" +config LINEAR_RANGES_KUNIT_TEST + tristate "KUnit test for linear_ranges" if !KUNIT_ALL_TESTS depends on KUNIT + default KUNIT_ALL_TESTS select LINEAR_RANGES help This builds the linear_ranges unit test, which runs on boot. @@ -2350,8 +2353,9 @@ config LINEAR_RANGES_TEST If unsure, say N. config CMDLINE_KUNIT_TEST - tristate "KUnit test for cmdline API" + tristate "KUnit test for
Re: [PATCH v3] Documentation: kunit: add tips for running KUnit
On Wed, Apr 14, 2021 at 8:45 AM Daniel Latypov wrote: > > This is long overdue. > > There are several things that aren't nailed down (in-tree > .kunitconfig's), or partially broken (GCOV on UML), but having them > documented, warts and all, is better than having nothing. > > This covers a bunch of the more recent features > * kunit_filter_glob > * kunit.py run --kunitconfig > * slightly more detail on building tests as modules > * CONFIG_KUNIT_DEBUGFS > > By my count, the only headline features now not mentioned are the KASAN > integration and KernelCI json output support (kunit.py run --json). > > And then it also discusses how to get code coverage reports under UML > and non-UML since this is a question people have repeatedly asked. > > Non-UML coverage collection is no different from normal, but we should > probably explicitly call this out. > > As for UML, I was able to get it working again with two small hacks.* > E.g. with CONFIG_KUNIT=y && CONFIG_KUNIT_ALL_TESTS=y > Overall coverage rate: > lines..: 15.1% (18294 of 120776 lines) > functions..: 16.8% (1860 of 11050 functions) > > Note: this doesn't document --alltests since this is not stable yet. > Hopefully being run more frequently as part of KernelCI will help... > > *Using gcc/gcov-6 and not using uml_abort() in os_dump_core(). > I've documented these hacks in "Notes" but left TODOs for > brendanhigg...@google.com who tracked down the runtime issue in GCC. > To be clear: these are not issues specific to KUnit, but rather to UML. > > Signed-off-by: Daniel Latypov > --- I'm very happy with this now: all my issues with the previous versions are addressed. I'm particularly excited to have code coverage documented somewhere. Assuming Brendan's happy with the TODOs being there, I think this is ready to go. I also built this with Sphinx and gave it a quick look, and it all looks good there as well. Therefore, this is: Reviewed-by: David Gow Cheers, -- David
Re: [PATCH v4 1/2] mfd: google,cros-ec: add DT bindings for a baseboard's switch device
On Tue, Apr 13, 2021 at 10:57 PM Rob Herring wrote: > > On Mon, Apr 12, 2021 at 07:30:19PM +0800, Ikjoon Jang wrote: > > This is for ChromeOS tablets which have a 'cros_cbas' switch device > > in the "Whiskers" base board. This device can be instantiated only by > > device tree on ARM platforms. ChromeOS EC doesn't provide a way to > > probe the device. > > > > Signed-off-by: Ikjoon Jang > > > > --- > > > > Changes in v4: > > Define cros-cbase bindings inside google,cros-ec.yaml instead of > > a seperated binding document. > > > > .../devicetree/bindings/mfd/google,cros-ec.yaml | 16 > > 1 file changed, 16 insertions(+) > > > > diff --git a/Documentation/devicetree/bindings/mfd/google,cros-ec.yaml > > b/Documentation/devicetree/bindings/mfd/google,cros-ec.yaml > > index 76bf16ee27ec..c76809cd9f7f 100644 > > --- a/Documentation/devicetree/bindings/mfd/google,cros-ec.yaml > > +++ b/Documentation/devicetree/bindings/mfd/google,cros-ec.yaml > > @@ -127,6 +127,18 @@ patternProperties: > > type: object > > $ref: "/schemas/extcon/extcon-usbc-cros-ec.yaml#" > > > > + "^cbas$": > > Not a pattern, put under 'properties'. > > > +type: object > > +properties: > > + compatible: > > +const: google,cros-cbas > > Blank line > > > +required: > > + - compatible > > Blank line > > > +additionalProperties: false > > +description: > > Make this 1st or at least before 'properties'. ACKed, thanks! > > > + This device is used to signal when a detachable base is attached > > + to a Chrome OS tablet. > > This can't happen at runtime? There is no way to detect the switch device's existence at runtime. I'll add a note to the description about this. > > > + > > required: > >- compatible > > > > @@ -180,6 +192,10 @@ examples: > > interrupts = <99 0>; > > interrupt-parent = <>; > > spi-max-frequency = <500>; > > + > > +base_detection: cbas { > > +compatible = "google,cros-cbas"; > > +}; > > }; > > }; > > > > -- > > 2.31.1.295.g9ea45b61b8-goog > >
Re: [PATCH 1/2] iommu/sva: Tighten SVA bind API with explicit flags
Hi Jacob, I love your patch! Yet something to improve: [auto build test ERROR on vkoul-dmaengine/next] [also build test ERROR on char-misc/char-misc-testing v5.12-rc7] [cannot apply to iommu/next next-20210413] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch] url: https://github.com/0day-ci/linux/commits/Jacob-Pan/iommu-sva-Tighten-SVA-bind-API-with-explicit-flags/20210409-094521 base: https://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine.git next config: x86_64-randconfig-a016-20210413 (attached as .config) compiler: clang version 13.0.0 (https://github.com/llvm/llvm-project 9829f5e6b1bca9b61efc629770d28bb9014dec45) reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # install x86_64 cross compiling tool for clang build # apt-get install binutils-x86-64-linux-gnu # https://github.com/0day-ci/linux/commit/3b009446e2f451c401cff7a6d4766424acbcc890 git remote add linux-review https://github.com/0day-ci/linux git fetch --no-tags linux-review Jacob-Pan/iommu-sva-Tighten-SVA-bind-API-with-explicit-flags/20210409-094521 git checkout 3b009446e2f451c401cff7a6d4766424acbcc890 # save the attached .config to linux build tree COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=x86_64 If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot All errors (new ones prefixed by >>): >> drivers/dma/idxd/init.c:307:10: error: use of undeclared identifier >> 'IOMMU_SVA_BIND_SUPERVISOR' flags = IOMMU_SVA_BIND_SUPERVISOR; ^ drivers/dma/idxd/init.c:422:30: warning: shift count >= width of type [-Wshift-count-overflow] rc = pci_set_dma_mask(pdev, DMA_BIT_MASK(64)); ^~~~ include/linux/dma-mapping.h:76:54: note: expanded from macro 'DMA_BIT_MASK' #define DMA_BIT_MASK(n) (((n) == 64) ? ~0ULL : ((1ULL<<(n))-1)) ^ ~~~ drivers/dma/idxd/init.c:428:41: warning: shift count >= width of type [-Wshift-count-overflow] rc = pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(64)); ^~~~ include/linux/dma-mapping.h:76:54: note: expanded from macro 'DMA_BIT_MASK' #define DMA_BIT_MASK(n) (((n) == 64) ? ~0ULL : ((1ULL<<(n))-1)) ^ ~~~ 2 warnings and 1 error generated. vim +/IOMMU_SVA_BIND_SUPERVISOR +307 drivers/dma/idxd/init.c 300 301 static int idxd_enable_system_pasid(struct idxd_device *idxd) 302 { 303 unsigned int flags; 304 unsigned int pasid; 305 struct iommu_sva *sva; 306 > 307 flags = IOMMU_SVA_BIND_SUPERVISOR; 308 309 sva = iommu_sva_bind_device(>pdev->dev, NULL, flags); 310 if (IS_ERR(sva)) { 311 dev_warn(>pdev->dev, 312 "iommu sva bind failed: %ld\n", PTR_ERR(sva)); 313 return PTR_ERR(sva); 314 } 315 316 pasid = iommu_sva_get_pasid(sva); 317 if (pasid == IOMMU_PASID_INVALID) { 318 iommu_sva_unbind_device(sva); 319 return -ENODEV; 320 } 321 322 idxd->sva = sva; 323 idxd->pasid = pasid; 324 dev_dbg(>pdev->dev, "system pasid: %u\n", pasid); 325 return 0; 326 } 327 --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org .config.gz Description: application/gzip
Re: [PATCH] staging: mt7621-pci: stop using of_pci_range_to_resource
Hi Sergio, Just as an aside, are you planning to move staging/mt7621-pci into arch/mips/pci at some point? This driver seems more maintained (by you!) than many in-tree drivers... Ilya On Sat, Apr 10, 2021 at 12:23 PM Sergio Paracuellos wrote: > > Hi Ilya, > > On Sat, Apr 10, 2021 at 7:33 PM Ilya Lipnitskiy > wrote: > > > > The logic here was already overriding the erroneous IO addresses > > returned from of_pci_range_to_resource, which is the bulk of the logic. > > > > So stop using it altogether and initialize the fields explicitly, as > > done in aeba3731b150 ("powerpc/pci: Fix IO space breakage after > > of_pci_range_to_resource() change"). > > > > Signed-off-by: Ilya Lipnitskiy > > Cc: Sergio Paracuellos > > --- > > drivers/staging/mt7621-pci/pci-mt7621.c | 11 ++- > > 1 file changed, 6 insertions(+), 5 deletions(-) > > Looks good to me, thanks! I have also tested this in gnubee pc1 > platform with no regressions at all when io bars are assigned: > > [ 16.378956] mt7621-pci 1e14.pcie: host bridge /pcie@1e14 ranges: > [ 16.392405] mt7621-pci 1e14.pcie: MEM > 0x006000..0x006fff -> 0x00 > [ 16.408796] mt7621-pci 1e14.pcie: IO > 0x001e16..0x001e16 -> 0x00 > [ 16.425264] mt7621-pci-phy 1e149000.pcie-phy: PHY for 0xbe149000 > (dual port = 1) > [ 16.440452] mt7621-pci-phy 1e14a000.pcie-phy: PHY for 0xbe14a000 > (dual port = 0) > [ 16.678713] mt7621-pci 1e14.pcie: PCIE0 enabled > [ 16.688435] mt7621-pci 1e14.pcie: PCIE1 enabled > [ 16.698160] mt7621-pci 1e14.pcie: PCIE2 enabled > [ 16.707886] mt7621-pci 1e14.pcie: PCI coherence region base: > 0x6000, mask/settings: 0xf002 > [ 16.726623] mt7621-pci 1e14.pcie: PCI host bridge to bus :00 > [ 16.739309] pci_bus :00: root bus resource [io 0x1e16-0x1e16] > [ 16.753008] pci_bus :00: root bus resource [mem 0x6000-0x6fff] > [ 16.766709] pci_bus :00: root bus resource [bus 00-ff] > [ 16.777649] pci_bus :00: root bus resource [mem > 0x6000-0x6fff] (bus address [0x-0x0fff]) > [ 16.797986] pci :00:00.0: [0e8d:0801] type 01 class 0x060400 > [ 16.809973] pci :00:00.0: reg 0x10: [mem 0x-0x7fff] > [ 16.822467] pci :00:00.0: reg 0x14: initial BAR value 0x > invalid > [ 16.836511] pci :00:00.0: reg 0x14: [mem size 0x0001] > [ 16.848048] pci :00:00.0: supports D1 > [ 16.856051] pci :00:00.0: PME# supported from D0 D1 D3hot > [ 16.867943] pci :00:01.0: [0e8d:0801] type 01 class 0x060400 > [ 16.879960] pci :00:01.0: reg 0x10: [mem 0x-0x7fff] > [ 16.892454] pci :00:01.0: reg 0x14: initial BAR value 0x > invalid > [ 16.906497] pci :00:01.0: reg 0x14: [mem size 0x0001] > [ 16.918040] pci :00:01.0: supports D1 > [ 16.926031] pci :00:01.0: PME# supported from D0 D1 D3hot > [ 16.937903] pci :00:02.0: [0e8d:0801] type 01 class 0x060400 > [ 16.949915] pci :00:02.0: reg 0x10: [mem 0x-0x7fff] > [ 16.962412] pci :00:02.0: reg 0x14: initial BAR value 0x > invalid > [ 16.976466] pci :00:02.0: reg 0x14: [mem size 0x0001] > [ 16.987991] pci :00:02.0: supports D1 > [ 16.995986] pci :00:02.0: PME# supported from D0 D1 D3hot > [ 17.008716] pci :00:00.0: bridge configuration invalid ([bus > 00-00]), reconfiguring > [ 17.024695] pci :00:01.0: bridge configuration invalid ([bus > 00-00]), reconfiguring > [ 17.040654] pci :00:02.0: bridge configuration invalid ([bus > 00-00]), reconfiguring > [ 17.056868] pci :01:00.0: [1b21:0611] type 00 class 0x010185 > [ 17.068882] pci :01:00.0: reg 0x10: [io 0x-0x0007] > [ 17.080003] pci :01:00.0: reg 0x14: [io 0x-0x0003] > [ 17.091115] pci :01:00.0: reg 0x18: [io 0x-0x0007] > [ 17.102238] pci :01:00.0: reg 0x1c: [io 0x-0x0003] > [ 17.113350] pci :01:00.0: reg 0x20: [io 0x-0x000f] > [ 17.124463] pci :01:00.0: reg 0x24: initial BAR value 0x > invalid > [ 17.138508] pci :01:00.0: reg 0x24: [mem size 0x0200] > [ 17.150115] pci :01:00.0: 2.000 Gb/s available PCIe bandwidth, > limited by 2.5 GT/s PCIe x1 link at :00:00.0 (capable of 4.000 > Gb/s with 5.0 GT/s PCIe x1 link) > [ 17.204594] pci :00:00.0: PCI bridge to [bus 01-ff] > [ 17.215109] pci :00:00.0: bridge window [io 0x-0x0fff] > [ 17.227257] pci :00:00.0: bridge window [mem 0x6000-0x600f] > [ 17.240785] pci :00:00.0: bridge window [mem > 0x6000-0x600f pref] > [ 17.255183] pci_bus :01: busn_res: [bus 01-ff] end is updated to 01 > [ 17.268648] pci :02:00.0: [1b21:0611] type 00 class 0x010185 > [ 17.280671] pci :02:00.0: reg 0x10: [io 0x-0x0007] > [ 17.291785] pci :02:00.0: reg 0x14: [io 0x-0x0003] > [ 17.302898] pci :02:00.0: reg 0x18: [io
Re: [PATCH v2 2/3] soundwire: Intel: introduce DMI quirks for HP Spectre x360 Convertible
On 12-04-21, 14:37, Dave Hansen wrote: > On 3/1/21 11:51 PM, Bard Liao wrote: > > +++ b/drivers/soundwire/dmi-quirks.c > > @@ -0,0 +1,66 @@ > > +// SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause) > > +// Copyright(c) 2021 Intel Corporation. > > It looks like this is already in intel-next, so this may be moot. But, > is there a specific reason this is dual licensed? If so, can you please > include information about the license choice in the cover letter of any > future version? The soundwire module from Intel and core soundwire core was always dual licensed, so it kind of followed that.. > If there is no specific reason for this contribution to be dual > licensed, please make it GPL-2.0 only. This module, I would say NO. Unless someone from Intel disagree.. Pierre/Bard..? If all agree I dont see a reason why this cant be updated to GPL only. Thanks -- ~Vinod
Re: [PATCH 1/5] mm/swapfile: add percpu_ref support for swap
On Wed, Apr 14, 2021 at 11:59:03AM +0800, Huang, Ying wrote: > Dennis Zhou writes: > > > Hello, > > > > On Wed, Apr 14, 2021 at 10:06:48AM +0800, Huang, Ying wrote: > >> Miaohe Lin writes: > >> > >> > On 2021/4/14 9:17, Huang, Ying wrote: > >> >> Miaohe Lin writes: > >> >> > >> >>> On 2021/4/12 15:24, Huang, Ying wrote: > >> "Huang, Ying" writes: > >> > >> > Miaohe Lin writes: > >> > > >> >> We will use percpu-refcount to serialize against concurrent > >> >> swapoff. This > >> >> patch adds the percpu_ref support for later fixup. > >> >> > >> >> Signed-off-by: Miaohe Lin > >> >> --- > >> >> include/linux/swap.h | 2 ++ > >> >> mm/swapfile.c| 25 ++--- > >> >> 2 files changed, 24 insertions(+), 3 deletions(-) > >> >> > >> >> diff --git a/include/linux/swap.h b/include/linux/swap.h > >> >> index 144727041e78..849ba5265c11 100644 > >> >> --- a/include/linux/swap.h > >> >> +++ b/include/linux/swap.h > >> >> @@ -240,6 +240,7 @@ struct swap_cluster_list { > >> >> * The in-memory structure used to track swap areas. > >> >> */ > >> >> struct swap_info_struct { > >> >> + struct percpu_ref users;/* serialization against > >> >> concurrent swapoff */ > >> >> unsigned long flags; /* SWP_USED etc: see above */ > >> >> signed shortprio; /* swap priority of this type */ > >> >> struct plist_node list; /* entry in swap_active_head */ > >> >> @@ -260,6 +261,7 @@ struct swap_info_struct { > >> >> struct block_device *bdev; /* swap device or bdev of swap > >> >> file */ > >> >> struct file *swap_file; /* seldom referenced */ > >> >> unsigned int old_block_size;/* seldom referenced */ > >> >> + struct completion comp; /* seldom referenced */ > >> >> #ifdef CONFIG_FRONTSWAP > >> >> unsigned long *frontswap_map; /* frontswap in-use, one bit > >> >> per page */ > >> >> atomic_t frontswap_pages; /* frontswap pages in-use > >> >> counter */ > >> >> diff --git a/mm/swapfile.c b/mm/swapfile.c > >> >> index 149e77454e3c..724173cd7d0c 100644 > >> >> --- a/mm/swapfile.c > >> >> +++ b/mm/swapfile.c > >> >> @@ -39,6 +39,7 @@ > >> >> #include > >> >> #include > >> >> #include > >> >> +#include > >> >> > >> >> #include > >> >> #include > >> >> @@ -511,6 +512,15 @@ static void swap_discard_work(struct > >> >> work_struct *work) > >> >> spin_unlock(>lock); > >> >> } > >> >> > >> >> +static void swap_users_ref_free(struct percpu_ref *ref) > >> >> +{ > >> >> + struct swap_info_struct *si; > >> >> + > >> >> + si = container_of(ref, struct swap_info_struct, users); > >> >> + complete(>comp); > >> >> + percpu_ref_exit(>users); > >> > > >> > Because percpu_ref_exit() is used, we cannot use percpu_ref_tryget() > >> > in > >> > get_swap_device(), better to add comments there. > >> > >> I just noticed that the comments of percpu_ref_tryget_live() says, > >> > >> * This function is safe to call as long as @ref is between init and > >> exit. > >> > >> While we need to call get_swap_device() almost at any time, so it's > >> better to avoid to call percpu_ref_exit() at all. This will waste > >> some > >> memory, but we need to follow the API definition to avoid potential > >> issues in the long term. > >> >>> > >> >>> I have to admit that I'am not really familiar with percpu_ref. So I > >> >>> read the > >> >>> implementation code of the percpu_ref and found > >> >>> percpu_ref_tryget_live() could > >> >>> be called after exit now. But you're right we need to follow the API > >> >>> definition > >> >>> to avoid potential issues in the long term. > >> >>> > >> > >> And we need to call percpu_ref_init() before insert the > >> swap_info_struct > >> into the swap_info[]. > >> >>> > >> >>> If we remove the call to percpu_ref_exit(), we should not use > >> >>> percpu_ref_init() > >> >>> here because *percpu_ref->data is assumed to be NULL* in > >> >>> percpu_ref_init() while > >> >>> this is not the case as we do not call percpu_ref_exit(). Maybe > >> >>> percpu_ref_reinit() > >> >>> or percpu_ref_resurrect() will do the work. > >> >>> > >> >>> One more thing, how could I distinguish the killed percpu_ref from > >> >>> newly allocated one? > >> >>> It seems percpu_ref_is_dying is only safe to call when @ref is between > >> >>> init and exit. > >> >>> Maybe I could do this in alloc_swap_info()? > >> >> > >> >> Yes. In alloc_swap_info(), you can distinguish newly allocated and > >> >> reused swap_info_struct. > >> >> > >> > >> >> +} > >> >> + > >> >> static void alloc_cluster(struct swap_info_struct *si,
Re: [PATCH v2 1/1] powerpc/iommu: Enable remaining IOMMU Pagesizes present in LoPAR
On Mon, 2021-04-12 at 17:21 -0500, Segher Boessenkool wrote: > On Fri, Apr 09, 2021 at 02:36:16PM +1000, Alexey Kardashevskiy wrote: > > On 08/04/2021 19:04, Michael Ellerman wrote: > > > > > > +#define QUERY_DDW_PGSIZE_4K0x01 > > > > > > +#define QUERY_DDW_PGSIZE_64K 0x02 > > > > > > +#define QUERY_DDW_PGSIZE_16M 0x04 > > > > > > +#define QUERY_DDW_PGSIZE_32M 0x08 > > > > > > +#define QUERY_DDW_PGSIZE_64M 0x10 > > > > > > +#define QUERY_DDW_PGSIZE_128M 0x20 > > > > > > +#define QUERY_DDW_PGSIZE_256M 0x40 > > > > > > +#define QUERY_DDW_PGSIZE_16G 0x80 > > > > > > > > > > I'm not sure the #defines really gain us much vs just putting the > > > > > literal values in the array below? > > > > > > > > Then someone says "u magic values" :) I do not mind either way. > > > > Thanks, > > > > > > Yeah that's true. But #defining them doesn't make them less magic, if > > > you only use them in one place :) > > > > Defining them with "QUERY_DDW" in the names kinda tells where they are > > from. Can also grep QEMU using these to see how the other side handles > > it. Dunno. > > And *not* defining anything reduces the mental load a lot. You can add > a comment at the single spot you use them, explaining what this is, in a > much better way! > > Comments are *good*. > > > Segher Thanks for the feedback Alexey, Michael and Segher! I have sent a v3 for this patch. http://patchwork.ozlabs.org/project/linuxppc-dev/patch/20210408201915.174217-1-leobra...@gmail.com/ Please let me know of your feedback in it. Best regards, Leonardo Bras
Re: Question on KASAN calltrace record in RT
On Tue, 2021-04-13 at 17:29 +0200, Dmitry Vyukov wrote: > On Tue, Apr 6, 2021 at 10:26 AM Zhang, Qiang > wrote: > > > > Hello everyone > > > > In RT system, after Andrew test, found the following calltrace , > > in KASAN, we record callstack through stack_depot_save(), in this function, > > may be call alloc_pages, but in RT, the spin_lock replace with > > rt_mutex in alloc_pages(), if before call this function, the irq is > > disabled, > > will trigger following calltrace. > > > > maybe add array[KASAN_STACK_DEPTH] in struct kasan_track to record > > callstack in RT system. > > > > Is there a better solution ? > > Hi Qiang, > > Adding 2 full stacks per heap object can increase memory usage too much. > The stackdepot has a preallocation mechanism, I would start with > adding interrupts check here: > https://elixir.bootlin.com/linux/v5.12-rc7/source/lib/stackdepot.c#L294 > and just not do preallocation in interrupt context. This will solve > the problem, right? Hm, this thing might actually be (sorta?) working, modulo one startup gripe. The CRASH_DUMP inspired gripe I get with !RT appeared (and shut up when told I don't care given kdump has worked just fine for ages:), but no more might_sleep() gripeage. CONFIG_KASAN_SHADOW_OFFSET=0xdc00 CONFIG_HAVE_ARCH_KASAN=y CONFIG_HAVE_ARCH_KASAN_VMALLOC=y CONFIG_CC_HAS_KASAN_GENERIC=y CONFIG_KASAN=y CONFIG_KASAN_GENERIC=y CONFIG_KASAN_OUTLINE=y # CONFIG_KASAN_INLINE is not set CONFIG_KASAN_STACK=1 CONFIG_KASAN_VMALLOC=y # CONFIG_KASAN_MODULE_TEST is not set --- lib/stackdepot.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) --- a/lib/stackdepot.c +++ b/lib/stackdepot.c @@ -71,7 +71,7 @@ static void *stack_slabs[STACK_ALLOC_MAX static int depot_index; static int next_slab_inited; static size_t depot_offset; -static DEFINE_SPINLOCK(depot_lock); +static DEFINE_RAW_SPINLOCK(depot_lock); static bool init_stack_slab(void **prealloc) { @@ -265,7 +265,7 @@ depot_stack_handle_t stack_depot_save(un struct page *page = NULL; void *prealloc = NULL; unsigned long flags; - u32 hash; + u32 hash, may_prealloc = !IS_ENABLED(CONFIG_PREEMPT_RT) || preemptible(); if (unlikely(nr_entries == 0) || stack_depot_disable) goto fast_exit; @@ -291,7 +291,7 @@ depot_stack_handle_t stack_depot_save(un * The smp_load_acquire() here pairs with smp_store_release() to * |next_slab_inited| in depot_alloc_stack() and init_stack_slab(). */ - if (unlikely(!smp_load_acquire(_slab_inited))) { + if (unlikely(!smp_load_acquire(_slab_inited) && may_prealloc)) { /* * Zero out zone modifiers, as we don't have specific zone * requirements. Keep the flags related to allocation in atomic @@ -305,7 +305,7 @@ depot_stack_handle_t stack_depot_save(un prealloc = page_address(page); } - spin_lock_irqsave(_lock, flags); + raw_spin_lock_irqsave(_lock, flags); found = find_stack(*bucket, entries, nr_entries, hash); if (!found) { @@ -329,7 +329,7 @@ depot_stack_handle_t stack_depot_save(un WARN_ON(!init_stack_slab()); } - spin_unlock_irqrestore(_lock, flags); + raw_spin_unlock_irqrestore(_lock, flags); exit: if (prealloc) { /* Nobody used this memory, ok to free it. */ [0.692437] BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:943 [0.692439] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1, name: swapper/0 [0.692442] Preemption disabled at: [0.692443] [] on_each_cpu_cond_mask+0x30/0xb0 [0.692451] CPU: 5 PID: 1 Comm: swapper/0 Not tainted 5.12.0.g2afefec-tip-rt #5 [0.692454] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 09/23/2013 [0.692456] Call Trace: [0.692458] ? on_each_cpu_cond_mask+0x30/0xb0 [0.692462] dump_stack+0x8a/0xb5 [0.692467] ___might_sleep.cold+0xfe/0x112 [0.692471] rt_spin_lock+0x1c/0x60 [0.692475] free_unref_page+0x117/0x3c0 [0.692481] qlist_free_all+0x60/0xd0 [0.692485] per_cpu_remove_cache+0x5b/0x70 [0.692488] smp_call_function_many_cond+0x185/0x3d0 [0.692492] ? qlist_move_cache+0xe0/0xe0 [0.692495] ? qlist_move_cache+0xe0/0xe0 [0.692497] on_each_cpu_cond_mask+0x44/0xb0 [0.692501] kasan_quarantine_remove_cache+0x52/0xf0 [0.692505] ? acpi_bus_init+0x183/0x183 [0.692510] kmem_cache_shrink+0xe/0x20 [0.692513] acpi_os_purge_cache+0xa/0x10 [0.692517] acpi_purge_cached_objects+0x1d/0x68 [0.692522] acpi_initialize_objects+0x11/0x39 [0.692524] ? acpi_ev_install_xrupt_handlers+0x6f/0x7c [0.692529] acpi_bus_init+0x50/0x183 [0.692532] acpi_init+0xce/0x182 [0.692536] ? acpi_bus_init+0x183/0x183 [0.692539] ? intel_idle_init+0x36d/0x36d [0.692543] ? acpi_bus_init+0x183/0x183 [0.692546]
Re: [PATCH 1/5] mm/swapfile: add percpu_ref support for swap
Dennis Zhou writes: > Hello, > > On Wed, Apr 14, 2021 at 10:06:48AM +0800, Huang, Ying wrote: >> Miaohe Lin writes: >> >> > On 2021/4/14 9:17, Huang, Ying wrote: >> >> Miaohe Lin writes: >> >> >> >>> On 2021/4/12 15:24, Huang, Ying wrote: >> "Huang, Ying" writes: >> >> > Miaohe Lin writes: >> > >> >> We will use percpu-refcount to serialize against concurrent swapoff. >> >> This >> >> patch adds the percpu_ref support for later fixup. >> >> >> >> Signed-off-by: Miaohe Lin >> >> --- >> >> include/linux/swap.h | 2 ++ >> >> mm/swapfile.c| 25 ++--- >> >> 2 files changed, 24 insertions(+), 3 deletions(-) >> >> >> >> diff --git a/include/linux/swap.h b/include/linux/swap.h >> >> index 144727041e78..849ba5265c11 100644 >> >> --- a/include/linux/swap.h >> >> +++ b/include/linux/swap.h >> >> @@ -240,6 +240,7 @@ struct swap_cluster_list { >> >> * The in-memory structure used to track swap areas. >> >> */ >> >> struct swap_info_struct { >> >> + struct percpu_ref users;/* serialization against >> >> concurrent swapoff */ >> >> unsigned long flags; /* SWP_USED etc: see above */ >> >> signed shortprio; /* swap priority of this type */ >> >> struct plist_node list; /* entry in swap_active_head */ >> >> @@ -260,6 +261,7 @@ struct swap_info_struct { >> >> struct block_device *bdev; /* swap device or bdev of swap >> >> file */ >> >> struct file *swap_file; /* seldom referenced */ >> >> unsigned int old_block_size;/* seldom referenced */ >> >> + struct completion comp; /* seldom referenced */ >> >> #ifdef CONFIG_FRONTSWAP >> >> unsigned long *frontswap_map; /* frontswap in-use, one bit >> >> per page */ >> >> atomic_t frontswap_pages; /* frontswap pages in-use >> >> counter */ >> >> diff --git a/mm/swapfile.c b/mm/swapfile.c >> >> index 149e77454e3c..724173cd7d0c 100644 >> >> --- a/mm/swapfile.c >> >> +++ b/mm/swapfile.c >> >> @@ -39,6 +39,7 @@ >> >> #include >> >> #include >> >> #include >> >> +#include >> >> >> >> #include >> >> #include >> >> @@ -511,6 +512,15 @@ static void swap_discard_work(struct work_struct >> >> *work) >> >> spin_unlock(>lock); >> >> } >> >> >> >> +static void swap_users_ref_free(struct percpu_ref *ref) >> >> +{ >> >> + struct swap_info_struct *si; >> >> + >> >> + si = container_of(ref, struct swap_info_struct, users); >> >> + complete(>comp); >> >> + percpu_ref_exit(>users); >> > >> > Because percpu_ref_exit() is used, we cannot use percpu_ref_tryget() in >> > get_swap_device(), better to add comments there. >> >> I just noticed that the comments of percpu_ref_tryget_live() says, >> >> * This function is safe to call as long as @ref is between init and >> exit. >> >> While we need to call get_swap_device() almost at any time, so it's >> better to avoid to call percpu_ref_exit() at all. This will waste some >> memory, but we need to follow the API definition to avoid potential >> issues in the long term. >> >>> >> >>> I have to admit that I'am not really familiar with percpu_ref. So I read >> >>> the >> >>> implementation code of the percpu_ref and found percpu_ref_tryget_live() >> >>> could >> >>> be called after exit now. But you're right we need to follow the API >> >>> definition >> >>> to avoid potential issues in the long term. >> >>> >> >> And we need to call percpu_ref_init() before insert the swap_info_struct >> into the swap_info[]. >> >>> >> >>> If we remove the call to percpu_ref_exit(), we should not use >> >>> percpu_ref_init() >> >>> here because *percpu_ref->data is assumed to be NULL* in >> >>> percpu_ref_init() while >> >>> this is not the case as we do not call percpu_ref_exit(). Maybe >> >>> percpu_ref_reinit() >> >>> or percpu_ref_resurrect() will do the work. >> >>> >> >>> One more thing, how could I distinguish the killed percpu_ref from newly >> >>> allocated one? >> >>> It seems percpu_ref_is_dying is only safe to call when @ref is between >> >>> init and exit. >> >>> Maybe I could do this in alloc_swap_info()? >> >> >> >> Yes. In alloc_swap_info(), you can distinguish newly allocated and >> >> reused swap_info_struct. >> >> >> >> >> +} >> >> + >> >> static void alloc_cluster(struct swap_info_struct *si, unsigned long >> >> idx) >> >> { >> >> struct swap_cluster_info *ci = si->cluster_info; >> >> @@ -2500,7 +2510,7 @@ static void enable_swap_info(struct >> >> swap_info_struct *p, int prio, >> >>* Guarantee swap_map, cluster_info, etc. fields are valid >> >>
Re: [PATCH RFC 6/6] dcache: prevent flooding with negative dentries
On Thu, Jan 21, 2021 at 06:49:45PM +0530, Gautham Ananthakrishna wrote: > + spin_lock(>d_lock); > + parent = lock_parent(victim); > + > + rcu_read_unlock(); Similar story. As soon as you hit that rcu_read_unlock(), the memory pointed to by victim might be reused. If you have hit __lock_parent(), victim->d_lock had been dropped and regained. Which means that freeing might've been already scheduled. Unlike #1/6, here you won't get memory corruption in lock_parent() itself, but... > + > + if (d_count(victim) || !d_is_negative(victim) || > + (victim->d_flags & DCACHE_REFERENCED)) { > + if (parent) > + spin_unlock(>d_lock); > + spin_unlock(>d_lock); ... starting from here you just might.
the qemu-nbd process automatically exit with the commit 43347d56c 'livepatch: send a fake signal to all blocking tasks'
I found the qemu-nbd process(started with qemu-nbd -t -c /dev/nbd0 nbd.qcow2) will automatically exit when I patched for functions of the nbd with livepatch. The nbd relative source: static int nbd_start_device_ioctl(struct nbd_device *nbd, struct block_device *bdev) { struct nbd_config *config = nbd->config; int ret; ret = nbd_start_device(nbd); if (ret) return ret; if (max_part) bdev->bd_invalidated = 1; mutex_unlock(>config_lock); ret = wait_event_interruptible(config->recv_wq, atomic_read(>recv_threads) == 0); if (ret) sock_shutdown(nbd); flush_workqueue(nbd->recv_workq); mutex_lock(>config_lock); nbd_bdev_reset(bdev); /* user requested, ignore socket errors */ if (test_bit(NBD_RT_DISCONNECT_REQUESTED, >runtime_flags)) ret = 0; if (test_bit(NBD_RT_TIMEDOUT, >runtime_flags)) ret = -ETIMEDOUT; return ret; } When the nbd waits for atomic_read(>recv_threads) == 0, the klp will send a fake signal to it then the qemu-nbd process exits. And the signal of sysfs to control this action was removed in the commit 10b3d52790e 'livepatch: Remove signal sysfs attribute'. Are there other ways to control this action? How? Thanks very much.
[PATCH] netfilter: nf_conntrack: Add conntrack helper for ESP/IPsec
Introduce changes to add ESP connection tracking helper to netfilter conntrack. The connection tracking of ESP is based on IPsec SPIs. The underlying motivation for this patch was to allow multiple VPN ESP clients to be distinguished when using NAT. Added config flag CONFIG_NF_CT_PROTO_ESP to enable the ESP/IPsec conntrack helper. Signed-off-by: Cole Dishington --- .../linux/netfilter/nf_conntrack_proto_esp.h | 25 + .../net/netfilter/ipv4/nf_conntrack_ipv4.h| 3 + include/net/netfilter/nf_conntrack.h | 2 + include/net/netfilter/nf_conntrack_l4proto.h | 15 + include/net/netfilter/nf_conntrack_tuple.h| 3 + include/net/netns/conntrack.h | 24 + .../netfilter/nf_conntrack_tuple_common.h | 3 + .../linux/netfilter/nfnetlink_conntrack.h | 2 + net/netfilter/Kconfig | 10 + net/netfilter/Makefile| 1 + net/netfilter/nf_conntrack_core.c | 23 + net/netfilter/nf_conntrack_netlink.c | 4 +- net/netfilter/nf_conntrack_proto.c| 6 + net/netfilter/nf_conntrack_proto_esp.c| 535 ++ net/netfilter/nf_conntrack_standalone.c | 5 + net/netfilter/nf_internals.h | 4 +- 16 files changed, 663 insertions(+), 2 deletions(-) create mode 100644 include/linux/netfilter/nf_conntrack_proto_esp.h create mode 100644 net/netfilter/nf_conntrack_proto_esp.c diff --git a/include/linux/netfilter/nf_conntrack_proto_esp.h b/include/linux/netfilter/nf_conntrack_proto_esp.h new file mode 100644 index ..2441e031c68e --- /dev/null +++ b/include/linux/netfilter/nf_conntrack_proto_esp.h @@ -0,0 +1,25 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _CONNTRACK_PROTO_ESP_H +#define _CONNTRACK_PROTO_ESP_H +#include + +/* ESP PROTOCOL HEADER */ + +struct esphdr { + __u32 spi; +}; + +struct nf_ct_esp { + unsigned int stream_timeout; + unsigned int timeout; +}; + +#ifdef __KERNEL__ +#include + +void destroy_esp_conntrack_entry(struct nf_conn *ct); + +bool esp_pkt_to_tuple(const struct sk_buff *skb, unsigned int dataoff, + struct net *net, struct nf_conntrack_tuple *tuple); +#endif /* __KERNEL__ */ +#endif /* _CONNTRACK_PROTO_ESP_H */ diff --git a/include/net/netfilter/ipv4/nf_conntrack_ipv4.h b/include/net/netfilter/ipv4/nf_conntrack_ipv4.h index 2c8c2b023848..1aee91592639 100644 --- a/include/net/netfilter/ipv4/nf_conntrack_ipv4.h +++ b/include/net/netfilter/ipv4/nf_conntrack_ipv4.h @@ -25,5 +25,8 @@ extern const struct nf_conntrack_l4proto nf_conntrack_l4proto_udplite; #ifdef CONFIG_NF_CT_PROTO_GRE extern const struct nf_conntrack_l4proto nf_conntrack_l4proto_gre; #endif +#ifdef CONFIG_NF_CT_PROTO_ESP +extern const struct nf_conntrack_l4proto nf_conntrack_l4proto_esp; +#endif #endif /*_NF_CONNTRACK_IPV4_H*/ diff --git a/include/net/netfilter/nf_conntrack.h b/include/net/netfilter/nf_conntrack.h index 439379ca9ffa..2bd1d94de138 100644 --- a/include/net/netfilter/nf_conntrack.h +++ b/include/net/netfilter/nf_conntrack.h @@ -21,6 +21,7 @@ #include #include #include +#include #include @@ -36,6 +37,7 @@ union nf_conntrack_proto { struct ip_ct_tcp tcp; struct nf_ct_udp udp; struct nf_ct_gre gre; + struct nf_ct_esp esp; unsigned int tmpl_padto; }; diff --git a/include/net/netfilter/nf_conntrack_l4proto.h b/include/net/netfilter/nf_conntrack_l4proto.h index 96f9cf81f46b..ec89e83ff20e 100644 --- a/include/net/netfilter/nf_conntrack_l4proto.h +++ b/include/net/netfilter/nf_conntrack_l4proto.h @@ -75,6 +75,8 @@ bool nf_conntrack_invert_icmp_tuple(struct nf_conntrack_tuple *tuple, const struct nf_conntrack_tuple *orig); bool nf_conntrack_invert_icmpv6_tuple(struct nf_conntrack_tuple *tuple, const struct nf_conntrack_tuple *orig); +bool nf_conntrack_invert_esp_tuple(struct nf_conntrack_tuple *tuple, + const struct nf_conntrack_tuple *orig); int nf_conntrack_inet_error(struct nf_conn *tmpl, struct sk_buff *skb, unsigned int dataoff, @@ -132,6 +134,11 @@ int nf_conntrack_gre_packet(struct nf_conn *ct, unsigned int dataoff, enum ip_conntrack_info ctinfo, const struct nf_hook_state *state); +int nf_conntrack_esp_packet(struct nf_conn *ct, + struct sk_buff *skb, + unsigned int dataoff, + enum ip_conntrack_info ctinfo, + const struct nf_hook_state *state); void nf_conntrack_generic_init_net(struct net *net); void nf_conntrack_tcp_init_net(struct net *net); @@ -141,6 +148,7 @@ void nf_conntrack_dccp_init_net(struct net *net); void nf_conntrack_sctp_init_net(struct net *net); void nf_conntrack_icmp_init_net(struct net
Re: [PATCH v10] i2c: virtio: add a virtio i2c frontend driver
On 14-04-21, 10:07, Jie Deng wrote: > Hi maintainers, > > What's the status of this patch based on the review comments you got ? I was expecting a new version to be honest.. > Is i2c/for-next the right tree to merge it > ? It should be. -- viresh
Re: [PATCH 1/5] mm/swapfile: add percpu_ref support for swap
Hello, On Wed, Apr 14, 2021 at 10:06:48AM +0800, Huang, Ying wrote: > Miaohe Lin writes: > > > On 2021/4/14 9:17, Huang, Ying wrote: > >> Miaohe Lin writes: > >> > >>> On 2021/4/12 15:24, Huang, Ying wrote: > "Huang, Ying" writes: > > > Miaohe Lin writes: > > > >> We will use percpu-refcount to serialize against concurrent swapoff. > >> This > >> patch adds the percpu_ref support for later fixup. > >> > >> Signed-off-by: Miaohe Lin > >> --- > >> include/linux/swap.h | 2 ++ > >> mm/swapfile.c| 25 ++--- > >> 2 files changed, 24 insertions(+), 3 deletions(-) > >> > >> diff --git a/include/linux/swap.h b/include/linux/swap.h > >> index 144727041e78..849ba5265c11 100644 > >> --- a/include/linux/swap.h > >> +++ b/include/linux/swap.h > >> @@ -240,6 +240,7 @@ struct swap_cluster_list { > >> * The in-memory structure used to track swap areas. > >> */ > >> struct swap_info_struct { > >> + struct percpu_ref users;/* serialization against > >> concurrent swapoff */ > >>unsigned long flags; /* SWP_USED etc: see above */ > >>signed shortprio; /* swap priority of this type */ > >>struct plist_node list; /* entry in swap_active_head */ > >> @@ -260,6 +261,7 @@ struct swap_info_struct { > >>struct block_device *bdev; /* swap device or bdev of swap > >> file */ > >>struct file *swap_file; /* seldom referenced */ > >>unsigned int old_block_size;/* seldom referenced */ > >> + struct completion comp; /* seldom referenced */ > >> #ifdef CONFIG_FRONTSWAP > >>unsigned long *frontswap_map; /* frontswap in-use, one bit > >> per page */ > >>atomic_t frontswap_pages; /* frontswap pages in-use > >> counter */ > >> diff --git a/mm/swapfile.c b/mm/swapfile.c > >> index 149e77454e3c..724173cd7d0c 100644 > >> --- a/mm/swapfile.c > >> +++ b/mm/swapfile.c > >> @@ -39,6 +39,7 @@ > >> #include > >> #include > >> #include > >> +#include > >> > >> #include > >> #include > >> @@ -511,6 +512,15 @@ static void swap_discard_work(struct work_struct > >> *work) > >>spin_unlock(>lock); > >> } > >> > >> +static void swap_users_ref_free(struct percpu_ref *ref) > >> +{ > >> + struct swap_info_struct *si; > >> + > >> + si = container_of(ref, struct swap_info_struct, users); > >> + complete(>comp); > >> + percpu_ref_exit(>users); > > > > Because percpu_ref_exit() is used, we cannot use percpu_ref_tryget() in > > get_swap_device(), better to add comments there. > > I just noticed that the comments of percpu_ref_tryget_live() says, > > * This function is safe to call as long as @ref is between init and > exit. > > While we need to call get_swap_device() almost at any time, so it's > better to avoid to call percpu_ref_exit() at all. This will waste some > memory, but we need to follow the API definition to avoid potential > issues in the long term. > >>> > >>> I have to admit that I'am not really familiar with percpu_ref. So I read > >>> the > >>> implementation code of the percpu_ref and found percpu_ref_tryget_live() > >>> could > >>> be called after exit now. But you're right we need to follow the API > >>> definition > >>> to avoid potential issues in the long term. > >>> > > And we need to call percpu_ref_init() before insert the swap_info_struct > into the swap_info[]. > >>> > >>> If we remove the call to percpu_ref_exit(), we should not use > >>> percpu_ref_init() > >>> here because *percpu_ref->data is assumed to be NULL* in > >>> percpu_ref_init() while > >>> this is not the case as we do not call percpu_ref_exit(). Maybe > >>> percpu_ref_reinit() > >>> or percpu_ref_resurrect() will do the work. > >>> > >>> One more thing, how could I distinguish the killed percpu_ref from newly > >>> allocated one? > >>> It seems percpu_ref_is_dying is only safe to call when @ref is between > >>> init and exit. > >>> Maybe I could do this in alloc_swap_info()? > >> > >> Yes. In alloc_swap_info(), you can distinguish newly allocated and > >> reused swap_info_struct. > >> > > >> +} > >> + > >> static void alloc_cluster(struct swap_info_struct *si, unsigned long > >> idx) > >> { > >>struct swap_cluster_info *ci = si->cluster_info; > >> @@ -2500,7 +2510,7 @@ static void enable_swap_info(struct > >> swap_info_struct *p, int prio, > >> * Guarantee swap_map, cluster_info, etc. fields are valid > >> * between get/put_swap_device() if SWP_VALID bit is set > >> */ > >> - synchronize_rcu(); > >> +
Re: [PATCH RFC 1/6] dcache: sweep cached negative dentries to the end of list of siblings
On Thu, Jan 21, 2021 at 06:49:40PM +0530, Gautham Ananthakrishna wrote: > +static void sweep_negative(struct dentry *dentry) > +{ > + struct dentry *parent; > + > + if (!d_is_tail_negative(dentry)) { > + parent = lock_parent(dentry); > + if (!parent) > + return; Wait a minute. It's not a good environment for calling lock_parent(). Who said that dentry won't get freed right under it? Right now callers of __lock_parent() either hold a reference to dentry *or* are called for a positive dentry, with inode->i_lock held. You are introducing something very different - > if (likely(retain_dentry(dentry))) { > + if (d_is_negative(dentry)) > + sweep_negative(dentry); > spin_unlock(>d_lock); Here we can be called for a negative dentry with refcount already *NOT* held by us. Look: static inline struct dentry *lock_parent(struct dentry *dentry) { struct dentry *parent = dentry->d_parent; if (IS_ROOT(dentry)) return NULL; isn't a root if (likely(spin_trylock(>d_lock))) return parent; no such luck - someone's already holding parent's ->d_lock return __lock_parent(dentry); and here we have static struct dentry *__lock_parent(struct dentry *dentry) { struct dentry *parent; rcu_read_lock(); OK, anything we see in its ->d_parent is guaranteed to stay allocated until we get to matching rcu_read_unlock() spin_unlock(>d_lock); dropped the spinlock, now it's fair game for d_move(), d_drop(), etc. again: parent = READ_ONCE(dentry->d_parent); dentry couldn't have been reused, so it's the last value stored there. Points to still allocated struct dentry instance, so we can... spin_lock(>d_lock); grab its ->d_lock. /* * We can't blindly lock dentry until we are sure * that we won't violate the locking order. * Any changes of dentry->d_parent must have * been done with parent->d_lock held, so * spin_lock() above is enough of a barrier * for checking if it's still our child. */ if (unlikely(parent != dentry->d_parent)) { spin_unlock(>d_lock); goto again; } Nevermind, it's still equal to our ->d_parent. So we have the last valid parent's ->d_lock held rcu_read_unlock(); What's to hold dentry allocated now? IF we held its refcount - no problem, it can't go away. If we held its ->d_inode->i_lock - ditto (it wouldn't get to __dentry_kill() until we drop that, since all callers do acquire that lock and it couldn't get scheduled for freeing until it gets through most of __dentry_kill()). IOW, we are free to grab dentry->d_lock again. if (parent != dentry) spin_lock_nested(>d_lock, DENTRY_D_LOCK_NESTED); else parent = NULL; return parent; } With your patch, though, you've got a call site where neither condition is guaranteed. Current kernel is fine - we are holding ->d_lock there, and we don't touch dentry after it gets dropped. Again, it can't get scheduled for freeing until after we drop ->d_lock, so we are safe. With that change, however, you've got a hard-to-hit memory corruptor there...
Re: [PATCH v2 00/16] Multigenerational LRU Framework
On Tue, Apr 13, 2021 at 5:14 PM Dave Chinner wrote: > > On Tue, Apr 13, 2021 at 10:13:24AM -0600, Jens Axboe wrote: > > On 4/13/21 1:51 AM, SeongJae Park wrote: > > > From: SeongJae Park > > > > > > Hello, > > > > > > > > > Very interesting work, thank you for sharing this :) > > > > > > On Tue, 13 Apr 2021 00:56:17 -0600 Yu Zhao wrote: > > > > > >> What's new in v2 > > >> > > >> Special thanks to Jens Axboe for reporting a regression in buffered > > >> I/O and helping test the fix. > > > > > > Is the discussion open? If so, could you please give me a link? > > > > I wasn't on the initial post (or any of the lists it was posted to), but > > it's on the google page reclaim list. Not sure if that is public or not. > > > > tldr is that I was pretty excited about this work, as buffered IO tends > > to suck (a lot) for high throughput applications. My test case was > > pretty simple: > > > > Randomly read a fast device, using 4k buffered IO, and watch what > > happens when the page cache gets filled up. For this particular test, > > we'll initially be doing 2.1GB/sec of IO, and then drop to 1.5-1.6GB/sec > > with kswapd using a lot of CPU trying to keep up. That's mainline > > behavior. > > I see this exact same behaviour here, too, but I RCA'd it to > contention between the inode and memory reclaim for the mapping > structure that indexes the page cache. Basically the mapping tree > lock is the contention point here - you can either be adding pages > to the mapping during IO, or memory reclaim can be removing pages > from the mapping, but we can't do both at once. > > So we end up with kswapd spinning on the mapping tree lock like so > when doing 1.6GB/s in 4kB buffered IO: > > - 20.06% 0.00% [kernel] [k] kswapd > >▒ >- 20.06% kswapd > >▒ > - 20.05% balance_pgdat > >▒ > - 20.03% shrink_node > >▒ > - 19.92% shrink_lruvec > >▒ >- 19.91% shrink_inactive_list > >▒ > - 19.22% shrink_page_list > >▒ > - 17.51% __remove_mapping > >▒ > - 14.16% _raw_spin_lock_irqsave > >▒ >- 14.14% do_raw_spin_lock > >▒ > __pv_queued_spin_lock_slowpath > >▒ > - 1.56% __delete_from_page_cache > >▒ > 0.63% xas_store > >▒ > - 0.78% _raw_spin_unlock_irqrestore > >▒ >- 0.69% do_raw_spin_unlock > >▒ > __raw_callee_save___pv_queued_spin_unlock > >▒ > - 0.82% free_unref_page_list > >▒ > - 0.72% free_unref_page_commit > >▒ > 0.57% free_pcppages_bulk > >▒ > > And these are the processes consuming CPU: > >5171 root
Re: [PATCH 2/5] swap: fix do_swap_page() race with swapoff
On 2021/4/14 11:07, Huang, Ying wrote: > Miaohe Lin writes: > >> On 2021/4/13 9:27, Huang, Ying wrote: >>> Miaohe Lin writes: >>> When I was investigating the swap code, I found the below possible race window: CPU 1 CPU 2 - - do_swap_page synchronous swap_readpage alloc_page_vma swapoff release swap_file, bdev, or ... swap_readpage check sis->flags is ok access swap_file, bdev...[oops!] si->flags = 0 Using current get/put_swap_device() to guard against concurrent swapoff for swap_readpage() looks terrible because swap_readpage() may take really long time. And this race may not be really pernicious because swapoff is usually done when system shutdown only. To reduce the performance overhead on the hot-path as much as possible, it appears we can use the percpu_ref to close this race window(as suggested by Huang, Ying). Fixes: 235b62176712 ("mm/swap: add cluster lock") >>> >>> This isn't the commit that introduces the race. You can use `git blame` >>> find out the correct commit. For this it's commit 0bcac06f27d7 "mm, >>> swap: skip swapcache for swapin of synchronous device". >>> >> >> Sorry about it! What I refer to is commit eb085574a752 ("mm, swap: fix race >> between >> swapoff and some swap operations"). And I think this commit does not fix the >> race >> condition completely, so I reuse the Fixes tag inside it. >> >>> And I suggest to merge 1/5 and 2/5 to make it easy to get the full >>> picture. >>> Signed-off-by: Miaohe Lin --- include/linux/swap.h | 2 +- mm/memory.c | 10 ++ mm/swapfile.c| 28 +++- 3 files changed, 22 insertions(+), 18 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index 849ba5265c11..9066addb57fd 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -513,7 +513,7 @@ sector_t swap_page_sector(struct page *page); static inline void put_swap_device(struct swap_info_struct *si) { - rcu_read_unlock(); + percpu_ref_put(>users); } #else /* CONFIG_SWAP */ diff --git a/mm/memory.c b/mm/memory.c index cc71a445c76c..8543c47b955c 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3311,6 +3311,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) { struct vm_area_struct *vma = vmf->vma; struct page *page = NULL, *swapcache; + struct swap_info_struct *si = NULL; swp_entry_t entry; pte_t pte; int locked; @@ -3339,6 +3340,11 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) } >>> >>> I suggest to add comments here as follows (words copy from Matthew Wilcox) >>> >>> /* Prevent swapoff from happening to us */ >> >> Ok. >> >>> + si = get_swap_device(entry); + /* In case we raced with swapoff. */ + if (unlikely(!si)) + goto out; + >>> >>> Because we wrap the whole do_swap_page() with get/put_swap_device() >>> now. We can remove several get/put_swap_device() for function called by >>> do_swap_page(). That can be another optimization patch. >> >> I tried to remove several get/put_swap_device() for function called >> by do_swap_page() only before I send this series. But it seems they have >> other callers without proper get/put_swap_device(). > > Then we need to revise these callers instead. Anyway, can be another > series. Yes. can be another series. Thanks. > > Best Regards, > Huang, Ying > > . >
[PATCH] ASoC: codec: remove unused variable
Fix the following gcc warning: sound/soc/codecs/jz4760.c:201:6: warning: variable ‘ret’ set but not used [-Wunused-but-set-variable]. Reported-by: Abaci Robot Signed-off-by: Jiapeng Chong --- sound/soc/codecs/jz4760.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/sound/soc/codecs/jz4760.c b/sound/soc/codecs/jz4760.c index e8f28cc..ad5e859 100644 --- a/sound/soc/codecs/jz4760.c +++ b/sound/soc/codecs/jz4760.c @@ -198,15 +198,13 @@ static int jz4760_codec_startup(struct snd_pcm_substream *substream, { struct snd_soc_component *codec = dai->component; struct snd_soc_dapm_context *dapm = snd_soc_component_get_dapm(codec); - int ret; - /* * SYSCLK output from the codec to the AIC is required to keep the * DMA transfer going during playback when all audible outputs have * been disabled. */ if (substream->stream == SNDRV_PCM_STREAM_PLAYBACK) - ret = snd_soc_dapm_force_enable_pin(dapm, "SYSCLK"); + snd_soc_dapm_force_enable_pin(dapm, "SYSCLK"); return 0; } -- 1.8.3.1
Re: [PATCH v2 2/3] drm/msm/dp: do not re initialize of audio_comp at display_disable()
Quoting Kuogee Hsieh (2021-04-13 16:11:30) > At dongle unplug, dp initializes audio_comp followed by sending disconnect > event notification to audio and to make sure audio had shutdown completely > by wait for audio completion notification at display_disable(). This patch Is this dp_display_disable()? Doubtful that display_disable() is the function we're talking about. > will not re initialize audio_comp at display_disable() if audio shutdown > is triggered by dongle unplugged. This commit text seems to say the why before the what, where why is "dp initializes audio_comp followed by sending disconnect.." and the what is "this patch will no re-initialized audio_comp...". Can you reorder this so the what comes before the why? > > Signed-off-by: Kuogee Hsieh > --- > drivers/gpu/drm/msm/dp/dp_display.c | 6 -- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/msm/dp/dp_display.c > b/drivers/gpu/drm/msm/dp/dp_display.c > index 0ba71c7..1d71c95 100644 > --- a/drivers/gpu/drm/msm/dp/dp_display.c > +++ b/drivers/gpu/drm/msm/dp/dp_display.c > @@ -894,8 +894,10 @@ static int dp_display_disable(struct dp_display_private > *dp, u32 data) > /* wait only if audio was enabled */ > if (dp_display->audio_enabled) { > /* signal the disconnect event */ > - reinit_completion(>audio_comp); > - dp_display_handle_plugged_change(dp_display, false); > + if (dp->hpd_state != ST_DISCONNECT_PENDING) { > + reinit_completion(>audio_comp); Why is this reinitialized here at all? Wouldn't it make more sense to initialize the completion once at cable plug in and then not initialize the completion anywhere else? Or initialize the completion whenever dp_display->audio_enabled is set to true and then only wait for the completion here if that boolean is true? Or initialize the completion when dp_display_handle_plugged_change() is passed true for the 'plugged' argument? I started reading the code and quickly got lost figuring out how dp_display_handle_plugged_change() worked and the interaction between the dp display code and the audio codec embedded in here. There seem to be a couple of conditions that cut off things early, like dp_display->audio_enabled and audio->engine_on. Why? Why does dp_display_signal_audio_complete() call complete_all() vs. just complete()? Please help! :( > + dp_display_handle_plugged_change(dp_display, false); I think it's this way because dp_hpd_unplug_handle() is the function that sets the hpd_state to ST_DISCONNECT_PENDING and then reinitializes the completion (why?) and calls dp_display_handle_plugged_change(). So the commit text could say that reinitializing the completion again here at dp_display_disable() is racing with the audio code in the case that dp_hpd_unplug_handle() already called dp_display_handle_plugged_change() and it would make more sense. But the question still stands why that race even exists in the first place vs. initializing the completion variable in only one place unconditionally when the cable is connected, in dp_hpd_plug_handle() or dp_display_post_enable(). > + } > if (!wait_for_completion_timeout(>audio_comp, > HZ * 5)) > DRM_ERROR("audio comp timeout\n");
Re: [PATCH v2] drivers: pinctrl: qcom: fix Kconfig dependency on GPIOLIB
On Tue 13 Apr 21:51 CDT 2021, Julian Braha wrote: > When PINCTRL_MSM is enabled, and GPIOLIB is disabled, > Kbuild gives the following warning: > > WARNING: unmet direct dependencies detected for GPIOLIB_IRQCHIP > Depends on [n]: GPIOLIB [=n] > Selected by [y]: > - PINCTRL_MSM [=y] && PINCTRL [=y] && (ARCH_QCOM || COMPILE_TEST [=y]) > > This is because PINCTRL_MSM selects GPIOLIB_IRQCHIP, > without selecting or depending on GPIOLIB, despite > GPIOLIB_IRQCHIP depending on GPIOLIB. Having PINCTRL_MSM > select GPIOLIB will cause a recursive dependency error. > Reviewed-by: Bjorn Andersson > Signed-off-by: Julian Braha > --- > drivers/pinctrl/qcom/Kconfig | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/pinctrl/qcom/Kconfig b/drivers/pinctrl/qcom/Kconfig > index 6853a896c476..d42ac59875ab 100644 > --- a/drivers/pinctrl/qcom/Kconfig > +++ b/drivers/pinctrl/qcom/Kconfig > @@ -3,7 +3,7 @@ if (ARCH_QCOM || COMPILE_TEST) > > config PINCTRL_MSM > tristate "Qualcomm core pin controller driver" > - depends on QCOM_SCM || !QCOM_SCM #if QCOM_SCM=m this can't be =y > + depends on GPIOLIB && (QCOM_SCM || !QCOM_SCM) #if QCOM_SCM=m this can't > be =y > select PINMUX > select PINCONF > select GENERIC_PINCONF > -- > 2.27.0 >
Re: [PATCH 1/3] vfio/iommu_type1: Add HWDBM status maintanance
Hi Keqian, Thank you for the patch! Yet something to improve: [auto build test ERROR on vfio/next] [also build test ERROR on linux/master linus/master v5.12-rc7 next-20210413] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch] url: https://github.com/0day-ci/linux/commits/Keqian-Zhu/vfio-iommu_type1-Implement-dirty-log-tracking-based-on-IOMMU-HWDBM/20210413-171632 base: https://github.com/awilliam/linux-vfio.git next config: x86_64-randconfig-a013-20210413 (attached as .config) compiler: clang version 13.0.0 (https://github.com/llvm/llvm-project 9829f5e6b1bca9b61efc629770d28bb9014dec45) reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # install x86_64 cross compiling tool for clang build # apt-get install binutils-x86-64-linux-gnu # https://github.com/0day-ci/linux/commit/3005ed21d06a3ed861847529f08c3d8814013399 git remote add linux-review https://github.com/0day-ci/linux git fetch --no-tags linux-review Keqian-Zhu/vfio-iommu_type1-Implement-dirty-log-tracking-based-on-IOMMU-HWDBM/20210413-171632 git checkout 3005ed21d06a3ed861847529f08c3d8814013399 # save the attached .config to linux build tree COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=x86_64 If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot All errors (new ones prefixed by >>): >> drivers/vfio/vfio_iommu_type1.c:2270:33: error: use of undeclared identifier >> 'IOMMU_DEV_FEAT_HWDBM' enum iommu_dev_features feat = IOMMU_DEV_FEAT_HWDBM; ^ 1 error generated. vim +/IOMMU_DEV_FEAT_HWDBM +2270 drivers/vfio/vfio_iommu_type1.c 2267 2268 static bool vfio_group_supports_hwdbm(struct vfio_group *group) 2269 { > 2270 enum iommu_dev_features feat = IOMMU_DEV_FEAT_HWDBM; 2271 2272 return !iommu_group_for_each_dev(group->iommu_group, , 2273 vfio_dev_enable_feature); 2274 } 2275 --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org .config.gz Description: application/gzip
Re: [PATCH 1/2] dt-bindings: soc: qcom: Add bindings for Qualcomm Memshare service
On Sat 10 Apr 03:05 CDT 2021, Nikita Travkin wrote: > Hi, sorry for a late reply but I couldn't answer earlier. > > 30.03.2021 19:40, Rob Herring ??: > > On Fri, Mar 19, 2021 at 10:23:20PM +0500, nikitos...@gmail.com wrote: > >> From: Nikita Travkin > >> > >> Add DT bindings for memshare: QMI service that allocates > >> memory per remote processor request. > >> > >> Signed-off-by: Nikita Travkin > >> --- > >> .../bindings/soc/qcom/qcom,memshare.yaml | 109 ++ > >> include/dt-bindings/soc/qcom,memshare.h | 10 ++ > >> 2 files changed, 119 insertions(+) > >> create mode 100644 > >> Documentation/devicetree/bindings/soc/qcom/qcom,memshare.yaml > >> create mode 100644 include/dt-bindings/soc/qcom,memshare.h > >> > >> diff --git a/Documentation/devicetree/bindings/soc/qcom/qcom,memshare.yaml > >> b/Documentation/devicetree/bindings/soc/qcom/qcom,memshare.yaml > >> new file mode 100644 > >> index ..ebdf128b066c > >> --- /dev/null > >> +++ b/Documentation/devicetree/bindings/soc/qcom/qcom,memshare.yaml > >> @@ -0,0 +1,109 @@ > >> +# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause) > >> +%YAML 1.2 > >> +--- > >> +$id: "http://devicetree.org/schemas/soc/qcom/qcom,memshare.yaml#; > >> +$schema: "http://devicetree.org/meta-schemas/core.yaml#; > >> + > >> +title: Qualcomm QMI Shared Memory Service > > How many shared memory interfaces does Qcom have... > > > >> + > >> +description: | > >> + This driver provides a QMI service that allows remote processors (like > >> modem) > >> + to request additional memory. It is used for applications like GPS in > >> modem. > > If the memory region is defined in reserved-memory, how are you > > allocating additional memory? > > Initially remoteproc is loaded into it's own reserved-memory region > but qcom decided that they sometimes need more memory than that. > Memshare driver in msm8916 downstream tree seem to blindly allocate > DMA region for every request that it gets. Additionally for those > clients described in the DT, they do the DMA allocation on boot > time and never free the region. They call it "guaranteed" allocation. > > On msm8916 only one "guaranteed" client seem to be used so I decided > to implement it with reserved-memory node. On newer platforms they > seem to have more clients but I think that the driver can be easily > extended to support dynamic allocation if someone really needs it. > Is the "guaranteed" memory required to come from the reserved-memory part of memory, or could it simply be allocated on demand as well (or preallocated, but at a dynamic address)? If these allocations always came from a reserved-memory region, then adding a "qcom,memshare" compatible to the reserved-memory node itself seems like a reasonable approach. But if dma_alloc is sufficient, and there's cases where there's no "guaranteed" region, perhaps we should just describe this as part of the remoteproc node (i.e. essentially flipping the node/subnode in your current binding). E.g. can we get away with simply adding an optional qcom,memshare-node to the remoteproc binding and when that's present we make the Qualcomm remoteproc drivers spawn the memshare handler and listen for requests from that node? > I tried to explain that in the cover letter but I think I made some > mistake as I don't see it in the Patchwork. > > >> + > >> +maintainers: > >> + - Nikita Travkin > >> + > >> +properties: > >> + compatible: > >> +const: qcom,memshare > >> + > >> + qcom,legacy-client: > >> +$ref: /schemas/types.yaml#/definitions/phandle > >> +description: Phandle to a memshare client node used for legacy > >> requests. > >> + > >> + "#address-cells": > >> +const: 1 > >> + > >> + "#size-cells": > >> +const: 0 > >> + > >> +patternProperties: > >> + "^.*@[0-9]+$": > >> +type: object > >> + > >> +properties: > >> + reg: > >> +description: Proc-ID for clients in this node. > > What's Proc-ID? > > The requests from the remote nodes contain client-id and proc-id > that are supposed to differentiate the clients. It's possible to > find the values in downstream DT or by observing what messages > are received by the memshare service (I left dev_dbg logging in > the driver for that reason) > > I think I should reword it to make this more apparent, maybe > "Proc-ID that clients in this node send."? > If this is a constant for each remote and we make this a child thing of remoteproc perhaps encode the number in the remoteproc nodes? (We still need something in DT to state that we want a memshare for a given platform/remoteproc) > > > >> + > >> + qcom,qrtr-node: > >> +$ref: /schemas/types.yaml#/definitions/uint32 > >> +description: Node from which the requests are expected. > >> + > >> + "#address-cells": > >> +const: 1 > >> + > >> + "#size-cells": > >> +const: 0 > >> + > >> +patternProperties: > >> + "^.*@[0-9]+$": > >> +
[PATCH v2 6/8] MIPS: pci-legacy: remove redundant info messages
Remove the following pci-legacy message: PCI host bridge /pci@44/host-bridge ranges: MEM 0x2000..0x2fff IO 0x0046..0x0046 It is followed shortly by the same data from pci_register_host_bridge: PCI host bridge to bus :00 pci_bus :00: root bus resource [mem 0x2000-0x2fff] pci_bus :00: root bus resource [io 0x46-0x46] Signed-off-by: Ilya Lipnitskiy --- arch/mips/pci/pci-legacy.c | 7 --- 1 file changed, 7 deletions(-) diff --git a/arch/mips/pci/pci-legacy.c b/arch/mips/pci/pci-legacy.c index 3a909194284a..ec3f52ade72d 100644 --- a/arch/mips/pci/pci-legacy.c +++ b/arch/mips/pci/pci-legacy.c @@ -140,7 +140,6 @@ void pci_load_of_ranges(struct pci_controller *hose, struct device_node *node) struct of_pci_range range; struct of_pci_range_parser parser; - pr_info("PCI host bridge %pOF ranges:\n", node); hose->of_node = node; if (of_pci_range_parser_init(, node)) @@ -151,18 +150,12 @@ void pci_load_of_ranges(struct pci_controller *hose, struct device_node *node) switch (range.flags & IORESOURCE_TYPE_BITS) { case IORESOURCE_IO: - pr_info(" IO 0x%016llx..0x%016llx\n", - range.cpu_addr, - range.cpu_addr + range.size - 1); hose->io_map_base = (unsigned long)ioremap(range.cpu_addr, range.size); res = hose->io_resource; break; case IORESOURCE_MEM: - pr_info(" MEM 0x%016llx..0x%016llx\n", - range.cpu_addr, - range.cpu_addr + range.size - 1); res = hose->mem_resource; break; } -- 2.31.1
[PATCH v2 8/8] MIPS: pci-legacy: use generic pci_enable_resources
Follow the reasoning from commit 842de40d93e0 ("PCI: add generic pci_enable_resources()"): The only functional difference from the MIPS version is that the generic one uses "!r->parent" to check for resource collisions instead of "!r->start && r->end". That should have no effect on any pci-legacy driver. Suggested-by: Bjorn Helgaas Signed-off-by: Ilya Lipnitskiy --- arch/mips/pci/pci-legacy.c | 40 ++ 1 file changed, 2 insertions(+), 38 deletions(-) diff --git a/arch/mips/pci/pci-legacy.c b/arch/mips/pci/pci-legacy.c index 78c22987bef0..c24226ea0a6e 100644 --- a/arch/mips/pci/pci-legacy.c +++ b/arch/mips/pci/pci-legacy.c @@ -241,47 +241,11 @@ static int __init pcibios_init(void) subsys_initcall(pcibios_init); -static int pcibios_enable_resources(struct pci_dev *dev, int mask) -{ - u16 cmd, old_cmd; - int idx; - struct resource *r; - - pci_read_config_word(dev, PCI_COMMAND, ); - old_cmd = cmd; - for (idx=0; idx < PCI_NUM_RESOURCES; idx++) { - /* Only set up the requested stuff */ - if (!(mask & (1flags & (IORESOURCE_IO | IORESOURCE_MEM))) - continue; - if ((idx == PCI_ROM_RESOURCE) && - (!(r->flags & IORESOURCE_ROM_ENABLE))) - continue; - if (!r->start && r->end) { - pci_err(dev, - "can't enable device: resource collisions\n"); - return -EINVAL; - } - if (r->flags & IORESOURCE_IO) - cmd |= PCI_COMMAND_IO; - if (r->flags & IORESOURCE_MEM) - cmd |= PCI_COMMAND_MEMORY; - } - if (cmd != old_cmd) { - pci_info(dev, "enabling device (%04x -> %04x)\n", old_cmd, cmd); - pci_write_config_word(dev, PCI_COMMAND, cmd); - } - return 0; -} - int pcibios_enable_device(struct pci_dev *dev, int mask) { - int err; + int err = pci_enable_resources(dev, mask); - if ((err = pcibios_enable_resources(dev, mask)) < 0) + if (err < 0) return err; return pcibios_plat_dev_init(dev); -- 2.31.1
[PATCH v2 3/8] MIPS: pci-rt3883: trivial: remove unused variable
Fixes the following compiler warning: warning: unused variable 'flags' [-Wunused-variable] Fixes: e5067c718b3a ("MIPS: pci-rt3883: Remove odd locking in PCI config space access code") Signed-off-by: Ilya Lipnitskiy Acked-by: Sergey Ryazanov Cc: triv...@kernel.org --- arch/mips/pci/pci-rt3883.c | 4 1 file changed, 4 deletions(-) diff --git a/arch/mips/pci/pci-rt3883.c b/arch/mips/pci/pci-rt3883.c index 0ac6346026d0..e422f78db5bc 100644 --- a/arch/mips/pci/pci-rt3883.c +++ b/arch/mips/pci/pci-rt3883.c @@ -100,7 +100,6 @@ static u32 rt3883_pci_read_cfg32(struct rt3883_pci_controller *rpc, unsigned bus, unsigned slot, unsigned func, unsigned reg) { - unsigned long flags; u32 address; u32 ret; @@ -116,7 +115,6 @@ static void rt3883_pci_write_cfg32(struct rt3883_pci_controller *rpc, unsigned bus, unsigned slot, unsigned func, unsigned reg, u32 val) { - unsigned long flags; u32 address; address = rt3883_pci_get_cfgaddr(bus, slot, func, reg); @@ -229,7 +227,6 @@ static int rt3883_pci_config_read(struct pci_bus *bus, unsigned int devfn, int where, int size, u32 *val) { struct rt3883_pci_controller *rpc; - unsigned long flags; u32 address; u32 data; @@ -263,7 +260,6 @@ static int rt3883_pci_config_write(struct pci_bus *bus, unsigned int devfn, int where, int size, u32 val) { struct rt3883_pci_controller *rpc; - unsigned long flags; u32 address; u32 data; -- 2.31.1
[PATCH v2 7/8] MIPS: pci-legacy: remove busn_resource field
No drivers set the busn_resource field in the pci_controller struct. Commit 7ee214b540d9 ("MIPS: PCI: Remove unused busn_offset") almost removed it over 3 years ago. Remove it for good to free up memory and eliminate messages like: pci_bus :00: root bus resource [??? 0x flags 0x0] Signed-off-by: Ilya Lipnitskiy Cc: Bjorn Helgaas --- arch/mips/include/asm/pci.h | 1 - arch/mips/pci/pci-legacy.c | 1 - 2 files changed, 2 deletions(-) diff --git a/arch/mips/include/asm/pci.h b/arch/mips/include/asm/pci.h index 6f48649201c5..9ffc8192adae 100644 --- a/arch/mips/include/asm/pci.h +++ b/arch/mips/include/asm/pci.h @@ -38,7 +38,6 @@ struct pci_controller { struct resource *io_resource; unsigned long io_offset; unsigned long io_map_base; - struct resource *busn_resource; #ifndef CONFIG_PCI_DOMAINS_GENERIC unsigned int index; diff --git a/arch/mips/pci/pci-legacy.c b/arch/mips/pci/pci-legacy.c index ec3f52ade72d..78c22987bef0 100644 --- a/arch/mips/pci/pci-legacy.c +++ b/arch/mips/pci/pci-legacy.c @@ -89,7 +89,6 @@ static void pcibios_scanbus(struct pci_controller *hose) hose->mem_resource, hose->mem_offset); pci_add_resource_offset(, hose->io_resource, hose->io_offset); - pci_add_resource(, hose->busn_resource); list_splice_init(, >windows); bridge->dev.parent = NULL; bridge->sysdata = hose; -- 2.31.1
[PATCH v2 5/8] MIPS: pci-legacy: stop using of_pci_range_to_resource
Mirror commit aeba3731b150 ("powerpc/pci: Fix IO space breakage after of_pci_range_to_resource() change"). Most MIPS platforms do not define PCI_IOBASE, nor implement pci_address_to_pio(). Moreover, IO_SPACE_LIMIT is 0x for most MIPS platforms. of_pci_range_to_resource passes the _start address_ of the IO range into pci_address_to_pio, which then checks it against IO_SPACE_LIMIT and fails, because for MIPS platforms that use pci-legacy (pci-lantiq, pci-rt3883, pci-mt7620), IO ranges start much higher than 0x. In fact, pci-mt7621 in staging already works around this problem, see commit 09dd629eeabb ("staging: mt7621-pci: fix io space and properly set resource limits") So just stop using of_pci_range_to_resource, which does not work for MIPS. Fixes PCI errors like: pci_bus :00: root bus resource [io 0x] Fixes: 0b0b0893d49b ("of/pci: Fix the conversion of IO ranges into IO resources") Signed-off-by: Ilya Lipnitskiy Cc: Liviu Dudau --- arch/mips/pci/pci-legacy.c | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/arch/mips/pci/pci-legacy.c b/arch/mips/pci/pci-legacy.c index 39052de915f3..3a909194284a 100644 --- a/arch/mips/pci/pci-legacy.c +++ b/arch/mips/pci/pci-legacy.c @@ -166,8 +166,13 @@ void pci_load_of_ranges(struct pci_controller *hose, struct device_node *node) res = hose->mem_resource; break; } - if (res != NULL) - of_pci_range_to_resource(, node, res); + if (res != NULL) { + res->name = node->full_name; + res->flags = range.flags; + res->start = range.cpu_addr; + res->end = range.cpu_addr + range.size - 1; + res->parent = res->child = res->sibling = NULL; + } } } -- 2.31.1
[PATCH v2 1/8] MIPS: pci-rt2880: fix slot 0 configuration
pci_fixup_irqs() used to call pcibios_map_irq on every PCI device, which for RT2880 included bus 0 slot 0. After pci_fixup_irqs() got removed, only slots/funcs with devices attached would be called. While arguably the right thing, that left no chance for this driver to ever initialize slot 0, effectively bricking PCI and USB on RT2880 devices such as the Belkin F5D8235-4 v1. Slot 0 configuration needs to happen after PCI bus enumeration, but before any device at slot 0x11 (func 0 or 1) is talked to. That was determined empirically by testing on a Belkin F5D8235-4 v1 device. A minimal BAR 0 config write followed by read, then setting slot 0 PCI_COMMAND to MASTER | IO | MEMORY is all that seems to be required for proper functionality. Tested by ensuring that full- and high-speed USB devices get enumerated on the Belkin F5D8235-4 v1 (with an out of tree DTS file from OpenWrt). Fixes: 04c81c7293df ("MIPS: PCI: Replace pci_fixup_irqs() call with host bridge IRQ mapping hooks") Signed-off-by: Ilya Lipnitskiy Cc: Lorenzo Pieralisi Cc: Tobias Wolf Cc: # v4.14+ --- arch/mips/pci/pci-rt2880.c | 37 - 1 file changed, 24 insertions(+), 13 deletions(-) diff --git a/arch/mips/pci/pci-rt2880.c b/arch/mips/pci/pci-rt2880.c index e1f12e398136..f1538d2be89e 100644 --- a/arch/mips/pci/pci-rt2880.c +++ b/arch/mips/pci/pci-rt2880.c @@ -180,7 +180,6 @@ static inline void rt2880_pci_write_u32(unsigned long reg, u32 val) int pcibios_map_irq(const struct pci_dev *dev, u8 slot, u8 pin) { - u16 cmd; int irq = -1; if (dev->bus->number != 0) @@ -188,8 +187,6 @@ int pcibios_map_irq(const struct pci_dev *dev, u8 slot, u8 pin) switch (PCI_SLOT(dev->devfn)) { case 0x00: - rt2880_pci_write_u32(PCI_BASE_ADDRESS_0, 0x0800); - (void) rt2880_pci_read_u32(PCI_BASE_ADDRESS_0); break; case 0x11: irq = RT288X_CPU_IRQ_PCI; @@ -201,16 +198,6 @@ int pcibios_map_irq(const struct pci_dev *dev, u8 slot, u8 pin) break; } - pci_write_config_byte((struct pci_dev *) dev, - PCI_CACHE_LINE_SIZE, 0x14); - pci_write_config_byte((struct pci_dev *) dev, PCI_LATENCY_TIMER, 0xFF); - pci_read_config_word((struct pci_dev *) dev, PCI_COMMAND, ); - cmd |= PCI_COMMAND_MASTER | PCI_COMMAND_IO | PCI_COMMAND_MEMORY | - PCI_COMMAND_INVALIDATE | PCI_COMMAND_FAST_BACK | - PCI_COMMAND_SERR | PCI_COMMAND_WAIT | PCI_COMMAND_PARITY; - pci_write_config_word((struct pci_dev *) dev, PCI_COMMAND, cmd); - pci_write_config_byte((struct pci_dev *) dev, PCI_INTERRUPT_LINE, - dev->irq); return irq; } @@ -251,6 +238,30 @@ static int rt288x_pci_probe(struct platform_device *pdev) int pcibios_plat_dev_init(struct pci_dev *dev) { + static bool slot0_init; + + /* +* Nobody seems to initialize slot 0, but this platform requires it, so +* do it once when some other slot is being enabled. The PCI subsystem +* should configure other slots properly, so no need to do anything +* special for those. +*/ + if (!slot0_init && dev->bus->number == 0) { + u16 cmd; + u32 bar0; + + slot0_init = true; + + pci_bus_write_config_dword(dev->bus, 0, PCI_BASE_ADDRESS_0, + 0x0800); + pci_bus_read_config_dword(dev->bus, 0, PCI_BASE_ADDRESS_0, + ); + + pci_bus_read_config_word(dev->bus, 0, PCI_COMMAND, ); + cmd |= PCI_COMMAND_MASTER | PCI_COMMAND_IO | PCI_COMMAND_MEMORY; + pci_bus_write_config_word(dev->bus, 0, PCI_COMMAND, cmd); + } + return 0; } -- 2.31.1
[PATCH v2 4/8] MIPS: pci-rt3883: more accurate DT error messages
Existing strings do not make sense: one is always NULL and the other refers to the wrong parent node. Signed-off-by: Ilya Lipnitskiy --- arch/mips/pci/pci-rt3883.c | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/arch/mips/pci/pci-rt3883.c b/arch/mips/pci/pci-rt3883.c index e422f78db5bc..aebd4964ea34 100644 --- a/arch/mips/pci/pci-rt3883.c +++ b/arch/mips/pci/pci-rt3883.c @@ -431,8 +431,7 @@ static int rt3883_pci_probe(struct platform_device *pdev) if (!rpc->intc_of_node) { dev_err(dev, "%pOF has no %s child node", - rpc->intc_of_node, - "interrupt controller"); + np, "interrupt controller"); return -EINVAL; } @@ -446,8 +445,7 @@ static int rt3883_pci_probe(struct platform_device *pdev) if (!rpc->pci_controller.of_node) { dev_err(dev, "%pOF has no %s child node", - rpc->intc_of_node, - "PCI host bridge"); + np, "PCI host bridge"); err = -EINVAL; goto err_put_intc_node; } -- 2.31.1
[PATCH v2 0/8] MIPS: fixes for PCI legacy drivers (rt2880, rt3883)
One major fix for rt2880-pci in the first patch - fixes breakage that existed since v4.14. Other more minor fixes, cleanups, and improvements that either free up memory, make dmesg messages clearer, or remove redundant dmesg output. v2: - Do not use internal pci-rt2880 config read and write functions after the device has been registered with the PCI subsystem to avoid races. Use safe pci_bus_{read,write}_config_{d}word wrappers instead. Ilya Lipnitskiy (8): MIPS: pci-rt2880: fix slot 0 configuration MIPS: pci-rt2880: remove unneeded locks MIPS: pci-rt3883: trivial: remove unused variable MIPS: pci-rt3883: more accurate DT error messages MIPS: pci-legacy: stop using of_pci_range_to_resource MIPS: pci-legacy: remove redundant info messages MIPS: pci-legacy: remove busn_resource field MIPS: pci-legacy: use generic pci_enable_resources arch/mips/include/asm/pci.h | 1 - arch/mips/pci/pci-legacy.c | 57 ++--- arch/mips/pci/pci-rt2880.c | 50 arch/mips/pci/pci-rt3883.c | 10 ++- 4 files changed, 35 insertions(+), 83 deletions(-) -- 2.31.1
[PATCH v2 2/8] MIPS: pci-rt2880: remove unneeded locks
Mirror pci-rt3883 fix from commit e5067c718b3a ("MIPS: pci-rt3883: Remove odd locking in PCI config space access code"). pci-rt2880 shares the driver layout with pci-rt3883 and the same reasons apply. Caller (generic PCI code) already does proper locking, so no need to add another one here. Local PCI read/write functions are never called simultaneously, also they do not require synchronization with the PCI controller ops, since they are used before the controller registration. Suggested-by: Sergey Ryazanov Signed-off-by: Ilya Lipnitskiy --- arch/mips/pci/pci-rt2880.c | 13 - 1 file changed, 13 deletions(-) diff --git a/arch/mips/pci/pci-rt2880.c b/arch/mips/pci/pci-rt2880.c index f1538d2be89e..e9dd01431f21 100644 --- a/arch/mips/pci/pci-rt2880.c +++ b/arch/mips/pci/pci-rt2880.c @@ -41,7 +41,6 @@ #define RT2880_PCI_REG_ARBCTL 0x80 static void __iomem *rt2880_pci_base; -static DEFINE_SPINLOCK(rt2880_pci_lock); static u32 rt2880_pci_reg_read(u32 reg) { @@ -63,17 +62,14 @@ static inline u32 rt2880_pci_get_cfgaddr(unsigned int bus, unsigned int slot, static int rt2880_pci_config_read(struct pci_bus *bus, unsigned int devfn, int where, int size, u32 *val) { - unsigned long flags; u32 address; u32 data; address = rt2880_pci_get_cfgaddr(bus->number, PCI_SLOT(devfn), PCI_FUNC(devfn), where); - spin_lock_irqsave(_pci_lock, flags); rt2880_pci_reg_write(address, RT2880_PCI_REG_CONFIG_ADDR); data = rt2880_pci_reg_read(RT2880_PCI_REG_CONFIG_DATA); - spin_unlock_irqrestore(_pci_lock, flags); switch (size) { case 1: @@ -93,14 +89,12 @@ static int rt2880_pci_config_read(struct pci_bus *bus, unsigned int devfn, static int rt2880_pci_config_write(struct pci_bus *bus, unsigned int devfn, int where, int size, u32 val) { - unsigned long flags; u32 address; u32 data; address = rt2880_pci_get_cfgaddr(bus->number, PCI_SLOT(devfn), PCI_FUNC(devfn), where); - spin_lock_irqsave(_pci_lock, flags); rt2880_pci_reg_write(address, RT2880_PCI_REG_CONFIG_ADDR); data = rt2880_pci_reg_read(RT2880_PCI_REG_CONFIG_DATA); @@ -119,7 +113,6 @@ static int rt2880_pci_config_write(struct pci_bus *bus, unsigned int devfn, } rt2880_pci_reg_write(data, RT2880_PCI_REG_CONFIG_DATA); - spin_unlock_irqrestore(_pci_lock, flags); return PCIBIOS_SUCCESSFUL; } @@ -151,31 +144,25 @@ static struct pci_controller rt2880_pci_controller = { static inline u32 rt2880_pci_read_u32(unsigned long reg) { - unsigned long flags; u32 address; u32 ret; address = rt2880_pci_get_cfgaddr(0, 0, 0, reg); - spin_lock_irqsave(_pci_lock, flags); rt2880_pci_reg_write(address, RT2880_PCI_REG_CONFIG_ADDR); ret = rt2880_pci_reg_read(RT2880_PCI_REG_CONFIG_DATA); - spin_unlock_irqrestore(_pci_lock, flags); return ret; } static inline void rt2880_pci_write_u32(unsigned long reg, u32 val) { - unsigned long flags; u32 address; address = rt2880_pci_get_cfgaddr(0, 0, 0, reg); - spin_lock_irqsave(_pci_lock, flags); rt2880_pci_reg_write(address, RT2880_PCI_REG_CONFIG_ADDR); rt2880_pci_reg_write(val, RT2880_PCI_REG_CONFIG_DATA); - spin_unlock_irqrestore(_pci_lock, flags); } int pcibios_map_irq(const struct pci_dev *dev, u8 slot, u8 pin) -- 2.31.1
[PATCH v2] time: Fix overwrite err unexpected in clock_adjtime32
the correct error is covered by put_old_timex32. Fixes: 3a4d44b61625 ("ntp: Move adjtimex related compat syscalls to native counterparts") Signed-off-by: Chen Jun --- v2: Make "Fixes" tag correct kernel/time/posix-timers.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/time/posix-timers.c b/kernel/time/posix-timers.c index bf540f5a..dd5697d 100644 --- a/kernel/time/posix-timers.c +++ b/kernel/time/posix-timers.c @@ -1191,8 +1191,8 @@ SYSCALL_DEFINE2(clock_adjtime32, clockid_t, which_clock, err = do_clock_adjtime(which_clock, ); - if (err >= 0) - err = put_old_timex32(utp, ); + if (err >= 0 && put_old_timex32(utp, )) + return -EFAULT; return err; } -- 2.9.4
Re: [PATCH 2/5] swap: fix do_swap_page() race with swapoff
Miaohe Lin writes: > On 2021/4/13 9:27, Huang, Ying wrote: >> Miaohe Lin writes: >> >>> When I was investigating the swap code, I found the below possible race >>> window: >>> >>> CPU 1 CPU 2 >>> - - >>> do_swap_page >>> synchronous swap_readpage >>> alloc_page_vma >>> swapoff >>> release swap_file, bdev, or ... >>> swap_readpage >>> check sis->flags is ok >>> access swap_file, bdev...[oops!] >>> si->flags = 0 >>> >>> Using current get/put_swap_device() to guard against concurrent swapoff for >>> swap_readpage() looks terrible because swap_readpage() may take really long >>> time. And this race may not be really pernicious because swapoff is usually >>> done when system shutdown only. To reduce the performance overhead on the >>> hot-path as much as possible, it appears we can use the percpu_ref to close >>> this race window(as suggested by Huang, Ying). >>> >>> Fixes: 235b62176712 ("mm/swap: add cluster lock") >> >> This isn't the commit that introduces the race. You can use `git blame` >> find out the correct commit. For this it's commit 0bcac06f27d7 "mm, >> swap: skip swapcache for swapin of synchronous device". >> > > Sorry about it! What I refer to is commit eb085574a752 ("mm, swap: fix race > between > swapoff and some swap operations"). And I think this commit does not fix the > race > condition completely, so I reuse the Fixes tag inside it. > >> And I suggest to merge 1/5 and 2/5 to make it easy to get the full >> picture. >> >>> Signed-off-by: Miaohe Lin >>> --- >>> include/linux/swap.h | 2 +- >>> mm/memory.c | 10 ++ >>> mm/swapfile.c| 28 +++- >>> 3 files changed, 22 insertions(+), 18 deletions(-) >>> >>> diff --git a/include/linux/swap.h b/include/linux/swap.h >>> index 849ba5265c11..9066addb57fd 100644 >>> --- a/include/linux/swap.h >>> +++ b/include/linux/swap.h >>> @@ -513,7 +513,7 @@ sector_t swap_page_sector(struct page *page); >>> >>> static inline void put_swap_device(struct swap_info_struct *si) >>> { >>> - rcu_read_unlock(); >>> + percpu_ref_put(>users); >>> } >>> >>> #else /* CONFIG_SWAP */ >>> diff --git a/mm/memory.c b/mm/memory.c >>> index cc71a445c76c..8543c47b955c 100644 >>> --- a/mm/memory.c >>> +++ b/mm/memory.c >>> @@ -3311,6 +3311,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) >>> { >>> struct vm_area_struct *vma = vmf->vma; >>> struct page *page = NULL, *swapcache; >>> + struct swap_info_struct *si = NULL; >>> swp_entry_t entry; >>> pte_t pte; >>> int locked; >>> @@ -3339,6 +3340,11 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) >>> } >>> >>> >> >> I suggest to add comments here as follows (words copy from Matthew Wilcox) >> >> /* Prevent swapoff from happening to us */ > > Ok. > >> >>> + si = get_swap_device(entry); >>> + /* In case we raced with swapoff. */ >>> + if (unlikely(!si)) >>> + goto out; >>> + >> >> Because we wrap the whole do_swap_page() with get/put_swap_device() >> now. We can remove several get/put_swap_device() for function called by >> do_swap_page(). That can be another optimization patch. > > I tried to remove several get/put_swap_device() for function called > by do_swap_page() only before I send this series. But it seems they have > other callers without proper get/put_swap_device(). Then we need to revise these callers instead. Anyway, can be another series. Best Regards, Huang, Ying
Re: [PATCH v3] nbd_genl_status: null check for nla_nest_start
From: Michal Kubecek Hi, I found that CVE-2019-16089 still exist in upstream kernel. Does anyone know why this patch was not merged?
Re: [PATCH 6/7] i915: Convert to verify_page_range()
On Mon, Apr 12, 2021 at 10:00:18AM +0200, Peter Zijlstra wrote: > check_{present,absent}() only need R/O access, use verify_page_range() > instead to remove modular use of apply_to_page_range(). > > Signed-off-by: Peter Zijlstra (Intel) > --- > drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c | 16 > 1 file changed, 8 insertions(+), 8 deletions(-) > > --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c > +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c > @@ -1225,9 +1225,9 @@ static int igt_mmap_gpu(void *arg) > return 0; > } > > -static int check_present_pte(pte_t *pte, unsigned long addr, void *data) > +static int check_present_pte(pte_t pte, unsigned long addr, void *data) > { > - if (!pte_present(*pte) || pte_none(*pte)) { > + if (!pte_present(pte) || pte_none(pte)) { > pr_err("missing PTE:%lx\n", > (addr - (unsigned long)data) >> PAGE_SHIFT); > return -EINVAL; > @@ -1236,9 +1236,9 @@ static int check_present_pte(pte_t *pte, > return 0; > } > > -static int check_absent_pte(pte_t *pte, unsigned long addr, void *data) > +static int check_absent_pte(pte_t pte, unsigned long addr, void *data) > { > - if (pte_present(*pte) && !pte_none(*pte)) { > + if (pte_present(pte) && !pte_none(pte)) { > pr_err("present PTE:%lx; expected to be revoked\n", > (addr - (unsigned long)data) >> PAGE_SHIFT); > return -EINVAL; > @@ -1249,14 +1249,14 @@ static int check_absent_pte(pte_t *pte, > > static int check_present(unsigned long addr, unsigned long len) > { > - return apply_to_page_range(current->mm, addr, len, > -check_present_pte, (void *)addr); > + return verify_page_range(current->mm, addr, len, > + check_present_pte, (void *)addr); > } And this would be: static int check_present(unsigned long addr, unsigned long len) unsigned long fail; fail = verify_page_range(current->mm, addr, len, check_present_pte); if (fail) { pr_err("missing PTE:%lx\n", addr); return -EINVAL; } } (Oh, and I think I messed up the page shifting macro name in the earlier one...) -- Kees Cook
RE: [PATCH] arm64: dts: imx8mq-evk: add one regulator used to power up pcie phy
Hi Shawn: Regarding Lucas' advice, this patch should be split out and post for you to pick up into DT tree. Since the other two patches are accepted by PCIe tree now. Can you help to pick up this patch? Thanks in advanced. https://patchwork.kernel.org/project/linux-pci/patch/1616661882-26487-3-git-send-email-hongxing@nxp.com/ https://patchwork.kernel.org/project/linux-pci/patch/1617091701-6444-2-git-send-email-hongxing@nxp.com/ Best Regards Richard Zhu > -Original Message- > From: Richard Zhu > Sent: Wednesday, April 14, 2021 10:26 AM > To: shawn...@kernel.org > Cc: l.st...@pengutronix.de; dl-linux-imx ; > linux-arm-ker...@lists.infradead.org; linux-kernel@vger.kernel.org; Richard > Zhu > Subject: [PATCH] arm64: dts: imx8mq-evk: add one regulator used to power > up pcie phy > > Both 1.8v and 3.3v power supplies can be used by i.MX8MQ PCIe PHY. > In default, the PCIE_VPH voltage is suggested to be 1.8v refer to data sheet. > When PCIE_VPH is supplied by 3.3v in the HW schematic design, the > VREG_BYPASS bits of GPR registers should be cleared from default value 1b'1 > to 1b'0. Thus, the internal 3v3 to 1v8 translator would be turned on. > > Signed-off-by: Richard Zhu > Reviewed-by: Lucas Stach > --- > arch/arm64/boot/dts/freescale/imx8mq-evk.dts | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/arch/arm64/boot/dts/freescale/imx8mq-evk.dts > b/arch/arm64/boot/dts/freescale/imx8mq-evk.dts > index 85b045253a0e..4d2035e3dd7c 100644 > --- a/arch/arm64/boot/dts/freescale/imx8mq-evk.dts > +++ b/arch/arm64/boot/dts/freescale/imx8mq-evk.dts > @@ -318,6 +318,7 @@ >< IMX8MQ_CLK_PCIE1_PHY>, ><_refclk>; > clock-names = "pcie", "pcie_aux", "pcie_phy", "pcie_bus"; > + vph-supply = <_reg>; > status = "okay"; > }; > > -- > 2.17.1
Re: [PATCH 4/7] mm: Introduce verify_page_range()
On Tue, Apr 13, 2021 at 09:36:32AM +0200, Peter Zijlstra wrote: > On Mon, Apr 12, 2021 at 01:05:09PM -0700, Kees Cook wrote: > > On Mon, Apr 12, 2021 at 10:00:16AM +0200, Peter Zijlstra wrote: > > > +struct vpr_data { > > > + int (*fn)(pte_t pte, unsigned long addr, void *data); > > > + void *data; > > > +}; > > > > Eeerg. This is likely to become an attack target itself. Stored function > > pointer with stored (3rd) argument. > > > > This doesn't seem needed: only DRM uses it, and that's for error > > reporting. I'd rather plumb back errors in a way to not have to add > > another place in the kernel where we do func+arg stored calling. > > Is this any better? It does have the stored pointer, but not a stored > argument, assuming you don't count returns as arguments I suppose. It's better in the sense that it's not the func/arg pair that really bugs me, yes. :) > The alternative is refactoring apply_to_page_range() :-/ Yeah, I'm looking now, I see what you mean. > --- > > struct vpr_data { > bool (*fn)(pte_t pte, unsigned long addr); > unsigned long addr; > }; > > static int vpr_fn(pte_t *pte, unsigned long addr, void *data) > { > struct vpr_data *vpr = data; > if (!vpr->fn(*pte, addr)) { > vpr->addr = addr; > return -EINVAL; > } > return 0; > } My point about passing "addr" was that nothing in the callback actually needs it -- the top level can just as easily report the error. And that the helper is always vpr_fn(), so it doesn't need to be passed either. So the addr can just be encoded in "int", and no structure is needed at: typedef bool (*vpr_fn_t)(pte_t pte); static int vpr_fn(pte_t *pte, unsigned long addr, void *data) { vpr_fn_t callback = data; if (!callback(*pte)) return addr >> PAGE_SIZE; return 0; } unsigned long verify_page_range(struct mm_struct *mm, unsigned long addr, unsigned long size, vpr_fn_t callback) { return apply_to_page_range(mm, addr, size, vpr_fn, callback) << PAGE_SIZE; } But maybe I'm missing something? -- Kees Cook
Re: [v2 PATCH 6/7] mm: migrate: check mapcount for THP instead of ref count
Yang Shi writes: > The generic migration path will check refcount, so no need check refcount > here. > But the old code actually prevents from migrating shared THP (mapped by > multiple > processes), so bail out early if mapcount is > 1 to keep the behavior. What prevents us from migrating shared THP? If no, why not just remove the old refcount checking? Best Regards, Huang, Ying > Signed-off-by: Yang Shi > --- > mm/migrate.c | 16 > 1 file changed, 4 insertions(+), 12 deletions(-) > > diff --git a/mm/migrate.c b/mm/migrate.c > index a72994c68ec6..dc7cc7f3a124 100644 > --- a/mm/migrate.c > +++ b/mm/migrate.c > @@ -2067,6 +2067,10 @@ static int numamigrate_isolate_page(pg_data_t *pgdat, > struct page *page) > > VM_BUG_ON_PAGE(compound_order(page) && !PageTransHuge(page), page); > > + /* Do not migrate THP mapped by multiple processes */ > + if (PageTransHuge(page) && page_mapcount(page) > 1) > + return 0; > + > /* Avoid migrating to a node that is nearly full */ > if (!migrate_balanced_pgdat(pgdat, compound_nr(page))) > return 0; > @@ -2074,18 +2078,6 @@ static int numamigrate_isolate_page(pg_data_t *pgdat, > struct page *page) > if (isolate_lru_page(page)) > return 0; > > - /* > - * migrate_misplaced_transhuge_page() skips page migration's usual > - * check on page_count(), so we must do it here, now that the page > - * has been isolated: a GUP pin, or any other pin, prevents migration. > - * The expected page count is 3: 1 for page's mapcount and 1 for the > - * caller's pin and 1 for the reference taken by isolate_lru_page(). > - */ > - if (PageTransHuge(page) && page_count(page) != 3) { > - putback_lru_page(page); > - return 0; > - } > - > page_lru = page_is_file_lru(page); > mod_node_page_state(page_pgdat(page), NR_ISOLATED_ANON + page_lru, > thp_nr_pages(page));
Re: [PATCH RFC 1/6] dcache: sweep cached negative dentries to the end of list of siblings
On Thu, Jan 21, 2021 at 06:49:40PM +0530, Gautham Ananthakrishna wrote: > From: Konstantin Khlebnikov > > For disk filesystems result of every negative lookup is cached, content of > directories is usually cached too. Production of negative dentries isn't > limited with disk speed. It's really easy to generate millions of them if > system has enough memory. Negative dentries are linked into siblings list > along with normal positive dentries. Some operations walks dcache tree but > looks only for positive dentries: most important is fsnotify/inotify. > > This patch moves negative dentries to the end of list at final dput() and > marks with flag which tells that all following dentries are negative too. > Reverse operation is required before instantiating negative dentry. > +static void sweep_negative(struct dentry *dentry) > +{ > + struct dentry *parent; > + > + if (!d_is_tail_negative(dentry)) { > + parent = lock_parent(dentry); > + if (!parent) > + return; > + > + if (!d_count(dentry) && d_is_negative(dentry) && > + !d_is_tail_negative(dentry)) { > + dentry->d_flags |= DCACHE_TAIL_NEGATIVE; > + list_move_tail(>d_child, >d_subdirs); > + } > + > + spin_unlock(>d_lock); > + } > +} Ugh... So when dput() drives the refcount down to 0 you hit lock_parent() and only then bother to check if the sucker had been negative in the first place? > @@ -1970,6 +2021,8 @@ void d_instantiate(struct dentry *entry, struct inode * > inode) > { > BUG_ON(!hlist_unhashed(>d_u.d_alias)); > if (inode) { > + if (d_is_tail_negative(entry)) > + recycle_negative(entry); > security_d_instantiate(entry, inode); > spin_lock(>i_lock); > __d_instantiate(entry, inode); Wait a bloody minute. What about d_instantiate_new() right next to it?
Re: [PATCH 2/5] swap: fix do_swap_page() race with swapoff
On 2021/4/13 9:27, Huang, Ying wrote: > Miaohe Lin writes: > >> When I was investigating the swap code, I found the below possible race >> window: >> >> CPU 1CPU 2 >> -- >> do_swap_page >> synchronous swap_readpage >> alloc_page_vma >> swapoff >>release swap_file, bdev, or ... >> swap_readpage >> check sis->flags is ok >>access swap_file, bdev...[oops!] >> si->flags = 0 >> >> Using current get/put_swap_device() to guard against concurrent swapoff for >> swap_readpage() looks terrible because swap_readpage() may take really long >> time. And this race may not be really pernicious because swapoff is usually >> done when system shutdown only. To reduce the performance overhead on the >> hot-path as much as possible, it appears we can use the percpu_ref to close >> this race window(as suggested by Huang, Ying). >> >> Fixes: 235b62176712 ("mm/swap: add cluster lock") > > This isn't the commit that introduces the race. You can use `git blame` > find out the correct commit. For this it's commit 0bcac06f27d7 "mm, > swap: skip swapcache for swapin of synchronous device". > Sorry about it! What I refer to is commit eb085574a752 ("mm, swap: fix race between swapoff and some swap operations"). And I think this commit does not fix the race condition completely, so I reuse the Fixes tag inside it. > And I suggest to merge 1/5 and 2/5 to make it easy to get the full > picture. > >> Signed-off-by: Miaohe Lin >> --- >> include/linux/swap.h | 2 +- >> mm/memory.c | 10 ++ >> mm/swapfile.c| 28 +++- >> 3 files changed, 22 insertions(+), 18 deletions(-) >> >> diff --git a/include/linux/swap.h b/include/linux/swap.h >> index 849ba5265c11..9066addb57fd 100644 >> --- a/include/linux/swap.h >> +++ b/include/linux/swap.h >> @@ -513,7 +513,7 @@ sector_t swap_page_sector(struct page *page); >> >> static inline void put_swap_device(struct swap_info_struct *si) >> { >> -rcu_read_unlock(); >> +percpu_ref_put(>users); >> } >> >> #else /* CONFIG_SWAP */ >> diff --git a/mm/memory.c b/mm/memory.c >> index cc71a445c76c..8543c47b955c 100644 >> --- a/mm/memory.c >> +++ b/mm/memory.c >> @@ -3311,6 +3311,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) >> { >> struct vm_area_struct *vma = vmf->vma; >> struct page *page = NULL, *swapcache; >> +struct swap_info_struct *si = NULL; >> swp_entry_t entry; >> pte_t pte; >> int locked; >> @@ -3339,6 +3340,11 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) >> } >> >> > > I suggest to add comments here as follows (words copy from Matthew Wilcox) > > /* Prevent swapoff from happening to us */ Ok. > >> +si = get_swap_device(entry); >> +/* In case we raced with swapoff. */ >> +if (unlikely(!si)) >> +goto out; >> + > > Because we wrap the whole do_swap_page() with get/put_swap_device() > now. We can remove several get/put_swap_device() for function called by > do_swap_page(). That can be another optimization patch. I tried to remove several get/put_swap_device() for function called by do_swap_page() only before I send this series. But it seems they have other callers without proper get/put_swap_device(). > >> delayacct_set_flag(DELAYACCT_PF_SWAPIN); >> page = lookup_swap_cache(entry, vma, vmf->address); >> swapcache = page; >> @@ -3514,6 +3520,8 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) >> unlock: >> pte_unmap_unlock(vmf->pte, vmf->ptl); >> out: >> +if (si) >> +put_swap_device(si); >> return ret; >> out_nomap: >> pte_unmap_unlock(vmf->pte, vmf->ptl); >> @@ -3525,6 +3533,8 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) >> unlock_page(swapcache); >> put_page(swapcache); >> } >> +if (si) >> +put_swap_device(si); >> return ret; >> } >> >> diff --git a/mm/swapfile.c b/mm/swapfile.c >> index 724173cd7d0c..01032c72ceae 100644 >> --- a/mm/swapfile.c >> +++ b/mm/swapfile.c >> @@ -1280,18 +1280,12 @@ static unsigned char __swap_entry_free_locked(struct >> swap_info_struct *p, >> * via preventing the swap device from being swapoff, until >> * put_swap_device() is called. Otherwise return NULL. >> * >> - * The entirety of the RCU read critical section must come before the >> - * return from or after the call to synchronize_rcu() in >> - * enable_swap_info() or swapoff(). So if "si->flags & SWP_VALID" is >> - * true, the si->map, si->cluster_info, etc. must be valid in the >> - * critical section. >> - * >> * Notice that swapoff or swapoff+swapon can still happen before the >> - * rcu_read_lock() in get_swap_device() or after the rcu_read_unlock() >> - * in put_swap_device() if there isn't any other way to prevent
[PATCH] mm: Define ARCH_HAS_FIRST_USER_ADDRESS
Currently most platforms define FIRST_USER_ADDRESS as 0UL duplicating the same code all over. Instead define a new option ARCH_HAS_FIRST_USER_ADDRESS for those platforms which would override generic default FIRST_USER_ADDRESS value 0UL. This makes it much cleaner with reduced code. Cc: linux-al...@vger.kernel.org Cc: linux-snps-...@lists.infradead.org Cc: linux-arm-ker...@lists.infradead.org Cc: linux-c...@vger.kernel.org Cc: linux-hexa...@vger.kernel.org Cc: linux-i...@vger.kernel.org Cc: linux-m...@lists.linux-m68k.org Cc: linux-m...@vger.kernel.org Cc: openr...@lists.librecores.org Cc: linux-par...@vger.kernel.org Cc: linuxppc-...@lists.ozlabs.org Cc: linux-ri...@lists.infradead.org Cc: linux-s...@vger.kernel.org Cc: linux...@vger.kernel.org Cc: sparcli...@vger.kernel.org Cc: linux...@lists.infradead.org Cc: linux-xte...@linux-xtensa.org Cc: x...@kernel.org Cc: linux...@kvack.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Anshuman Khandual --- arch/alpha/include/asm/pgtable.h | 1 - arch/arc/include/asm/pgtable.h | 6 -- arch/arm/Kconfig | 1 + arch/arm64/include/asm/pgtable.h | 2 -- arch/csky/include/asm/pgtable.h | 1 - arch/hexagon/include/asm/pgtable.h | 3 --- arch/ia64/include/asm/pgtable.h | 1 - arch/m68k/include/asm/pgtable_mm.h | 1 - arch/microblaze/include/asm/pgtable.h| 2 -- arch/mips/include/asm/pgtable-32.h | 1 - arch/mips/include/asm/pgtable-64.h | 1 - arch/nds32/Kconfig | 1 + arch/nios2/include/asm/pgtable.h | 2 -- arch/openrisc/include/asm/pgtable.h | 1 - arch/parisc/include/asm/pgtable.h| 2 -- arch/powerpc/include/asm/book3s/pgtable.h| 1 - arch/powerpc/include/asm/nohash/32/pgtable.h | 1 - arch/powerpc/include/asm/nohash/64/pgtable.h | 2 -- arch/riscv/include/asm/pgtable.h | 2 -- arch/s390/include/asm/pgtable.h | 2 -- arch/sh/include/asm/pgtable.h| 2 -- arch/sparc/include/asm/pgtable_32.h | 1 - arch/sparc/include/asm/pgtable_64.h | 3 --- arch/um/include/asm/pgtable-2level.h | 1 - arch/um/include/asm/pgtable-3level.h | 1 - arch/x86/include/asm/pgtable_types.h | 2 -- arch/xtensa/include/asm/pgtable.h| 1 - include/linux/mm.h | 4 mm/Kconfig | 4 29 files changed, 10 insertions(+), 43 deletions(-) diff --git a/arch/alpha/include/asm/pgtable.h b/arch/alpha/include/asm/pgtable.h index 8d856c62e22a..1a2fb0dc905b 100644 --- a/arch/alpha/include/asm/pgtable.h +++ b/arch/alpha/include/asm/pgtable.h @@ -46,7 +46,6 @@ struct vm_area_struct; #define PTRS_PER_PMD (1UL << (PAGE_SHIFT-3)) #define PTRS_PER_PGD (1UL << (PAGE_SHIFT-3)) #define USER_PTRS_PER_PGD (TASK_SIZE / PGDIR_SIZE) -#define FIRST_USER_ADDRESS 0UL /* Number of pointers that fit on a page: this will go away. */ #define PTRS_PER_PAGE (1UL << (PAGE_SHIFT-3)) diff --git a/arch/arc/include/asm/pgtable.h b/arch/arc/include/asm/pgtable.h index 163641726a2b..a9fabfb70664 100644 --- a/arch/arc/include/asm/pgtable.h +++ b/arch/arc/include/asm/pgtable.h @@ -228,12 +228,6 @@ */ #defineUSER_PTRS_PER_PGD (TASK_SIZE / PGDIR_SIZE) -/* - * No special requirements for lowest virtual address we permit any user space - * mapping to be mapped at. - */ -#define FIRST_USER_ADDRESS 0UL - / * Bucket load of VM Helpers diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 5da96f5df48f..ad086e6d7155 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -7,6 +7,7 @@ config ARM select ARCH_HAS_DEBUG_VIRTUAL if MMU select ARCH_HAS_DMA_WRITE_COMBINE if !ARM_DMA_MEM_BUFFERABLE select ARCH_HAS_ELF_RANDOMIZE + select ARCH_HAS_FIRST_USER_ADDRESS select ARCH_HAS_FORTIFY_SOURCE select ARCH_HAS_KEEPINITRD select ARCH_HAS_KCOV diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index 47027796c2f9..f6ab8b64967e 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -26,8 +26,6 @@ #define vmemmap((struct page *)VMEMMAP_START - (memstart_addr >> PAGE_SHIFT)) -#define FIRST_USER_ADDRESS 0UL - #ifndef __ASSEMBLY__ #include diff --git a/arch/csky/include/asm/pgtable.h b/arch/csky/include/asm/pgtable.h index 0d60367b6bfa..151607ed5158 100644 --- a/arch/csky/include/asm/pgtable.h +++ b/arch/csky/include/asm/pgtable.h @@ -14,7 +14,6 @@ #define PGDIR_MASK (~(PGDIR_SIZE-1)) #define USER_PTRS_PER_PGD (PAGE_OFFSET/PGDIR_SIZE) -#define FIRST_USER_ADDRESS 0UL /* * C-SKY is two-level paging structure: diff --git a/arch/hexagon/include/asm/pgtable.h b/arch/hexagon/include/asm/pgtable.h index
[PATCH v2] drivers: pinctrl: qcom: fix Kconfig dependency on GPIOLIB
When PINCTRL_MSM is enabled, and GPIOLIB is disabled, Kbuild gives the following warning: WARNING: unmet direct dependencies detected for GPIOLIB_IRQCHIP Depends on [n]: GPIOLIB [=n] Selected by [y]: - PINCTRL_MSM [=y] && PINCTRL [=y] && (ARCH_QCOM || COMPILE_TEST [=y]) This is because PINCTRL_MSM selects GPIOLIB_IRQCHIP, without selecting or depending on GPIOLIB, despite GPIOLIB_IRQCHIP depending on GPIOLIB. Having PINCTRL_MSM select GPIOLIB will cause a recursive dependency error. Signed-off-by: Julian Braha --- drivers/pinctrl/qcom/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/pinctrl/qcom/Kconfig b/drivers/pinctrl/qcom/Kconfig index 6853a896c476..d42ac59875ab 100644 --- a/drivers/pinctrl/qcom/Kconfig +++ b/drivers/pinctrl/qcom/Kconfig @@ -3,7 +3,7 @@ if (ARCH_QCOM || COMPILE_TEST) config PINCTRL_MSM tristate "Qualcomm core pin controller driver" - depends on QCOM_SCM || !QCOM_SCM #if QCOM_SCM=m this can't be =y + depends on GPIOLIB && (QCOM_SCM || !QCOM_SCM) #if QCOM_SCM=m this can't be =y select PINMUX select PINCONF select GENERIC_PINCONF -- 2.27.0
Re: [PATCH 4.19 00/66] 4.19.187-rc1 review
On 2021/4/12 16:40, Greg Kroah-Hartman wrote: This is the start of the stable review cycle for the 4.19.187 release. There are 66 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know. Responses should be made by Wed, 14 Apr 2021 08:39:44 +. Anything received after that time might be too late. The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.19.187-rc1.gz or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.19.y and the diffstat can be found below. thanks, greg k-h Tested on arm64 and x86 for 4.19.187-rc1, Kernel repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git Branch: linux-4.19.y Version: 4.19.187-rc1 Commit: 85bc28045cdbb9576907965c761445aaece4f5ad Compiler: gcc version 7.3.0 (GCC) arm64: Testcase Result Summary: total: 5223 passed: 5223 failed: 0 timeout: 0 x86: Testcase Result Summary: total: 5223 passed: 5223 failed: 0 timeout: 0 Tested-by: Hulk Robot
Re: [RFC PATCH v2 2/2] kvm/arm64: Try stage2 block mapping for host device MMIO
Hi Marc, I think I have fully tested this patch. The next step is to do some restriction on HVA in vfio module, so we can build block mapping for it with a higher probability. Is there anything to improve? If not, could you apply it? ^_^ Thanks, Keqian On 2021/4/7 21:18, Marc Zyngier wrote: > On Tue, 16 Mar 2021 13:43:38 +, > Keqian Zhu wrote: >> >> The MMIO region of a device maybe huge (GB level), try to use >> block mapping in stage2 to speedup both map and unmap. >> >> Compared to normal memory mapping, we should consider two more >> points when try block mapping for MMIO region: >> >> 1. For normal memory mapping, the PA(host physical address) and >> HVA have same alignment within PUD_SIZE or PMD_SIZE when we use >> the HVA to request hugepage, so we don't need to consider PA >> alignment when verifing block mapping. But for device memory >> mapping, the PA and HVA may have different alignment. >> >> 2. For normal memory mapping, we are sure hugepage size properly >> fit into vma, so we don't check whether the mapping size exceeds >> the boundary of vma. But for device memory mapping, we should pay >> attention to this. >> >> This adds device_rough_page_shift() to check these two points when >> selecting block mapping size. >> >> Signed-off-by: Keqian Zhu >> --- >> >> Mainly for RFC, not fully tested. I will fully test it when the >> code logic is well accepted. >> >> --- >> arch/arm64/kvm/mmu.c | 42 ++ >> 1 file changed, 38 insertions(+), 4 deletions(-) >> >> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c >> index c59af5ca01b0..224aa15eb4d9 100644 >> --- a/arch/arm64/kvm/mmu.c >> +++ b/arch/arm64/kvm/mmu.c >> @@ -624,6 +624,36 @@ static void kvm_send_hwpoison_signal(unsigned long >> address, short lsb) >> send_sig_mceerr(BUS_MCEERR_AR, (void __user *)address, lsb, current); >> } >> >> +/* >> + * Find a mapping size that properly insides the intersection of vma and >> + * memslot. And hva and pa have the same alignment to this mapping size. >> + * It's rough because there are still other restrictions, which will be >> + * checked by the following fault_supports_stage2_huge_mapping(). > > I don't think these restrictions make complete sense to me. If this is > a PFNMAP VMA, we should use the biggest mapping size that covers the > VMA, and not more than the VMA. > >> + */ >> +static short device_rough_page_shift(struct kvm_memory_slot *memslot, >> + struct vm_area_struct *vma, >> + unsigned long hva) >> +{ >> +size_t size = memslot->npages * PAGE_SIZE; >> +hva_t sec_start = max(memslot->userspace_addr, vma->vm_start); >> +hva_t sec_end = min(memslot->userspace_addr + size, vma->vm_end); >> +phys_addr_t pa = (vma->vm_pgoff << PAGE_SHIFT) + (hva - vma->vm_start); >> + >> +#ifndef __PAGETABLE_PMD_FOLDED >> +if ((hva & (PUD_SIZE - 1)) == (pa & (PUD_SIZE - 1)) && >> +ALIGN_DOWN(hva, PUD_SIZE) >= sec_start && >> +ALIGN(hva, PUD_SIZE) <= sec_end) >> +return PUD_SHIFT; >> +#endif >> + >> +if ((hva & (PMD_SIZE - 1)) == (pa & (PMD_SIZE - 1)) && >> +ALIGN_DOWN(hva, PMD_SIZE) >= sec_start && >> +ALIGN(hva, PMD_SIZE) <= sec_end) >> +return PMD_SHIFT; >> + >> +return PAGE_SHIFT; >> +} >> + >> static bool fault_supports_stage2_huge_mapping(struct kvm_memory_slot >> *memslot, >> unsigned long hva, >> unsigned long map_size) >> @@ -769,7 +799,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, >> phys_addr_t fault_ipa, >> return -EFAULT; >> } >> >> -/* Let's check if we will get back a huge page backed by hugetlbfs */ >> +/* >> + * Let's check if we will get back a huge page backed by hugetlbfs, or >> + * get block mapping for device MMIO region. >> + */ >> mmap_read_lock(current->mm); >> vma = find_vma_intersection(current->mm, hva, hva + 1); >> if (unlikely(!vma)) { >> @@ -780,11 +813,12 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, >> phys_addr_t fault_ipa, >> >> if (is_vm_hugetlb_page(vma)) >> vma_shift = huge_page_shift(hstate_vma(vma)); >> +else if (vma->vm_flags & VM_PFNMAP) >> +vma_shift = device_rough_page_shift(memslot, vma, hva); >> else >> vma_shift = PAGE_SHIFT; >> >> -if (logging_active || >> -(vma->vm_flags & VM_PFNMAP)) { >> +if (logging_active) { >> force_pte = true; >> vma_shift = PAGE_SHIFT; > > But why should we downgrade to page-size mappings if logging? This is > a device, and you aren't moving the device around, are you? Or is your > device actually memory with a device mapping that you are trying to > migrate? > >> } >> @@ -855,7 +889,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, >> phys_addr_t fault_ipa, >>
Re: [PATCH] hrtimer: Update softirq_expires_next correctly after __hrtimer_get_next_event()
On Wed, Apr 14, 2021 at 2:14 AM Greg KH wrote: > To give context, the commit is now 46eb1701c046 ("hrtimer: Update > softirq_expires_next correctly after __hrtimer_get_next_event()") and is > attached below. > > The f_ncm.c driver is doing a lot of calls to hrtimer_start() with mode > HRTIMER_MODE_REL_SOFT for I think every packet it gets. If that should > not be happening, we can easily fix it but I guess no one has actually > had fast USB devices that noticed this until now :) AIUI the purpose of this timer is to support packet aggregation. USB transfers are relatively expensive/high latency. 6 Gbps is 500k 1500-byte packets per second, or one every 2us. So f_ncm buffers as many packets as will fit into 16k (usually, 10 1500-byte packets), and only initiates a USB transfer when those packets have arrived. That ends up doing only one transfer every 20us. It sets a 300us timer to ensure that if the 10 packets haven't arrived, it still sends out whatever it has when the timer fires. The timer is set/updated on every packet buffered by ncm. Is this somehow much more expensive in 5.10.24 than it was before? Even if this driver is somehow "holding it wrong", might there not be other workloads that have a similar problem? What about regressions on those workloads?