date:20210413

We should return an error code if the timing mode is not acknowledged
by the NAND chip.

Fixes: 415ae78ffb5d ("mtd: rawnand: check ONFI timings have been acked by the 
chip")
Signed-off-by: Dan Carpenter 
---
>From static analysis.  Not tested.

 drivers/mtd/nand/raw/nand_base.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/mtd/nand/raw/nand_base.c b/drivers/mtd/nand/raw/nand_base.c
index fb072c95..d83c0503f96f 100644
--- a/drivers/mtd/nand/raw/nand_base.c
+++ b/drivers/mtd/nand/raw/nand_base.c
@@ -880,6 +880,7 @@ static int nand_setup_interface(struct nand_chip *chip, int 
chipnr)
if (tmode_param[0] != chip->best_interface_config->timings.mode) {
pr_warn("timing mode %d not acknowledged by the NAND chip\n",
chip->best_interface_config->timings.mode);
+   ret = -EINVAL;
goto err_reset_chip;
}
 
-- 
2.30.2

Re: [syzbot] KASAN: use-after-free Read in get_wchan

2021-04-13 Thread Dmitry Vyukov

On Wed, Apr 14, 2021 at 7:52 AM syzbot
 wrote:
>
> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit:b2b3d18f riscv: Make NUMA depend on MMU
> git tree:   git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux.git 
> fixes
> console output: https://syzkaller.appspot.com/x/log.txt?x=12b59d16d0
> kernel config:  https://syzkaller.appspot.com/x/.config?x=81b3e7c68dad6e
> dashboard link: https://syzkaller.appspot.com/bug?extid=0806291048161061627c
> userspace arch: riscv64
>
> Unfortunately, I don't have any reproducer for this issue yet.
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+08062910481610616...@syzkaller.appspotmail.com
>
> ==
> BUG: KASAN: use-after-free in walk_stackframe 
> arch/riscv/kernel/stacktrace.c:60 [inline]
> BUG: KASAN: use-after-free in get_wchan+0x156/0x196 
> arch/riscv/kernel/stacktrace.c:136
> Read of size 8 at addr ffe0058e3d90 by task syz-executor.0/4667
>
> CPU: 1 PID: 4667 Comm: syz-executor.0 Not tainted 
> 5.12.0-rc5-syzkaller-00721-gb2b3d18fc20e #0
> Hardware name: riscv-virtio,qemu (DT)
> Call Trace:
> [] walk_stackframe+0x0/0x23c arch/riscv/kernel/traps.c:201
> [] dump_backtrace+0x40/0x4e 
> arch/riscv/kernel/stacktrace.c:113
> [] show_stack+0x22/0x2e arch/riscv/kernel/stacktrace.c:118
> [] __dump_stack lib/dump_stack.c:79 [inline]
> [] dump_stack+0x148/0x1d8 lib/dump_stack.c:120
> [] print_address_description.constprop.0+0x52/0x31e 
> mm/kasan/report.c:232
> [] __kasan_report mm/kasan/report.c:399 [inline]
> [] kasan_report+0x16e/0x18c mm/kasan/report.c:416
> [] check_region_inline mm/kasan/generic.c:180 [inline]
> [] __asan_load8+0x6e/0x80 mm/kasan/generic.c:253
> [] walk_stackframe arch/riscv/kernel/stacktrace.c:60 
> [inline]

If it's walking the stack of another task, then it needs to use
READ_ONCE_NOCHECK. See x86/arm64/s390 for examples:
https://elixir.bootlin.com/linux/v5.12-rc7/A/ident/READ_ONCE_NOCHECK

> [] get_wchan+0x156/0x196 arch/riscv/kernel/stacktrace.c:136
> [] proc_pid_wchan+0x48/0xa4 fs/proc/base.c:390
> [] proc_single_show+0x9c/0x13c fs/proc/base.c:774
> [] seq_read_iter+0x2e0/0x8f2 fs/seq_file.c:227
> [] seq_read+0x200/0x298 fs/seq_file.c:159
> [] vfs_read+0x108/0x2ac fs/read_write.c:494
> [] ksys_read+0xb4/0x1b8 fs/read_write.c:634
> [] __do_sys_read fs/read_write.c:644 [inline]
> [] sys_read+0x28/0x36 fs/read_write.c:642
> [] ret_from_syscall+0x0/0x2
>
> The buggy address belongs to the page:
> page:ffcf0216b8c0 refcount:0 mapcount:0 mapping: 
> index:0x0 pfn:0x85ae3
> flags: 0xffe()
> raw: 0ffe ffcf0216b8c8 ffcf0216b8c8 
> raw:    
> page dumped because: kasan: bad access detected
>
> Memory state around the buggy address:
>  ffe0058e3c80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>  ffe0058e3d00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> >ffe0058e3d80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>  ^
>  ffe0058e3e00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>  ffe0058e3e80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> ==
>
>
> ---
> This report is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at syzkal...@googlegroups.com.
>
> syzbot will keep track of this issue. See:
> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
>
> --
> You received this message because you are subscribed to the Google Groups 
> "syzkaller-bugs" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to syzkaller-bugs+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/syzkaller-bugs/9862e005bfe859c8%40google.com.

[PATCH 4/4] selftests: add tests for process_vm_exec

Output:
 $ make run_tests
 TAP version 13
 1..4
 # selftests: process_vm_exec: process_vm_exec
 # 1..1
 # ok 1 275 ns/syscall
 # # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
 ok 1 selftests: process_vm_exec: process_vm_exec
 # selftests: process_vm_exec: process_vm_exec_fault
 # 1..1
 # ok 1 789 ns/signal
 # # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
 ok 2 selftests: process_vm_exec: process_vm_exec_fault
 # selftests: process_vm_exec: ptrace_vm_exec
 # 1..1
 # ok 1 1378 ns/syscall# Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
 ok 3 selftests: process_vm_exec: ptrace_vm_exec
 # selftests: process_vm_exec: process_vm_exec_syscall
 # 1..1
 # ok 1 write works as expectd
 # # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
 ok 4 selftests: process_vm_exec: process_vm_exec_syscall

Signed-off-by: Andrei Vagin 
---
 .../selftests/process_vm_exec/Makefile|   7 ++
 tools/testing/selftests/process_vm_exec/log.h |  26 
 .../process_vm_exec/process_vm_exec.c | 105 +
 .../process_vm_exec/process_vm_exec_fault.c   | 111 ++
 .../process_vm_exec/process_vm_exec_syscall.c |  81 +
 .../process_vm_exec/ptrace_vm_exec.c  | 111 ++
 6 files changed, 441 insertions(+)
 create mode 100644 tools/testing/selftests/process_vm_exec/Makefile
 create mode 100644 tools/testing/selftests/process_vm_exec/log.h
 create mode 100644 tools/testing/selftests/process_vm_exec/process_vm_exec.c
 create mode 100644 
tools/testing/selftests/process_vm_exec/process_vm_exec_fault.c
 create mode 100644 
tools/testing/selftests/process_vm_exec/process_vm_exec_syscall.c
 create mode 100644 tools/testing/selftests/process_vm_exec/ptrace_vm_exec.c

diff --git a/tools/testing/selftests/process_vm_exec/Makefile 
b/tools/testing/selftests/process_vm_exec/Makefile
new file mode 100644
index ..bdf7fcf0fdd3
--- /dev/null
+++ b/tools/testing/selftests/process_vm_exec/Makefile
@@ -0,0 +1,7 @@
+# SPDX-License-Identifier: GPL-2.0
+
+UNAME_M := $(shell uname -m)
+TEST_GEN_PROGS_x86_64 := process_vm_exec process_vm_exec_fault ptrace_vm_exec 
process_vm_exec_syscall
+TEST_GEN_PROGS += $(TEST_GEN_PROGS_$(UNAME_M))
+
+include ../lib.mk
diff --git a/tools/testing/selftests/process_vm_exec/log.h 
b/tools/testing/selftests/process_vm_exec/log.h
new file mode 100644
index ..ef268c2cf2b8
--- /dev/null
+++ b/tools/testing/selftests/process_vm_exec/log.h
@@ -0,0 +1,26 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef __SELFTEST_PROCESS_VM_EXEC_LOG_H__
+#define __SELFTEST_PROCESS_VM_EXEC_LOG_H__
+
+#define pr_msg(fmt, lvl, ...)  \
+   ksft_print_msg("[%s] (%s:%d)\t" fmt "\n",   \
+   lvl, __FILE__, __LINE__, ##__VA_ARGS__)
+
+#define pr_p(func, fmt, ...)   func(fmt ": %m", ##__VA_ARGS__)
+
+#define pr_err(fmt, ...)   \
+   ({  \
+   ksft_test_result_error(fmt "\n", ##__VA_ARGS__);
\
+   -1; \
+   })
+
+#define pr_fail(fmt, ...)  \
+   ({  \
+   ksft_test_result_fail(fmt "\n", ##__VA_ARGS__); \
+   -1; \
+   })
+
+#define pr_perror(fmt, ...)pr_p(pr_err, fmt, ##__VA_ARGS__)
+
+#endif
diff --git a/tools/testing/selftests/process_vm_exec/process_vm_exec.c 
b/tools/testing/selftests/process_vm_exec/process_vm_exec.c
new file mode 100644
index ..aa4009c43e01
--- /dev/null
+++ b/tools/testing/selftests/process_vm_exec/process_vm_exec.c
@@ -0,0 +1,105 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#define _GNU_SOURCE
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "asm/unistd.h"
+#include 
+#include 
+
+#include "../kselftest.h"
+#include "log.h"
+
+#ifndef __NR_process_vm_exec
+#define __NR_process_vm_exec 441
+#endif
+
+#define TEST_SYSCALL 123
+#define TEST_SYSCALL_RET 456
+#define TEST_MARKER 789
+#define TEST_TIMEOUT 5
+#define TEST_STACK_SIZE 65536
+
+static inline long __syscall1(long n, long a1)
+{
+   unsigned long ret;
+
+   __asm__ __volatile__ ("syscall" : "=a"(ret) : "a"(n), "D"(a1) : "rcx", 
"r11", "memory");
+
+   return ret;
+}
+
+int marker;
+
+static void guest(void)
+{
+   while (1)
+   if (__syscall1(TEST_SYSCALL, marker) != TEST_SYSCALL_RET)
+   abort();
+}
+
+int main(int argc, char **argv)
+{
+   struct sigcontext ctx = {};
+   struct timespec start, cur;
+   int status, ret;
+   pid_t pid;
+   long sysnr;
+   void *stack;
+
+   ksft_set_plan(1);
+
+   stack = mmap(NULL, TEST_STACK_SIZE, PROT_READ | PROT_WRITE,

[PATCH 3/4] arch/x86: allow to execute syscalls via process_vm_exec

process_vm_exec allows to execute code in an address space of another
process. It changes the current address space to the target address
space and resume the current process with registers from sigcontex that
is passed in the arguments.

This changes adds the PROCESS_VM_EXEC_SYSCALL flag and if it is set
process_vm_exec will execute a system call with arguments from sigcontext.

process_vm_exec retuns 0 if the system call has been executed and an error
code in other cases.

A return code of the system call can be found in a proper register in
sigcontext.

Signed-off-by: Andrei Vagin 
---
 arch/x86/entry/common.c  |  5 -
 arch/x86/kernel/process_vm_exec.c| 29 +++-
 include/linux/entry-common.h |  2 ++
 include/linux/process_vm_exec.h  |  2 ++
 include/uapi/linux/process_vm_exec.h |  8 
 kernel/entry/common.c|  2 +-
 6 files changed, 45 insertions(+), 3 deletions(-)
 create mode 100644 include/uapi/linux/process_vm_exec.h

diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
index 42eac459b25b..8de02ca19aca 100644
--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -40,7 +40,10 @@
 __visible noinstr void do_syscall_64(unsigned long nr, struct pt_regs *regs)
 {
 #ifdef CONFIG_PROCESS_VM_EXEC
-   if (current->exec_mm && current->exec_mm->ctx) {
+   struct exec_mm *exec_mm = current->exec_mm;
+
+   if (exec_mm && exec_mm->ctx &&
+   !(exec_mm->flags & PROCESS_VM_EXEC_SYSCALL)) {
kernel_siginfo_t info = {
.si_signo = SIGSYS,
.si_call_addr = (void __user *)KSTK_EIP(current),
diff --git a/arch/x86/kernel/process_vm_exec.c 
b/arch/x86/kernel/process_vm_exec.c
index 28b32330f744..9124b23f1e9b 100644
--- a/arch/x86/kernel/process_vm_exec.c
+++ b/arch/x86/kernel/process_vm_exec.c
@@ -11,6 +11,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 static void swap_mm(struct mm_struct *prev_mm, struct mm_struct *target_mm)
@@ -73,7 +74,7 @@ SYSCALL_DEFINE6(process_vm_exec, pid_t, pid, struct 
sigcontext __user *, uctx,
 
sigset_t mask;
 
-   if (flags)
+   if (flags & ~PROCESS_VM_EXEC_SYSCALL)
return -EINVAL;
 
if (sizemask != sizeof(sigset_t))
@@ -97,6 +98,9 @@ SYSCALL_DEFINE6(process_vm_exec, pid_t, pid, struct 
sigcontext __user *, uctx,
}
 
current_pt_regs()->ax = 0;
+   if (flags & PROCESS_VM_EXEC_SYSCALL)
+   syscall_exit_to_user_mode_prepare(current_pt_regs());
+
ret = swap_vm_exec_context(uctx);
if (ret < 0)
goto err_mm_put;
@@ -117,6 +121,29 @@ SYSCALL_DEFINE6(process_vm_exec, pid_t, pid, struct 
sigcontext __user *, uctx,
mmgrab(prev_mm);
swap_mm(prev_mm, mm);
 
+   if (flags & PROCESS_VM_EXEC_SYSCALL) {
+   struct pt_regs *regs = current_pt_regs();
+   kernel_siginfo_t info;
+   int sysno;
+
+   regs->orig_ax = regs->ax;
+   regs->ax = -ENOSYS;
+   sysno = syscall_get_nr(current, regs);
+
+   do_syscall_64(sysno, regs);
+
+   restore_vm_exec_context(regs);
+   info.si_signo = SIGSYS;
+   info.si_call_addr = (void __user *)KSTK_EIP(current);
+   info.si_arch = syscall_get_arch(current);
+   info.si_syscall = sysno;
+   ret = copy_siginfo_to_user(current->exec_mm->siginfo, );
+   current_pt_regs()->orig_ax = __NR_process_vm_exec;
+   current_pt_regs()->ax = -ENOSYS;
+   syscall_enter_from_user_mode_work(current_pt_regs(), 
current_pt_regs()->orig_ax);
+   return ret;
+   }
+
ret = current_pt_regs()->ax;
 
return ret;
diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h
index 474f29638d2c..d0ebbe9ca9e4 100644
--- a/include/linux/entry-common.h
+++ b/include/linux/entry-common.h
@@ -285,6 +285,8 @@ static inline void arch_syscall_exit_tracehook(struct 
pt_regs *regs, bool step)
 }
 #endif
 
+void syscall_exit_to_user_mode_prepare(struct pt_regs *regs);
+
 /**
  * syscall_exit_to_user_mode - Handle work before returning to user mode
  * @regs:  Pointer to currents pt_regs
diff --git a/include/linux/process_vm_exec.h b/include/linux/process_vm_exec.h
index a02535fbd5c8..2e04b4875a92 100644
--- a/include/linux/process_vm_exec.h
+++ b/include/linux/process_vm_exec.h
@@ -2,6 +2,8 @@
 #ifndef _LINUX_PROCESS_VM_EXEC_H
 #define _LINUX_PROCESS_VM_EXEC_H
 
+#include 
+
 struct exec_mm {
struct sigcontext *ctx;
struct mm_struct *mm;
diff --git a/include/uapi/linux/process_vm_exec.h 
b/include/uapi/linux/process_vm_exec.h
new file mode 100644
index ..35465b5d3ebf
--- /dev/null
+++ b/include/uapi/linux/process_vm_exec.h
@@ -0,0 +1,8 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+
+#ifndef _UAPI_LINUX_PROCESS_VM_EXEC_H

[PATCH 1/4] signal: add a helper to restore a process state from sigcontex

It will be used to implement process_vm_exec.

Signed-off-by: Andrei Vagin 
---
 arch/x86/kernel/signal.c | 78 ++--
 1 file changed, 43 insertions(+), 35 deletions(-)

diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c
index be0d7d4152ec..cc269a20dd5f 100644
--- a/arch/x86/kernel/signal.c
+++ b/arch/x86/kernel/signal.c
@@ -79,51 +79,43 @@ static void force_valid_ss(struct pt_regs *regs)
 # define CONTEXT_COPY_SIZE sizeof(struct sigcontext)
 #endif
 
-static int restore_sigcontext(struct pt_regs *regs,
- struct sigcontext __user *usc,
+static int __restore_sigcontext(struct pt_regs *regs,
+ struct sigcontext __user *sc,
  unsigned long uc_flags)
 {
-   struct sigcontext sc;
-
-   /* Always make any pending restarted system calls return -EINTR */
-   current->restart_block.fn = do_no_restart_syscall;
-
-   if (copy_from_user(, usc, CONTEXT_COPY_SIZE))
-   return -EFAULT;
-
 #ifdef CONFIG_X86_32
-   set_user_gs(regs, sc.gs);
-   regs->fs = sc.fs;
-   regs->es = sc.es;
-   regs->ds = sc.ds;
+   set_user_gs(regs, sc->gs);
+   regs->fs = sc->fs;
+   regs->es = sc->es;
+   regs->ds = sc->ds;
 #endif /* CONFIG_X86_32 */
 
-   regs->bx = sc.bx;
-   regs->cx = sc.cx;
-   regs->dx = sc.dx;
-   regs->si = sc.si;
-   regs->di = sc.di;
-   regs->bp = sc.bp;
-   regs->ax = sc.ax;
-   regs->sp = sc.sp;
-   regs->ip = sc.ip;
+   regs->bx = sc->bx;
+   regs->cx = sc->cx;
+   regs->dx = sc->dx;
+   regs->si = sc->si;
+   regs->di = sc->di;
+   regs->bp = sc->bp;
+   regs->ax = sc->ax;
+   regs->sp = sc->sp;
+   regs->ip = sc->ip;
 
 #ifdef CONFIG_X86_64
-   regs->r8 = sc.r8;
-   regs->r9 = sc.r9;
-   regs->r10 = sc.r10;
-   regs->r11 = sc.r11;
-   regs->r12 = sc.r12;
-   regs->r13 = sc.r13;
-   regs->r14 = sc.r14;
-   regs->r15 = sc.r15;
+   regs->r8 = sc->r8;
+   regs->r9 = sc->r9;
+   regs->r10 = sc->r10;
+   regs->r11 = sc->r11;
+   regs->r12 = sc->r12;
+   regs->r13 = sc->r13;
+   regs->r14 = sc->r14;
+   regs->r15 = sc->r15;
 #endif /* CONFIG_X86_64 */
 
/* Get CS/SS and force CPL3 */
-   regs->cs = sc.cs | 0x03;
-   regs->ss = sc.ss | 0x03;
+   regs->cs = sc->cs | 0x03;
+   regs->ss = sc->ss | 0x03;
 
-   regs->flags = (regs->flags & ~FIX_EFLAGS) | (sc.flags & FIX_EFLAGS);
+   regs->flags = (regs->flags & ~FIX_EFLAGS) | (sc->flags & FIX_EFLAGS);
/* disable syscall checks */
regs->orig_ax = -1;
 
@@ -136,10 +128,26 @@ static int restore_sigcontext(struct pt_regs *regs,
force_valid_ss(regs);
 #endif
 
-   return fpu__restore_sig((void __user *)sc.fpstate,
+   return fpu__restore_sig((void __user *)sc->fpstate,
   IS_ENABLED(CONFIG_X86_32));
 }
 
+static int restore_sigcontext(struct pt_regs *regs,
+ struct sigcontext __user *usc,
+ unsigned long uc_flags)
+{
+   struct sigcontext sc;
+
+   /* Always make any pending restarted system calls return -EINTR */
+   current->restart_block.fn = do_no_restart_syscall;
+
+   if (copy_from_user(, usc, CONTEXT_COPY_SIZE))
+   return -EFAULT;
+
+   return __restore_sigcontext(regs, , uc_flags);
+}
+
+
 static __always_inline int
 __unsafe_setup_sigcontext(struct sigcontext __user *sc, void __user *fpstate,
 struct pt_regs *regs, unsigned long mask)
-- 
2.29.2

[PATCH 2/4] arch/x86: implement the process_vm_exec syscall

This change introduces the new system call:
process_vm_exec(pid_t pid, struct sigcontext *uctx, unsigned long flags,
siginfo_t * uinfo, sigset_t *sigmask, size_t sizemask)

process_vm_exec allows to execute the current process in an address
space of another process.

process_vm_exec swaps the current address space with an address space of
a specified process, sets a state from sigcontex and resumes the process.
When a process receives a signal or calls a system call,
process_vm_exec saves the process state back to sigcontext, restores the
origin address space, restores the origin process state, and returns to
userspace.

If it was interrupted by a signal and the signal is in the user_mask,
the signal is dequeued and information about it is saved in uinfo.
If process_vm_exec is interrupted by a system call, a synthetic siginfo
for the SIGSYS signal is generated.

The behavior of this system call is similar to PTRACE_SYSEMU but
everything is happing in the context of one process, so
process_vm_exec shows a better performance.

PTRACE_SYSEMU is primarily used to implement sandboxes (application
kernels) like User-mode Linux or gVisor. These type of sandboxes
intercepts applications system calls and acts as the guest kernel.
A simple benchmark, where a "tracee" process executes systems calls in a
loop and a "tracer" process traps syscalls and handles them just
incrementing the tracee instruction pointer to skip the syscall
instruction shows that process_vm_exec works more than 5 times faster
than PTRACE_SYSEMU.

Signed-off-by: Andrei Vagin 
---
 arch/Kconfig   |  15 +++
 arch/x86/Kconfig   |   1 +
 arch/x86/entry/common.c|  16 +++
 arch/x86/entry/syscalls/syscall_64.tbl |   1 +
 arch/x86/include/asm/sigcontext.h  |   2 +
 arch/x86/kernel/Makefile   |   1 +
 arch/x86/kernel/process_vm_exec.c  | 133 +
 arch/x86/kernel/signal.c   |  47 +
 include/linux/process_vm_exec.h|  15 +++
 include/linux/sched.h  |   7 ++
 include/linux/syscalls.h   |   6 ++
 include/uapi/asm-generic/unistd.h  |   4 +-
 kernel/fork.c  |   9 ++
 kernel/sys_ni.c|   2 +
 14 files changed, 258 insertions(+), 1 deletion(-)
 create mode 100644 arch/x86/kernel/process_vm_exec.c
 create mode 100644 include/linux/process_vm_exec.h

diff --git a/arch/Kconfig b/arch/Kconfig
index ba4e966484ab..3ed9b8fb1727 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -514,6 +514,21 @@ config SECCOMP_FILTER
 
  See Documentation/userspace-api/seccomp_filter.rst for details.
 
+config HAVE_ARCH_PROCESS_VM_EXEC
+   bool
+   help
+ An arch should select this symbol to support the process_vm_exec 
system call.
+
+config PROCESS_VM_EXEC
+   prompt "Enable the process_vm_exec syscall"
+   def_bool y
+   depends on HAVE_ARCH_PROCESS_VM_EXEC
+   help
+ process_vm_exec allows executing code and system calls in a specified
+ address space.
+
+ If unsure, say Y.
+
 config HAVE_ARCH_STACKLEAK
bool
help
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index fbf26e0f7a6a..1c7ebb58865e 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -27,6 +27,7 @@ config X86_64
select ARCH_SUPPORTS_INT128 if CC_HAS_INT128
select ARCH_USE_CMPXCHG_LOCKREF
select HAVE_ARCH_SOFT_DIRTY
+   select HAVE_ARCH_PROCESS_VM_EXEC
select MODULES_USE_ELF_RELA
select NEED_DMA_MAP_STATE
select SWIOTLB
diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
index 870efeec8bda..42eac459b25b 100644
--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #ifdef CONFIG_XEN_PV
 #include 
@@ -38,6 +39,21 @@
 #ifdef CONFIG_X86_64
 __visible noinstr void do_syscall_64(unsigned long nr, struct pt_regs *regs)
 {
+#ifdef CONFIG_PROCESS_VM_EXEC
+   if (current->exec_mm && current->exec_mm->ctx) {
+   kernel_siginfo_t info = {
+   .si_signo = SIGSYS,
+   .si_call_addr = (void __user *)KSTK_EIP(current),
+   .si_arch = syscall_get_arch(current),
+   .si_syscall = nr,
+   };
+   restore_vm_exec_context(regs);
+   regs->ax = copy_siginfo_to_user(current->exec_mm->siginfo, 
);
+   syscall_exit_to_user_mode(regs);
+   return;
+   }
+#endif
+
nr = syscall_enter_from_user_mode(regs, nr);
 
instrumentation_begin();
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl 
b/arch/x86/entry/syscalls/syscall_64.tbl
index 379819244b91..2a8e27b2d87e 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -362,6 +362,7 @@
 438common  pidfd_getfd sys_pidfd_getfd
 439common

[PATCH 0/4 POC] Allow executing code and syscalls in another address space

We already have process_vm_readv and process_vm_writev to read and write
to a process memory faster than we can do this with ptrace. And now it
is time for process_vm_exec that allows executing code in an address
space of another process. We can do this with ptrace but it is much
slower.

= Use-cases =

Here are two known use-cases. The first one is “application kernel”
sandboxes like User-mode Linux and gVisor. In this case, we have a
process that runs the sandbox kernel and a set of stub processes that
are used to manage guest address spaces. Guest code is executed in the
context of stub processes but all system calls are intercepted and
handled in the sandbox kernel. Right now, these sort of sandboxes use
PTRACE_SYSEMU to trap system calls, but the process_vm_exec can
significantly speed them up.

Another use-case is CRIU (Checkpoint/Restore in User-space). Several
process properties can be received only from the process itself. Right
now, we use a parasite code that is injected into the process. We do
this with ptrace but it is slow, unsafe, and tricky. process_vm_exec can
simplify the process of injecting a parasite code and it will allow
pre-dump memory without stopping processes. The pre-dump here is when we
enable a memory tracker and dump the memory while a process is continue
running. On each interaction we dump memory that has been changed from
the previous iteration. In the final step, we will stop processes and
dump their full state. Right now the most effective way to dump process
memory is to create a set of pipes and splice memory into these pipes
from the parasite code. With process_vm_exec, we will be able to call
vmsplice directly. It means that we will not need to stop a process to
inject the parasite code.

= How it works =

process_vm_exec has two modes:

* Execute code in an address space of a target process and stop on any
  signal or system call.

* Execute a system call in an address space of a target process.

int process_vm_exec(pid_t pid, struct sigcontext uctx,
unsigned long flags, siginfo_t siginfo,
sigset_t  *sigmask, size_t sizemask)

PID - target process identification. We can consider to use pidfd
instead of PID here.

sigcontext contains a process state with what the process will be
resumed after switching the address space and then when a process will
be stopped, its sate will be saved back to sigcontext.

siginfo is information about a signal that has interrupted the process.
If a process is interrupted by a system call, signfo will contain a
synthetic siginfo of the SIGSYS signal.

sigmask is a set of signals that process_vm_exec returns via signfo.

# How fast is it

In the fourth patch, you can find two benchmarks that execute a function
that calls system calls in a loop. ptrace_vm_exe uses ptrace to trap
system calls, proces_vm_exec uses the process_vm_exec syscall to do the
same thing.

ptrace_vm_exec:   1446 ns/syscall
ptrocess_vm_exec:  289 ns/syscall

PS: This version is just a prototype. Its goal is to collect the initial
feedback, to discuss the interfaces, and maybe to get some advice on
implementation..

Cc: Andrew Morton 
Cc: Andy Lutomirski 
Cc: Anton Ivanov 
Cc: Christian Brauner 
Cc: Dmitry Safonov <0x7f454...@gmail.com>
Cc: Ingo Molnar 
Cc: Jeff Dike 
Cc: Mike Rapoport 
Cc: Michael Kerrisk (man-pages) 
Cc: Oleg Nesterov 
Cc: Peter Zijlstra 
Cc: Richard Weinberger 
Cc: Thomas Gleixner 

Andrei Vagin (4):
  signal: add a helper to restore a process state from sigcontex
  arch/x86: implement the process_vm_exec syscall
  arch/x86: allow to execute syscalls via process_vm_exec
  selftests: add tests for process_vm_exec

 arch/Kconfig  |  15 ++
 arch/x86/Kconfig  |   1 +
 arch/x86/entry/common.c   |  19 +++
 arch/x86/entry/syscalls/syscall_64.tbl|   1 +
 arch/x86/include/asm/sigcontext.h |   2 +
 arch/x86/kernel/Makefile  |   1 +
 arch/x86/kernel/process_vm_exec.c | 160 ++
 arch/x86/kernel/signal.c  | 125 ++
 include/linux/entry-common.h  |   2 +
 include/linux/process_vm_exec.h   |  17 ++
 include/linux/sched.h |   7 +
 include/linux/syscalls.h  |   6 +
 include/uapi/asm-generic/unistd.h |   4 +-
 include/uapi/linux/process_vm_exec.h  |   8 +
 kernel/entry/common.c |   2 +-
 kernel/fork.c |   9 +
 kernel/sys_ni.c   |   2 +
 .../selftests/process_vm_exec/Makefile|   7 +
 tools/testing/selftests/process_vm_exec/log.h |  26 +++
 .../process_vm_exec/process_vm_exec.c | 105 
 .../process_vm_exec/process_vm_exec_fault.c   | 111 
 .../process_vm_exec/process_vm_exec_syscall.c |  81 +
 .../process_vm_exec/ptrace_vm_exec.c

Re: [PATCH] riscv: locks: introduce ticket-based spinlock implementation

2021-04-13 Thread Guo Ren

On Tue, Apr 13, 2021 at 6:54 PM David Laight  wrote:
>
> From: Catalin Marinas
> > Sent: 13 April 2021 11:45
> ...
> > This indeed needs some care. IIUC RISC-V has similar restrictions as arm
> > here, no load/store instructions are allowed between LR and SC. You
> > can't guarantee that the compiler won't spill some variable onto the
> > stack.
>
> You can probably never guarantee the compiler won't spill to stack.
> Especially if someone compiles with -O0.
>
> Which probably means that anything using LR/SC must be written in
> asm and the C wrappers disabled.
Agree, and cmpxchg has been widely used in Linux. I think it's the
last requirement for complex atomic API, although cmpxchg has ABA
problem:

CPU0
  CPU1
===
==
do {
  old32 = load32;

   *ptr32 = new32_tmp;

   *ptr32 = old32;
  load32 = cmpxchg(ptr32, old32, new32); //still success
} while (load32 != old32);

That means cmpxhg only cares about the result but not the middle
situation. It's different from LR/SC or AMO instructions.

>
> David
>
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 
> 1PT, UK
> Registration No: 1397386 (Wales)
>


-- 
Best Regards
 Guo Ren

ML: https://lore.kernel.org/linux-csky/

[PATCH v7 3/9] regulator: add warning flags

Add 'warning' level events and error flags to regulator core.
Current regulator core notifications are used to inform consumers
about errors where HW is misbehaving in such way it is assumed to
be broken/unrecoverable.

There are PMICs which are designed for system(s) that may have use
for regulator indications sent before HW is damaged so that some
board/consumer specific recovery-event can be performed while
continuing most of the normal operations.

Add new WARNING level events and notifications to be used for
that purpose.

Signed-off-by: Matti Vaittinen 
---
No changes since RFC-v2
---
 include/linux/regulator/consumer.h | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/include/linux/regulator/consumer.h 
b/include/linux/regulator/consumer.h
index 20e84a84fb77..f72ca73631be 100644
--- a/include/linux/regulator/consumer.h
+++ b/include/linux/regulator/consumer.h
@@ -119,6 +119,16 @@ struct regulator_dev;
 #define REGULATOR_EVENT_PRE_DISABLE0x400
 #define REGULATOR_EVENT_ABORT_DISABLE  0x800
 #define REGULATOR_EVENT_ENABLE 0x1000
+/*
+ * Following notifications should be emitted only if detected condition
+ * is such that the HW is likely to still be working but consumers should
+ * take a recovery action to prevent problems esacalating into errors.
+ */
+#define REGULATOR_EVENT_UNDER_VOLTAGE_WARN 0x2000
+#define REGULATOR_EVENT_OVER_CURRENT_WARN  0x4000
+#define REGULATOR_EVENT_OVER_VOLTAGE_WARN  0x8000
+#define REGULATOR_EVENT_OVER_TEMP_WARN 0x1
+#define REGULATOR_EVENT_WARN_MASK  0x1E000
 
 /*
  * Regulator errors that can be queried using regulator_get_error_flags
@@ -138,6 +148,10 @@ struct regulator_dev;
 #define REGULATOR_ERROR_FAIL   BIT(4)
 #define REGULATOR_ERROR_OVER_TEMP  BIT(5)
 
+#define REGULATOR_ERROR_UNDER_VOLTAGE_WARN BIT(6)
+#define REGULATOR_ERROR_OVER_CURRENT_WARN  BIT(7)
+#define REGULATOR_ERROR_OVER_VOLTAGE_WARN  BIT(8)
+#define REGULATOR_ERROR_OVER_TEMP_WARN BIT(9)
 
 /**
  * struct pre_voltage_change_data - Data sent with PRE_VOLTAGE_CHANGE event
-- 
2.25.4


-- 
Matti Vaittinen, Linux device drivers
ROHM Semiconductors, Finland SWDC
Kiviharjunlenkki 1E
90220 OULU
FINLAND

~~~ "I don't think so," said Rene Descartes. Just then he vanished ~~~
Simon says - in Latin please.
~~~ "non cogito me" dixit Rene Descarte, deinde evanescavit ~~~
Thanks to Simon Glass for the translation =]

Re: [PATCH net-next] net: Space: remove hp100 probe

2021-04-13 Thread Arnd Bergmann

On Wed, Apr 14, 2021, 00:42 Stephen Hemminger
 wrote:
>
> On Tue, 13 Apr 2021 16:16:17 +0200 Arnd Bergmann  wrote:
>
> >   */
> >  static struct devprobe2 isa_probes[] __initdata = {
> > -#if defined(CONFIG_HP100) && defined(CONFIG_ISA) /* ISA, EISA */
> > - {hp100_probe, 0},
> > -#endif
> >  #ifdef CONFIG_3C515
> >   {tc515_probe, 0},
> >  #endif
>
> Thanks, do we even need to have the static initialization anymore?

I actually did some more cleanups after I sent the above patch when
I found out that this code still exists. It turned out that above half of
the static initializations are completely pointless because the
drivers never rely on the netdev= command line arguments and
can simply be changed to always using module_init() instead of
relying on net_olddevs_init() for the built-in case.

The remaining ones are all ISA drivers: 3c515, Ultra, WD80x3,
NE2000, Lance, SMC9194, CS89x0, NI65 and COPS.

With my cleanups, I move the netdev_boot_setup infrastructure
into drivers/net/Space.c and only compile it when at least one of
these eight drivers is enabled.

All these drivers also support being built as loadable modules, but
in that configuration they only support a single device (back in the
day you could copy the module and just load it twice to support
more than one instance, not sure we still want to support that).

None of these drivers have a maintainer listed, but I suppose
there are still some PC/104 machines with NE2000 network
cards that could theoretically run a modern kernel.

Arnd

[PATCH v7 2/9] reboot: thermal: Export hardware protection shutdown

Thermal core contains a logic for safety shutdown. System is attempted to
be powered off if temperature exceeds safety limits.

Currently this can be also utilized by regulator subsystem as a final
protection measure if PMICs report dangerous over-voltage, over-current or
over-temperature and if per regulator counter measures fail or do not
exist.

Move this logic to kernel/reboot.c and export the functionality for other
subsystems to use. Also replace the mutex with a spinlock to allow using
the function from any context.

Also the EMIF bus code has implemented a safety shut-down. EMIF does not
attempt orderly_poweroff at all. Thus the EMIF code is not converted to use
this new function.

Signed-off-by: Matti Vaittinen 
---
Changelog
 v7:
  - new patch

Please note - this patch has received only a minimal amount of testing.
(The new API call was tested to shut-down my system at driver probe but
no odd corner-cases have been tested).

Any testing for thermal shutdown is appreciated.
---
 drivers/thermal/thermal_core.c | 63 ++---
 include/linux/reboot.h |  1 +
 kernel/reboot.c| 86 ++
 3 files changed, 91 insertions(+), 59 deletions(-)

diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
index 996c038f83a4..b1444845af38 100644
--- a/drivers/thermal/thermal_core.c
+++ b/drivers/thermal/thermal_core.c
@@ -36,10 +36,8 @@ static LIST_HEAD(thermal_governor_list);
 
 static DEFINE_MUTEX(thermal_list_lock);
 static DEFINE_MUTEX(thermal_governor_lock);
-static DEFINE_MUTEX(poweroff_lock);
 
 static atomic_t in_suspend;
-static bool power_off_triggered;
 
 static struct thermal_governor *def_governor;
 
@@ -327,70 +325,18 @@ static void handle_non_critical_trips(struct 
thermal_zone_device *tz, int trip)
   def_governor->throttle(tz, trip);
 }
 
-/**
- * thermal_emergency_poweroff_func - emergency poweroff work after a known 
delay
- * @work: work_struct associated with the emergency poweroff function
- *
- * This function is called in very critical situations to force
- * a kernel poweroff after a configurable timeout value.
- */
-static void thermal_emergency_poweroff_func(struct work_struct *work)
-{
-   /*
-* We have reached here after the emergency thermal shutdown
-* Waiting period has expired. This means orderly_poweroff has
-* not been able to shut off the system for some reason.
-* Try to shut down the system immediately using kernel_power_off
-* if populated
-*/
-   WARN(1, "Attempting kernel_power_off: Temperature too high\n");
-   kernel_power_off();
-
-   /*
-* Worst of the worst case trigger emergency restart
-*/
-   WARN(1, "Attempting emergency_restart: Temperature too high\n");
-   emergency_restart();
-}
-
-static DECLARE_DELAYED_WORK(thermal_emergency_poweroff_work,
-   thermal_emergency_poweroff_func);
-
-/**
- * thermal_emergency_poweroff - Trigger an emergency system poweroff
- *
- * This may be called from any critical situation to trigger a system shutdown
- * after a known period of time. By default this is not scheduled.
- */
-static void thermal_emergency_poweroff(void)
+void thermal_zone_device_critical(struct thermal_zone_device *tz)
 {
-   int poweroff_delay_ms = CONFIG_THERMAL_EMERGENCY_POWEROFF_DELAY_MS;
/*
 * poweroff_delay_ms must be a carefully profiled positive value.
-* Its a must for thermal_emergency_poweroff_work to be scheduled
+* Its a must for forced_emergency_poweroff_work to be scheduled.
 */
-   if (poweroff_delay_ms <= 0)
-   return;
-   schedule_delayed_work(_emergency_poweroff_work,
- msecs_to_jiffies(poweroff_delay_ms));
-}
+   int poweroff_delay_ms = CONFIG_THERMAL_EMERGENCY_POWEROFF_DELAY_MS;
 
-void thermal_zone_device_critical(struct thermal_zone_device *tz)
-{
dev_emerg(>device, "%s: critical temperature reached, "
  "shutting down\n", tz->type);
 
-   mutex_lock(_lock);
-   if (!power_off_triggered) {
-   /*
-* Queue a backup emergency shutdown in the event of
-* orderly_poweroff failure
-*/
-   thermal_emergency_poweroff();
-   orderly_poweroff(true);
-   power_off_triggered = true;
-   }
-   mutex_unlock(_lock);
+   hw_protection_shutdown("Temperature too high", poweroff_delay_ms);
 }
 EXPORT_SYMBOL(thermal_zone_device_critical);
 
@@ -1549,7 +1495,6 @@ static int __init thermal_init(void)
ida_destroy(_cdev_ida);
mutex_destroy(_list_lock);
mutex_destroy(_governor_lock);
-   mutex_destroy(_lock);
return result;
 }
 postcore_initcall(thermal_init);
diff --git a/include/linux/reboot.h b/include/linux/reboot.h
index 3734cd8f38a8..af907a3d68d1 100644
---

[syzbot] KASAN: use-after-free Read in get_wchan

2021-04-13 Thread syzbot

Hello,

syzbot found the following issue on:

HEAD commit:b2b3d18f riscv: Make NUMA depend on MMU
git tree:   git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux.git 
fixes
console output: https://syzkaller.appspot.com/x/log.txt?x=12b59d16d0
kernel config:  https://syzkaller.appspot.com/x/.config?x=81b3e7c68dad6e
dashboard link: https://syzkaller.appspot.com/bug?extid=0806291048161061627c
userspace arch: riscv64

Unfortunately, I don't have any reproducer for this issue yet.

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+08062910481610616...@syzkaller.appspotmail.com

==
BUG: KASAN: use-after-free in walk_stackframe arch/riscv/kernel/stacktrace.c:60 
[inline]
BUG: KASAN: use-after-free in get_wchan+0x156/0x196 
arch/riscv/kernel/stacktrace.c:136
Read of size 8 at addr ffe0058e3d90 by task syz-executor.0/4667

CPU: 1 PID: 4667 Comm: syz-executor.0 Not tainted 
5.12.0-rc5-syzkaller-00721-gb2b3d18fc20e #0
Hardware name: riscv-virtio,qemu (DT)
Call Trace:
[] walk_stackframe+0x0/0x23c arch/riscv/kernel/traps.c:201
[] dump_backtrace+0x40/0x4e arch/riscv/kernel/stacktrace.c:113
[] show_stack+0x22/0x2e arch/riscv/kernel/stacktrace.c:118
[] __dump_stack lib/dump_stack.c:79 [inline]
[] dump_stack+0x148/0x1d8 lib/dump_stack.c:120
[] print_address_description.constprop.0+0x52/0x31e 
mm/kasan/report.c:232
[] __kasan_report mm/kasan/report.c:399 [inline]
[] kasan_report+0x16e/0x18c mm/kasan/report.c:416
[] check_region_inline mm/kasan/generic.c:180 [inline]
[] __asan_load8+0x6e/0x80 mm/kasan/generic.c:253
[] walk_stackframe arch/riscv/kernel/stacktrace.c:60 [inline]
[] get_wchan+0x156/0x196 arch/riscv/kernel/stacktrace.c:136
[] proc_pid_wchan+0x48/0xa4 fs/proc/base.c:390
[] proc_single_show+0x9c/0x13c fs/proc/base.c:774
[] seq_read_iter+0x2e0/0x8f2 fs/seq_file.c:227
[] seq_read+0x200/0x298 fs/seq_file.c:159
[] vfs_read+0x108/0x2ac fs/read_write.c:494
[] ksys_read+0xb4/0x1b8 fs/read_write.c:634
[] __do_sys_read fs/read_write.c:644 [inline]
[] sys_read+0x28/0x36 fs/read_write.c:642
[] ret_from_syscall+0x0/0x2

The buggy address belongs to the page:
page:ffcf0216b8c0 refcount:0 mapcount:0 mapping: index:0x0 
pfn:0x85ae3
flags: 0xffe()
raw: 0ffe ffcf0216b8c8 ffcf0216b8c8 
raw:    
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffe0058e3c80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
 ffe0058e3d00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>ffe0058e3d80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
 ^
 ffe0058e3e00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
 ffe0058e3e80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
==


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkal...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

[PATCH v7 1/9] dt_bindings: Add protection limit properties

Support specifying protection/error/warning limits for regulator
over current, over temperature and over/under voltage.

Most of the PMICs support only "protection" feature but few
setups do also support error/warning level indications.

On many ICs most of the protection limits can't actually be set.
But for example the ampere limit for over-current protection on ROHM
BD9576 can be configured - or feature can be completely disabled.

Provide limit setting for all protections/errors for the sake of
the completeness and do that using own properties for all so that
not all users would need to set all levels when only one or few are
supported.

Signed-off-by: Matti Vaittinen 
Reviewed-by: Rob Herring 

---
No changes since RFC-v2
---
 .../bindings/regulator/regulator.yaml | 82 +++
 1 file changed, 82 insertions(+)

diff --git a/Documentation/devicetree/bindings/regulator/regulator.yaml 
b/Documentation/devicetree/bindings/regulator/regulator.yaml
index 6d0bc9cd4040..a6ae9ecae5cc 100644
--- a/Documentation/devicetree/bindings/regulator/regulator.yaml
+++ b/Documentation/devicetree/bindings/regulator/regulator.yaml
@@ -117,6 +117,88 @@ properties:
 description: Enable over current protection.
 type: boolean
 
+  regulator-oc-protection-microamp:
+description: Set over current protection limit. This is a limit where
+  hardware performs emergency shutdown. Zero can be passed to disable
+  protection and value '1' indicates that protection should be enabled but
+  limit setting can be omitted.
+
+  regulator-oc-error-microamp:
+description: Set over current error limit. This is a limit where part of
+  the hardware propably is malfunctional and damage prevention is 
requested.
+  Zero can be passed to disable error detection and value '1' indicates
+  that detection should be enabled but limit setting can be omitted.
+
+  regulator-oc-warn-microamp:
+description: Set over current warning limit. This is a limit where hardware
+  is assumed still to be functional but approaching limit where it gets
+  damaged. Recovery actions should be initiated. Zero can be passed to
+  disable detection and value '1' indicates that detection should
+  be enabled but limit setting can be omitted.
+
+  regulator-ov-protection-microvolt:
+description: Set over voltage protection limit. This is a limit where
+  hardware performs emergency shutdown. Zero can be passed to disable
+  protection and value '1' indicates that protection should be enabled but
+  limit setting can be omitted. Limit is given as microvolt offset from
+  voltage set to regulator.
+
+  regulator-ov-error-microvolt:
+description: Set over voltage error limit. This is a limit where part of
+  the hardware propably is malfunctional and damage prevention is requested
+  Zero can be passed to disable error detection and value '1' indicates
+  that detection should be enabled but limit setting can be omitted. Limit
+  is given as microvolt offset from voltage set to regulator.
+
+  regulator-ov-warn-microvolt:
+description: Set over voltage warning limit. This is a limit where hardware
+  is assumed still to be functional but approaching limit where it gets
+  damaged. Recovery actions should be initiated. Zero can be passed to
+  disable detection and value '1' indicates that detection should
+  be enabled but limit setting can be omitted. Limit is given as microvolt
+  offset from voltage set to regulator.
+
+  regulator-uv-protection-microvolt:
+description: Set over under voltage protection limit. This is a limit where
+  hardware performs emergency shutdown. Zero can be passed to disable
+  protection and value '1' indicates that protection should be enabled but
+  limit setting can be omitted. Limit is given as microvolt offset from
+  voltage set to regulator.
+
+  regulator-uv-error-microvolt:
+description: Set under voltage error limit. This is a limit where part of
+  the hardware propably is malfunctional and damage prevention is requested
+  Zero can be passed to disable error detection and value '1' indicates
+  that detection should be enabled but limit setting can be omitted. Limit
+  is given as microvolt offset from voltage set to regulator.
+
+  regulator-uv-warn-microvolt:
+description: Set over under voltage warning limit. This is a limit where
+  hardware is assumed still to be functional but approaching limit where
+  it gets damaged. Recovery actions should be initiated. Zero can be passed
+  to disable detection and value '1' indicates that detection should
+  be enabled but limit setting can be omitted. Limit is given as microvolt
+  offset from voltage set to regulator.
+
+  regulator-temp-protection-kelvin:
+description: Set over temperature protection limit. This is a limit where
+  hardware performs emergency shutdown.

Re: [PATCH 1/2] kunit: Fix formatting of KUNIT tests to meet the standard

2021-04-13 Thread Greg KH

On Wed, Apr 14, 2021 at 12:33:02AM -0400, Nico Pache wrote:
> There are few instances of KUNIT tests that are not properly defined.
> This commit focuses on correcting these issues to match the standard
> defined in the Documentation.
> 
> Issues Fixed:
>  - Tests should default to KUNIT_ALL_TESTS
>  - Tests configs tristate should have `if !KUNIT_ALL_TESTS`
>  - Tests should end in KUNIT_TEST, some fixes have been applied to
> correct issues were KUNIT_TESTS is used or KUNIT is not mentioned.

You are changing lots of different things all in one patch, please break
this up into "one config option change per patch" to make it easier to
review and get merged through the proper subsystem.

thanks,

greg k-h

[PATCH v7 0/9] Extend regulator notification support

Extend regulator notification support

This series extends the regulator notification and error flag support.
Initial discussion on the topic can be found here:
https://lore.kernel.org/lkml/6046836e22b8252983f08d5621c35ececb97820d.ca...@fi.rohmeurope.com/

This series is built on top of the BD9576MUF support patch series v9
which is currently in MFD tree at immutable branch ib-mfd-watchdog-5.13
https://lore.kernel.org/lkml/cover.1615219345.git.matti.vaitti...@fi.rohmeurope.com/
(The series should apply without those patches but there is compile time
dependency to definitions brought in at the last patch of the BD9576
series. This should be Ok though as there is a Kconfig dependency in
BD9576 regulator driver)

In a nutshell - the series adds:

1. WARNING level events/error flags. (Patch 3)
Current regulator 'ERROR' event notifications for over/under
voltage, over current and over temperature are used to indicate
condition where monitored entity is so badly "off" that it actually
indicates a hardware error which can not be recovered. The most
typical hanling for that is believed to be a (graceful)
system-shutdown. Here we add set of 'WARNING' level flags to allow
sending notifications to consumers before things are 'that badly off'
so that consumer drivers can implement recovery-actions.
2. Device-tree properties for specifying limit values. (Patches 1, 5)
Add limits for above mentioned 'ERROR' and 'WARNING' levels (which
send notifications to consumers) and also for a 'PROTECTION' level
(which will be used to immediately shut-down the regulator(s) W/O
informing consumer drivers. Typically implemented by hardware).
Property parsing is implemented in regulator core which then calls
callback operations for limit setting from the IC drivers. A
warning is emitted if protection is requested by device tree but the
underlying IC does not support configuring requested protection.
3. Helpers which can be registered by IC. (Patch 4)
Target is to avoid implementing IRQ handling and IRQ storm protection
in each IC driver. (Many of the ICs implementin these IRQs do not allow
masking or acking the IRQ but keep the IRQ asserted for the whole
duration of problem keeping the processor in IRQ handling loop).
4. Emergency poweroff function (refactored out of the thermal_core to
kernel/reboot.c) which is called if IC fires error IRQs but IC reading
fails and given retry-count is exceeded. (Patches 2, 4)
Please note that the mutex in the emergency shutdown was replaced by a
spinlock to allow calls from any context.

The helper was attempted to be done so it could be used to implement
roughly same logic as is used in qcom-labibb regulator. This means
amongst other things a safety shut-down if IC registers are not readable.
Using these shut-down retry counters are optional. The idea is that the
helper could be also used by simpler ICs which do not provide status
register(s) which can be used to check if error is still active.

ICs which do not have such status register can simply omit the 'renable'
callback (and retry-counts etc) - and helper assumes the situation is Ok
and re-enables IRQ after given time period. If problem persists the
handler is ran again and another notification is sent - but at least the
delay allows processor to avoid IRQ loop.

Patch 7 takes this notification support in use at BD9576MUF.
Patch 8 is related to MFD change which is not really related to the RFC
here. It was added to this series in order to avoid potential conflicts.
Patch 9 adds a maintainers entry.

Changelog v7:
general:
- rebased on v5.12-rc7
- new patch for refactoring the hw-failure reboot logic out of
thermal_core.c for others to use.
notification helpers:
- fix regulator error_flags query
- grammar/typos
- do not BUG() but attempt to shut-down the system
- use BITS_PER_TYPE()

Changelog v6:
Add MAINTAINERS entry
Changes to IRQ notifiers
- move devm functions to drivers/regulator/devres.c
- drop irq validity check
- use devm_add_action_or_reset()
- fix styling issues
- fix kerneldocs

Changelog v5:
- Fix the badly formatted pr_emerg() call.

Changelog v4:
- rebased on v5.12-rc6
- dropped RFC
- fix external FET DT-binding.
- improve prints for cases when expecting HW failure.
- styling and typos

Changelog v3:
Regulator core:
- Fix dangling pointer access at regulator_irq_helper()
stpmic1_regulator:
- fix function prototype (compile error)
bd9576-regulator:
- Update over current limits to what was given in new data-sheet
(REV00K)
- Allow over-current monitoring without external FET. Set limits to
values given in data-sheet (REV00K).

Changelog v2:
Generic:
- rebase on v5.12-rc2 + BD9576 series
- Split devm variant of delayed wq to own series
Regulator framework:
- Provide non devm variant of IRQ notification helpers
- shorten dt-property names as suggested by Rob
- unconditionally call

Re: [PATCH 1/2] kunit: Fix formatting of KUNIT tests to meet the standard

2021-04-13 Thread Greg KH

On Wed, Apr 14, 2021 at 12:33:02AM -0400, Nico Pache wrote:
> There are few instances of KUNIT tests that are not properly defined.
> This commit focuses on correcting these issues to match the standard
> defined in the Documentation.
> 
> Issues Fixed:
>  - Tests should default to KUNIT_ALL_TESTS
>  - Tests configs tristate should have `if !KUNIT_ALL_TESTS`
>  - Tests should end in KUNIT_TEST, some fixes have been applied to
> correct issues were KUNIT_TESTS is used or KUNIT is not mentioned.
> 
> No functional changes other than CONFIG name changes
> Signed-off-by: Nico Pache 

Please put a blank line before your signed-off-by, as is required by the
tools :(

thanks,

greg k-h

Re: [PATCH 3/4] mce/copyin: fix to not SIGBUS when copying from user hits poison

2021-04-13 Thread Jue Wang

On Tue, 13 Apr 2021 12:07:22 +0200, Petkov, Borislav wrote:

>> KVM apparently passes a machine check into the guest.

> Ah, there it is:

> static void kvm_send_hwpoison_signal(unsigned long address, struct 
> task_struct *tsk)
> {
> send_sig_mceerr(BUS_MCEERR_AR, (void __user *)address, PAGE_SHIFT, 
> tsk);
> }

This path is when EPT #PF finds accesses to a hwpoisoned page and
sends SIGBUS to user space (KVM exits into user space) with the same
semantic as if regular #PF found access to a hwpoisoned page.

The KVM_X86_SET_MCE ioctl actually injects a machine check into the guest.

We are in process to launch a product with MCE recovery capability in
a KVM based virtualization product and plan to expand the scope of the
application of it in the near future.

> So what I'm missing with all this fun is, yeah, sure, we have this
> facility out there but who's using it? Is anyone even using it at all?

The in-memory database and analytical domain are definitely using it.
A couple examples:
SAP HANA - as we've tested and planned to launch as a strategic
enterprise use case with MCE recovery capability in our product
SQL server - 
https://support.microsoft.com/en-us/help/2967651/inf-sql-server-may-display-memory-corruption-and-recovery-errors

Cheers,
-Jue

Re: [syzbot] unexpected kernel reboot (4)

2021-04-13 Thread Dmitry Vyukov

On Tue, Apr 13, 2021 at 11:27 PM syzbot
 wrote:
>
> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit:89698bec Merge tag 'm68knommu-for-v5.12-rc7' of git://git...
> git tree:   upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=1243fcfed0
> kernel config:  https://syzkaller.appspot.com/x/.config?x=b234ddbbe2953747
> dashboard link: https://syzkaller.appspot.com/bug?extid=9ce030d4c89856b27619
> syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=173e92fed0
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=1735da2ed0
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+9ce030d4c89856b27...@syzkaller.appspotmail.com
>
> output_len: 0x0e74eb68
> kernel_total_size: 0x0f226000
> needed_size: 0x0f40
> trampoline_32bit: 0x0009d000
> Decompressing Linux... Parsing ELF... done.
> Booting the kernel.

+linux-input

The reproducer connects some USB HID device and communicates with the driver.
Previously we observed reboots because HID devices can trigger reboot
SYSRQ, but we disable it with "CONFIG_MAGIC_SYSRQ is not set".
How else can a USB device reboot the machine? Is it possible to disable it?
I don't see any direct includes of  in drivers/usb/*

r0 = syz_usb_connect$hid(0x0, 0x36,
&(0x7f00)=ANY=[@ANYBLOB="12010e402609411b000109022400010904010301092110012201000905810308"],
0x0)
syz_usb_control_io(r0, 0x0, 0x0)
syz_usb_control_io$hid(r0, &(0x7f001440)={0x24, 0x0, 0x0,
&(0x7f40)={0x0, 0x22, 0x1, {[@local]}}, 0x0}, 0x0)
syz_usb_ep_write(r0, 0x0, 0xfd,
&(0x7f000b80)="34981a23c3490d163907e65ff758478e74cd7dc073700ebf655f1ce3394018ade882075917a36a30ad3594f98282ea729f3620534fd655c69ebec66aa7397e843ee79879e825e6a31a189616c611912dee259ab9d8ff1566c90ae8985ec380bcab6b8265695f7b76654377adab6b1930de1f44060021f50f1dd3fff126f862f378ef2deb2d4331b9bcb3f394062133b4bb44a7f168473f7ca3d9945bfb4c456b22428a7a11d5d7df1fcc4f7ffad0e526d34321fb6aedfb5dd4fc6797cba2cf45369daea9f0953bf1a8343aa7548f3f9817c6a1bedde9dcaa4b8eed4a493828384fc9ccb7d230967ea0cb2003076ac7d9f386a5fbaec392")



> ---
> This report is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at syzkal...@googlegroups.com.
>
> syzbot will keep track of this issue. See:
> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
> syzbot can test patches for this issue, for details see:
> https://goo.gl/tpsmEJ#testing-patches

Re: [PATCH 1/5] mm/swapfile: add percpu_ref support for swap

Dennis Zhou  writes:

> On Wed, Apr 14, 2021 at 11:59:03AM +0800, Huang, Ying wrote:
>> Dennis Zhou  writes:
>> 
>> > Hello,
>> >
>> > On Wed, Apr 14, 2021 at 10:06:48AM +0800, Huang, Ying wrote:
>> >> Miaohe Lin  writes:
>> >> 
>> >> > On 2021/4/14 9:17, Huang, Ying wrote:
>> >> >> Miaohe Lin  writes:
>> >> >> 
>> >> >>> On 2021/4/12 15:24, Huang, Ying wrote:
>> >>  "Huang, Ying"  writes:
>> >> 
>> >> > Miaohe Lin  writes:
>> >> >
>> >> >> We will use percpu-refcount to serialize against concurrent 
>> >> >> swapoff. This
>> >> >> patch adds the percpu_ref support for later fixup.
>> >> >>
>> >> >> Signed-off-by: Miaohe Lin 
>> >> >> ---
>> >> >>  include/linux/swap.h |  2 ++
>> >> >>  mm/swapfile.c| 25 ++---
>> >> >>  2 files changed, 24 insertions(+), 3 deletions(-)
>> >> >>
>> >> >> diff --git a/include/linux/swap.h b/include/linux/swap.h
>> >> >> index 144727041e78..849ba5265c11 100644
>> >> >> --- a/include/linux/swap.h
>> >> >> +++ b/include/linux/swap.h
>> >> >> @@ -240,6 +240,7 @@ struct swap_cluster_list {
>> >> >>   * The in-memory structure used to track swap areas.
>> >> >>   */
>> >> >>  struct swap_info_struct {
>> >> >> +  struct percpu_ref users;/* serialization against 
>> >> >> concurrent swapoff */
>> >> >>unsigned long   flags;  /* SWP_USED etc: see above */
>> >> >>signed shortprio;   /* swap priority of this type */
>> >> >>struct plist_node list; /* entry in swap_active_head */
>> >> >> @@ -260,6 +261,7 @@ struct swap_info_struct {
>> >> >>struct block_device *bdev;  /* swap device or bdev of swap 
>> >> >> file */
>> >> >>struct file *swap_file; /* seldom referenced */
>> >> >>unsigned int old_block_size;/* seldom referenced */
>> >> >> +  struct completion comp; /* seldom referenced */
>> >> >>  #ifdef CONFIG_FRONTSWAP
>> >> >>unsigned long *frontswap_map;   /* frontswap in-use, one bit 
>> >> >> per page */
>> >> >>atomic_t frontswap_pages;   /* frontswap pages in-use 
>> >> >> counter */
>> >> >> diff --git a/mm/swapfile.c b/mm/swapfile.c
>> >> >> index 149e77454e3c..724173cd7d0c 100644
>> >> >> --- a/mm/swapfile.c
>> >> >> +++ b/mm/swapfile.c
>> >> >> @@ -39,6 +39,7 @@
>> >> >>  #include 
>> >> >>  #include 
>> >> >>  #include 
>> >> >> +#include 
>> >> >>  
>> >> >>  #include 
>> >> >>  #include 
>> >> >> @@ -511,6 +512,15 @@ static void swap_discard_work(struct 
>> >> >> work_struct *work)
>> >> >>spin_unlock(>lock);
>> >> >>  }
>> >> >>  
>> >> >> +static void swap_users_ref_free(struct percpu_ref *ref)
>> >> >> +{
>> >> >> +  struct swap_info_struct *si;
>> >> >> +
>> >> >> +  si = container_of(ref, struct swap_info_struct, users);
>> >> >> +  complete(>comp);
>> >> >> +  percpu_ref_exit(>users);
>> >> >
>> >> > Because percpu_ref_exit() is used, we cannot use 
>> >> > percpu_ref_tryget() in
>> >> > get_swap_device(), better to add comments there.
>> >> 
>> >>  I just noticed that the comments of percpu_ref_tryget_live() says,
>> >> 
>> >>   * This function is safe to call as long as @ref is between init and 
>> >>  exit.
>> >> 
>> >>  While we need to call get_swap_device() almost at any time, so it's
>> >>  better to avoid to call percpu_ref_exit() at all.  This will waste 
>> >>  some
>> >>  memory, but we need to follow the API definition to avoid potential
>> >>  issues in the long term.
>> >> >>>
>> >> >>> I have to admit that I'am not really familiar with percpu_ref. So I 
>> >> >>> read the
>> >> >>> implementation code of the percpu_ref and found 
>> >> >>> percpu_ref_tryget_live() could
>> >> >>> be called after exit now. But you're right we need to follow the API 
>> >> >>> definition
>> >> >>> to avoid potential issues in the long term.
>> >> >>>
>> >> 
>> >>  And we need to call percpu_ref_init() before insert the 
>> >>  swap_info_struct
>> >>  into the swap_info[].
>> >> >>>
>> >> >>> If we remove the call to percpu_ref_exit(), we should not use 
>> >> >>> percpu_ref_init()
>> >> >>> here because *percpu_ref->data is assumed to be NULL* in 
>> >> >>> percpu_ref_init() while
>> >> >>> this is not the case as we do not call percpu_ref_exit(). Maybe 
>> >> >>> percpu_ref_reinit()
>> >> >>> or percpu_ref_resurrect() will do the work.
>> >> >>>
>> >> >>> One more thing, how could I distinguish the killed percpu_ref from 
>> >> >>> newly allocated one?
>> >> >>> It seems percpu_ref_is_dying is only safe to call when @ref is 
>> >> >>> between init and exit.
>> >> >>> Maybe I could do this in alloc_swap_info()?
>> >> >> 
>> >> >> Yes.  In alloc_swap_info(), you can distinguish newly allocated and
>> >> >> reused

Re: [PATCH v7 2/3] dt-bindings: touchscreen: Add HY46XX bindings

2021-04-13 Thread Dmitry Torokhov

On Tue, Apr 13, 2021 at 04:44:45PM +0200, Giulio Benetti wrote:
> This adds device tree bindings for the Hycon HY46XX touchscreen series.
> 
> Signed-off-by: Giulio Benetti 

Applied, thank you.

-- 
Dmitry

Re: [PATCH v7 1/3] dt-bindings: Add Hycon Technology vendor prefix

2021-04-13 Thread Dmitry Torokhov

On Tue, Apr 13, 2021 at 04:44:44PM +0200, Giulio Benetti wrote:
> Update Documentation/devicetree/bindings/vendor-prefixes.yaml to
> include "hycon" as a vendor prefix for "Hycon Technology".
> Company website: https://www.hycontek.com/
> 
> Signed-off-by: Giulio Benetti 
> Reviewed-by: Jonathan Neuschäfer 
> Acked-by: Rob Herring 

Applied, thank you.

-- 
Dmitry

Re: [PATCH v7 3/3] Input: add driver for the Hycon HY46XX touchpanel series

2021-04-13 Thread Dmitry Torokhov

Hi Giulio,

On Tue, Apr 13, 2021 at 04:44:46PM +0200, Giulio Benetti wrote:
> +
> + input_mt_report_pointer_emulation(tsdata->input, true);

For touchscreens it does not make much sense to report BTN_DOUBLETAP,
BTN_TRIPLETAP, etc, events (they are really for touchpads), so I changed
this to

input_mt_report_pointer_emulation(tsdata->input, false);

to only report ABS_X, ABS_Y, and BTN_TOUCH, and applied.

Thanks.

-- 
Dmitry

Re: [PATCH 5.4 v3 1/1] iommu/vt-d: Fix agaw for a supported 48 bit guest address width

2021-04-13 Thread Greg KH

On Tue, Apr 13, 2021 at 11:05:34AM -0700, Saeed Mirzamohammadi wrote:
> Hi Greg,
> 
> I don’t have any commit ID since the fix is not in mainline or any Linus’ 
> tree yet. The driver has completely changed for newer stable versions (and 
> also mainline) and the fix only applies for 5.4, 4.19, and 4.14 stable 
> kernels.

Why can we not just take what is in mainline?

And if not, then you need to document the heck out of this in the
changelog text, and get all of the related maintainers in the area to
sign off on this.  Diverging from Linus's tree creates a big burden over
time, you have to make this really really obvious why you are doing
this, and what you are doing here so that everyone agrees with it.

Remember, 90% of all of these types of "do it differently than Linus's
tree" are buggy and cause problems, be very careful.

Please fix up and resend.

thanks,

greg k-h

Re: [PATCH net-next v2 1/2] of: net: pass the dst buffer to of_get_mac_address()

Hi Michael,

url:
https://github.com/0day-ci/linux/commits/Michael-Walle/of-net-support-non-platform-devices-in-of_get_mac_address/20210406-234030
base:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git 
cc0626c2aaed8e475efdd85fa374b497a7192e35
config: x86_64-randconfig-m001-20210406 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 
Reported-by: Dan Carpenter 

smatch warnings:
drivers/net/ethernet/xilinx/xilinx_axienet_main.c:2069 axienet_probe() warn: 
passing a valid pointer to 'PTR_ERR'

vim +/PTR_ERR +2069 drivers/net/ethernet/xilinx/xilinx_axienet_main.c

522856cefaf09d Robert Hancock  2019-06-06  2060 /* Check for Ethernet 
core IRQ (optional) */
522856cefaf09d Robert Hancock  2019-06-06  2061 if (lp->eth_irq <= 0)
522856cefaf09d Robert Hancock  2019-06-06  2062 
dev_info(>dev, "Ethernet core IRQ not defined\n");
522856cefaf09d Robert Hancock  2019-06-06  2063  
8a3b7a252dca9f Daniel Borkmann 2012-01-19  2064 /* Retrieve the MAC 
address */
411b125c6ace1f Michael Walle   2021-04-06  2065 ret = 
of_get_mac_address(pdev->dev.of_node, mac_addr);
411b125c6ace1f Michael Walle   2021-04-06  2066 if (!ret) {
411b125c6ace1f Michael Walle   2021-04-06  2067 
axienet_set_mac_address(ndev, mac_addr);
411b125c6ace1f Michael Walle   2021-04-06  2068 } else {
d05a9ed5c3a773 Robert Hancock  2019-06-06 @2069 
dev_warn(>dev, "could not find MAC address property: %ld\n",
d05a9ed5c3a773 Robert Hancock  2019-06-06  2070  
PTR_ERR(mac_addr));
 
^
This should print "ret".

411b125c6ace1f Michael Walle   2021-04-06  2071 
axienet_set_mac_address(ndev, NULL);
8a3b7a252dca9f Daniel Borkmann 2012-01-19  2072 }
8a3b7a252dca9f Daniel Borkmann 2012-01-19  2073  
8a3b7a252dca9f Daniel Borkmann 2012-01-19  2074 lp->coalesce_count_rx = 
XAXIDMA_DFT_RX_THRESHOLD;
8a3b7a252dca9f Daniel Borkmann 2012-01-19  2075 lp->coalesce_count_tx = 
XAXIDMA_DFT_TX_THRESHOLD;

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


.config.gz
Description: application/gzip

Re: [PATCH][next] iommu/vt-d: Fix out-bounds-warning in intel_svm_page_response()

2021-04-13 Thread Lu Baolu


Hi Gustavo,

On 4/14/21 3:54 AM, Gustavo A. R. Silva wrote:

Replace call to memcpy() with just a couple of simple assignments in
order to fix the following out-of-bounds warning:

drivers/iommu/intel/svm.c:1198:4: warning: 'memcpy' offset [25, 32] from the 
object at 'desc' is out of the bounds of referenced subobject 'qw2' with type 
'long long unsigned int' at offset 16 [-Warray-bounds]

The problem is that the original code is trying to copy data into a
couple of struct members adjacent to each other in a single call to
memcpy(). This causes a legitimate compiler warning because memcpy()
overruns the length of 

This helps with the ongoing efforts to globally enable -Warray-bounds
and get us closer to being able to tighten the FORTIFY_SOURCE routines
on memcpy().

Link: https://github.com/KSPP/linux/issues/109
Signed-off-by: Gustavo A. R. Silva 
---
  drivers/iommu/intel/svm.c | 7 ---
  1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 5165cea90421..65909f504c50 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -1194,9 +1194,10 @@ int intel_svm_page_response(struct device *dev,
desc.qw1 = QI_PGRP_IDX(prm->grpid) | QI_PGRP_LPIG(last_page);
desc.qw2 = 0;
desc.qw3 = 0;
-   if (private_present)
-   memcpy(, prm->private_data,
-  sizeof(prm->private_data));


The same memcpy() is used in multiple places in this file. Did they
compile the same warnings? Or there are multiple patches to fix them
one by one?

Best regards,
baolu


+   if (private_present) {
+   desc.qw2 = prm->private_data[0];
+   desc.qw3 = prm->private_data[1];
+   }
  
  		qi_submit_sync(iommu, , 1, 0);

}

Re: Question on KASAN calltrace record in RT

2021-04-13 Thread Dmitry Vyukov

On Wed, Apr 14, 2021 at 6:00 AM Mike Galbraith  wrote:
>
> On Tue, 2021-04-13 at 17:29 +0200, Dmitry Vyukov wrote:
> > On Tue, Apr 6, 2021 at 10:26 AM Zhang, Qiang  
> > wrote:
> > >
> > > Hello everyone
> > >
> > > In RT system,   after  Andrew test,   found the following calltrace ,
> > > in KASAN, we record callstack through stack_depot_save(), in this 
> > > function, may be call alloc_pages,  but in RT, the spin_lock replace with
> > > rt_mutex in alloc_pages(), if before call this function, the irq is 
> > > disabled,
> > > will trigger following calltrace.
> > >
> > > maybe  add array[KASAN_STACK_DEPTH] in struct kasan_track to record 
> > > callstack  in RT system.
> > >
> > > Is there a better solution ？
> >
> > Hi Qiang,
> >
> > Adding 2 full stacks per heap object can increase memory usage too much.
> > The stackdepot has a preallocation mechanism, I would start with
> > adding interrupts check here:
> > https://elixir.bootlin.com/linux/v5.12-rc7/source/lib/stackdepot.c#L294
> > and just not do preallocation in interrupt context. This will solve
> > the problem, right?
>
> Hm, this thing might actually be (sorta?) working, modulo one startup
> gripe.  The CRASH_DUMP inspired gripe I get with !RT appeared (and shut
> up when told I don't care given kdump has worked just fine for ages:),
> but no more might_sleep() gripeage.
>
>
> CONFIG_KASAN_SHADOW_OFFSET=0xdc00
> CONFIG_HAVE_ARCH_KASAN=y
> CONFIG_HAVE_ARCH_KASAN_VMALLOC=y
> CONFIG_CC_HAS_KASAN_GENERIC=y
> CONFIG_KASAN=y
> CONFIG_KASAN_GENERIC=y
> CONFIG_KASAN_OUTLINE=y
> # CONFIG_KASAN_INLINE is not set
> CONFIG_KASAN_STACK=1
> CONFIG_KASAN_VMALLOC=y
> # CONFIG_KASAN_MODULE_TEST is not set
>
> ---
>  lib/stackdepot.c |   10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
>
> --- a/lib/stackdepot.c
> +++ b/lib/stackdepot.c
> @@ -71,7 +71,7 @@ static void *stack_slabs[STACK_ALLOC_MAX
>  static int depot_index;
>  static int next_slab_inited;
>  static size_t depot_offset;
> -static DEFINE_SPINLOCK(depot_lock);
> +static DEFINE_RAW_SPINLOCK(depot_lock);
>
>  static bool init_stack_slab(void **prealloc)
>  {
> @@ -265,7 +265,7 @@ depot_stack_handle_t stack_depot_save(un
> struct page *page = NULL;
> void *prealloc = NULL;
> unsigned long flags;
> -   u32 hash;
> +   u32 hash, may_prealloc = !IS_ENABLED(CONFIG_PREEMPT_RT) || 
> preemptible();
>
> if (unlikely(nr_entries == 0) || stack_depot_disable)
> goto fast_exit;
> @@ -291,7 +291,7 @@ depot_stack_handle_t stack_depot_save(un
>  * The smp_load_acquire() here pairs with smp_store_release() to
>  * |next_slab_inited| in depot_alloc_stack() and init_stack_slab().
>  */
> -   if (unlikely(!smp_load_acquire(_slab_inited))) {
> +   if (unlikely(!smp_load_acquire(_slab_inited) && may_prealloc)) {
> /*
>  * Zero out zone modifiers, as we don't have specific zone
>  * requirements. Keep the flags related to allocation in 
> atomic
> @@ -305,7 +305,7 @@ depot_stack_handle_t stack_depot_save(un
> prealloc = page_address(page);
> }
>
> -   spin_lock_irqsave(_lock, flags);
> +   raw_spin_lock_irqsave(_lock, flags);
>
> found = find_stack(*bucket, entries, nr_entries, hash);
> if (!found) {
> @@ -329,7 +329,7 @@ depot_stack_handle_t stack_depot_save(un
> WARN_ON(!init_stack_slab());
> }
>
> -   spin_unlock_irqrestore(_lock, flags);
> +   raw_spin_unlock_irqrestore(_lock, flags);
>  exit:
> if (prealloc) {
> /* Nobody used this memory, ok to free it. */
>
> [0.692437] BUG: sleeping function called from invalid context at 
> kernel/locking/rtmutex.c:943
> [0.692439] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1, 
> name: swapper/0
> [0.692442] Preemption disabled at:
> [0.692443] [] on_each_cpu_cond_mask+0x30/0xb0
> [0.692451] CPU: 5 PID: 1 Comm: swapper/0 Not tainted 
> 5.12.0.g2afefec-tip-rt #5
> [0.692454] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 
> 09/23/2013
> [0.692456] Call Trace:
> [0.692458]  ? on_each_cpu_cond_mask+0x30/0xb0
> [0.692462]  dump_stack+0x8a/0xb5
> [0.692467]  ___might_sleep.cold+0xfe/0x112
> [0.692471]  rt_spin_lock+0x1c/0x60

HI Mike,

If freeing pages from smp_call_function is not OK, then perhaps we
need just to collect the objects to be freed to the task/CPU that
executes kasan_quarantine_remove_cache and it will free them (we know
it can free objects).

> [0.692475]  free_unref_page+0x117/0x3c0
> [0.692481]  qlist_free_all+0x60/0xd0
> [0.692485]  per_cpu_remove_cache+0x5b/0x70
> [0.692488]  smp_call_function_many_cond+0x185/0x3d0
> [0.692492]  ? qlist_move_cache+0xe0/0xe0
> [0.692495]  ? qlist_move_cache+0xe0/0xe0
> [0.692497]  on_each_cpu_cond_mask+0x44/0xb0
> [0.692501]  kasan_quarantine_remove_cache+0x52/0xf0

Re: [PATCH] x86: Accelerate copy_page with non-temporal in X86

2021-04-13 Thread Kemeng Shi




on 2021/4/13 22:53, Borislav Petkov wrote:
> I thought "should be better" too last time when I measured rep; movs vs
> NT stores but actual measurements showed no real difference.
Mabye the NT stores make difference when store to slow dimms, like the
persistent memory I just tested. Also, it likely reduces unnecessary cache
load and flush, and benifits the running processes which have data cached.

-- 
Best wishes
Kemeng Shi

Re: [PATCH v2 3/4] powerpc: Rename probe_kernel_read_inst()

2021-04-13 Thread Aneesh Kumar K.V

Christophe Leroy  writes:

> When probe_kernel_read_inst() was created, it was to mimic
> probe_kernel_read() function.
>
> Since then, probe_kernel_read() has been renamed
> copy_from_kernel_nofault().
>
> Rename probe_kernel_read_inst() into copy_from_kernel_nofault_inst().

At first glance I read it as copy from kernel nofault instruction.
How about copy_inst_from_kernel_nofault()? 

-aneesh

Re: [Outreachy kernel] [PATCH] :staging: rtl8723bs: Remove useless led_blink_hdl()

On Tue, Apr 13, 2021 at 10:08:32PM +0200, Fabio M. De Francesco wrote:
> On Tuesday, April 13, 2021 9:48:44 PM CEST Matthew Wilcox wrote:
> > On Tue, Apr 13, 2021 at 09:45:03PM +0200, Fabio M. De Francesco wrote:
> > > 1) The driver doesn't call that function from anywhere else than the
> > > macro. 2) You have explained that the macro add its symbol to a slot
> > > in an array that would shift all the subsequent elements down if that
> > > macro is not used exactly in the line where it is.
> > > 3) Dan Carpenter said that that array is full of null functions (or
> > > empty slots?).
> > > 
> > > Unless that function is called anonymously dereferencing its address
> > > from the position it occupies in the array, I'm not able to see what
> > > else means can any caller use.
> > > 
> > > I know I have much less experience than you with C: what can go wrong?
> > 
> > Here's where the driver calls that function:
> > 
> > $ git grep wlancmds drivers/staging/rtl8723bs/
> > drivers/staging/rtl8723bs/core/rtw_cmd.c:static struct cmd_hdl wlancmds[]
> > = { drivers/staging/rtl8723bs/core/rtw_cmd.c:   if
> > (pcmd->cmdcode < ARRAY_SIZE(wlancmds)) {
> > drivers/staging/rtl8723bs/core/rtw_cmd.c:   cmd_hdl
> > = wlancmds[pcmd->cmdcode].h2cfuns;
> >
> OK, I had imagined an anonymous call from its location in the array (as I 
> wrote in the last phrase of my message). However, I thought that it could 
> have been an improbable possibility, not a real one.
> 
> Linux uses a lot of interesting ideas that newcomers like me should learn. 
> Things here are trickier than they appear at first sight.

One trick would be to build the Smatch cross function database.

https://www.spinics.net/lists/smatch/msg00568.html

Then you could do:

$ ~/path/to/smatch_data/db/smdb.py led_blink_hdl
file | caller | function | type | parameter | key | value |
drivers/staging/rtl8723bs/core/rtw_cmd.c |   rtw_cmd_thread | 
rtw_cmd_thread ptr cmd_hdl |   INTERNAL | -1 | | 
uchar(*)(struct adapter*, uchar*)
drivers/staging/rtl8188eu/core/rtw_cmd.c |   rtw_cmd_thread | 
rtw_cmd_thread ptr cmd_hdl |   INTERNAL | -1 | | 
uchar(*)(struct adapter*, uchar*)
drivers/staging/rtl8188eu/core/rtw_cmd.c |   rtw_cmd_thread | 
rtw_cmd_thread ptr cmd_hdl |   BUF_SIZE |  1 |pbuf | 
1,4,6,8,12,14,16,19-20,23-24,48,740,884,892,900,960


Which says that led_blink_hdl() is called as a function pointer called
"cmd_hdl" from rtw_cmd_thread().

Hm...  It says it can be called from either rtw_cmd_thread() function
(the rtl8723bs or rtl8188eu version) which is not ideal.  But also
basically harmless so whatever...

regards,
dan carpenter

Re: [PATCH] mm: Define ARCH_HAS_FIRST_USER_ADDRESS

2021-04-13 Thread Christophe Leroy





Le 14/04/2021 à 04:54, Anshuman Khandual a écrit :

Currently most platforms define FIRST_USER_ADDRESS as 0UL duplicating the
same code all over. Instead define a new option ARCH_HAS_FIRST_USER_ADDRESS
for those platforms which would override generic default FIRST_USER_ADDRESS
value 0UL. This makes it much cleaner with reduced code.

Cc: linux-al...@vger.kernel.org
Cc: linux-snps-...@lists.infradead.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-c...@vger.kernel.org
Cc: linux-hexa...@vger.kernel.org
Cc: linux-i...@vger.kernel.org
Cc: linux-m...@lists.linux-m68k.org
Cc: linux-m...@vger.kernel.org
Cc: openr...@lists.librecores.org
Cc: linux-par...@vger.kernel.org
Cc: linuxppc-...@lists.ozlabs.org
Cc: linux-ri...@lists.infradead.org
Cc: linux-s...@vger.kernel.org
Cc: linux...@vger.kernel.org
Cc: sparcli...@vger.kernel.org
Cc: linux...@lists.infradead.org
Cc: linux-xte...@linux-xtensa.org
Cc: x...@kernel.org
Cc: linux...@kvack.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Anshuman Khandual 
---
  arch/alpha/include/asm/pgtable.h | 1 -
  arch/arc/include/asm/pgtable.h   | 6 --
  arch/arm/Kconfig | 1 +
  arch/arm64/include/asm/pgtable.h | 2 --
  arch/csky/include/asm/pgtable.h  | 1 -
  arch/hexagon/include/asm/pgtable.h   | 3 ---
  arch/ia64/include/asm/pgtable.h  | 1 -
  arch/m68k/include/asm/pgtable_mm.h   | 1 -
  arch/microblaze/include/asm/pgtable.h| 2 --
  arch/mips/include/asm/pgtable-32.h   | 1 -
  arch/mips/include/asm/pgtable-64.h   | 1 -
  arch/nds32/Kconfig   | 1 +
  arch/nios2/include/asm/pgtable.h | 2 --
  arch/openrisc/include/asm/pgtable.h  | 1 -
  arch/parisc/include/asm/pgtable.h| 2 --
  arch/powerpc/include/asm/book3s/pgtable.h| 1 -
  arch/powerpc/include/asm/nohash/32/pgtable.h | 1 -
  arch/powerpc/include/asm/nohash/64/pgtable.h | 2 --
  arch/riscv/include/asm/pgtable.h | 2 --
  arch/s390/include/asm/pgtable.h  | 2 --
  arch/sh/include/asm/pgtable.h| 2 --
  arch/sparc/include/asm/pgtable_32.h  | 1 -
  arch/sparc/include/asm/pgtable_64.h  | 3 ---
  arch/um/include/asm/pgtable-2level.h | 1 -
  arch/um/include/asm/pgtable-3level.h | 1 -
  arch/x86/include/asm/pgtable_types.h | 2 --
  arch/xtensa/include/asm/pgtable.h| 1 -
  include/linux/mm.h   | 4 
  mm/Kconfig   | 4 
  29 files changed, 10 insertions(+), 43 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 8ba434287387..47098ccd715e 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -46,6 +46,10 @@ extern int sysctl_page_lock_unfairness;
  
  void init_mm_internals(void);
  
+#ifndef ARCH_HAS_FIRST_USER_ADDRESS


I guess you didn't test it . :)

should be #ifndef CONFIG_ARCH_HAS_FIRST_USER_ADDRESS


+#define FIRST_USER_ADDRESS 0UL
+#endif


But why do we need a config option at all for that ?

Why not just:

#ifndef FIRST_USER_ADDRESS
#define FIRST_USER_ADDRESS  0UL
#endif


+
  #ifndef CONFIG_NEED_MULTIPLE_NODES/* Don't use mapnrs, do it properly */
  extern unsigned long max_mapnr;
  
diff --git a/mm/Kconfig b/mm/Kconfig

index 24c045b24b95..373fbe377075 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -806,6 +806,10 @@ config VMAP_PFN
  
  config ARCH_USES_HIGH_VMA_FLAGS

bool
+
+config ARCH_HAS_FIRST_USER_ADDRESS
+   bool
+
  config ARCH_HAS_PKEYS
bool

Re: [RFC PATCH v8 11/19] virtio/vsock: dequeue callback for SOCK_SEQPACKET

2021-04-13 Thread Arseny Krasnov

I'll fix some issues of this patch found by kernel test robot

On 13.04.2021 15:44, Arseny Krasnov wrote:
> This adds transport callback and it's logic for SEQPACKET dequeue.
> Callback fetches RW packets from rx queue of socket until whole record
> is copied(if user's buffer is full, user is not woken up). This is done
> to not stall sender, because if we wake up user and it leaves syscall,
> nobody will send credit update for rest of record, and sender will wait
> for next enter of read syscall at receiver's side. So if user buffer is
> full, we just send credit update and drop data.
>
> Signed-off-by: Arseny Krasnov 
> ---
> v7 -> v8:
>  - Things like SEQ_BEGIN, SEQ_END, 'msg_len' and 'msg_id' now removed.
>This callback fetches and copies RW packets to user's buffer, until
>last packet of message found(this packet is marked in 'flags' field
>of header).
>
>  include/linux/virtio_vsock.h|  5 ++
>  net/vmw_vsock/virtio_transport_common.c | 73 +
>  2 files changed, 78 insertions(+)
>
> diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
> index dc636b727179..02acf6e9ae04 100644
> --- a/include/linux/virtio_vsock.h
> +++ b/include/linux/virtio_vsock.h
> @@ -80,6 +80,11 @@ virtio_transport_dgram_dequeue(struct vsock_sock *vsk,
>  struct msghdr *msg,
>  size_t len, int flags);
>  
> +ssize_t
> +virtio_transport_seqpacket_dequeue(struct vsock_sock *vsk,
> +struct msghdr *msg,
> +int flags,
> +bool *msg_ready);
>  s64 virtio_transport_stream_has_data(struct vsock_sock *vsk);
>  s64 virtio_transport_stream_has_space(struct vsock_sock *vsk);
>  
> diff --git a/net/vmw_vsock/virtio_transport_common.c 
> b/net/vmw_vsock/virtio_transport_common.c
> index 833104b71a1c..8492b8bd5df5 100644
> --- a/net/vmw_vsock/virtio_transport_common.c
> +++ b/net/vmw_vsock/virtio_transport_common.c
> @@ -393,6 +393,67 @@ virtio_transport_stream_do_dequeue(struct vsock_sock 
> *vsk,
>   return err;
>  }
>  
> +static int virtio_transport_seqpacket_do_dequeue(struct vsock_sock *vsk,
> +  struct msghdr *msg,
> +  int flags,
> +  bool *msg_ready)
> +{
> + struct virtio_vsock_sock *vvs = vsk->trans;
> + struct virtio_vsock_pkt *pkt;
> + int err = 0;
> + size_t user_buf_len = msg->msg_iter.count;
> +
> + *msg_ready = false;
> + spin_lock_bh(>rx_lock);
> +
> + while (!*msg_ready && !list_empty(>rx_queue) && err >= 0) {
> + pkt = list_first_entry(>rx_queue, struct virtio_vsock_pkt, 
> list);
> +
> + if (le16_to_cpu(pkt->hdr.op) == VIRTIO_VSOCK_OP_RW) {
> + size_t bytes_to_copy;
> + size_t pkt_len;
> +
> + pkt_len = (size_t)le32_to_cpu(pkt->hdr.len);
> + bytes_to_copy = min(user_buf_len, pkt_len);
> +
> + /* sk_lock is held by caller so no one else can dequeue.
> +  * Unlock rx_lock since memcpy_to_msg() may sleep.
> +  */
> + spin_unlock_bh(>rx_lock);
> +
> + if (memcpy_to_msg(msg, pkt->buf, bytes_to_copy)) {
> + err = -EINVAL;
> + break;
> + }
> +
> + spin_lock_bh(>rx_lock);
> +
> + /* If user sets 'MSG_TRUNC' we return real length
> +  * of message.
> +  */
> + if (flags & MSG_TRUNC)
> + err += pkt_len;
> + else
> + err += bytes_to_copy;
> +
> + user_buf_len -= bytes_to_copy;
> +
> + if (pkt->hdr.flags & VIRTIO_VSOCK_SEQ_EOR)
> + *msg_ready = true;
> + }
> +
> + virtio_transport_dec_rx_pkt(vvs, pkt);
> + list_del(>list);
> + virtio_transport_free_pkt(pkt);
> + }
> +
> + spin_unlock_bh(>rx_lock);
> +
> + virtio_transport_send_credit_update(vsk);
> +
> + return err;
> +}
> +
>  ssize_t
>  virtio_transport_stream_dequeue(struct vsock_sock *vsk,
>   struct msghdr *msg,
> @@ -405,6 +466,18 @@ virtio_transport_stream_dequeue(struct vsock_sock *vsk,
>  }
>  EXPORT_SYMBOL_GPL(virtio_transport_stream_dequeue);
>  
> +ssize_t
> +virtio_transport_seqpacket_dequeue(struct vsock_sock *vsk,
> +struct msghdr *msg,
> +int flags, bool *msg_ready)
> +{
> + if (flags & MSG_PEEK)
> + return -EOPNOTSUPP;
> +
> + return virtio_transport_seqpacket_do_dequeue(vsk, msg, flags, 
> msg_ready);
> +}

Re: [PATCH v2 1/2] perf/core: Share an event with multiple cgroups

Hi Namhyung,

I love your patch! Perhaps something to improve:

[auto build test WARNING on tip/perf/core]
[also build test WARNING on tip/master linux/master linus/master v5.12-rc7 
next-20210413]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Namhyung-Kim/perf-core-Sharing-events-with-multiple-cgroups/20210413-124251
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 
cface0326a6c2ae5c8f47bd466f07624b3e348a7
config: x86_64-allyesconfig (attached as .config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 


cocci warnings: (new ones prefixed by >>)
>> kernel/events/core.c:5925:13-20: WARNING opportunity for memdup_user

Please review and possibly fold the followup patch.

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


.config.gz
Description: application/gzip

[PATCH] perf/core: fix memdup_user.cocci warnings

From: kernel test robot 

kernel/events/core.c:5925:13-20: WARNING opportunity for memdup_user

 Use memdup_user rather than duplicating its implementation
 This is a little bit restricted to reduce false positives

Generated by: scripts/coccinelle/api/memdup_user.cocci

CC: Namhyung Kim 
Reported-by: kernel test robot 
Signed-off-by: kernel test robot 
---

url:
https://github.com/0day-ci/linux/commits/Namhyung-Kim/perf-core-Sharing-events-with-multiple-cgroups/20210413-124251
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 
cface0326a6c2ae5c8f47bd466f07624b3e348a7

 core.c |   12 +++-
 1 file changed, 3 insertions(+), 9 deletions(-)

--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5922,15 +5922,9 @@ static long _perf_ioctl(struct perf_even
 
cgrp_bufsz = nr_cgrps * sizeof(*cgrp_buf);
 
-   cgrp_buf = kmalloc(cgrp_bufsz, GFP_KERNEL);
-   if (cgrp_buf == NULL)
-   return -ENOMEM;
-
-   if (copy_from_user(cgrp_buf, (u64 __user *)(arg + 8),
-  cgrp_bufsz)) {
-   kfree(cgrp_buf);
-   return -EFAULT;
-   }
+   cgrp_buf = memdup_user((u64 __user *)(arg + 8), cgrp_bufsz);
+   if (IS_ERR(cgrp_buf))
+   return PTR_ERR(cgrp_buf);
 
ret = perf_event_attach_cgroup_node(event, nr_cgrps, cgrp_buf);

Re: [PATCH] USB: Don't set USB_PORT_FEAT_SUSPEND on WD19's Realtek Hub

2021-04-13 Thread Chris Chiu

On Tue, Apr 13, 2021 at 10:44 PM Alan Stern  wrote:
>
> On Tue, Apr 13, 2021 at 03:52:14PM +0800, Chris Chiu wrote:
> > On Mon, Apr 12, 2021 at 11:12 PM Alan Stern  
> > wrote:
> > >
> > > On Mon, Apr 12, 2021 at 11:00:06PM +0800, chris.c...@canonical.com wrote:
> > > > The USB_PORT_FEAT_SUSPEND is not really necessary due to the
> > > > "global suspend" in USB 2.0 spec. It's only for many hub devices
> > > > which don't relay wakeup requests from the devices connected to
> > > > downstream ports. For this realtek hub, there's no problem waking
> > > > up the system from connected keyboard.
> > >
> > > What about runtime suspend?  That _does_ require USB_PORT_FEAT_SUSPEND.
> >
> > It's hard to reproduce the same thing with runtime PM. I also don't
> > know the aggressive
> > way to trigger runtime suspend. So I'm assuming the same thing will happen 
> > in
> > runtime PM case because they both go the same usb_port_resume path. Could
> > you please suggest a better way to verify this for runtime PM?
>
> To put a USB device into runtime suspend, do this:
>
> echo 0 >/sys/bus/usb/devices/.../bConfigurationValue
> echo auto >/sys/bus/usb/devices/.../power/control
>
> where ... is the pathname for the device you want to suspend.  (Note
> that this will unbind the device from its driver, so make sure there's
> no possibility of data loss before you do it.)
>
> To resume the device, write "on" to the power/control file.  You can
> verify the runtime-PM status by reading the files in the power/
> subdirectory.
>
Thanks for the instructions. I can hit the same timeout problem with
runtime PM. The
fail rate seems the same as normal PM. (around 1/4 ~ 1/7)
root@:/sys/bus/usb/devices/3-4.3# echo auto > power/control
root@:/sys/bus/usb/devices/3-4.3# echo on > power/control
root@:/sys/bus/usb/devices/3-4.3# dmesg -c
[ 2789.679807] usb 3-4: kworker/7:0 timed out on ep0out len=0/0
[ 2789.679812] usb 3-4-port3: can't suspend, status -110
[ 2789.680078] usb 3-4.3: Failed to suspend device, error -110


> > > > This commit bypasses the USB_PORT_FEAT_SUSPEND for the quirky hub.
> > > >
> > > > Signed-off-by: Chris Chiu 
> > > > ---
> > >
> > >
> > > > diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
> > > > index 7f71218cc1e5..8478d49bba77 100644
> > > > --- a/drivers/usb/core/hub.c
> > > > +++ b/drivers/usb/core/hub.c
> > > > @@ -3329,8 +3329,11 @@ int usb_port_suspend(struct usb_device *udev, 
> > > > pm_message_t msg)
> > > >* descendants is enabled for remote wakeup.
> > > >*/
> > > >   else if (PMSG_IS_AUTO(msg) || 
> > > > usb_wakeup_enabled_descendants(udev) > 0)
> > > > - status = set_port_feature(hub->hdev, port1,
> > > > - USB_PORT_FEAT_SUSPEND);
> > > > + if (udev->quirks & USB_QUIRK_NO_SET_FEAT_SUSPEND)
> > >
> > > You should test hub->hdev->quirks, here, not udev->quirks.  The quirk
> > > belongs to the Realtek hub, not to the device that's plugged into the
> > > hub.
> > >
> >
> > Thanks for pointing that out. I'll verify again and propose a V2 after
> > it's done.
>
> Another thing to consider: You shouldn't return 0 from usb_port_suspend
> if the port wasn't actually suspended.  We don't want to kernel to have
> a false idea of the hardware's current state.
>
So we still need the "really_suspend=false". What if I replace it with
the following?
It's a little verbose but expressive enough. Any suggestions?

+   else if (!(hub->hdev->quirks & USB_QUIRK_NO_SET_FEAT_SUSPEND) &&
+   (PMSG_IS_AUTO(msg) || usb_wakeup_enabled_descendants(udev) > 0))
+   status = set_port_feature(hub->hdev, port1,
+   USB_PORT_FEAT_SUSPEND);
else {
really_suspend = false;
status = 0;

Chris

> Alan Stern

Re: [PATCH v2] firmware_loader: fix use-after-free in firmware_fallback_sysfs

2021-04-13 Thread Anirudh Rayabharam

On Tue, Apr 13, 2021 at 04:51:38PM +, Luis Chamberlain wrote:
> On Tue, Apr 13, 2021 at 04:12:42PM +0530, Anirudh Rayabharam wrote:
> > The use-after-free happens when a fw_priv object has been freed but
> > hasn't been removed from the pending list (pending_fw_head). The next
> > time fw_load_sysfs_fallback tries to insert into the list, it ends up
> > accessing the pending_list member of the previoiusly freed fw_priv.
> > 
> > In commit bcfbd3523f3c ("firmware: fix a double abort case with
> > fw_load_sysfs_fallback"), fw_load_abort() is skipped if
> > fw_sysfs_wait_timeout() returns -ENOENT. This causes the fw_priv to
> > not be removed from the pending list.
> > 
> > To fix this, delete the fw_priv from the pending list when retval
> > is -ENOENT instead of skipping the entire block.
> > 
> > Fixes: bcfbd3523f3c ("firmware: fix a double abort case with 
> > fw_load_sysfs_fallback")
> > Reported-by: syzbot+de271708674e20930...@syzkaller.appspotmail.com
> > Tested-by: syzbot+de271708674e20930...@syzkaller.appspotmail.com
> > Signed-off-by: Anirudh Rayabharam 
> 
> Thanks for your patch Anirudh, but please also see this reply to the
> issue:
> 
> http://lkml.kernel.org/r/20210403013143.gv4...@42.do-not-panic.com

Hi Luis! Thanks for pointing me to this. I completely forgot to check
the existing discussion on this issue.

> 
> The way you patched the issue is just a band-aid, meaning we keep on
> moving the issue further and it seems that's just the wrong approach.
> 
> Can you try the patch in that thread, to verify if the UAF goes away?

The patch in that thread doesn't work. But I think I know what's
missing. The root problem here is that all code paths that abort fw load
don't remove it from the pending list. For example:

_request_firmware()
 -> fw_abort_batch_reqs()
   -> fw_state_aborted()

In the above code path, the fw_priv is aborted but not removed from
pending list. So, the patch in the above thread fails because the load
is being aborted after it has been added to the list.

So, to fix the root cause of this issue we should make it so that all
aborts remove the fw_priv from the pending list. Perhaps we should add
a list_del_init in __fw_set_state() just before calling complete_all().
This way, all code paths that abort will delete the fw_priv from the
list.

The patch in the above thread also makes some changes to the error
codes. Since it isn't directly related to the UAF, the error codes
change should be a separate patch right?

Thanks!

- Anirudh.

Re: [PATCH v7 4/7] mm,hugetlb: Split prep_new_huge_page functionality

2021-04-13 Thread Oscar Salvador

On Tue, Apr 13, 2021 at 02:33:41PM -0700, Mike Kravetz wrote:
> > -static void prep_new_huge_page(struct hstate *h, struct page *page, int 
> > nid)
> > +/*
> > + * Must be called with the hugetlb lock held
> > + */
> > +static void __prep_account_new_huge_page(struct hstate *h, int nid)
> > +{
> > +   h->nr_huge_pages++;
> > +   h->nr_huge_pages_node[nid]++;
> 
> I would prefer if we also move setting the destructor to this routine.
>   set_compound_page_dtor(page, HUGETLB_PAGE_DTOR);

Uhm, but that is the routine that does the accounting, it feels wrong
here, plus...

> 
> That way, PageHuge() will be false until it 'really' is a huge page.
> If not, we could potentially go into that retry loop in
> dissolve_free_huge_page or alloc_and_dissolve_huge_page in patch 5.

...I do not follow here, could you please elaborate some more?
Unless I am missing something, behaviour should not be any different with this
patch.

Thanks


-- 
Oscar Salvador
SUSE L3

Re: [PATCH] staging: mt7621-pci: stop using of_pci_range_to_resource

2021-04-13 Thread Sergio Paracuellos

Hi Illya,

On Wed, Apr 14, 2021 at 6:10 AM Ilya Lipnitskiy
 wrote:
>
> Hi Sergio,
>
> Just as an aside, are you planning to move staging/mt7621-pci into
> arch/mips/pci at some point? This driver seems more maintained (by
> you!) than many in-tree drivers...

Yes, I am planning to move it and maintain it. There is minor stuff
that must be changed before that but I hope to give that all a try
after the next merge window.

Best regards,
Sergio Paracuellos
> Ilya
>
> On Sat, Apr 10, 2021 at 12:23 PM Sergio Paracuellos
>  wrote:
> >
> > Hi Ilya,
> >
> > On Sat, Apr 10, 2021 at 7:33 PM Ilya Lipnitskiy
> >  wrote:
> > >
> > > The logic here was already overriding the erroneous IO addresses
> > > returned from of_pci_range_to_resource, which is the bulk of the logic.
> > >
> > > So stop using it altogether and initialize the fields explicitly, as
> > > done in aeba3731b150 ("powerpc/pci: Fix IO space breakage after
> > > of_pci_range_to_resource() change").
> > >
> > > Signed-off-by: Ilya Lipnitskiy 
> > > Cc: Sergio Paracuellos 
> > > ---
> > >  drivers/staging/mt7621-pci/pci-mt7621.c | 11 ++-
> > >  1 file changed, 6 insertions(+), 5 deletions(-)
> >
> > Looks good to me, thanks! I have also tested this in gnubee pc1
> > platform with no regressions at all when io bars are assigned:
> >
> > [   16.378956] mt7621-pci 1e14.pcie: host bridge /pcie@1e14 ranges:
> > [   16.392405] mt7621-pci 1e14.pcie:  MEM
> > 0x006000..0x006fff -> 0x00
> > [   16.408796] mt7621-pci 1e14.pcie:   IO
> > 0x001e16..0x001e16 -> 0x00
> > [   16.425264] mt7621-pci-phy 1e149000.pcie-phy: PHY for 0xbe149000
> > (dual port = 1)
> > [   16.440452] mt7621-pci-phy 1e14a000.pcie-phy: PHY for 0xbe14a000
> > (dual port = 0)
> > [   16.678713] mt7621-pci 1e14.pcie: PCIE0 enabled
> > [   16.688435] mt7621-pci 1e14.pcie: PCIE1 enabled
> > [   16.698160] mt7621-pci 1e14.pcie: PCIE2 enabled
> > [   16.707886] mt7621-pci 1e14.pcie: PCI coherence region base:
> > 0x6000, mask/settings: 0xf002
> > [   16.726623] mt7621-pci 1e14.pcie: PCI host bridge to bus :00
> > [   16.739309] pci_bus :00: root bus resource [io  
> > 0x1e16-0x1e16]
> > [   16.753008] pci_bus :00: root bus resource [mem 
> > 0x6000-0x6fff]
> > [   16.766709] pci_bus :00: root bus resource [bus 00-ff]
> > [   16.777649] pci_bus :00: root bus resource [mem
> > 0x6000-0x6fff] (bus address [0x-0x0fff])
> > [   16.797986] pci :00:00.0: [0e8d:0801] type 01 class 0x060400
> > [   16.809973] pci :00:00.0: reg 0x10: [mem 0x-0x7fff]
> > [   16.822467] pci :00:00.0: reg 0x14: initial BAR value 0x 
> > invalid
> > [   16.836511] pci :00:00.0: reg 0x14: [mem size 0x0001]
> > [   16.848048] pci :00:00.0: supports D1
> > [   16.856051] pci :00:00.0: PME# supported from D0 D1 D3hot
> > [   16.867943] pci :00:01.0: [0e8d:0801] type 01 class 0x060400
> > [   16.879960] pci :00:01.0: reg 0x10: [mem 0x-0x7fff]
> > [   16.892454] pci :00:01.0: reg 0x14: initial BAR value 0x 
> > invalid
> > [   16.906497] pci :00:01.0: reg 0x14: [mem size 0x0001]
> > [   16.918040] pci :00:01.0: supports D1
> > [   16.926031] pci :00:01.0: PME# supported from D0 D1 D3hot
> > [   16.937903] pci :00:02.0: [0e8d:0801] type 01 class 0x060400
> > [   16.949915] pci :00:02.0: reg 0x10: [mem 0x-0x7fff]
> > [   16.962412] pci :00:02.0: reg 0x14: initial BAR value 0x 
> > invalid
> > [   16.976466] pci :00:02.0: reg 0x14: [mem size 0x0001]
> > [   16.987991] pci :00:02.0: supports D1
> > [   16.995986] pci :00:02.0: PME# supported from D0 D1 D3hot
> > [   17.008716] pci :00:00.0: bridge configuration invalid ([bus
> > 00-00]), reconfiguring
> > [   17.024695] pci :00:01.0: bridge configuration invalid ([bus
> > 00-00]), reconfiguring
> > [   17.040654] pci :00:02.0: bridge configuration invalid ([bus
> > 00-00]), reconfiguring
> > [   17.056868] pci :01:00.0: [1b21:0611] type 00 class 0x010185
> > [   17.068882] pci :01:00.0: reg 0x10: [io  0x-0x0007]
> > [   17.080003] pci :01:00.0: reg 0x14: [io  0x-0x0003]
> > [   17.091115] pci :01:00.0: reg 0x18: [io  0x-0x0007]
> > [   17.102238] pci :01:00.0: reg 0x1c: [io  0x-0x0003]
> > [   17.113350] pci :01:00.0: reg 0x20: [io  0x-0x000f]
> > [   17.124463] pci :01:00.0: reg 0x24: initial BAR value 0x 
> > invalid
> > [   17.138508] pci :01:00.0: reg 0x24: [mem size 0x0200]
> > [   17.150115] pci :01:00.0: 2.000 Gb/s available PCIe bandwidth,
> > limited by 2.5 GT/s PCIe x1 link at :00:00.0 (capable of 4.000
> > Gb/s with 5.0 GT/s PCIe x1 link)
> > [   17.204594] pci :00:00.0: PCI bridge to [bus 01-ff]
> > [   17.215109] pci :00:00.0:   bridge window [io  0x-0x0fff]
> > [   17.227257] pci :00:00.0:

Re: [PATCH v7 5/7] mm: Make alloc_contig_range handle free hugetlb pages

2021-04-13 Thread Oscar Salvador

On Tue, Apr 13, 2021 at 03:29:02PM -0700, Mike Kravetz wrote:
> >   spin_lock_irq(hugetlb_lock)
> >   1) update_and_free_page
> >PageHuge() == F
> >__free_pages()
> >   2) enqueue_huge_page
> >SetPageHugeFreed()
> >   spin_unlock(_lock)
> 
> Very small nit, the above should be spin_unlock_irq(_lock)

Right, I missed it somehow.

> > +   /*
> > +* The page might have been dissolved from under our feet, so make sure
> > +* to carefully check the state under the lock.
> > +* Return success when racing as if we dissolved the page ourselves.
> > +*/
> > +   spin_lock_irq(_lock);
> > +   if (PageHuge(page)) {
> > +   head = compound_head(page);
> > +   h = page_hstate(head);
> > +   } else {
> > +   spin_unlock(_lock);
> 
> Should be be spin_unlock_irq(_lock);
> 
> Other than that, it looks good.

Yeah, I will amend it in the next version.

Thanks Mike!


-- 
Oscar Salvador
SUSE L3

Re: [PATCH v7 6/7] mm: Make alloc_contig_range handle in-use hugetlb pages

2021-04-13 Thread Oscar Salvador

On Tue, Apr 13, 2021 at 03:48:53PM -0700, Mike Kravetz wrote:
> The label free_new is:
> 
> free_new:
> spin_unlock_irq(_lock);
> __free_pages(new_page, huge_page_order(h));
> 
> return ret;
> 
> So, we are locking and immediately unlocking without any code in
> between.  Usually, I don't like like multiple labels before return.
> However, perhaps we should add another to avoid this unnecessary
> cycle.  On the other hand, this is an uncommon race condition so the
> simple code may be acceptable.

I guess we could have something like:

 free_new:
 spin_unlock_irq(_lock);
 free_new_nolock:
 __free_pages(new_page, huge_page_order(h));
 
 return ret;

And let the retry go to there without locking. But as you said, the
racecondition is rare enough, so I am not sure if this buys us much.
But I can certainly add it if you feel strong about it.


-- 
Oscar Salvador
SUSE L3

[char-misc-next] mei: me: add Alder Lake P device id.

2021-04-13 Thread Tomas Winkler

Add Alder Lake P device ID.

Cc: 
Signed-off-by: Tomas Winkler 
---
 drivers/misc/mei/hw-me-regs.h | 1 +
 drivers/misc/mei/pci-me.c | 1 +
 2 files changed, 2 insertions(+)

diff --git a/drivers/misc/mei/hw-me-regs.h b/drivers/misc/mei/hw-me-regs.h
index 14be76d4c2e6..cb34925e10f1 100644
--- a/drivers/misc/mei/hw-me-regs.h
+++ b/drivers/misc/mei/hw-me-regs.h
@@ -105,6 +105,7 @@
 
 #define MEI_DEV_ID_ADP_S  0x7AE8  /* Alder Lake Point S */
 #define MEI_DEV_ID_ADP_LP 0x7A60  /* Alder Lake Point LP */
+#define MEI_DEV_ID_ADP_P  0x51E0  /* Alder Lake Point P */
 
 /*
  * MEI HW Section
diff --git a/drivers/misc/mei/pci-me.c b/drivers/misc/mei/pci-me.c
index a7e179626b63..c3393b383e59 100644
--- a/drivers/misc/mei/pci-me.c
+++ b/drivers/misc/mei/pci-me.c
@@ -111,6 +111,7 @@ static const struct pci_device_id mei_me_pci_tbl[] = {
 
{MEI_PCI_DEVICE(MEI_DEV_ID_ADP_S, MEI_ME_PCH15_CFG)},
{MEI_PCI_DEVICE(MEI_DEV_ID_ADP_LP, MEI_ME_PCH15_CFG)},
+   {MEI_PCI_DEVICE(MEI_DEV_ID_ADP_P, MEI_ME_PCH15_CFG)},
 
/* required last entry */
{0, }
-- 
2.26.3

Re: [PATCH v2 00/16] Multigenerational LRU Framework

2021-04-13 Thread Dave Chinner

On Tue, Apr 13, 2021 at 09:40:12PM -0600, Yu Zhao wrote:
> On Tue, Apr 13, 2021 at 5:14 PM Dave Chinner  wrote:
> > On Tue, Apr 13, 2021 at 10:13:24AM -0600, Jens Axboe wrote:
> > > On 4/13/21 1:51 AM, SeongJae Park wrote:
> > > > From: SeongJae Park 
> > > >
> > > > Hello,
> > > >
> > > >
> > > > Very interesting work, thank you for sharing this :)
> > > >
> > > > On Tue, 13 Apr 2021 00:56:17 -0600 Yu Zhao  wrote:
> > > >
> > > >> What's new in v2
> > > >> 
> > > >> Special thanks to Jens Axboe for reporting a regression in buffered
> > > >> I/O and helping test the fix.
> > > >
> > > > Is the discussion open?  If so, could you please give me a link?
> > >
> > > I wasn't on the initial post (or any of the lists it was posted to), but
> > > it's on the google page reclaim list. Not sure if that is public or not.
> > >
> > > tldr is that I was pretty excited about this work, as buffered IO tends
> > > to suck (a lot) for high throughput applications. My test case was
> > > pretty simple:
> > >
> > > Randomly read a fast device, using 4k buffered IO, and watch what
> > > happens when the page cache gets filled up. For this particular test,
> > > we'll initially be doing 2.1GB/sec of IO, and then drop to 1.5-1.6GB/sec
> > > with kswapd using a lot of CPU trying to keep up. That's mainline
> > > behavior.
> >
> > I see this exact same behaviour here, too, but I RCA'd it to
> > contention between the inode and memory reclaim for the mapping
> > structure that indexes the page cache. Basically the mapping tree
> > lock is the contention point here - you can either be adding pages
> > to the mapping during IO, or memory reclaim can be removing pages
> > from the mapping, but we can't do both at once.
> >
> > So we end up with kswapd spinning on the mapping tree lock like so
> > when doing 1.6GB/s in 4kB buffered IO:
> >
> > -   20.06% 0.00%  [kernel]   [k] kswapd 
> > 
> >▒
> >- 20.06% kswapd  
> > 
> >▒
> >   - 20.05% balance_pgdat
> > 
> >▒
> >  - 20.03% shrink_node   
> > 
> >▒
> > - 19.92% shrink_lruvec  
> > 
> >▒
> >- 19.91% shrink_inactive_list
> > 
> >▒
> >   - 19.22% shrink_page_list 
> > 
> >▒
> >  - 17.51% __remove_mapping  
> > 
> >▒
> > - 14.16% _raw_spin_lock_irqsave 
> > 
> >▒
> >- 14.14% do_raw_spin_lock
> > 
> >▒
> > __pv_queued_spin_lock_slowpath  
> > 
> >▒
> > - 1.56% __delete_from_page_cache
> > 
> >▒
> >  0.63% xas_store
> > 
> >▒
> > - 0.78% _raw_spin_unlock_irqrestore 
> > 
> >▒
> >- 0.69% do_raw_spin_unlock   
> > 
> >▒
> > __raw_callee_save___pv_queued_spin_unlock   
> > 
> >▒
> >  - 0.82% free_unref_page_list   
> > 
> >▒
> > - 0.72% free_unref_page_commit  
> >

Re: [PATCH v7 clocksource 3/5] clocksource: Check per-CPU clock synchronization when marked unstable

On Tue, Apr 13, 2021 at 10:49:11PM +0200, Thomas Gleixner wrote:
> Paul,
> 
> On Mon, Apr 12 2021 at 16:18, Paul E. McKenney wrote:
> > On Mon, Apr 12, 2021 at 10:37:10PM +0200, Thomas Gleixner wrote:
> >> On Mon, Apr 12 2021 at 12:57, Paul E. McKenney wrote:
> >> > On Mon, Apr 12, 2021 at 08:54:03PM +0200, Thomas Gleixner wrote:
> >> >> > I will send a new series out later today, Pacific Time.
> >> >> 
> >> >> Can you do me a favour and send it standalone and not as yet another
> >> >> reply to this existing thread maze. A trivial lore link to the previous
> >> >> version gives enough context.
> >> >
> >> > Will do!
> >> >
> >> > Of course, it turns out that lockdep also doesn't like waited-on
> >> > smp_call_function_single() invocations from timer handlers,
> >> > so I am currently looking at other options for dealing with that
> >> > potential use-after-free.  I am starting to like the looks of "only set
> >> > CLOCK_SOURCE_VERIFY_PERCPU on statically allocated clocksource structures
> >> > and let KASAN enforce this restriction", but I have not quite given up
> >> > on making it more general.
> >> 
> >> The simplest point is in the thread under the clocksource_mutex which
> >> prevents anything from vanishing under your feet.
> >
> > And lockdep is -much- happier with the setup shown below, so thank
> > you again!
> 
> But it is too simple now :) ...
> 
> > diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
> > index f047c6cb056c..34dc38b6b923 100644
> > --- a/kernel/time/clocksource.c
> > +++ b/kernel/time/clocksource.c
> > @@ -519,6 +515,13 @@ static int __clocksource_watchdog_kthread(void)
> > unsigned long flags;
> > int select = 0;
> >  
> > +   /* Do any required per-CPU skew verification. */
> > +   list_for_each_entry(cs, _list, wd_list) {
> > +   if ((cs->flags & (CLOCK_SOURCE_UNSTABLE | 
> > CLOCK_SOURCE_VERIFY_PERCPU)) ==
> > +   (CLOCK_SOURCE_UNSTABLE | CLOCK_SOURCE_VERIFY_PERCPU))
> > +   clocksource_verify_percpu(cs);
> > +   }
> 
> because that list is _NOT_ protected by the clocksource_mutex as you
> noticed yourself already.
> 
> But you don't have to walk that list at all because the only interesting
> thing is the currently active clocksource, which is about to be changed
> in case the watchdog marked it unstable and cannot be changed by any
> other code concurrently because clocksource_mutex is held.
> 
> So all you need is:
> 
>   if (curr_clocksource &&
>   curr_clocksource->flags & CLOCK_SOURCE_UNSTABLE &&
>   curr_clocksource->flags & CLOCK_SOURCE_VERIFY_PERCPU)
>   clocksource_verify_percpu_wreckage(curr_clocksource);
> 
> Hmm?

With the addition of a clocksource=tsc boot parameter, this approach
does appear to work, thank you!  I sent out the updated series.

Thanx, Paul

Re: [Outreachy kernel] [PATCH] staging: rtl8723bs: hal: Remove four set but not used variables

On Tue, Apr 13, 2021 at 08:42:22PM +0200, Fabio M. De Francesco wrote:
> Removed four variables that were set but not used.
> 
> Signed-off-by: Fabio M. De Francesco 
> ---
>  drivers/staging/rtl8723bs/hal/rtl8723b_hal_init.c | 7 +--
>  1 file changed, 1 insertion(+), 6 deletions(-)
> 
> diff --git a/drivers/staging/rtl8723bs/hal/rtl8723b_hal_init.c 
> b/drivers/staging/rtl8723bs/hal/rtl8723b_hal_init.c
> index 77f8353c5ce5..fad6a3bfe07c 100644
> --- a/drivers/staging/rtl8723bs/hal/rtl8723b_hal_init.c
> +++ b/drivers/staging/rtl8723bs/hal/rtl8723b_hal_init.c
> @@ -3199,12 +3199,10 @@ static void hw_var_set_mlme_join(struct adapter 
> *padapter, u8 variable, u8 *val)
>  
>  void CCX_FwC2HTxRpt_8723b(struct adapter *padapter, u8 *pdata, u8 len)
>  {
> - u8 seq_no;
>  
>  #define  GET_8723B_C2H_TX_RPT_LIFE_TIME_OVER(_Header)
> LE_BITS_TO_1BYTE((_Header + 0), 6, 1)
>  #define  GET_8723B_C2H_TX_RPT_RETRY_OVER(_Header)
> LE_BITS_TO_1BYTE((_Header + 0), 7, 1)
>  
> - seq_no = *(pdata+6);
>  
>   if (GET_8723B_C2H_TX_RPT_RETRY_OVER(pdata) | 
> GET_8723B_C2H_TX_RPT_LIFE_TIME_OVER(pdata)) {

Now we have two blank lines in a row.  Please delete one.

regards,
dan carpenter

Re: [PATCH] [v4] Input: Add "Select" button to Microsoft Xbox One controller.

2021-04-13 Thread Chris Ye

That line was using tab + 4 spaces on the left and was reformatted to
use 2 tabs.
If you don't like it I've uploaded patch v5 not touching that line.


On Tue, Apr 13, 2021 at 5:34 AM Bastien Nocera  wrote:
>
> On Tue, 2021-04-13 at 01:02 +, Chris Ye wrote:
> > Add "Select" button input capability and input event mapping for
> > Microsoft Xbox One controller. From product site this is also
> > referred as
> > "Share" button.
> > Fixed Microsoft Xbox One controller select button not working under
> > USB
> > connection.
> >
> > Signed-off-by: Chris Ye 
> > ---
> >  drivers/input/joystick/xpad.c | 8 +++-
> >  1 file changed, 7 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/input/joystick/xpad.c
> > b/drivers/input/joystick/xpad.c
> > index 9f0d07dcbf06..99cb8bb78570 100644
> > --- a/drivers/input/joystick/xpad.c
> > +++ b/drivers/input/joystick/xpad.c
> > @@ -79,6 +79,7 @@
> >  #define MAP_DPAD_TO_BUTTONS(1 << 0)
> >  #define MAP_TRIGGERS_TO_BUTTONS(1 << 1)
> >  #define MAP_STICKS_TO_NULL (1 << 2)
> > +#define MAP_SELECT_BUTTON  (1 << 3)
> >  #define DANCEPAD_MAP_CONFIG(MAP_DPAD_TO_BUTTONS
> > |  \
> > MAP_TRIGGERS_TO_BUTTONS |
> > MAP_STICKS_TO_NULL)
> >
> > @@ -130,6 +131,7 @@ static const struct xpad_device {
> > { 0x045e, 0x02e3, "Microsoft X-Box One Elite pad", 0,
> > XTYPE_XBOXONE },
> > { 0x045e, 0x02ea, "Microsoft X-Box One S pad", 0,
> > XTYPE_XBOXONE },
> > { 0x045e, 0x0719, "Xbox 360 Wireless Receiver",
> > MAP_DPAD_TO_BUTTONS, XTYPE_XBOX360W },
> > +   { 0x045e, 0x0b12, "Microsoft Xbox One X pad",
> > MAP_SELECT_BUTTON, XTYPE_XBOXONE },
> > { 0x046d, 0xc21d, "Logitech Gamepad F310", 0, XTYPE_XBOX360
> > },
> > { 0x046d, 0xc21e, "Logitech Gamepad F510", 0, XTYPE_XBOX360
> > },
> > { 0x046d, 0xc21f, "Logitech Gamepad F710", 0, XTYPE_XBOX360
> > },
> > @@ -862,6 +864,8 @@ static void xpadone_process_packet(struct
> > usb_xpad *xpad, u16 cmd, unsigned char
> > /* menu/view buttons */
> > input_report_key(dev, BTN_START,  data[4] & 0x04);
> > input_report_key(dev, BTN_SELECT, data[4] & 0x08);
> > +   if (xpad->mapping & MAP_SELECT_BUTTON)
> > +   input_report_key(dev, KEY_RECORD, data[22] & 0x01);
> >
> > /* buttons A,B,X,Y */
> > input_report_key(dev, BTN_A,data[4] & 0x10);
> > @@ -1669,9 +1673,11 @@ static int xpad_init_input(struct usb_xpad
> > *xpad)
> >
> > /* set up model-specific ones */
> > if (xpad->xtype == XTYPE_XBOX360 || xpad->xtype ==
> > XTYPE_XBOX360W ||
> > -   xpad->xtype == XTYPE_XBOXONE) {
> > +   xpad->xtype == XTYPE_XBOXONE) {
>
> Why the indentation change here?
>
> > for (i = 0; xpad360_btn[i] >= 0; i++)
> > input_set_capability(input_dev, EV_KEY,
> > xpad360_btn[i]);
> > +   if (xpad->mapping & MAP_SELECT_BUTTON)
> > +   input_set_capability(input_dev, EV_KEY,
> > KEY_RECORD);
> > } else {
> > for (i = 0; xpad_btn[i] >= 0; i++)
> > input_set_capability(input_dev, EV_KEY,
> > xpad_btn[i]);
>
>

Re: [PATCH] Documentation: syscalls: add a note about ABI-agnostic types

2021-04-13 Thread Yury Norov

Ping?

On Fri, Apr 09, 2021 at 01:43:04PM -0700, Yury Norov wrote:
> Recently added memfd_secret() syscall had a flags parameter passed
> as unsigned long, which requires creation of compat entry for it.
> It was possible to change the type of flags to unsigned int and so
> avoid bothering with compat layer.
> 
> https://www.spinics.net/lists/linux-mm/msg251550.html
> 
> Documentation/process/adding-syscalls.rst doesn't point clearly about
> preference of ABI-agnostic types. This patch adds such notification.
> 
> Signed-off-by: Yury Norov 
> ---
>  Documentation/process/adding-syscalls.rst | 7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/Documentation/process/adding-syscalls.rst 
> b/Documentation/process/adding-syscalls.rst
> index 9af35f4ec728..46add16edf14 100644
> --- a/Documentation/process/adding-syscalls.rst
> +++ b/Documentation/process/adding-syscalls.rst
> @@ -172,6 +172,13 @@ arguments (i.e. parameter 1, 3, 5), to allow use of 
> contiguous pairs of 32-bit
>  registers.  (This concern does not apply if the arguments are part of a
>  structure that's passed in by pointer.)
>  
> +Whenever possible, try to use ABI-agnostic types for passing parameters to
> +a syscall in order to avoid creating compat entry for it. Linux supports two
> +ABI models - ILP32 and LP64. The types like ``void *``, ``long``, ``size_t``,
> +``off_t`` have different size in those ABIs; types like ``char`` and  ``int``
> +have the same size and don't require a compat layer support. For flags, it's
> +always better to use ``unsigned int``.
> +
>  
>  Proposing the API
>  -
> -- 
> 2.25.1

Re: [syzbot] WARNING in unsafe_follow_pfn

On Tue, Apr 13, 2021 at 03:11:45PM -0300, Jason Gunthorpe wrote:
> On Tue, Apr 13, 2021 at 07:20:12PM +0200, Dmitry Vyukov wrote:
> > > > Plus users are going to be seeing this as well.  According to the commit
> > > > message for 69bacee7f9ad ("mm: Add unsafe_follow_pfn") "Unfortunately
> > > > there's some users where this is not fixable (like v4l userptr of iomem
> > > > mappings)".  It sort of seems crazy to dump this giant splat and then
> > > > tell users to ignore it forever because it can't be fixed...  0_0
> > >
> > > I think the discussion conclusion was that this interface should not
> > > be used by userspace anymore, it is obsolete by some new interface?
> > >
> > > It should be protected by some kconfig and the kconfig should be
> > > turned off for syzkaller runs.
> > 
> > If this is not a kernel bug, then it must not use WARN_ON[_ONCE]. It
> > makes the kernel untestable for both automated systems and humans:
> 
> It is a kernel security bug triggerable by userspace.
> 
> > And if it's a kernel bug reachable from user-space, then I think this
> > code should be removed entirely, not just on all testing systems. Or
> > otherwise if we are not removing it for some reason, then it needs to
> > be fixed.
> 
> Legacy embedded systems apparently require it.

Are legacy embedded systems ever going to update their kernel?  It might
be better to just remove it.  (I don't really have any details outside
of your email so I don't know).

regards,
dan carpenter

[PATCH v8 clocksource 3/5] clocksource: Check per-CPU clock synchronization when marked unstable

Some sorts of per-CPU clock sources have a history of going out of
synchronization with each other.  However, this problem has purportedy
been solved in the past ten years.  Except that it is all too possible
that the problem has instead simply been made less likely, which might
mean that some of the occasional "Marking clocksource 'tsc' as unstable"
messages might be due to desynchronization.  How would anyone know?

Therefore apply CPU-to-CPU synchronization checking to newly unstable
clocksource that are marked with the new CLOCK_SOURCE_VERIFY_PERCPU flag.
Lists of desynchronized CPUs are printed, with the caveat that if it
is the reporting CPU that is itself desynchronized, it will appear that
all the other clocks are wrong.  Just like in real life.

Cc: John Stultz 
Cc: Thomas Gleixner 
Cc: Stephen Boyd 
Cc: Jonathan Corbet 
Cc: Mark Rutland 
Cc: Marc Zyngier 
Cc: Andi Kleen 
Reported-by: Chris Mason 
[ paulmck: Add "static" to clocksource_verify_one_cpu() per kernel test robot 
feedback. ]
[ paulmck: Apply Thomas Gleixner feedback. ]
Signed-off-by: Paul E. McKenney 
---
 arch/x86/kernel/kvmclock.c  |  2 +-
 arch/x86/kernel/tsc.c   |  3 +-
 include/linux/clocksource.h |  2 +-
 kernel/time/clocksource.c   | 63 +
 4 files changed, 67 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index 1fc0962c89c0..97eeaf164296 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -169,7 +169,7 @@ struct clocksource kvm_clock = {
.read   = kvm_clock_get_cycles,
.rating = 400,
.mask   = CLOCKSOURCE_MASK(64),
-   .flags  = CLOCK_SOURCE_IS_CONTINUOUS,
+   .flags  = CLOCK_SOURCE_IS_CONTINUOUS | CLOCK_SOURCE_VERIFY_PERCPU,
.enable = kvm_cs_enable,
 };
 EXPORT_SYMBOL_GPL(kvm_clock);
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index f70dffc2771f..56289170753c 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -1151,7 +1151,8 @@ static struct clocksource clocksource_tsc = {
.mask   = CLOCKSOURCE_MASK(64),
.flags  = CLOCK_SOURCE_IS_CONTINUOUS |
  CLOCK_SOURCE_VALID_FOR_HRES |
- CLOCK_SOURCE_MUST_VERIFY,
+ CLOCK_SOURCE_MUST_VERIFY |
+ CLOCK_SOURCE_VERIFY_PERCPU,
.vdso_clock_mode= VDSO_CLOCKMODE_TSC,
.enable = tsc_cs_enable,
.resume = tsc_resume,
diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
index 86d143db6523..83a3ebff7456 100644
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -131,7 +131,7 @@ struct clocksource {
 #define CLOCK_SOURCE_UNSTABLE  0x40
 #define CLOCK_SOURCE_SUSPEND_NONSTOP   0x80
 #define CLOCK_SOURCE_RESELECT  0x100
-
+#define CLOCK_SOURCE_VERIFY_PERCPU 0x200
 /* simplify initialization of mask field */
 #define CLOCKSOURCE_MASK(bits) GENMASK_ULL((bits) - 1, 0)
 
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 3ac19a7859f5..69311deab428 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -211,6 +211,63 @@ static void clocksource_watchdog_inject_delay(void)
WARN_ON_ONCE(injectfail < 0);
 }
 
+static u64 csnow_mid;
+static cpumask_t cpus_ahead;
+static cpumask_t cpus_behind;
+
+static void clocksource_verify_one_cpu(void *csin)
+{
+   struct clocksource *cs = (struct clocksource *)csin;
+
+   csnow_mid = cs->read(cs);
+}
+
+static void clocksource_verify_percpu(struct clocksource *cs)
+{
+   int64_t cs_nsec, cs_nsec_max, cs_nsec_min;
+   u64 csnow_begin, csnow_end;
+   bool firsttime = 1;
+   int testcpu;
+   s64 delta;
+   int cpu;
+
+   cpumask_clear(_ahead);
+   cpumask_clear(_behind);
+   preempt_disable();
+   testcpu = smp_processor_id();
+   pr_warn("Checking clocksource %s synchronization from CPU %d.\n", 
cs->name, testcpu);
+   for_each_online_cpu(cpu) {
+   if (cpu == testcpu)
+   continue;
+   csnow_begin = cs->read(cs);
+   smp_call_function_single(cpu, clocksource_verify_one_cpu, cs, 
1);
+   csnow_end = cs->read(cs);
+   delta = (s64)((csnow_mid - csnow_begin) & cs->mask);
+   if (delta < 0)
+   cpumask_set_cpu(cpu, _behind);
+   delta = (csnow_end - csnow_mid) & cs->mask;
+   if (delta < 0)
+   cpumask_set_cpu(cpu, _ahead);
+   delta = clocksource_delta(csnow_end, csnow_begin, cs->mask);
+   cs_nsec = clocksource_cyc2ns(delta, cs->mult, cs->shift);
+   if (firsttime || cs_nsec > cs_nsec_max)
+   cs_nsec_max = cs_nsec;
+   if (firsttime || cs_nsec <

[PATCH v8 clocksource 2/5] clocksource: Retry clock read if long delays detected

When the clocksource watchdog marks a clock as unstable, this might
be due to that clock being unstable or it might be due to delays that
happen to occur between the reads of the two clocks.  Yes, interrupts are
disabled across those two reads, but there are no shortage of things that
can delay interrupts-disabled regions of code ranging from SMI handlers
to vCPU preemption.  It would be good to have some indication as to why
the clock was marked unstable.

Therefore, re-read the watchdog clock on either side of the read from
the clock under test.  If the watchdog clock shows an excessive time
delta between its pair of reads, the reads are retried.  The maximum
number of retries is specified by a new kernel boot parameter
clocksource.max_read_retries, which defaults to three, that is,
up to four reads, one initial and up to three retries.  If retries
were required, a message is printed on the console.  If the number of
retries is exceeded, the clock under test will be marked unstable.
However, the probability of this happening due to various sorts of
delays is quite small.  In addition, the reason (clock-read delays)
for the unstable marking will be apparent.

Cc: John Stultz 
Cc: Thomas Gleixner 
Cc: Stephen Boyd 
Cc: Jonathan Corbet 
Cc: Mark Rutland 
Cc: Marc Zyngier 
Cc: Andi Kleen 
Reported-by: Chris Mason 
[ paulmck: Per-clocksource retries per Neeraj Upadhyay feedback. ]
[ paulmck: Don't reset injectfail per Neeraj Upadhyay feedback. ]
[ paulmck: Apply Thomas Gleixner feedback. ]
Signed-off-by: Paul E. McKenney 
---
 kernel/time/clocksource.c | 24 
 1 file changed, 20 insertions(+), 4 deletions(-)

diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 4be4391aa72f..3ac19a7859f5 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -124,6 +124,7 @@ static void __clocksource_change_rating(struct clocksource 
*cs, int rating);
  */
 #define WATCHDOG_INTERVAL (HZ >> 1)
 #define WATCHDOG_THRESHOLD (NSEC_PER_SEC >> 4)
+#define WATCHDOG_MAX_SKEW (NSEC_PER_SEC >> 6)
 
 static void clocksource_watchdog_work(struct work_struct *work)
 {
@@ -213,9 +214,10 @@ static void clocksource_watchdog_inject_delay(void)
 static void clocksource_watchdog(struct timer_list *unused)
 {
struct clocksource *cs;
-   u64 csnow, wdnow, cslast, wdlast, delta;
-   int64_t wd_nsec, cs_nsec;
+   u64 csnow, wdnow, wdagain, cslast, wdlast, delta;
+   int64_t wd_nsec, wdagain_delta, wderr_nsec = 0, cs_nsec;
int next_cpu, reset_pending;
+   int nretries;
 
spin_lock(_lock);
if (!watchdog_running)
@@ -224,6 +226,7 @@ static void clocksource_watchdog(struct timer_list *unused)
reset_pending = atomic_read(_reset_pending);
 
list_for_each_entry(cs, _list, wd_list) {
+   nretries = 0;
 
/* Clocksource already marked unstable? */
if (cs->flags & CLOCK_SOURCE_UNSTABLE) {
@@ -232,11 +235,24 @@ static void clocksource_watchdog(struct timer_list 
*unused)
continue;
}
 
+retry:
local_irq_disable();
-   csnow = cs->read(cs);
-   clocksource_watchdog_inject_delay();
wdnow = watchdog->read(watchdog);
+   clocksource_watchdog_inject_delay();
+   csnow = cs->read(cs);
+   wdagain = watchdog->read(watchdog);
local_irq_enable();
+   delta = clocksource_delta(wdagain, wdnow, watchdog->mask);
+   wdagain_delta = clocksource_cyc2ns(delta, watchdog->mult, 
watchdog->shift);
+   if (wdagain_delta > WATCHDOG_MAX_SKEW) {
+   wderr_nsec = wdagain_delta;
+   if (nretries++ < max_read_retries)
+   goto retry;
+   }
+   if (nretries) {
+   pr_warn("timekeeping watchdog on CPU%d: %s read-back 
delay of %lldns, attempt %d\n",
+   smp_processor_id(), watchdog->name, wderr_nsec, 
nretries);
+   }
 
/* Clocksource initialized ? */
if (!(cs->flags & CLOCK_SOURCE_WATCHDOG) ||
-- 
2.25.1

[PATCH v8 clocksource 1/5] clocksource: Provide module parameters to inject delays in watchdog

When the clocksource watchdog marks a clock as unstable, this might be due
to that clock being unstable or it might be due to delays that happen to
occur between the reads of the two clocks.  Yes, interrupts are disabled
across those two reads, but there are no shortage of things that can
delay interrupts-disabled regions of code ranging from SMI handlers to
vCPU preemption.  It would be good to have some indication as to why
the clock was marked unstable.

The first step is a way of injecting such delays.  Therefore, provide
clocksource.inject_delay_freq and clocksource.inject_delay_run kernel boot
parameters that specify that sufficient delay be injected to cause the
clocksource_watchdog() function to mark a clock unstable.  This delay is
injected every Nth set of M calls to clocksource_watchdog(), where N is
the value specified for the inject_delay_freq boot parameter and M is the
value specified for the inject_delay_run boot parameter.  Values of zero
or less for either parameter disable delay injection, and the default for
clocksource.inject_delay_freq is zero, that is, disabled.  The default for
clocksource.inject_delay_run is the value one, that is single-call runs.

This facility is intended for diagnostic use only, and should be avoided
on production systems.

Cc: John Stultz 
Cc: Thomas Gleixner 
Cc: Stephen Boyd 
Cc: Jonathan Corbet 
Cc: Mark Rutland 
Cc: Marc Zyngier 
Cc: Andi Kleen 
[ paulmck: Apply Rik van Riel feedback. ]
Reported-by: Chris Mason 
Signed-off-by: Paul E. McKenney 
---
 .../admin-guide/kernel-parameters.txt | 22 +++
 kernel/time/clocksource.c | 27 +++
 2 files changed, 49 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 04545725f187..fc57952dcba0 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -583,6 +583,28 @@
loops can be debugged more effectively on production
systems.
 
+   clocksource.inject_delay_freq= [KNL]
+   Number of runs of calls to clocksource_watchdog()
+   before delays are injected between reads from the
+   two clocksources.  Values less than or equal to
+   zero disable this delay injection.  These delays
+   can cause clocks to be marked unstable, so use
+   of this parameter should therefore be avoided on
+   production systems.  Defaults to zero (disabled).
+
+   clocksource.inject_delay_run= [KNL]
+   Run lengths of clocksource_watchdog() delay
+   injections.  Specifying the value 8 will result
+   in eight consecutive delays followed by eight
+   times the value specified for inject_delay_freq
+   of consecutive non-delays.
+
+   clocksource.max_read_retries= [KNL]
+   Number of clocksource_watchdog() retries due to
+   external delays before the clock will be marked
+   unstable.  Defaults to three retries, that is,
+   four attempts to read the clock under test.
+
clearcpuid=BITNUM[,BITNUM...] [X86]
Disable CPUID feature X for the kernel. See
arch/x86/include/asm/cpufeatures.h for the valid bit
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index cce484a2cc7c..4be4391aa72f 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -14,6 +14,7 @@
 #include  /* for spin_unlock_irq() using preempt_count() m68k */
 #include 
 #include 
+#include 
 
 #include "tick-internal.h"
 #include "timekeeping_internal.h"
@@ -184,6 +185,31 @@ void clocksource_mark_unstable(struct clocksource *cs)
spin_unlock_irqrestore(_lock, flags);
 }
 
+static int inject_delay_freq;
+module_param(inject_delay_freq, int, 0644);
+static int inject_delay_run = 1;
+module_param(inject_delay_run, int, 0644);
+static int max_read_retries = 3;
+module_param(max_read_retries, int, 0644);
+
+static void clocksource_watchdog_inject_delay(void)
+{
+   int i;
+   static int injectfail = -1;
+
+   if (inject_delay_freq <= 0 || inject_delay_run <= 0)
+   return;
+   if (injectfail < 0 || injectfail > INT_MAX / 2)
+   injectfail = inject_delay_run;
+   if (!(++injectfail / inject_delay_run % inject_delay_freq)) {
+   pr_warn("%s(): Injecting delay.\n", __func__);
+   for (i = 0; i < 2 * WATCHDOG_THRESHOLD / NSEC_PER_MSEC; i++)
+   udelay(1000);
+   pr_warn("%s(): Done injecting delay.\n", __func__);
+   }
+   WARN_ON_ONCE(injectfail < 0);
+}
+
 static void clocksource_watchdog(struct timer_list *unused)
 {

[PATCH v8 clocksource 5/5] clocksource: Limit number of CPUs checked for clock synchronization

Currently, if skew is detected on a clock marked CLOCK_SOURCE_VERIFY_PERCPU,
that clock is checked on all CPUs.  This is thorough, but might not be
what you want on a system with a few tens of CPUs, let alone a few hundred
of them.

Therefore, by default check only up to eight randomly chosen CPUs.
Also provide a new clocksource.verify_n_cpus kernel boot parameter.
A value of -1 says to check all of the CPUs, and a non-negative value says
to randomly select that number of CPUs, without concern about selecting
the same CPU multiple times.  However, make use of a cpumask so that a
given CPU will be checked at most once.

Suggested-by: Thomas Gleixner  # For verify_n_cpus=1.
Cc: John Stultz 
Cc: Thomas Gleixner 
Cc: Stephen Boyd 
Cc: Jonathan Corbet 
Cc: Mark Rutland 
Cc: Marc Zyngier 
Cc: Andi Kleen 
Signed-off-by: Paul E. McKenney 
---
 .../admin-guide/kernel-parameters.txt | 10 +++
 kernel/time/clocksource.c | 74 ++-
 2 files changed, 82 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index f9da90f2791f..c09be8392bd6 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -621,6 +621,16 @@
unstable.  Defaults to three retries, that is,
four attempts to read the clock under test.
 
+   clocksource.verify_n_cpus= [KNL]
+   Limit the number of CPUs checked for clocksources
+   marked with CLOCK_SOURCE_VERIFY_PERCPU that
+   are marked unstable due to excessive skew.
+   A negative value says to check all CPUs, while
+   zero says not to check any.  Values larger than
+   nr_cpu_ids are silently truncated to nr_cpu_ids.
+   The actual CPUs are chosen randomly, with
+   no replacement if the same CPU is chosen twice.
+
clearcpuid=BITNUM[,BITNUM...] [X86]
Disable CPUID feature X for the kernel. See
arch/x86/include/asm/cpufeatures.h for the valid bit
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 5f641b15ec6c..8f4967e59b05 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -15,6 +15,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include "tick-internal.h"
 #include "timekeeping_internal.h"
@@ -194,6 +196,8 @@ static int inject_delay_shift_percpu = -1;
 module_param(inject_delay_shift_percpu, int, 0644);
 static int max_read_retries = 3;
 module_param(max_read_retries, int, 0644);
+static int verify_n_cpus = 8;
+module_param(verify_n_cpus, int, 0644);
 
 static void clocksource_watchdog_inject_delay(void)
 {
@@ -216,6 +220,55 @@ static void clocksource_watchdog_inject_delay(void)
 static u64 csnow_mid;
 static cpumask_t cpus_ahead;
 static cpumask_t cpus_behind;
+static cpumask_t cpus_chosen;
+
+static void clocksource_verify_choose_cpus(void)
+{
+   int cpu, i, n = verify_n_cpus;
+
+   if (n < 0) {
+   /* Check all of the CPUs. */
+   cpumask_copy(_chosen, cpu_online_mask);
+   cpumask_clear_cpu(smp_processor_id(), _chosen);
+   return;
+   }
+
+   /* If no checking desired, or no other CPU to check, leave. */
+   cpumask_clear(_chosen);
+   if (n == 0 || num_online_cpus() <= 1)
+   return;
+
+   /* Make sure to select at least one CPU other than the current CPU. */
+   cpu = cpumask_next(-1, cpu_online_mask);
+   if (cpu == smp_processor_id())
+   cpu = cpumask_next(cpu, cpu_online_mask);
+   if (WARN_ON_ONCE(cpu >= nr_cpu_ids))
+   return;
+   cpumask_set_cpu(cpu, _chosen);
+
+   /* Force a sane value for the boot parameter. */
+   if (n > nr_cpu_ids)
+   n = nr_cpu_ids - 1;
+
+   /*
+* Randomly select the specified number of CPUs.  If the same
+* CPU is selected multiple times, that CPU is checked only once,
+* and no replacement CPU is selected.  This gracefully handles
+* situations where verify_n_cpus is greater than the number of
+* CPUs that are currently online.
+*/
+   for (i = 1; i < n; i++) {
+   cpu = prandom_u32() % n;
+   cpu = cpumask_next(cpu - 1, cpu_online_mask);
+   if (cpu >= nr_cpu_ids)
+   cpu = cpumask_next(-1, cpu_online_mask);
+   if (!WARN_ON_ONCE(cpu >= nr_cpu_ids))
+   cpumask_set_cpu(cpu, _chosen);
+   }
+
+   /* Don't verify ourselves. */
+   cpumask_clear_cpu(smp_processor_id(), _chosen);
+}
 
 static void clocksource_verify_one_cpu(void *csin)
 {
@@ -239,12 +292,22 @@ static void clocksource_verify_percpu(struct clocksource 
*cs)
s64 delta;

[PATCH v8 clocksource 4/5] clocksource: Provide a module parameter to fuzz per-CPU clock checking

Code that checks for clock desynchronization must itself be tested, so
create a new clocksource.inject_delay_shift_percpu= kernel boot parameter
that adds or subtracts a large value from the check read, using the
specified bit of the CPU ID to determine whether to add or to subtract.

Cc: John Stultz 
Cc: Thomas Gleixner 
Cc: Stephen Boyd 
Cc: Jonathan Corbet 
Cc: Mark Rutland 
Cc: Marc Zyngier 
Cc: Andi Kleen 
Reported-by: Chris Mason 
[ paulmck: Apply Randy Dunlap feedback. ]
Signed-off-by: Paul E. McKenney 
---
 Documentation/admin-guide/kernel-parameters.txt | 16 
 kernel/time/clocksource.c   | 10 +-
 2 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index fc57952dcba0..f9da90f2791f 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -599,6 +599,22 @@
times the value specified for inject_delay_freq
of consecutive non-delays.
 
+   clocksource.inject_delay_shift_percpu= [KNL]
+   Clocksource delay injection partitions the CPUs
+   into two sets, one whose clocks are moved ahead
+   and the other whose clocks are moved behind.
+   This kernel parameter selects the CPU-number
+   bit that determines which of these two sets the
+   corresponding CPU is placed into.  For example,
+   setting this parameter to the value 4 will result
+   in the first set containing alternating groups
+   of 16 CPUs whose clocks are moved ahead, while
+   the second set will contain the rest of the CPUs,
+   whose clocks are moved behind.
+
+   The default value of -1 disables this type of
+   error injection.
+
clocksource.max_read_retries= [KNL]
Number of clocksource_watchdog() retries due to
external delays before the clock will be marked
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 69311deab428..5f641b15ec6c 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -190,6 +190,8 @@ static int inject_delay_freq;
 module_param(inject_delay_freq, int, 0644);
 static int inject_delay_run = 1;
 module_param(inject_delay_run, int, 0644);
+static int inject_delay_shift_percpu = -1;
+module_param(inject_delay_shift_percpu, int, 0644);
 static int max_read_retries = 3;
 module_param(max_read_retries, int, 0644);
 
@@ -218,8 +220,14 @@ static cpumask_t cpus_behind;
 static void clocksource_verify_one_cpu(void *csin)
 {
struct clocksource *cs = (struct clocksource *)csin;
+   s64 delta = 0;
+   int sign;
 
-   csnow_mid = cs->read(cs);
+   if (inject_delay_shift_percpu >= 0) {
+   sign = ((smp_processor_id() >> inject_delay_shift_percpu) & 
0x1) * 2 - 1;
+   delta = sign * NSEC_PER_SEC;
+   }
+   csnow_mid = cs->read(cs) + delta;
 }
 
 static void clocksource_verify_percpu(struct clocksource *cs)
-- 
2.25.1

[PATCH v8 clocksource] Do not mark clocks unstable due to delays for v5.13

Hello!

If there is a sufficient delay between reading the watchdog clock and the
clock under test, the clock under test will be marked unstable through no
fault of its own.  This series checks for this, doing limited retries
to get a good set of clock reads.  If the clock is marked unstable
and is marked as being per-CPU, cross-CPU synchronization is checked.
This series also provides delay injection, which may be enabled via
kernel boot parameters to test the checking for delays.

Note that "sufficient delay" can be provided by SMIs, NMIs, and of course
vCPU preemption.

1.  Provide module parameters to inject delays in watchdog.

2.  Retry clock read if long delays detected.

3.  Check per-CPU clock synchronization when marked unstable.

4.  Provide a module parameter to fuzz per-CPU clock checking.

5.  Limit number of CPUs checked for clock synchronization.

Changes since v7, based on Thomas Gleixner feedback:

o   Fix embarrassing git-format-patch operator error.

o   Merge pairwise clock-desynchronization checking into the checking
of per-CPU clock synchronization when marked unstable.

o   Do selective per-CPU checking rather than blindly checking all
CPUs.  Provide a clocksource.verify_n_cpus kernel boot parameter
to control this behavior, with the value -1 choosing the old
check-all-CPUs behavior.  The default is to randomly check 8 CPUs.

o   Fix the clock-desynchronization checking to avoid a potential
use-after-free error for dynamically allocated clocksource
structures.

o   Remove redundance "wdagain_nsec < 0" from clocksource_watchdog()
clocksource skew checking.

o   Update commit logs and do code-style updates.

Changes since v5:

o   Rebased to v5.12-rc5.

Changes since v4:

o   Rebased to v5.12-rc1.

Changes since v3:

o   Rebased to v5.11.

o   Apply Randy Dunlap feedback.

Changes since v2:

o   Rebased to v5.11-rc6.

o   Updated Cc: list.

Changes since v1:

o   Applied feedback from Rik van Riel.

o   Rebased to v5.11-rc3.

o   Stripped "RFC" from the subject lines.

Thanx, Paul



 Documentation/admin-guide/kernel-parameters.txt   |   26 +++
 b/Documentation/admin-guide/kernel-parameters.txt |   22 ++
 b/arch/x86/kernel/kvmclock.c  |2 
 b/arch/x86/kernel/tsc.c   |3 
 b/include/linux/clocksource.h |2 
 b/kernel/time/clocksource.c   |   27 +++
 kernel/time/clocksource.c |  171 +-
 7 files changed, 243 insertions(+), 10 deletions(-)

xilinx_axienet_main.c:undefined reference to `of_address_to_resource'

Hi Robert,

FYI, the error/warning still remains.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
master
head:   50987beca096a7ed4f453a6da245fd6a2fadedeb
commit: 57baf8cc70ea4cf5503c9d42f31f6a86d7f5ff1a net: axienet: Handle deferred 
probe on clock properly
date:   9 weeks ago
config: s390-randconfig-c023-20210414 (attached as .config)
compiler: s390-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=57baf8cc70ea4cf5503c9d42f31f6a86d7f5ff1a
git remote add linus 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
git fetch --no-tags linus master
git checkout 57baf8cc70ea4cf5503c9d42f31f6a86d7f5ff1a
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross 
ARCH=s390 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All errors (new ones prefixed by >>):

   s390-linux-ld: drivers/net/arcnet/com90xx.o: in function `com90xx_exit':
   com90xx.c:(.exit.text+0xe6): undefined reference to `iounmap'
   s390-linux-ld: drivers/net/arcnet/com90xx.o: in function `check_mirror':
   com90xx.c:(.init.text+0x4c): undefined reference to `ioremap'
   s390-linux-ld: com90xx.c:(.init.text+0xa8): undefined reference to `iounmap'
   s390-linux-ld: drivers/net/arcnet/com90xx.o: in function `com90xx_probe':
   com90xx.c:(.init.text+0x1182): undefined reference to `ioremap'
   s390-linux-ld: com90xx.c:(.init.text+0x15dc): undefined reference to 
`iounmap'
   s390-linux-ld: com90xx.c:(.init.text+0x2234): undefined reference to 
`iounmap'
   s390-linux-ld: com90xx.c:(.init.text+0x23b2): undefined reference to 
`iounmap'
   s390-linux-ld: com90xx.c:(.init.text+0x26d6): undefined reference to 
`ioremap'
   s390-linux-ld: com90xx.c:(.init.text+0x29ae): undefined reference to 
`iounmap'
   s390-linux-ld: com90xx.c:(.init.text+0x2e28): undefined reference to 
`iounmap'
   s390-linux-ld: drivers/net/arcnet/arc-rimi.o: in function `arc_rimi_exit':
   arc-rimi.c:(.exit.text+0x50): undefined reference to `iounmap'
   s390-linux-ld: drivers/net/arcnet/arc-rimi.o: in function `arc_rimi_init':
   arc-rimi.c:(.init.text+0x5ca): undefined reference to `ioremap'
   s390-linux-ld: arc-rimi.c:(.init.text+0x6c8): undefined reference to 
`iounmap'
   s390-linux-ld: arc-rimi.c:(.init.text+0xac4): undefined reference to 
`iounmap'
   s390-linux-ld: arc-rimi.c:(.init.text+0xc10): undefined reference to 
`ioremap'
   s390-linux-ld: arc-rimi.c:(.init.text+0xeea): undefined reference to 
`iounmap'
   s390-linux-ld: drivers/net/arcnet/arc-rimi.o: in function `check_mirror':
   arc-rimi.c:(.text.unlikely+0x4c): undefined reference to `ioremap'
   s390-linux-ld: arc-rimi.c:(.text.unlikely+0xa8): undefined reference to 
`iounmap'
   s390-linux-ld: drivers/net/ethernet/altera/altera_tse_main.o: in function 
`request_and_map':
   altera_tse_main.c:(.text+0xe54): undefined reference to `devm_ioremap'
   s390-linux-ld: drivers/net/ethernet/fujitsu/fmvj18x_cs.o: in function 
`fmvj18x_release':
   fmvj18x_cs.c:(.text+0x10d2): undefined reference to `iounmap'
   s390-linux-ld: drivers/net/ethernet/fujitsu/fmvj18x_cs.o: in function 
`fmvj18x_config':
   fmvj18x_cs.c:(.text+0x3096): undefined reference to `ioremap'
   s390-linux-ld: fmvj18x_cs.c:(.text+0x3132): undefined reference to `iounmap'
   s390-linux-ld: fmvj18x_cs.c:(.text+0x3896): undefined reference to `ioremap'
   s390-linux-ld: fmvj18x_cs.c:(.text+0x3bba): undefined reference to `iounmap'
   s390-linux-ld: fmvj18x_cs.c:(.text+0x3f62): undefined reference to `iounmap'
   s390-linux-ld: drivers/net/ethernet/xilinx/ll_temac_main.o: in function 
`temac_probe':
   ll_temac_main.c:(.text+0x6044): undefined reference to 
`devm_platform_ioremap_resource_byname'
   s390-linux-ld: ll_temac_main.c:(.text+0x63fc): undefined reference to 
`devm_of_iomap'
   s390-linux-ld: ll_temac_main.c:(.text+0x65ee): undefined reference to 
`devm_platform_ioremap_resource'
   s390-linux-ld: drivers/net/ethernet/xilinx/ll_temac_mdio.o: in function 
`temac_mdio_setup':
   ll_temac_mdio.c:(.text+0x3b0): undefined reference to 
`of_address_to_resource'
   s390-linux-ld: drivers/net/ethernet/xilinx/xilinx_axienet_main.o: in 
function `axienet_probe':
>> xilinx_axienet_main.c:(.text+0x59dc): undefined reference to 
>> `of_address_to_resource'
   s390-linux-ld: xilinx_axienet_main.c:(.text+0x5a18): undefined reference to 
`devm_ioremap_resource'
   s390-linux-ld: xilinx_axienet_main.c:(.text+0x5b78): undefined reference to 
`devm_ioremap_resource'
   s390-linux-ld: xilinx_axienet_main.c:(.text+0x6548): undefined reference to 
`devm_ioremap_resource'
   s390-linux-ld: drivers/net/ethernet/xircom/xirc2ps_cs.o: in function

[PATCH 2/2] m68k: update configs to match the proper KUNIT syntax

2021-04-13 Thread Nico Pache

No functional changes other than CONFIG name changes
Signed-off-by: Nico Pache 
---
 arch/m68k/configs/amiga_defconfig| 6 +++---
 arch/m68k/configs/apollo_defconfig   | 6 +++---
 arch/m68k/configs/atari_defconfig| 6 +++---
 arch/m68k/configs/bvme6000_defconfig | 6 +++---
 arch/m68k/configs/hp300_defconfig| 6 +++---
 arch/m68k/configs/mac_defconfig  | 6 +++---
 arch/m68k/configs/multi_defconfig| 6 +++---
 arch/m68k/configs/mvme147_defconfig  | 6 +++---
 arch/m68k/configs/mvme16x_defconfig  | 6 +++---
 arch/m68k/configs/q40_defconfig  | 6 +++---
 arch/m68k/configs/sun3_defconfig | 6 +++---
 arch/m68k/configs/sun3x_defconfig| 6 +++---
 12 files changed, 36 insertions(+), 36 deletions(-)

diff --git a/arch/m68k/configs/amiga_defconfig 
b/arch/m68k/configs/amiga_defconfig
index 786656090c50..77cc4ff7ae3a 100644
--- a/arch/m68k/configs/amiga_defconfig
+++ b/arch/m68k/configs/amiga_defconfig
@@ -655,11 +655,11 @@ CONFIG_TEST_BLACKHOLE_DEV=m
 CONFIG_FIND_BIT_BENCHMARK=m
 CONFIG_TEST_FIRMWARE=m
 CONFIG_TEST_SYSCTL=m
-CONFIG_BITFIELD_KUNIT=m
+CONFIG_BITFIELD_KUNIT_TEST=m
 CONFIG_RESOURCE_KUNIT_TEST=m
-CONFIG_LINEAR_RANGES_TEST=m
+CONFIG_LINEAR_RANGES_KUNIT_TEST=m
 CONFIG_CMDLINE_KUNIT_TEST=m
-CONFIG_BITS_TEST=m
+CONFIG_BITS_KUNIT_TEST=m
 CONFIG_TEST_UDELAY=m
 CONFIG_TEST_STATIC_KEYS=m
 CONFIG_TEST_KMOD=m
diff --git a/arch/m68k/configs/apollo_defconfig 
b/arch/m68k/configs/apollo_defconfig
index 9bb12be4a38e..86913bdb265b 100644
--- a/arch/m68k/configs/apollo_defconfig
+++ b/arch/m68k/configs/apollo_defconfig
@@ -611,11 +611,11 @@ CONFIG_TEST_BLACKHOLE_DEV=m
 CONFIG_FIND_BIT_BENCHMARK=m
 CONFIG_TEST_FIRMWARE=m
 CONFIG_TEST_SYSCTL=m
-CONFIG_BITFIELD_KUNIT=m
+CONFIG_BITFIELD_KUNIT_TEST=m
 CONFIG_RESOURCE_KUNIT_TEST=m
-CONFIG_LINEAR_RANGES_TEST=m
+CONFIG_LINEAR_RANGES_KUNIT_TEST=m
 CONFIG_CMDLINE_KUNIT_TEST=m
-CONFIG_BITS_TEST=m
+CONFIG_BITS_KUNIT_TEST=m
 CONFIG_TEST_UDELAY=m
 CONFIG_TEST_STATIC_KEYS=m
 CONFIG_TEST_KMOD=m
diff --git a/arch/m68k/configs/atari_defconfig 
b/arch/m68k/configs/atari_defconfig
index 413232626d9d..6b5c35e7be44 100644
--- a/arch/m68k/configs/atari_defconfig
+++ b/arch/m68k/configs/atari_defconfig
@@ -633,11 +633,11 @@ CONFIG_TEST_BLACKHOLE_DEV=m
 CONFIG_FIND_BIT_BENCHMARK=m
 CONFIG_TEST_FIRMWARE=m
 CONFIG_TEST_SYSCTL=m
-CONFIG_BITFIELD_KUNIT=m
+CONFIG_BITFIELD_KUNIT_TEST=m
 CONFIG_RESOURCE_KUNIT_TEST=m
-CONFIG_LINEAR_RANGES_TEST=m
+CONFIG_LINEAR_RANGES_KUNIT_TEST=m
 CONFIG_CMDLINE_KUNIT_TEST=m
-CONFIG_BITS_TEST=m
+CONFIG_BITS_KUNIT_TEST=m
 CONFIG_TEST_UDELAY=m
 CONFIG_TEST_STATIC_KEYS=m
 CONFIG_TEST_KMOD=m
diff --git a/arch/m68k/configs/bvme6000_defconfig 
b/arch/m68k/configs/bvme6000_defconfig
index 819cc70b06d8..8fbd238d9d29 100644
--- a/arch/m68k/configs/bvme6000_defconfig
+++ b/arch/m68k/configs/bvme6000_defconfig
@@ -604,11 +604,11 @@ CONFIG_TEST_BLACKHOLE_DEV=m
 CONFIG_FIND_BIT_BENCHMARK=m
 CONFIG_TEST_FIRMWARE=m
 CONFIG_TEST_SYSCTL=m
-CONFIG_BITFIELD_KUNIT=m
+CONFIG_BITFIELD_KUNIT_TEST=m
 CONFIG_RESOURCE_KUNIT_TEST=m
-CONFIG_LINEAR_RANGES_TEST=m
+CONFIG_LINEAR_RANGES_KUNIT_TEST=m
 CONFIG_CMDLINE_KUNIT_TEST=m
-CONFIG_BITS_TEST=m
+CONFIG_BITS_KUNIT_TEST=m
 CONFIG_TEST_UDELAY=m
 CONFIG_TEST_STATIC_KEYS=m
 CONFIG_TEST_KMOD=m
diff --git a/arch/m68k/configs/hp300_defconfig 
b/arch/m68k/configs/hp300_defconfig
index 8f8d5968713b..dbebbc079611 100644
--- a/arch/m68k/configs/hp300_defconfig
+++ b/arch/m68k/configs/hp300_defconfig
@@ -613,11 +613,11 @@ CONFIG_TEST_BLACKHOLE_DEV=m
 CONFIG_FIND_BIT_BENCHMARK=m
 CONFIG_TEST_FIRMWARE=m
 CONFIG_TEST_SYSCTL=m
-CONFIG_BITFIELD_KUNIT=m
+CONFIG_BITFIELD_KUNIT_TEST=m
 CONFIG_RESOURCE_KUNIT_TEST=m
-CONFIG_LINEAR_RANGES_TEST=m
+CONFIG_LINEAR_RANGES_KUNIT_TEST=m
 CONFIG_CMDLINE_KUNIT_TEST=m
-CONFIG_BITS_TEST=m
+CONFIG_BITS_KUNIT_TEST=m
 CONFIG_TEST_UDELAY=m
 CONFIG_TEST_STATIC_KEYS=m
 CONFIG_TEST_KMOD=m
diff --git a/arch/m68k/configs/mac_defconfig b/arch/m68k/configs/mac_defconfig
index bf15e6c1c939..3ccafd1db067 100644
--- a/arch/m68k/configs/mac_defconfig
+++ b/arch/m68k/configs/mac_defconfig
@@ -636,11 +636,11 @@ CONFIG_TEST_BLACKHOLE_DEV=m
 CONFIG_FIND_BIT_BENCHMARK=m
 CONFIG_TEST_FIRMWARE=m
 CONFIG_TEST_SYSCTL=m
-CONFIG_BITFIELD_KUNIT=m
+CONFIG_BITFIELD_KUNIT_TEST=m
 CONFIG_RESOURCE_KUNIT_TEST=m
-CONFIG_LINEAR_RANGES_TEST=m
+CONFIG_LINEAR_RANGES_KUNIT_TEST=m
 CONFIG_CMDLINE_KUNIT_TEST=m
-CONFIG_BITS_TEST=m
+CONFIG_BITS_KUNIT_TEST=m
 CONFIG_TEST_UDELAY=m
 CONFIG_TEST_STATIC_KEYS=m
 CONFIG_TEST_KMOD=m
diff --git a/arch/m68k/configs/multi_defconfig 
b/arch/m68k/configs/multi_defconfig
index 5466d48fcd9d..572c95f1c8d7 100644
--- a/arch/m68k/configs/multi_defconfig
+++ b/arch/m68k/configs/multi_defconfig
@@ -722,11 +722,11 @@ CONFIG_TEST_BLACKHOLE_DEV=m
 CONFIG_FIND_BIT_BENCHMARK=m
 CONFIG_TEST_FIRMWARE=m
 CONFIG_TEST_SYSCTL=m
-CONFIG_BITFIELD_KUNIT=m
+CONFIG_BITFIELD_KUNIT_TEST=m
 CONFIG_RESOURCE_KUNIT_TEST=m
-CONFIG_LINEAR_RANGES_TEST=m
+CONFIG_LINEAR_RANGES_KUNIT_TEST=m
 CONFIG_CMDLINE_KUNIT_TEST=m
-CONFIG_BITS_TEST=m

[PATCH 1/2] kunit: Fix formatting of KUNIT tests to meet the standard

2021-04-13 Thread Nico Pache

There are few instances of KUNIT tests that are not properly defined.
This commit focuses on correcting these issues to match the standard
defined in the Documentation.

Issues Fixed:
 - Tests should default to KUNIT_ALL_TESTS
 - Tests configs tristate should have `if !KUNIT_ALL_TESTS`
 - Tests should end in KUNIT_TEST, some fixes have been applied to
correct issues were KUNIT_TESTS is used or KUNIT is not mentioned.

No functional changes other than CONFIG name changes
Signed-off-by: Nico Pache 
---
 drivers/base/test/Kconfig  |  2 +-
 drivers/base/test/Makefile |  2 +-
 fs/ext4/.kunitconfig   |  2 +-
 fs/ext4/Kconfig|  2 +-
 fs/ext4/Makefile   |  2 +-
 lib/Kconfig.debug  | 21 +
 lib/Makefile   |  6 +++---
 net/mptcp/Kconfig  |  2 +-
 net/mptcp/Makefile |  2 +-
 net/mptcp/crypto.c |  2 +-
 net/mptcp/token.c  |  2 +-
 sound/soc/Kconfig  |  2 +-
 sound/soc/Makefile |  4 ++--
 13 files changed, 28 insertions(+), 23 deletions(-)

diff --git a/drivers/base/test/Kconfig b/drivers/base/test/Kconfig
index ba225eb1b761..2f3fa31a948e 100644
--- a/drivers/base/test/Kconfig
+++ b/drivers/base/test/Kconfig
@@ -8,7 +8,7 @@ config TEST_ASYNC_DRIVER_PROBE
  The module name will be test_async_driver_probe.ko
 
  If unsure say N.
-config KUNIT_DRIVER_PE_TEST
+config DRIVER_PE_KUNIT_TEST
bool "KUnit Tests for property entry API" if !KUNIT_ALL_TESTS
depends on KUNIT=y
default KUNIT_ALL_TESTS
diff --git a/drivers/base/test/Makefile b/drivers/base/test/Makefile
index 2f15fae8625f..64b2f3d744d5 100644
--- a/drivers/base/test/Makefile
+++ b/drivers/base/test/Makefile
@@ -1,5 +1,5 @@
 # SPDX-License-Identifier: GPL-2.0
 obj-$(CONFIG_TEST_ASYNC_DRIVER_PROBE)  += test_async_driver_probe.o
 
-obj-$(CONFIG_KUNIT_DRIVER_PE_TEST) += property-entry-test.o
+obj-$(CONFIG_DRIVER_PE_KUNIT_TEST) += property-entry-test.o
 CFLAGS_REMOVE_property-entry-test.o += -fplugin-arg-structleak_plugin-byref 
-fplugin-arg-structleak_plugin-byref-all
diff --git a/fs/ext4/.kunitconfig b/fs/ext4/.kunitconfig
index bf51da7cd9fc..81d4da667740 100644
--- a/fs/ext4/.kunitconfig
+++ b/fs/ext4/.kunitconfig
@@ -1,3 +1,3 @@
 CONFIG_KUNIT=y
 CONFIG_EXT4_FS=y
-CONFIG_EXT4_KUNIT_TESTS=y
+CONFIG_EXT4_KUNIT_TEST=y
diff --git a/fs/ext4/Kconfig b/fs/ext4/Kconfig
index 86699c8cab28..1569d3872136 100644
--- a/fs/ext4/Kconfig
+++ b/fs/ext4/Kconfig
@@ -101,7 +101,7 @@ config EXT4_DEBUG
  If you select Y here, then you will be able to turn on debugging
  using dynamic debug control for mb_debug() / ext_debug() msgs.
 
-config EXT4_KUNIT_TESTS
+config EXT4_KUNIT_TEST
tristate "KUnit tests for ext4" if !KUNIT_ALL_TESTS
depends on EXT4_FS && KUNIT
default KUNIT_ALL_TESTS
diff --git a/fs/ext4/Makefile b/fs/ext4/Makefile
index 49e7af6cc93f..4e28e380d51b 100644
--- a/fs/ext4/Makefile
+++ b/fs/ext4/Makefile
@@ -15,5 +15,5 @@ ext4-y:= balloc.o bitmap.o block_validity.o dir.o 
ext4_jbd2.o extents.o \
 ext4-$(CONFIG_EXT4_FS_POSIX_ACL)   += acl.o
 ext4-$(CONFIG_EXT4_FS_SECURITY)+= xattr_security.o
 ext4-inode-test-objs   += inode-test.o
-obj-$(CONFIG_EXT4_KUNIT_TESTS) += ext4-inode-test.o
+obj-$(CONFIG_EXT4_KUNIT_TEST)  += ext4-inode-test.o
 ext4-$(CONFIG_FS_VERITY)   += verity.o
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 417c3d3e521b..7ade91511da6 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -2279,9 +2279,10 @@ config TEST_SYSCTL
 
  If unsure, say N.
 
-config BITFIELD_KUNIT
-   tristate "KUnit test bitfield functions at runtime"
+config BITFIELD_KUNIT_TEST_TEST
+   tristate "KUnit test bitfield functions at runtime" if !KUNIT_ALL_TESTS
depends on KUNIT
+   default KUNIT_ALL_TESTS
help
  Enable this option to test the bitfield functions at boot.
 
@@ -2296,8 +2297,9 @@ config BITFIELD_KUNIT
  If unsure, say N.
 
 config RESOURCE_KUNIT_TEST
-   tristate "KUnit test for resource API"
+   tristate "KUnit test for resource API" if !KUNIT_ALL_TESTS
depends on KUNIT
+   default KUNIT_ALL_TESTS
help
  This builds the resource API unit test.
  Tests the logic of API provided by resource.c and ioport.h.
@@ -2337,9 +2339,10 @@ config LIST_KUNIT_TEST
 
  If unsure, say N.
 
-config LINEAR_RANGES_TEST
-   tristate "KUnit test for linear_ranges"
+config LINEAR_RANGES_KUNIT_TEST
+   tristate "KUnit test for linear_ranges" if !KUNIT_ALL_TESTS
depends on KUNIT
+   default KUNIT_ALL_TESTS
select LINEAR_RANGES
help
  This builds the linear_ranges unit test, which runs on boot.
@@ -2350,8 +2353,9 @@ config LINEAR_RANGES_TEST
  If unsure, say N.
 
 config CMDLINE_KUNIT_TEST
-   tristate "KUnit test for cmdline API"
+   tristate "KUnit test for

Re: [PATCH v3] Documentation: kunit: add tips for running KUnit

2021-04-13 Thread David Gow

On Wed, Apr 14, 2021 at 8:45 AM Daniel Latypov  wrote:
>
> This is long overdue.
>
> There are several things that aren't nailed down (in-tree
> .kunitconfig's), or partially broken (GCOV on UML), but having them
> documented, warts and all, is better than having nothing.
>
> This covers a bunch of the more recent features
> * kunit_filter_glob
> * kunit.py run --kunitconfig
> * slightly more detail on building tests as modules
> * CONFIG_KUNIT_DEBUGFS
>
> By my count, the only headline features now not mentioned are the KASAN
> integration and KernelCI json output support (kunit.py run --json).
>
> And then it also discusses how to get code coverage reports under UML
> and non-UML since this is a question people have repeatedly asked.
>
> Non-UML coverage collection is no different from normal, but we should
> probably explicitly call this out.
>
> As for UML, I was able to get it working again with two small hacks.*
> E.g. with CONFIG_KUNIT=y && CONFIG_KUNIT_ALL_TESTS=y
>   Overall coverage rate:
> lines..: 15.1% (18294 of 120776 lines)
> functions..: 16.8% (1860 of 11050 functions)
>
> Note: this doesn't document --alltests since this is not stable yet.
> Hopefully being run more frequently as part of KernelCI will help...
>
> *Using gcc/gcov-6 and not using uml_abort() in os_dump_core().
> I've documented these hacks in "Notes" but left TODOs for
> brendanhigg...@google.com who tracked down the runtime issue in GCC.
> To be clear: these are not issues specific to KUnit, but rather to UML.
>
> Signed-off-by: Daniel Latypov 
> ---

I'm very happy with this now: all my issues with the previous versions
are addressed. I'm particularly excited to have code coverage
documented somewhere.

Assuming Brendan's happy with the TODOs being there, I think this is
ready to go.

I also built this with Sphinx and gave it a quick look, and it all
looks good there as well.

Therefore, this is:

Reviewed-by: David Gow 

Cheers,
-- David

Re: [PATCH v4 1/2] mfd: google,cros-ec: add DT bindings for a baseboard's switch device

2021-04-13 Thread Ikjoon Jang

On Tue, Apr 13, 2021 at 10:57 PM Rob Herring  wrote:
>
> On Mon, Apr 12, 2021 at 07:30:19PM +0800, Ikjoon Jang wrote:
> > This is for ChromeOS tablets which have a 'cros_cbas' switch device
> > in the "Whiskers" base board. This device can be instantiated only by
> > device tree on ARM platforms. ChromeOS EC doesn't provide a way to
> > probe the device.
> >
> > Signed-off-by: Ikjoon Jang 
> >
> > ---
> >
> > Changes in v4:
> > Define cros-cbase bindings inside google,cros-ec.yaml instead of
> > a seperated binding document.
> >
> >  .../devicetree/bindings/mfd/google,cros-ec.yaml  | 16 
> >  1 file changed, 16 insertions(+)
> >
> > diff --git a/Documentation/devicetree/bindings/mfd/google,cros-ec.yaml 
> > b/Documentation/devicetree/bindings/mfd/google,cros-ec.yaml
> > index 76bf16ee27ec..c76809cd9f7f 100644
> > --- a/Documentation/devicetree/bindings/mfd/google,cros-ec.yaml
> > +++ b/Documentation/devicetree/bindings/mfd/google,cros-ec.yaml
> > @@ -127,6 +127,18 @@ patternProperties:
> >  type: object
> >  $ref: "/schemas/extcon/extcon-usbc-cros-ec.yaml#"
> >
> > +  "^cbas$":
>
> Not a pattern, put under 'properties'.
>
> > +type: object
> > +properties:
> > +  compatible:
> > +const: google,cros-cbas
>
> Blank line
>
> > +required:
> > +  - compatible
>
> Blank line
>
> > +additionalProperties: false
> > +description:
>
> Make this 1st or at least before 'properties'.

ACKed, thanks!

>
> > +  This device is used to signal when a detachable base is attached
> > +  to a Chrome OS tablet.
>
> This can't happen at runtime?

There is no way to detect the switch device's existence at runtime.
I'll add a note to the description about this.

>
> > +
> >  required:
> >- compatible
> >
> > @@ -180,6 +192,10 @@ examples:
> >  interrupts = <99 0>;
> >  interrupt-parent = <>;
> >  spi-max-frequency = <500>;
> > +
> > +base_detection: cbas {
> > +compatible = "google,cros-cbas";
> > +};
> >  };
> >  };
> >
> > --
> > 2.31.1.295.g9ea45b61b8-goog
> >

Re: [PATCH 1/2] iommu/sva: Tighten SVA bind API with explicit flags

Hi Jacob,

I love your patch! Yet something to improve:

[auto build test ERROR on vkoul-dmaengine/next]
[also build test ERROR on char-misc/char-misc-testing v5.12-rc7]
[cannot apply to iommu/next next-20210413]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Jacob-Pan/iommu-sva-Tighten-SVA-bind-API-with-explicit-flags/20210409-094521
base:   https://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine.git next
config: x86_64-randconfig-a016-20210413 (attached as .config)
compiler: clang version 13.0.0 (https://github.com/llvm/llvm-project 
9829f5e6b1bca9b61efc629770d28bb9014dec45)
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# install x86_64 cross compiling tool for clang build
# apt-get install binutils-x86-64-linux-gnu
# 
https://github.com/0day-ci/linux/commit/3b009446e2f451c401cff7a6d4766424acbcc890
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review 
Jacob-Pan/iommu-sva-Tighten-SVA-bind-API-with-explicit-flags/20210409-094521
git checkout 3b009446e2f451c401cff7a6d4766424acbcc890
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All errors (new ones prefixed by >>):

>> drivers/dma/idxd/init.c:307:10: error: use of undeclared identifier 
>> 'IOMMU_SVA_BIND_SUPERVISOR'
   flags = IOMMU_SVA_BIND_SUPERVISOR;
   ^
   drivers/dma/idxd/init.c:422:30: warning: shift count >= width of type 
[-Wshift-count-overflow]
   rc = pci_set_dma_mask(pdev, DMA_BIT_MASK(64));
   ^~~~
   include/linux/dma-mapping.h:76:54: note: expanded from macro 'DMA_BIT_MASK'
   #define DMA_BIT_MASK(n) (((n) == 64) ? ~0ULL : ((1ULL<<(n))-1))
^ ~~~
   drivers/dma/idxd/init.c:428:41: warning: shift count >= width of type 
[-Wshift-count-overflow]
   rc = pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(64));
  ^~~~
   include/linux/dma-mapping.h:76:54: note: expanded from macro 'DMA_BIT_MASK'
   #define DMA_BIT_MASK(n) (((n) == 64) ? ~0ULL : ((1ULL<<(n))-1))
^ ~~~
   2 warnings and 1 error generated.


vim +/IOMMU_SVA_BIND_SUPERVISOR +307 drivers/dma/idxd/init.c

   300  
   301  static int idxd_enable_system_pasid(struct idxd_device *idxd)
   302  {
   303  unsigned int flags;
   304  unsigned int pasid;
   305  struct iommu_sva *sva;
   306  
 > 307  flags = IOMMU_SVA_BIND_SUPERVISOR;
   308  
   309  sva = iommu_sva_bind_device(>pdev->dev, NULL, flags);
   310  if (IS_ERR(sva)) {
   311  dev_warn(>pdev->dev,
   312   "iommu sva bind failed: %ld\n", PTR_ERR(sva));
   313  return PTR_ERR(sva);
   314  }
   315  
   316  pasid = iommu_sva_get_pasid(sva);
   317  if (pasid == IOMMU_PASID_INVALID) {
   318  iommu_sva_unbind_device(sva);
   319  return -ENODEV;
   320  }
   321  
   322  idxd->sva = sva;
   323  idxd->pasid = pasid;
   324  dev_dbg(>pdev->dev, "system pasid: %u\n", pasid);
   325  return 0;
   326  }
   327  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


.config.gz
Description: application/gzip

Re: [PATCH] staging: mt7621-pci: stop using of_pci_range_to_resource

Hi Sergio,

Just as an aside, are you planning to move staging/mt7621-pci into
arch/mips/pci at some point? This driver seems more maintained (by
you!) than many in-tree drivers...

Ilya

On Sat, Apr 10, 2021 at 12:23 PM Sergio Paracuellos
 wrote:
>
> Hi Ilya,
>
> On Sat, Apr 10, 2021 at 7:33 PM Ilya Lipnitskiy
>  wrote:
> >
> > The logic here was already overriding the erroneous IO addresses
> > returned from of_pci_range_to_resource, which is the bulk of the logic.
> >
> > So stop using it altogether and initialize the fields explicitly, as
> > done in aeba3731b150 ("powerpc/pci: Fix IO space breakage after
> > of_pci_range_to_resource() change").
> >
> > Signed-off-by: Ilya Lipnitskiy 
> > Cc: Sergio Paracuellos 
> > ---
> >  drivers/staging/mt7621-pci/pci-mt7621.c | 11 ++-
> >  1 file changed, 6 insertions(+), 5 deletions(-)
>
> Looks good to me, thanks! I have also tested this in gnubee pc1
> platform with no regressions at all when io bars are assigned:
>
> [   16.378956] mt7621-pci 1e14.pcie: host bridge /pcie@1e14 ranges:
> [   16.392405] mt7621-pci 1e14.pcie:  MEM
> 0x006000..0x006fff -> 0x00
> [   16.408796] mt7621-pci 1e14.pcie:   IO
> 0x001e16..0x001e16 -> 0x00
> [   16.425264] mt7621-pci-phy 1e149000.pcie-phy: PHY for 0xbe149000
> (dual port = 1)
> [   16.440452] mt7621-pci-phy 1e14a000.pcie-phy: PHY for 0xbe14a000
> (dual port = 0)
> [   16.678713] mt7621-pci 1e14.pcie: PCIE0 enabled
> [   16.688435] mt7621-pci 1e14.pcie: PCIE1 enabled
> [   16.698160] mt7621-pci 1e14.pcie: PCIE2 enabled
> [   16.707886] mt7621-pci 1e14.pcie: PCI coherence region base:
> 0x6000, mask/settings: 0xf002
> [   16.726623] mt7621-pci 1e14.pcie: PCI host bridge to bus :00
> [   16.739309] pci_bus :00: root bus resource [io  0x1e16-0x1e16]
> [   16.753008] pci_bus :00: root bus resource [mem 0x6000-0x6fff]
> [   16.766709] pci_bus :00: root bus resource [bus 00-ff]
> [   16.777649] pci_bus :00: root bus resource [mem
> 0x6000-0x6fff] (bus address [0x-0x0fff])
> [   16.797986] pci :00:00.0: [0e8d:0801] type 01 class 0x060400
> [   16.809973] pci :00:00.0: reg 0x10: [mem 0x-0x7fff]
> [   16.822467] pci :00:00.0: reg 0x14: initial BAR value 0x 
> invalid
> [   16.836511] pci :00:00.0: reg 0x14: [mem size 0x0001]
> [   16.848048] pci :00:00.0: supports D1
> [   16.856051] pci :00:00.0: PME# supported from D0 D1 D3hot
> [   16.867943] pci :00:01.0: [0e8d:0801] type 01 class 0x060400
> [   16.879960] pci :00:01.0: reg 0x10: [mem 0x-0x7fff]
> [   16.892454] pci :00:01.0: reg 0x14: initial BAR value 0x 
> invalid
> [   16.906497] pci :00:01.0: reg 0x14: [mem size 0x0001]
> [   16.918040] pci :00:01.0: supports D1
> [   16.926031] pci :00:01.0: PME# supported from D0 D1 D3hot
> [   16.937903] pci :00:02.0: [0e8d:0801] type 01 class 0x060400
> [   16.949915] pci :00:02.0: reg 0x10: [mem 0x-0x7fff]
> [   16.962412] pci :00:02.0: reg 0x14: initial BAR value 0x 
> invalid
> [   16.976466] pci :00:02.0: reg 0x14: [mem size 0x0001]
> [   16.987991] pci :00:02.0: supports D1
> [   16.995986] pci :00:02.0: PME# supported from D0 D1 D3hot
> [   17.008716] pci :00:00.0: bridge configuration invalid ([bus
> 00-00]), reconfiguring
> [   17.024695] pci :00:01.0: bridge configuration invalid ([bus
> 00-00]), reconfiguring
> [   17.040654] pci :00:02.0: bridge configuration invalid ([bus
> 00-00]), reconfiguring
> [   17.056868] pci :01:00.0: [1b21:0611] type 00 class 0x010185
> [   17.068882] pci :01:00.0: reg 0x10: [io  0x-0x0007]
> [   17.080003] pci :01:00.0: reg 0x14: [io  0x-0x0003]
> [   17.091115] pci :01:00.0: reg 0x18: [io  0x-0x0007]
> [   17.102238] pci :01:00.0: reg 0x1c: [io  0x-0x0003]
> [   17.113350] pci :01:00.0: reg 0x20: [io  0x-0x000f]
> [   17.124463] pci :01:00.0: reg 0x24: initial BAR value 0x 
> invalid
> [   17.138508] pci :01:00.0: reg 0x24: [mem size 0x0200]
> [   17.150115] pci :01:00.0: 2.000 Gb/s available PCIe bandwidth,
> limited by 2.5 GT/s PCIe x1 link at :00:00.0 (capable of 4.000
> Gb/s with 5.0 GT/s PCIe x1 link)
> [   17.204594] pci :00:00.0: PCI bridge to [bus 01-ff]
> [   17.215109] pci :00:00.0:   bridge window [io  0x-0x0fff]
> [   17.227257] pci :00:00.0:   bridge window [mem 0x6000-0x600f]
> [   17.240785] pci :00:00.0:   bridge window [mem
> 0x6000-0x600f pref]
> [   17.255183] pci_bus :01: busn_res: [bus 01-ff] end is updated to 01
> [   17.268648] pci :02:00.0: [1b21:0611] type 00 class 0x010185
> [   17.280671] pci :02:00.0: reg 0x10: [io  0x-0x0007]
> [   17.291785] pci :02:00.0: reg 0x14: [io  0x-0x0003]
> [   17.302898] pci :02:00.0: reg 0x18: [io

Re: [PATCH v2 2/3] soundwire: Intel: introduce DMI quirks for HP Spectre x360 Convertible

2021-04-13 Thread Vinod Koul

On 12-04-21, 14:37, Dave Hansen wrote:
> On 3/1/21 11:51 PM, Bard Liao wrote:
> > +++ b/drivers/soundwire/dmi-quirks.c
> > @@ -0,0 +1,66 @@
> > +// SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause)
> > +// Copyright(c) 2021 Intel Corporation.
> 
> It looks like this is already in intel-next, so this may be moot.  But,
> is there a specific reason this is dual licensed?  If so, can you please
> include information about the license choice in the cover letter of any
> future version?

The soundwire module from Intel and core soundwire core was always dual
licensed, so it kind of followed that..

> If there is no specific reason for this contribution to be dual
> licensed, please make it GPL-2.0 only.

This module, I would say NO. Unless someone from Intel disagree..
Pierre/Bard..?

If all agree I dont see a reason why this cant be updated to GPL only.

Thanks
-- 
~Vinod

Re: [PATCH 1/5] mm/swapfile: add percpu_ref support for swap

2021-04-13 Thread Dennis Zhou

On Wed, Apr 14, 2021 at 11:59:03AM +0800, Huang, Ying wrote:
> Dennis Zhou  writes:
> 
> > Hello,
> >
> > On Wed, Apr 14, 2021 at 10:06:48AM +0800, Huang, Ying wrote:
> >> Miaohe Lin  writes:
> >> 
> >> > On 2021/4/14 9:17, Huang, Ying wrote:
> >> >> Miaohe Lin  writes:
> >> >> 
> >> >>> On 2021/4/12 15:24, Huang, Ying wrote:
> >>  "Huang, Ying"  writes:
> >> 
> >> > Miaohe Lin  writes:
> >> >
> >> >> We will use percpu-refcount to serialize against concurrent 
> >> >> swapoff. This
> >> >> patch adds the percpu_ref support for later fixup.
> >> >>
> >> >> Signed-off-by: Miaohe Lin 
> >> >> ---
> >> >>  include/linux/swap.h |  2 ++
> >> >>  mm/swapfile.c| 25 ++---
> >> >>  2 files changed, 24 insertions(+), 3 deletions(-)
> >> >>
> >> >> diff --git a/include/linux/swap.h b/include/linux/swap.h
> >> >> index 144727041e78..849ba5265c11 100644
> >> >> --- a/include/linux/swap.h
> >> >> +++ b/include/linux/swap.h
> >> >> @@ -240,6 +240,7 @@ struct swap_cluster_list {
> >> >>   * The in-memory structure used to track swap areas.
> >> >>   */
> >> >>  struct swap_info_struct {
> >> >> +   struct percpu_ref users;/* serialization against 
> >> >> concurrent swapoff */
> >> >> unsigned long   flags;  /* SWP_USED etc: see above */
> >> >> signed shortprio;   /* swap priority of this type */
> >> >> struct plist_node list; /* entry in swap_active_head */
> >> >> @@ -260,6 +261,7 @@ struct swap_info_struct {
> >> >> struct block_device *bdev;  /* swap device or bdev of swap 
> >> >> file */
> >> >> struct file *swap_file; /* seldom referenced */
> >> >> unsigned int old_block_size;/* seldom referenced */
> >> >> +   struct completion comp; /* seldom referenced */
> >> >>  #ifdef CONFIG_FRONTSWAP
> >> >> unsigned long *frontswap_map;   /* frontswap in-use, one bit 
> >> >> per page */
> >> >> atomic_t frontswap_pages;   /* frontswap pages in-use 
> >> >> counter */
> >> >> diff --git a/mm/swapfile.c b/mm/swapfile.c
> >> >> index 149e77454e3c..724173cd7d0c 100644
> >> >> --- a/mm/swapfile.c
> >> >> +++ b/mm/swapfile.c
> >> >> @@ -39,6 +39,7 @@
> >> >>  #include 
> >> >>  #include 
> >> >>  #include 
> >> >> +#include 
> >> >>  
> >> >>  #include 
> >> >>  #include 
> >> >> @@ -511,6 +512,15 @@ static void swap_discard_work(struct 
> >> >> work_struct *work)
> >> >> spin_unlock(>lock);
> >> >>  }
> >> >>  
> >> >> +static void swap_users_ref_free(struct percpu_ref *ref)
> >> >> +{
> >> >> +   struct swap_info_struct *si;
> >> >> +
> >> >> +   si = container_of(ref, struct swap_info_struct, users);
> >> >> +   complete(>comp);
> >> >> +   percpu_ref_exit(>users);
> >> >
> >> > Because percpu_ref_exit() is used, we cannot use percpu_ref_tryget() 
> >> > in
> >> > get_swap_device(), better to add comments there.
> >> 
> >>  I just noticed that the comments of percpu_ref_tryget_live() says,
> >> 
> >>   * This function is safe to call as long as @ref is between init and 
> >>  exit.
> >> 
> >>  While we need to call get_swap_device() almost at any time, so it's
> >>  better to avoid to call percpu_ref_exit() at all.  This will waste 
> >>  some
> >>  memory, but we need to follow the API definition to avoid potential
> >>  issues in the long term.
> >> >>>
> >> >>> I have to admit that I'am not really familiar with percpu_ref. So I 
> >> >>> read the
> >> >>> implementation code of the percpu_ref and found 
> >> >>> percpu_ref_tryget_live() could
> >> >>> be called after exit now. But you're right we need to follow the API 
> >> >>> definition
> >> >>> to avoid potential issues in the long term.
> >> >>>
> >> 
> >>  And we need to call percpu_ref_init() before insert the 
> >>  swap_info_struct
> >>  into the swap_info[].
> >> >>>
> >> >>> If we remove the call to percpu_ref_exit(), we should not use 
> >> >>> percpu_ref_init()
> >> >>> here because *percpu_ref->data is assumed to be NULL* in 
> >> >>> percpu_ref_init() while
> >> >>> this is not the case as we do not call percpu_ref_exit(). Maybe 
> >> >>> percpu_ref_reinit()
> >> >>> or percpu_ref_resurrect() will do the work.
> >> >>>
> >> >>> One more thing, how could I distinguish the killed percpu_ref from 
> >> >>> newly allocated one?
> >> >>> It seems percpu_ref_is_dying is only safe to call when @ref is between 
> >> >>> init and exit.
> >> >>> Maybe I could do this in alloc_swap_info()?
> >> >> 
> >> >> Yes.  In alloc_swap_info(), you can distinguish newly allocated and
> >> >> reused swap_info_struct.
> >> >> 
> >> 
> >> >> +}
> >> >> +
> >> >>  static void alloc_cluster(struct swap_info_struct *si,

Re: [PATCH v2 1/1] powerpc/iommu: Enable remaining IOMMU Pagesizes present in LoPAR

2021-04-13 Thread Leonardo Bras

On Mon, 2021-04-12 at 17:21 -0500, Segher Boessenkool wrote:
> On Fri, Apr 09, 2021 at 02:36:16PM +1000, Alexey Kardashevskiy wrote:
> > On 08/04/2021 19:04, Michael Ellerman wrote:
> > > > > > +#define QUERY_DDW_PGSIZE_4K0x01
> > > > > > +#define QUERY_DDW_PGSIZE_64K   0x02
> > > > > > +#define QUERY_DDW_PGSIZE_16M   0x04
> > > > > > +#define QUERY_DDW_PGSIZE_32M   0x08
> > > > > > +#define QUERY_DDW_PGSIZE_64M   0x10
> > > > > > +#define QUERY_DDW_PGSIZE_128M  0x20
> > > > > > +#define QUERY_DDW_PGSIZE_256M  0x40
> > > > > > +#define QUERY_DDW_PGSIZE_16G   0x80
> > > > > 
> > > > > I'm not sure the #defines really gain us much vs just putting the
> > > > > literal values in the array below?
> > > > 
> > > > Then someone says "u magic values" :) I do not mind either way. 
> > > > Thanks,
> > > 
> > > Yeah that's true. But #defining them doesn't make them less magic, if
> > > you only use them in one place :)
> > 
> > Defining them with "QUERY_DDW" in the names kinda tells where they are 
> > from. Can also grep QEMU using these to see how the other side handles 
> > it. Dunno.
> 
> And *not* defining anything reduces the mental load a lot.  You can add
> a comment at the single spot you use them, explaining what this is, in a
> much better way!
> 
> Comments are *good*.
> 
> 
> Segher

Thanks for the feedback Alexey, Michael and Segher!

I have sent a v3 for this patch. 
http://patchwork.ozlabs.org/project/linuxppc-dev/patch/20210408201915.174217-1-leobra...@gmail.com/

Please let me know of your feedback in it.

Best regards,
Leonardo Bras

Re: Question on KASAN calltrace record in RT

2021-04-13 Thread Mike Galbraith

On Tue, 2021-04-13 at 17:29 +0200, Dmitry Vyukov wrote:
> On Tue, Apr 6, 2021 at 10:26 AM Zhang, Qiang  
> wrote:
> >
> > Hello everyone
> >
> > In RT system,   after  Andrew test,   found the following calltrace ,
> > in KASAN, we record callstack through stack_depot_save(), in this function, 
> > may be call alloc_pages,  but in RT, the spin_lock replace with
> > rt_mutex in alloc_pages(), if before call this function, the irq is 
> > disabled,
> > will trigger following calltrace.
> >
> > maybe  add array[KASAN_STACK_DEPTH] in struct kasan_track to record 
> > callstack  in RT system.
> >
> > Is there a better solution ？
>
> Hi Qiang,
>
> Adding 2 full stacks per heap object can increase memory usage too much.
> The stackdepot has a preallocation mechanism, I would start with
> adding interrupts check here:
> https://elixir.bootlin.com/linux/v5.12-rc7/source/lib/stackdepot.c#L294
> and just not do preallocation in interrupt context. This will solve
> the problem, right?

Hm, this thing might actually be (sorta?) working, modulo one startup
gripe.  The CRASH_DUMP inspired gripe I get with !RT appeared (and shut
up when told I don't care given kdump has worked just fine for ages:),
but no more might_sleep() gripeage.


CONFIG_KASAN_SHADOW_OFFSET=0xdc00
CONFIG_HAVE_ARCH_KASAN=y
CONFIG_HAVE_ARCH_KASAN_VMALLOC=y
CONFIG_CC_HAS_KASAN_GENERIC=y
CONFIG_KASAN=y
CONFIG_KASAN_GENERIC=y
CONFIG_KASAN_OUTLINE=y
# CONFIG_KASAN_INLINE is not set
CONFIG_KASAN_STACK=1
CONFIG_KASAN_VMALLOC=y
# CONFIG_KASAN_MODULE_TEST is not set

---
 lib/stackdepot.c |   10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

--- a/lib/stackdepot.c
+++ b/lib/stackdepot.c
@@ -71,7 +71,7 @@ static void *stack_slabs[STACK_ALLOC_MAX
 static int depot_index;
 static int next_slab_inited;
 static size_t depot_offset;
-static DEFINE_SPINLOCK(depot_lock);
+static DEFINE_RAW_SPINLOCK(depot_lock);

 static bool init_stack_slab(void **prealloc)
 {
@@ -265,7 +265,7 @@ depot_stack_handle_t stack_depot_save(un
struct page *page = NULL;
void *prealloc = NULL;
unsigned long flags;
-   u32 hash;
+   u32 hash, may_prealloc = !IS_ENABLED(CONFIG_PREEMPT_RT) || 
preemptible();

if (unlikely(nr_entries == 0) || stack_depot_disable)
goto fast_exit;
@@ -291,7 +291,7 @@ depot_stack_handle_t stack_depot_save(un
 * The smp_load_acquire() here pairs with smp_store_release() to
 * |next_slab_inited| in depot_alloc_stack() and init_stack_slab().
 */
-   if (unlikely(!smp_load_acquire(_slab_inited))) {
+   if (unlikely(!smp_load_acquire(_slab_inited) && may_prealloc)) {
/*
 * Zero out zone modifiers, as we don't have specific zone
 * requirements. Keep the flags related to allocation in atomic
@@ -305,7 +305,7 @@ depot_stack_handle_t stack_depot_save(un
prealloc = page_address(page);
}

-   spin_lock_irqsave(_lock, flags);
+   raw_spin_lock_irqsave(_lock, flags);

found = find_stack(*bucket, entries, nr_entries, hash);
if (!found) {
@@ -329,7 +329,7 @@ depot_stack_handle_t stack_depot_save(un
WARN_ON(!init_stack_slab());
}

-   spin_unlock_irqrestore(_lock, flags);
+   raw_spin_unlock_irqrestore(_lock, flags);
 exit:
if (prealloc) {
/* Nobody used this memory, ok to free it. */

[0.692437] BUG: sleeping function called from invalid context at 
kernel/locking/rtmutex.c:943
[0.692439] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1, name: 
swapper/0
[0.692442] Preemption disabled at:
[0.692443] [] on_each_cpu_cond_mask+0x30/0xb0
[0.692451] CPU: 5 PID: 1 Comm: swapper/0 Not tainted 5.12.0.g2afefec-tip-rt 
#5
[0.692454] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 
09/23/2013
[0.692456] Call Trace:
[0.692458]  ? on_each_cpu_cond_mask+0x30/0xb0
[0.692462]  dump_stack+0x8a/0xb5
[0.692467]  ___might_sleep.cold+0xfe/0x112
[0.692471]  rt_spin_lock+0x1c/0x60
[0.692475]  free_unref_page+0x117/0x3c0
[0.692481]  qlist_free_all+0x60/0xd0
[0.692485]  per_cpu_remove_cache+0x5b/0x70
[0.692488]  smp_call_function_many_cond+0x185/0x3d0
[0.692492]  ? qlist_move_cache+0xe0/0xe0
[0.692495]  ? qlist_move_cache+0xe0/0xe0
[0.692497]  on_each_cpu_cond_mask+0x44/0xb0
[0.692501]  kasan_quarantine_remove_cache+0x52/0xf0
[0.692505]  ? acpi_bus_init+0x183/0x183
[0.692510]  kmem_cache_shrink+0xe/0x20
[0.692513]  acpi_os_purge_cache+0xa/0x10
[0.692517]  acpi_purge_cached_objects+0x1d/0x68
[0.692522]  acpi_initialize_objects+0x11/0x39
[0.692524]  ? acpi_ev_install_xrupt_handlers+0x6f/0x7c
[0.692529]  acpi_bus_init+0x50/0x183
[0.692532]  acpi_init+0xce/0x182
[0.692536]  ? acpi_bus_init+0x183/0x183
[0.692539]  ? intel_idle_init+0x36d/0x36d
[0.692543]  ? acpi_bus_init+0x183/0x183
[0.692546]

Re: [PATCH 1/5] mm/swapfile: add percpu_ref support for swap

Dennis Zhou  writes:

> Hello,
>
> On Wed, Apr 14, 2021 at 10:06:48AM +0800, Huang, Ying wrote:
>> Miaohe Lin  writes:
>> 
>> > On 2021/4/14 9:17, Huang, Ying wrote:
>> >> Miaohe Lin  writes:
>> >> 
>> >>> On 2021/4/12 15:24, Huang, Ying wrote:
>>  "Huang, Ying"  writes:
>> 
>> > Miaohe Lin  writes:
>> >
>> >> We will use percpu-refcount to serialize against concurrent swapoff. 
>> >> This
>> >> patch adds the percpu_ref support for later fixup.
>> >>
>> >> Signed-off-by: Miaohe Lin 
>> >> ---
>> >>  include/linux/swap.h |  2 ++
>> >>  mm/swapfile.c| 25 ++---
>> >>  2 files changed, 24 insertions(+), 3 deletions(-)
>> >>
>> >> diff --git a/include/linux/swap.h b/include/linux/swap.h
>> >> index 144727041e78..849ba5265c11 100644
>> >> --- a/include/linux/swap.h
>> >> +++ b/include/linux/swap.h
>> >> @@ -240,6 +240,7 @@ struct swap_cluster_list {
>> >>   * The in-memory structure used to track swap areas.
>> >>   */
>> >>  struct swap_info_struct {
>> >> + struct percpu_ref users;/* serialization against 
>> >> concurrent swapoff */
>> >>   unsigned long   flags;  /* SWP_USED etc: see above */
>> >>   signed shortprio;   /* swap priority of this type */
>> >>   struct plist_node list; /* entry in swap_active_head */
>> >> @@ -260,6 +261,7 @@ struct swap_info_struct {
>> >>   struct block_device *bdev;  /* swap device or bdev of swap 
>> >> file */
>> >>   struct file *swap_file; /* seldom referenced */
>> >>   unsigned int old_block_size;/* seldom referenced */
>> >> + struct completion comp; /* seldom referenced */
>> >>  #ifdef CONFIG_FRONTSWAP
>> >>   unsigned long *frontswap_map;   /* frontswap in-use, one bit 
>> >> per page */
>> >>   atomic_t frontswap_pages;   /* frontswap pages in-use 
>> >> counter */
>> >> diff --git a/mm/swapfile.c b/mm/swapfile.c
>> >> index 149e77454e3c..724173cd7d0c 100644
>> >> --- a/mm/swapfile.c
>> >> +++ b/mm/swapfile.c
>> >> @@ -39,6 +39,7 @@
>> >>  #include 
>> >>  #include 
>> >>  #include 
>> >> +#include 
>> >>  
>> >>  #include 
>> >>  #include 
>> >> @@ -511,6 +512,15 @@ static void swap_discard_work(struct work_struct 
>> >> *work)
>> >>   spin_unlock(>lock);
>> >>  }
>> >>  
>> >> +static void swap_users_ref_free(struct percpu_ref *ref)
>> >> +{
>> >> + struct swap_info_struct *si;
>> >> +
>> >> + si = container_of(ref, struct swap_info_struct, users);
>> >> + complete(>comp);
>> >> + percpu_ref_exit(>users);
>> >
>> > Because percpu_ref_exit() is used, we cannot use percpu_ref_tryget() in
>> > get_swap_device(), better to add comments there.
>> 
>>  I just noticed that the comments of percpu_ref_tryget_live() says,
>> 
>>   * This function is safe to call as long as @ref is between init and 
>>  exit.
>> 
>>  While we need to call get_swap_device() almost at any time, so it's
>>  better to avoid to call percpu_ref_exit() at all.  This will waste some
>>  memory, but we need to follow the API definition to avoid potential
>>  issues in the long term.
>> >>>
>> >>> I have to admit that I'am not really familiar with percpu_ref. So I read 
>> >>> the
>> >>> implementation code of the percpu_ref and found percpu_ref_tryget_live() 
>> >>> could
>> >>> be called after exit now. But you're right we need to follow the API 
>> >>> definition
>> >>> to avoid potential issues in the long term.
>> >>>
>> 
>>  And we need to call percpu_ref_init() before insert the swap_info_struct
>>  into the swap_info[].
>> >>>
>> >>> If we remove the call to percpu_ref_exit(), we should not use 
>> >>> percpu_ref_init()
>> >>> here because *percpu_ref->data is assumed to be NULL* in 
>> >>> percpu_ref_init() while
>> >>> this is not the case as we do not call percpu_ref_exit(). Maybe 
>> >>> percpu_ref_reinit()
>> >>> or percpu_ref_resurrect() will do the work.
>> >>>
>> >>> One more thing, how could I distinguish the killed percpu_ref from newly 
>> >>> allocated one?
>> >>> It seems percpu_ref_is_dying is only safe to call when @ref is between 
>> >>> init and exit.
>> >>> Maybe I could do this in alloc_swap_info()?
>> >> 
>> >> Yes.  In alloc_swap_info(), you can distinguish newly allocated and
>> >> reused swap_info_struct.
>> >> 
>> 
>> >> +}
>> >> +
>> >>  static void alloc_cluster(struct swap_info_struct *si, unsigned long 
>> >> idx)
>> >>  {
>> >>   struct swap_cluster_info *ci = si->cluster_info;
>> >> @@ -2500,7 +2510,7 @@ static void enable_swap_info(struct 
>> >> swap_info_struct *p, int prio,
>> >>* Guarantee swap_map, cluster_info, etc. fields are valid
>> >>

Re: [PATCH RFC 6/6] dcache: prevent flooding with negative dentries

2021-04-13 Thread Al Viro

On Thu, Jan 21, 2021 at 06:49:45PM +0530, Gautham Ananthakrishna wrote:

> + spin_lock(>d_lock);
> + parent = lock_parent(victim);
> +
> + rcu_read_unlock();

Similar story.  As soon as you hit that rcu_read_unlock(), the memory
pointed to by victim might be reused.  If you have hit __lock_parent(),
victim->d_lock had been dropped and regained.  Which means that freeing
might've been already scheduled.  Unlike #1/6, here you won't get
memory corruption in lock_parent() itself, but...

> +
> + if (d_count(victim) || !d_is_negative(victim) ||
> + (victim->d_flags & DCACHE_REFERENCED)) {
> + if (parent)
> + spin_unlock(>d_lock);
> + spin_unlock(>d_lock);

... starting from here you just might.

the qemu-nbd process automatically exit with the commit 43347d56c 'livepatch: send a fake signal to all blocking tasks'

2021-04-13 Thread xiaojun . zhao141

I found the qemu-nbd process(started with qemu-nbd -t -c /dev/nbd0
nbd.qcow2) will automatically exit when I patched for functions of
the nbd with livepatch.

The nbd relative source:
static int nbd_start_device_ioctl(struct nbd_device *nbd, struct block_device 
*bdev)
{   
struct nbd_config *config = nbd->config;
int ret;

ret = nbd_start_device(nbd);
if (ret)
return ret; 

if (max_part)   
bdev->bd_invalidated = 1;   
mutex_unlock(>config_lock);
ret = wait_event_interruptible(config->recv_wq, 
 atomic_read(>recv_threads) == 
0);
if (ret)
sock_shutdown(nbd); 
flush_workqueue(nbd->recv_workq);   

mutex_lock(>config_lock);  
nbd_bdev_reset(bdev);   
/* user requested, ignore socket errors */  
if (test_bit(NBD_RT_DISCONNECT_REQUESTED, >runtime_flags))  
ret = 0;
if (test_bit(NBD_RT_TIMEDOUT, >runtime_flags))  
ret = -ETIMEDOUT;   
return ret; 
}

When the nbd waits for atomic_read(>recv_threads) == 0, the klp
will send a fake signal to it then the qemu-nbd process exits. And the
signal of sysfs to control this action was removed in the commit
10b3d52790e 'livepatch: Remove signal sysfs attribute'. Are there other
ways to control this action? How?

Thanks very much.

[PATCH] netfilter: nf_conntrack: Add conntrack helper for ESP/IPsec

2021-04-13 Thread Cole Dishington

Introduce changes to add ESP connection tracking helper to netfilter
conntrack. The connection tracking of ESP is based on IPsec SPIs. The
underlying motivation for this patch was to allow multiple VPN ESP
clients to be distinguished when using NAT.

Added config flag CONFIG_NF_CT_PROTO_ESP to enable the ESP/IPsec
conntrack helper.

Signed-off-by: Cole Dishington 
---
 .../linux/netfilter/nf_conntrack_proto_esp.h  |  25 +
 .../net/netfilter/ipv4/nf_conntrack_ipv4.h|   3 +
 include/net/netfilter/nf_conntrack.h  |   2 +
 include/net/netfilter/nf_conntrack_l4proto.h  |  15 +
 include/net/netfilter/nf_conntrack_tuple.h|   3 +
 include/net/netns/conntrack.h |  24 +
 .../netfilter/nf_conntrack_tuple_common.h |   3 +
 .../linux/netfilter/nfnetlink_conntrack.h |   2 +
 net/netfilter/Kconfig |  10 +
 net/netfilter/Makefile|   1 +
 net/netfilter/nf_conntrack_core.c |  23 +
 net/netfilter/nf_conntrack_netlink.c  |   4 +-
 net/netfilter/nf_conntrack_proto.c|   6 +
 net/netfilter/nf_conntrack_proto_esp.c| 535 ++
 net/netfilter/nf_conntrack_standalone.c   |   5 +
 net/netfilter/nf_internals.h  |   4 +-
 16 files changed, 663 insertions(+), 2 deletions(-)
 create mode 100644 include/linux/netfilter/nf_conntrack_proto_esp.h
 create mode 100644 net/netfilter/nf_conntrack_proto_esp.c

diff --git a/include/linux/netfilter/nf_conntrack_proto_esp.h 
b/include/linux/netfilter/nf_conntrack_proto_esp.h
new file mode 100644
index ..2441e031c68e
--- /dev/null
+++ b/include/linux/netfilter/nf_conntrack_proto_esp.h
@@ -0,0 +1,25 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _CONNTRACK_PROTO_ESP_H
+#define _CONNTRACK_PROTO_ESP_H
+#include 
+
+/* ESP PROTOCOL HEADER */
+
+struct esphdr {
+   __u32 spi;
+};
+
+struct nf_ct_esp {
+   unsigned int stream_timeout;
+   unsigned int timeout;
+};
+
+#ifdef __KERNEL__
+#include 
+
+void destroy_esp_conntrack_entry(struct nf_conn *ct);
+
+bool esp_pkt_to_tuple(const struct sk_buff *skb, unsigned int dataoff,
+ struct net *net, struct nf_conntrack_tuple *tuple);
+#endif /* __KERNEL__ */
+#endif /* _CONNTRACK_PROTO_ESP_H */
diff --git a/include/net/netfilter/ipv4/nf_conntrack_ipv4.h 
b/include/net/netfilter/ipv4/nf_conntrack_ipv4.h
index 2c8c2b023848..1aee91592639 100644
--- a/include/net/netfilter/ipv4/nf_conntrack_ipv4.h
+++ b/include/net/netfilter/ipv4/nf_conntrack_ipv4.h
@@ -25,5 +25,8 @@ extern const struct nf_conntrack_l4proto 
nf_conntrack_l4proto_udplite;
 #ifdef CONFIG_NF_CT_PROTO_GRE
 extern const struct nf_conntrack_l4proto nf_conntrack_l4proto_gre;
 #endif
+#ifdef CONFIG_NF_CT_PROTO_ESP
+extern const struct nf_conntrack_l4proto nf_conntrack_l4proto_esp;
+#endif
 
 #endif /*_NF_CONNTRACK_IPV4_H*/
diff --git a/include/net/netfilter/nf_conntrack.h 
b/include/net/netfilter/nf_conntrack.h
index 439379ca9ffa..2bd1d94de138 100644
--- a/include/net/netfilter/nf_conntrack.h
+++ b/include/net/netfilter/nf_conntrack.h
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -36,6 +37,7 @@ union nf_conntrack_proto {
struct ip_ct_tcp tcp;
struct nf_ct_udp udp;
struct nf_ct_gre gre;
+   struct nf_ct_esp esp;
unsigned int tmpl_padto;
 };
 
diff --git a/include/net/netfilter/nf_conntrack_l4proto.h 
b/include/net/netfilter/nf_conntrack_l4proto.h
index 96f9cf81f46b..ec89e83ff20e 100644
--- a/include/net/netfilter/nf_conntrack_l4proto.h
+++ b/include/net/netfilter/nf_conntrack_l4proto.h
@@ -75,6 +75,8 @@ bool nf_conntrack_invert_icmp_tuple(struct nf_conntrack_tuple 
*tuple,
const struct nf_conntrack_tuple *orig);
 bool nf_conntrack_invert_icmpv6_tuple(struct nf_conntrack_tuple *tuple,
  const struct nf_conntrack_tuple *orig);
+bool nf_conntrack_invert_esp_tuple(struct nf_conntrack_tuple *tuple,
+  const struct nf_conntrack_tuple *orig);
 
 int nf_conntrack_inet_error(struct nf_conn *tmpl, struct sk_buff *skb,
unsigned int dataoff,
@@ -132,6 +134,11 @@ int nf_conntrack_gre_packet(struct nf_conn *ct,
unsigned int dataoff,
enum ip_conntrack_info ctinfo,
const struct nf_hook_state *state);
+int nf_conntrack_esp_packet(struct nf_conn *ct,
+   struct sk_buff *skb,
+   unsigned int dataoff,
+   enum ip_conntrack_info ctinfo,
+   const struct nf_hook_state *state);
 
 void nf_conntrack_generic_init_net(struct net *net);
 void nf_conntrack_tcp_init_net(struct net *net);
@@ -141,6 +148,7 @@ void nf_conntrack_dccp_init_net(struct net *net);
 void nf_conntrack_sctp_init_net(struct net *net);
 void nf_conntrack_icmp_init_net(struct net

Re: [PATCH v10] i2c: virtio: add a virtio i2c frontend driver

2021-04-13 Thread Viresh Kumar

On 14-04-21, 10:07, Jie Deng wrote:
> Hi maintainers,
> 
> What's the status of this patch based on the review comments you got ?

I was expecting a new version to be honest..

> Is i2c/for-next the right tree to merge it
> ?

It should be.

-- 
viresh

Re: [PATCH 1/5] mm/swapfile: add percpu_ref support for swap

2021-04-13 Thread Dennis Zhou

Hello,

On Wed, Apr 14, 2021 at 10:06:48AM +0800, Huang, Ying wrote:
> Miaohe Lin  writes:
> 
> > On 2021/4/14 9:17, Huang, Ying wrote:
> >> Miaohe Lin  writes:
> >> 
> >>> On 2021/4/12 15:24, Huang, Ying wrote:
>  "Huang, Ying"  writes:
> 
> > Miaohe Lin  writes:
> >
> >> We will use percpu-refcount to serialize against concurrent swapoff. 
> >> This
> >> patch adds the percpu_ref support for later fixup.
> >>
> >> Signed-off-by: Miaohe Lin 
> >> ---
> >>  include/linux/swap.h |  2 ++
> >>  mm/swapfile.c| 25 ++---
> >>  2 files changed, 24 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/include/linux/swap.h b/include/linux/swap.h
> >> index 144727041e78..849ba5265c11 100644
> >> --- a/include/linux/swap.h
> >> +++ b/include/linux/swap.h
> >> @@ -240,6 +240,7 @@ struct swap_cluster_list {
> >>   * The in-memory structure used to track swap areas.
> >>   */
> >>  struct swap_info_struct {
> >> +  struct percpu_ref users;/* serialization against 
> >> concurrent swapoff */
> >>unsigned long   flags;  /* SWP_USED etc: see above */
> >>signed shortprio;   /* swap priority of this type */
> >>struct plist_node list; /* entry in swap_active_head */
> >> @@ -260,6 +261,7 @@ struct swap_info_struct {
> >>struct block_device *bdev;  /* swap device or bdev of swap 
> >> file */
> >>struct file *swap_file; /* seldom referenced */
> >>unsigned int old_block_size;/* seldom referenced */
> >> +  struct completion comp; /* seldom referenced */
> >>  #ifdef CONFIG_FRONTSWAP
> >>unsigned long *frontswap_map;   /* frontswap in-use, one bit 
> >> per page */
> >>atomic_t frontswap_pages;   /* frontswap pages in-use 
> >> counter */
> >> diff --git a/mm/swapfile.c b/mm/swapfile.c
> >> index 149e77454e3c..724173cd7d0c 100644
> >> --- a/mm/swapfile.c
> >> +++ b/mm/swapfile.c
> >> @@ -39,6 +39,7 @@
> >>  #include 
> >>  #include 
> >>  #include 
> >> +#include 
> >>  
> >>  #include 
> >>  #include 
> >> @@ -511,6 +512,15 @@ static void swap_discard_work(struct work_struct 
> >> *work)
> >>spin_unlock(>lock);
> >>  }
> >>  
> >> +static void swap_users_ref_free(struct percpu_ref *ref)
> >> +{
> >> +  struct swap_info_struct *si;
> >> +
> >> +  si = container_of(ref, struct swap_info_struct, users);
> >> +  complete(>comp);
> >> +  percpu_ref_exit(>users);
> >
> > Because percpu_ref_exit() is used, we cannot use percpu_ref_tryget() in
> > get_swap_device(), better to add comments there.
> 
>  I just noticed that the comments of percpu_ref_tryget_live() says,
> 
>   * This function is safe to call as long as @ref is between init and 
>  exit.
> 
>  While we need to call get_swap_device() almost at any time, so it's
>  better to avoid to call percpu_ref_exit() at all.  This will waste some
>  memory, but we need to follow the API definition to avoid potential
>  issues in the long term.
> >>>
> >>> I have to admit that I'am not really familiar with percpu_ref. So I read 
> >>> the
> >>> implementation code of the percpu_ref and found percpu_ref_tryget_live() 
> >>> could
> >>> be called after exit now. But you're right we need to follow the API 
> >>> definition
> >>> to avoid potential issues in the long term.
> >>>
> 
>  And we need to call percpu_ref_init() before insert the swap_info_struct
>  into the swap_info[].
> >>>
> >>> If we remove the call to percpu_ref_exit(), we should not use 
> >>> percpu_ref_init()
> >>> here because *percpu_ref->data is assumed to be NULL* in 
> >>> percpu_ref_init() while
> >>> this is not the case as we do not call percpu_ref_exit(). Maybe 
> >>> percpu_ref_reinit()
> >>> or percpu_ref_resurrect() will do the work.
> >>>
> >>> One more thing, how could I distinguish the killed percpu_ref from newly 
> >>> allocated one?
> >>> It seems percpu_ref_is_dying is only safe to call when @ref is between 
> >>> init and exit.
> >>> Maybe I could do this in alloc_swap_info()?
> >> 
> >> Yes.  In alloc_swap_info(), you can distinguish newly allocated and
> >> reused swap_info_struct.
> >> 
> 
> >> +}
> >> +
> >>  static void alloc_cluster(struct swap_info_struct *si, unsigned long 
> >> idx)
> >>  {
> >>struct swap_cluster_info *ci = si->cluster_info;
> >> @@ -2500,7 +2510,7 @@ static void enable_swap_info(struct 
> >> swap_info_struct *p, int prio,
> >> * Guarantee swap_map, cluster_info, etc. fields are valid
> >> * between get/put_swap_device() if SWP_VALID bit is set
> >> */
> >> -  synchronize_rcu();
> >> +

Re: [PATCH RFC 1/6] dcache: sweep cached negative dentries to the end of list of siblings

2021-04-13 Thread Al Viro

On Thu, Jan 21, 2021 at 06:49:40PM +0530, Gautham Ananthakrishna wrote:

> +static void sweep_negative(struct dentry *dentry)
> +{
> + struct dentry *parent;
> +
> + if (!d_is_tail_negative(dentry)) {
> + parent = lock_parent(dentry);
> + if (!parent)
> + return;

Wait a minute.  It's not a good environment for calling lock_parent().
Who said that dentry won't get freed right under it?

Right now callers of __lock_parent() either hold a reference to dentry
*or* are called for a positive dentry, with inode->i_lock held.
You are introducing something very different - 

>   if (likely(retain_dentry(dentry))) {
> + if (d_is_negative(dentry))
> + sweep_negative(dentry);
>   spin_unlock(>d_lock);

Here we can be called for a negative dentry with refcount already *NOT*
held by us.  Look:

static inline struct dentry *lock_parent(struct dentry *dentry)
{
struct dentry *parent = dentry->d_parent;
if (IS_ROOT(dentry))
return NULL;
isn't a root

if (likely(spin_trylock(>d_lock)))
return parent;

no such luck - someone's already holding parent's ->d_lock

return __lock_parent(dentry);
and here we have
static struct dentry *__lock_parent(struct dentry *dentry)
{
struct dentry *parent;
rcu_read_lock();  

OK, anything we see in its ->d_parent is guaranteed to stay
allocated until we get to matching rcu_read_unlock()

spin_unlock(>d_lock);
dropped the spinlock, now it's fair game for d_move(), d_drop(), etc.

again:
parent = READ_ONCE(dentry->d_parent);
dentry couldn't have been reused, so it's the last value stored there.
Points to still allocated struct dentry instance, so we can...

spin_lock(>d_lock);
grab its ->d_lock.

/*
 * We can't blindly lock dentry until we are sure
 * that we won't violate the locking order.
 * Any changes of dentry->d_parent must have
 * been done with parent->d_lock held, so
 * spin_lock() above is enough of a barrier
 * for checking if it's still our child.
 */
if (unlikely(parent != dentry->d_parent)) {
spin_unlock(>d_lock);
goto again;
}
Nevermind, it's still equal to our ->d_parent.  So we have
the last valid parent's ->d_lock held

rcu_read_unlock();
What's to hold dentry allocated now?  IF we held its refcount - no
problem, it can't go away.  If we held its ->d_inode->i_lock - ditto
(it wouldn't get to __dentry_kill() until we drop that, since all
callers do acquire that lock and it couldn't get scheduled for
freeing until it gets through most of __dentry_kill()).

IOW, we are free to grab dentry->d_lock again.
if (parent != dentry)
spin_lock_nested(>d_lock, DENTRY_D_LOCK_NESTED);
else
parent = NULL;
return parent;
}

With your patch, though, you've got a call site where neither condition
is guaranteed.  Current kernel is fine - we are holding ->d_lock there,
and we don't touch dentry after it gets dropped.  Again, it can't get
scheduled for freeing until after we drop ->d_lock, so we are safe.
With that change, however, you've got a hard-to-hit memory corruptor
there...

Re: [PATCH v2 00/16] Multigenerational LRU Framework

2021-04-13 Thread Yu Zhao

On Tue, Apr 13, 2021 at 5:14 PM Dave Chinner  wrote:
>
> On Tue, Apr 13, 2021 at 10:13:24AM -0600, Jens Axboe wrote:
> > On 4/13/21 1:51 AM, SeongJae Park wrote:
> > > From: SeongJae Park 
> > >
> > > Hello,
> > >
> > >
> > > Very interesting work, thank you for sharing this :)
> > >
> > > On Tue, 13 Apr 2021 00:56:17 -0600 Yu Zhao  wrote:
> > >
> > >> What's new in v2
> > >> 
> > >> Special thanks to Jens Axboe for reporting a regression in buffered
> > >> I/O and helping test the fix.
> > >
> > > Is the discussion open?  If so, could you please give me a link?
> >
> > I wasn't on the initial post (or any of the lists it was posted to), but
> > it's on the google page reclaim list. Not sure if that is public or not.
> >
> > tldr is that I was pretty excited about this work, as buffered IO tends
> > to suck (a lot) for high throughput applications. My test case was
> > pretty simple:
> >
> > Randomly read a fast device, using 4k buffered IO, and watch what
> > happens when the page cache gets filled up. For this particular test,
> > we'll initially be doing 2.1GB/sec of IO, and then drop to 1.5-1.6GB/sec
> > with kswapd using a lot of CPU trying to keep up. That's mainline
> > behavior.
>
> I see this exact same behaviour here, too, but I RCA'd it to
> contention between the inode and memory reclaim for the mapping
> structure that indexes the page cache. Basically the mapping tree
> lock is the contention point here - you can either be adding pages
> to the mapping during IO, or memory reclaim can be removing pages
> from the mapping, but we can't do both at once.
>
> So we end up with kswapd spinning on the mapping tree lock like so
> when doing 1.6GB/s in 4kB buffered IO:
>
> -   20.06% 0.00%  [kernel]   [k] kswapd   
>   
>▒
>- 20.06% kswapd
>   
>▒
>   - 20.05% balance_pgdat  
>   
>▒
>  - 20.03% shrink_node 
>   
>▒
> - 19.92% shrink_lruvec
>   
>▒
>- 19.91% shrink_inactive_list  
>   
>▒
>   - 19.22% shrink_page_list   
>   
>▒
>  - 17.51% __remove_mapping
>   
>▒
> - 14.16% _raw_spin_lock_irqsave   
>   
>▒
>- 14.14% do_raw_spin_lock  
>   
>▒
> __pv_queued_spin_lock_slowpath
>   
>▒
> - 1.56% __delete_from_page_cache  
>   
>▒
>  0.63% xas_store  
>   
>▒
> - 0.78% _raw_spin_unlock_irqrestore   
>   
>▒
>- 0.69% do_raw_spin_unlock 
>   
>▒
> __raw_callee_save___pv_queued_spin_unlock 
>   
>▒
>  - 0.82% free_unref_page_list 
>   
>▒
> - 0.72% free_unref_page_commit
>   
>▒
>  0.57% free_pcppages_bulk 
>   
>▒
>
> And these are the processes consuming CPU:
>
>5171 root

Re: [PATCH 2/5] swap: fix do_swap_page() race with swapoff

2021-04-13 Thread Miaohe Lin

On 2021/4/14 11:07, Huang, Ying wrote:
> Miaohe Lin  writes:
> 
>> On 2021/4/13 9:27, Huang, Ying wrote:
>>> Miaohe Lin  writes:
>>>
 When I was investigating the swap code, I found the below possible race
 window:

 CPU 1  CPU 2
 -  -
 do_swap_page
   synchronous swap_readpage
 alloc_page_vma
swapoff
  release swap_file, bdev, or ...
   swap_readpage
check sis->flags is ok
  access swap_file, bdev...[oops!]
si->flags = 0

 Using current get/put_swap_device() to guard against concurrent swapoff for
 swap_readpage() looks terrible because swap_readpage() may take really long
 time. And this race may not be really pernicious because swapoff is usually
 done when system shutdown only. To reduce the performance overhead on the
 hot-path as much as possible, it appears we can use the percpu_ref to close
 this race window(as suggested by Huang, Ying).

 Fixes: 235b62176712 ("mm/swap: add cluster lock")
>>>
>>> This isn't the commit that introduces the race.  You can use `git blame`
>>> find out the correct commit.  For this it's commit 0bcac06f27d7 "mm,
>>> swap: skip swapcache for swapin of synchronous device".
>>>
>>
>> Sorry about it! What I refer to is commit eb085574a752 ("mm, swap: fix race 
>> between
>> swapoff and some swap operations"). And I think this commit does not fix the 
>> race
>> condition completely, so I reuse the Fixes tag inside it.
>>
>>> And I suggest to merge 1/5 and 2/5 to make it easy to get the full
>>> picture.
>>>
 Signed-off-by: Miaohe Lin 
 ---
  include/linux/swap.h |  2 +-
  mm/memory.c  | 10 ++
  mm/swapfile.c| 28 +++-
  3 files changed, 22 insertions(+), 18 deletions(-)

 diff --git a/include/linux/swap.h b/include/linux/swap.h
 index 849ba5265c11..9066addb57fd 100644
 --- a/include/linux/swap.h
 +++ b/include/linux/swap.h
 @@ -513,7 +513,7 @@ sector_t swap_page_sector(struct page *page);
  
  static inline void put_swap_device(struct swap_info_struct *si)
  {
 -  rcu_read_unlock();
 +  percpu_ref_put(>users);
  }
  
  #else /* CONFIG_SWAP */
 diff --git a/mm/memory.c b/mm/memory.c
 index cc71a445c76c..8543c47b955c 100644
 --- a/mm/memory.c
 +++ b/mm/memory.c
 @@ -3311,6 +3311,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
  {
struct vm_area_struct *vma = vmf->vma;
struct page *page = NULL, *swapcache;
 +  struct swap_info_struct *si = NULL;
swp_entry_t entry;
pte_t pte;
int locked;
 @@ -3339,6 +3340,11 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
}
  

>>>
>>> I suggest to add comments here as follows (words copy from Matthew Wilcox)
>>>
>>> /* Prevent swapoff from happening to us */
>>
>> Ok.
>>
>>>
 +  si = get_swap_device(entry);
 +  /* In case we raced with swapoff. */
 +  if (unlikely(!si))
 +  goto out;
 +
>>>
>>> Because we wrap the whole do_swap_page() with get/put_swap_device()
>>> now.  We can remove several get/put_swap_device() for function called by
>>> do_swap_page().  That can be another optimization patch.
>>
>> I tried to remove several get/put_swap_device() for function called
>> by do_swap_page() only before I send this series. But it seems they have
>> other callers without proper get/put_swap_device().
> 
> Then we need to revise these callers instead.  Anyway, can be another
> series.

Yes. can be another series.
Thanks.

> 
> Best Regards,
> Huang, Ying
> 
> .
>

[PATCH] ASoC: codec: remove unused variable

2021-04-13 Thread Jiapeng Chong

Fix the following gcc warning:

sound/soc/codecs/jz4760.c:201:6: warning: variable ‘ret’ set but not
used [-Wunused-but-set-variable].

Reported-by: Abaci Robot 
Signed-off-by: Jiapeng Chong 
---
 sound/soc/codecs/jz4760.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/sound/soc/codecs/jz4760.c b/sound/soc/codecs/jz4760.c
index e8f28cc..ad5e859 100644
--- a/sound/soc/codecs/jz4760.c
+++ b/sound/soc/codecs/jz4760.c
@@ -198,15 +198,13 @@ static int jz4760_codec_startup(struct snd_pcm_substream 
*substream,
 {
struct snd_soc_component *codec = dai->component;
struct snd_soc_dapm_context *dapm = snd_soc_component_get_dapm(codec);
-   int ret;
-
/*
 * SYSCLK output from the codec to the AIC is required to keep the
 * DMA transfer going during playback when all audible outputs have
 * been disabled.
 */
if (substream->stream == SNDRV_PCM_STREAM_PLAYBACK)
-   ret = snd_soc_dapm_force_enable_pin(dapm, "SYSCLK");
+   snd_soc_dapm_force_enable_pin(dapm, "SYSCLK");
return 0;
 }
 
-- 
1.8.3.1

Re: [PATCH v2 2/3] drm/msm/dp: do not re initialize of audio_comp at display_disable()

2021-04-13 Thread Stephen Boyd

Quoting Kuogee Hsieh (2021-04-13 16:11:30)
> At dongle unplug, dp initializes audio_comp followed by sending disconnect
> event notification to audio and to make sure audio had shutdown completely
> by wait for audio completion notification at display_disable(). This patch

Is this dp_display_disable()? Doubtful that display_disable() is the
function we're talking about.

> will not re initialize audio_comp at display_disable() if audio shutdown
> is triggered by dongle unplugged.

This commit text seems to say the why before the what, where why is "dp
initializes audio_comp followed by sending disconnect.." and the what is
"this patch will no re-initialized audio_comp...". Can you reorder this
so the what comes before the why?

> 
> Signed-off-by: Kuogee Hsieh 
> ---
>  drivers/gpu/drm/msm/dp/dp_display.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/msm/dp/dp_display.c 
> b/drivers/gpu/drm/msm/dp/dp_display.c
> index 0ba71c7..1d71c95 100644
> --- a/drivers/gpu/drm/msm/dp/dp_display.c
> +++ b/drivers/gpu/drm/msm/dp/dp_display.c
> @@ -894,8 +894,10 @@ static int dp_display_disable(struct dp_display_private 
> *dp, u32 data)
> /* wait only if audio was enabled */
> if (dp_display->audio_enabled) {
> /* signal the disconnect event */
> -   reinit_completion(>audio_comp);
> -   dp_display_handle_plugged_change(dp_display, false);
> +   if (dp->hpd_state != ST_DISCONNECT_PENDING) {
> +   reinit_completion(>audio_comp);

Why is this reinitialized here at all? Wouldn't it make more sense to
initialize the completion once at cable plug in and then not initialize
the completion anywhere else? Or initialize the completion whenever
dp_display->audio_enabled is set to true and then only wait for the
completion here if that boolean is true? Or initialize the completion
when dp_display_handle_plugged_change() is passed true for the 'plugged'
argument?

I started reading the code and quickly got lost figuring out how
dp_display_handle_plugged_change() worked and the interaction between
the dp display code and the audio codec embedded in here. There seem to
be a couple of conditions that cut off things early, like
dp_display->audio_enabled and audio->engine_on. Why? Why does
dp_display_signal_audio_complete() call complete_all() vs. just
complete()? Please help! :(

> +   dp_display_handle_plugged_change(dp_display, false);

I think it's this way because dp_hpd_unplug_handle() is the function
that sets the hpd_state to ST_DISCONNECT_PENDING and then reinitializes
the completion (why?) and calls dp_display_handle_plugged_change(). So
the commit text could say that reinitializing the completion again here
at dp_display_disable() is racing with the audio code in the case that
dp_hpd_unplug_handle() already called
dp_display_handle_plugged_change() and it would make more sense. But the
question still stands why that race even exists in the first place vs.
initializing the completion variable in only one place unconditionally
when the cable is connected, in dp_hpd_plug_handle() or
dp_display_post_enable().

> +   }
> if (!wait_for_completion_timeout(>audio_comp,
> HZ * 5))
> DRM_ERROR("audio comp timeout\n");

Re: [PATCH v2] drivers: pinctrl: qcom: fix Kconfig dependency on GPIOLIB

2021-04-13 Thread Bjorn Andersson

On Tue 13 Apr 21:51 CDT 2021, Julian Braha wrote:

> When PINCTRL_MSM is enabled, and GPIOLIB is disabled,
> Kbuild gives the following warning:
> 
> WARNING: unmet direct dependencies detected for GPIOLIB_IRQCHIP
>   Depends on [n]: GPIOLIB [=n]
>   Selected by [y]:
>   - PINCTRL_MSM [=y] && PINCTRL [=y] && (ARCH_QCOM || COMPILE_TEST [=y])
> 
> This is because PINCTRL_MSM selects GPIOLIB_IRQCHIP,
> without selecting or depending on GPIOLIB, despite
> GPIOLIB_IRQCHIP depending on GPIOLIB. Having PINCTRL_MSM
> select GPIOLIB will cause a recursive dependency error.
> 

Reviewed-by: Bjorn Andersson 

> Signed-off-by: Julian Braha 
> ---
>  drivers/pinctrl/qcom/Kconfig | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/pinctrl/qcom/Kconfig b/drivers/pinctrl/qcom/Kconfig
> index 6853a896c476..d42ac59875ab 100644
> --- a/drivers/pinctrl/qcom/Kconfig
> +++ b/drivers/pinctrl/qcom/Kconfig
> @@ -3,7 +3,7 @@ if (ARCH_QCOM || COMPILE_TEST)
>  
>  config PINCTRL_MSM
>   tristate "Qualcomm core pin controller driver"
> - depends on QCOM_SCM || !QCOM_SCM #if QCOM_SCM=m this can't be =y
> + depends on GPIOLIB && (QCOM_SCM || !QCOM_SCM) #if QCOM_SCM=m this can't 
> be =y
>   select PINMUX
>   select PINCONF
>   select GENERIC_PINCONF
> -- 
> 2.27.0
>

Re: [PATCH 1/3] vfio/iommu_type1: Add HWDBM status maintanance

Hi Keqian,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on vfio/next]
[also build test ERROR on linux/master linus/master v5.12-rc7 next-20210413]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Keqian-Zhu/vfio-iommu_type1-Implement-dirty-log-tracking-based-on-IOMMU-HWDBM/20210413-171632
base:   https://github.com/awilliam/linux-vfio.git next
config: x86_64-randconfig-a013-20210413 (attached as .config)
compiler: clang version 13.0.0 (https://github.com/llvm/llvm-project 
9829f5e6b1bca9b61efc629770d28bb9014dec45)
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# install x86_64 cross compiling tool for clang build
# apt-get install binutils-x86-64-linux-gnu
# 
https://github.com/0day-ci/linux/commit/3005ed21d06a3ed861847529f08c3d8814013399
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review 
Keqian-Zhu/vfio-iommu_type1-Implement-dirty-log-tracking-based-on-IOMMU-HWDBM/20210413-171632
git checkout 3005ed21d06a3ed861847529f08c3d8814013399
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All errors (new ones prefixed by >>):

>> drivers/vfio/vfio_iommu_type1.c:2270:33: error: use of undeclared identifier 
>> 'IOMMU_DEV_FEAT_HWDBM'
   enum iommu_dev_features feat = IOMMU_DEV_FEAT_HWDBM;
  ^
   1 error generated.


vim +/IOMMU_DEV_FEAT_HWDBM +2270 drivers/vfio/vfio_iommu_type1.c

  2267  
  2268  static bool vfio_group_supports_hwdbm(struct vfio_group *group)
  2269  {
> 2270  enum iommu_dev_features feat = IOMMU_DEV_FEAT_HWDBM;
  2271  
  2272  return !iommu_group_for_each_dev(group->iommu_group, ,
  2273   vfio_dev_enable_feature);
  2274  }
  2275  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


.config.gz
Description: application/gzip

Re: [PATCH 1/2] dt-bindings: soc: qcom: Add bindings for Qualcomm Memshare service

2021-04-13 Thread Bjorn Andersson

On Sat 10 Apr 03:05 CDT 2021, Nikita Travkin wrote:

> Hi, sorry for a late reply but I couldn't answer earlier.
> 
> 30.03.2021 19:40, Rob Herring ??:
> > On Fri, Mar 19, 2021 at 10:23:20PM +0500, nikitos...@gmail.com wrote:
> >> From: Nikita Travkin 
> >>
> >> Add DT bindings for memshare: QMI service that allocates
> >> memory per remote processor request.
> >>
> >> Signed-off-by: Nikita Travkin 
> >> ---
> >>  .../bindings/soc/qcom/qcom,memshare.yaml  | 109 ++
> >>  include/dt-bindings/soc/qcom,memshare.h   |  10 ++
> >>  2 files changed, 119 insertions(+)
> >>  create mode 100644 
> >> Documentation/devicetree/bindings/soc/qcom/qcom,memshare.yaml
> >>  create mode 100644 include/dt-bindings/soc/qcom,memshare.h
> >>
> >> diff --git a/Documentation/devicetree/bindings/soc/qcom/qcom,memshare.yaml 
> >> b/Documentation/devicetree/bindings/soc/qcom/qcom,memshare.yaml
> >> new file mode 100644
> >> index ..ebdf128b066c
> >> --- /dev/null
> >> +++ b/Documentation/devicetree/bindings/soc/qcom/qcom,memshare.yaml
> >> @@ -0,0 +1,109 @@
> >> +# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause)
> >> +%YAML 1.2
> >> +---
> >> +$id: "http://devicetree.org/schemas/soc/qcom/qcom,memshare.yaml#;
> >> +$schema: "http://devicetree.org/meta-schemas/core.yaml#;
> >> +
> >> +title: Qualcomm QMI Shared Memory Service
> > How many shared memory interfaces does Qcom have...
> >
> >> +
> >> +description: |
> >> +  This driver provides a QMI service that allows remote processors (like 
> >> modem)
> >> +  to request additional memory. It is used for applications like GPS in 
> >> modem.
> > If the memory region is defined in reserved-memory, how are you 
> > allocating additional memory? 
> 
> Initially remoteproc is loaded into it's own reserved-memory region
> but qcom decided that they sometimes need more memory than that.
> Memshare driver in msm8916 downstream tree seem to blindly allocate
> DMA region for every request that it gets. Additionally for those
> clients described in the DT, they do the DMA allocation on boot
> time and never free the region. They call it "guaranteed" allocation.
> 
> On msm8916 only one "guaranteed" client seem to be used so I decided
> to implement it with reserved-memory node. On newer platforms they
> seem to have more clients but I think that the driver can be easily
> extended to support dynamic allocation if someone really needs it.
> 

Is the "guaranteed" memory required to come from the reserved-memory
part of memory, or could it simply be allocated on demand as well (or
preallocated, but at a dynamic address)?

If these allocations always came from a reserved-memory region, then
adding a "qcom,memshare" compatible to the reserved-memory node itself
seems like a reasonable approach. But if dma_alloc is sufficient, and
there's cases where there's no "guaranteed" region, perhaps we should
just describe this as part of the remoteproc node (i.e. essentially
flipping the node/subnode in your current binding).


E.g. can we get away with simply adding an optional qcom,memshare-node
to the remoteproc binding and when that's present we make the Qualcomm
remoteproc drivers spawn the memshare handler and listen for requests
from that node?

> I tried to explain that in the cover letter but I think I made some
> mistake as I don't see it in the Patchwork.
> 
> >> +
> >> +maintainers:
> >> +  - Nikita Travkin 
> >> +
> >> +properties:
> >> +  compatible:
> >> +const: qcom,memshare
> >> +
> >> +  qcom,legacy-client:
> >> +$ref: /schemas/types.yaml#/definitions/phandle
> >> +description: Phandle to a memshare client node used for legacy 
> >> requests.
> >> +
> >> +  "#address-cells":
> >> +const: 1
> >> +
> >> +  "#size-cells":
> >> +const: 0
> >> +
> >> +patternProperties:
> >> +  "^.*@[0-9]+$":
> >> +type: object
> >> +
> >> +properties:
> >> +  reg:
> >> +description: Proc-ID for clients in this node.
> > What's Proc-ID?
> 
> The requests from the remote nodes contain client-id and proc-id
> that are supposed to differentiate the clients. It's possible to
> find the values in downstream DT or by observing what messages
> are received by the memshare service (I left dev_dbg logging in
> the driver for that reason)
> 
> I think I should reword it to make this more apparent, maybe
> "Proc-ID that clients in this node send."?
> 

If this is a constant for each remote and we make this a child thing of
remoteproc perhaps encode the number in the remoteproc nodes?

(We still need something in DT to state that we want a memshare for
a given platform/remoteproc)

> >
> >> +
> >> +  qcom,qrtr-node:
> >> +$ref: /schemas/types.yaml#/definitions/uint32
> >> +description: Node from which the requests are expected.
> >> +
> >> +  "#address-cells":
> >> +const: 1
> >> +
> >> +  "#size-cells":
> >> +const: 0
> >> +
> >> +patternProperties:
> >> +  "^.*@[0-9]+$":
> >> +

[PATCH v2 6/8] MIPS: pci-legacy: remove redundant info messages

Remove the following pci-legacy message:
  PCI host bridge /pci@44/host-bridge ranges:
   MEM 0x2000..0x2fff
IO 0x0046..0x0046

It is followed shortly by the same data from pci_register_host_bridge:
  PCI host bridge to bus :00
  pci_bus :00: root bus resource [mem 0x2000-0x2fff]
  pci_bus :00: root bus resource [io  0x46-0x46]

Signed-off-by: Ilya Lipnitskiy 
---
 arch/mips/pci/pci-legacy.c | 7 ---
 1 file changed, 7 deletions(-)

diff --git a/arch/mips/pci/pci-legacy.c b/arch/mips/pci/pci-legacy.c
index 3a909194284a..ec3f52ade72d 100644
--- a/arch/mips/pci/pci-legacy.c
+++ b/arch/mips/pci/pci-legacy.c
@@ -140,7 +140,6 @@ void pci_load_of_ranges(struct pci_controller *hose, struct 
device_node *node)
struct of_pci_range range;
struct of_pci_range_parser parser;
 
-   pr_info("PCI host bridge %pOF ranges:\n", node);
hose->of_node = node;
 
if (of_pci_range_parser_init(, node))
@@ -151,18 +150,12 @@ void pci_load_of_ranges(struct pci_controller *hose, 
struct device_node *node)
 
switch (range.flags & IORESOURCE_TYPE_BITS) {
case IORESOURCE_IO:
-   pr_info("  IO 0x%016llx..0x%016llx\n",
-   range.cpu_addr,
-   range.cpu_addr + range.size - 1);
hose->io_map_base =
(unsigned long)ioremap(range.cpu_addr,
   range.size);
res = hose->io_resource;
break;
case IORESOURCE_MEM:
-   pr_info(" MEM 0x%016llx..0x%016llx\n",
-   range.cpu_addr,
-   range.cpu_addr + range.size - 1);
res = hose->mem_resource;
break;
}
-- 
2.31.1

[PATCH v2 8/8] MIPS: pci-legacy: use generic pci_enable_resources

Follow the reasoning from commit 842de40d93e0 ("PCI: add generic
pci_enable_resources()"):

  The only functional difference from the MIPS version is that the
  generic one uses "!r->parent" to check for resource collisions
  instead of "!r->start && r->end".

That should have no effect on any pci-legacy driver.

Suggested-by: Bjorn Helgaas 
Signed-off-by: Ilya Lipnitskiy 
---
 arch/mips/pci/pci-legacy.c | 40 ++
 1 file changed, 2 insertions(+), 38 deletions(-)

diff --git a/arch/mips/pci/pci-legacy.c b/arch/mips/pci/pci-legacy.c
index 78c22987bef0..c24226ea0a6e 100644
--- a/arch/mips/pci/pci-legacy.c
+++ b/arch/mips/pci/pci-legacy.c
@@ -241,47 +241,11 @@ static int __init pcibios_init(void)
 
 subsys_initcall(pcibios_init);
 
-static int pcibios_enable_resources(struct pci_dev *dev, int mask)
-{
-   u16 cmd, old_cmd;
-   int idx;
-   struct resource *r;
-
-   pci_read_config_word(dev, PCI_COMMAND, );
-   old_cmd = cmd;
-   for (idx=0; idx < PCI_NUM_RESOURCES; idx++) {
-   /* Only set up the requested stuff */
-   if (!(mask & (1flags & (IORESOURCE_IO | IORESOURCE_MEM)))
-   continue;
-   if ((idx == PCI_ROM_RESOURCE) &&
-   (!(r->flags & IORESOURCE_ROM_ENABLE)))
-   continue;
-   if (!r->start && r->end) {
-   pci_err(dev,
-   "can't enable device: resource collisions\n");
-   return -EINVAL;
-   }
-   if (r->flags & IORESOURCE_IO)
-   cmd |= PCI_COMMAND_IO;
-   if (r->flags & IORESOURCE_MEM)
-   cmd |= PCI_COMMAND_MEMORY;
-   }
-   if (cmd != old_cmd) {
-   pci_info(dev, "enabling device (%04x -> %04x)\n", old_cmd, cmd);
-   pci_write_config_word(dev, PCI_COMMAND, cmd);
-   }
-   return 0;
-}
-
 int pcibios_enable_device(struct pci_dev *dev, int mask)
 {
-   int err;
+   int err = pci_enable_resources(dev, mask);
 
-   if ((err = pcibios_enable_resources(dev, mask)) < 0)
+   if (err < 0)
return err;
 
return pcibios_plat_dev_init(dev);
-- 
2.31.1

[PATCH v2 3/8] MIPS: pci-rt3883: trivial: remove unused variable

Fixes the following compiler warning:
  warning: unused variable 'flags' [-Wunused-variable]

Fixes: e5067c718b3a ("MIPS: pci-rt3883: Remove odd locking in PCI config space 
access code")
Signed-off-by: Ilya Lipnitskiy 
Acked-by: Sergey Ryazanov 
Cc: triv...@kernel.org
---
 arch/mips/pci/pci-rt3883.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/arch/mips/pci/pci-rt3883.c b/arch/mips/pci/pci-rt3883.c
index 0ac6346026d0..e422f78db5bc 100644
--- a/arch/mips/pci/pci-rt3883.c
+++ b/arch/mips/pci/pci-rt3883.c
@@ -100,7 +100,6 @@ static u32 rt3883_pci_read_cfg32(struct 
rt3883_pci_controller *rpc,
   unsigned bus, unsigned slot,
   unsigned func, unsigned reg)
 {
-   unsigned long flags;
u32 address;
u32 ret;
 
@@ -116,7 +115,6 @@ static void rt3883_pci_write_cfg32(struct 
rt3883_pci_controller *rpc,
 unsigned bus, unsigned slot,
 unsigned func, unsigned reg, u32 val)
 {
-   unsigned long flags;
u32 address;
 
address = rt3883_pci_get_cfgaddr(bus, slot, func, reg);
@@ -229,7 +227,6 @@ static int rt3883_pci_config_read(struct pci_bus *bus, 
unsigned int devfn,
  int where, int size, u32 *val)
 {
struct rt3883_pci_controller *rpc;
-   unsigned long flags;
u32 address;
u32 data;
 
@@ -263,7 +260,6 @@ static int rt3883_pci_config_write(struct pci_bus *bus, 
unsigned int devfn,
   int where, int size, u32 val)
 {
struct rt3883_pci_controller *rpc;
-   unsigned long flags;
u32 address;
u32 data;
 
-- 
2.31.1

[PATCH v2 7/8] MIPS: pci-legacy: remove busn_resource field

No drivers set the busn_resource field in the pci_controller struct.
Commit 7ee214b540d9 ("MIPS: PCI: Remove unused busn_offset") almost
removed it over 3 years ago. Remove it for good to free up memory and
eliminate messages like:
  pci_bus :00: root bus resource [??? 0x flags 0x0]

Signed-off-by: Ilya Lipnitskiy 
Cc: Bjorn Helgaas 
---
 arch/mips/include/asm/pci.h | 1 -
 arch/mips/pci/pci-legacy.c  | 1 -
 2 files changed, 2 deletions(-)

diff --git a/arch/mips/include/asm/pci.h b/arch/mips/include/asm/pci.h
index 6f48649201c5..9ffc8192adae 100644
--- a/arch/mips/include/asm/pci.h
+++ b/arch/mips/include/asm/pci.h
@@ -38,7 +38,6 @@ struct pci_controller {
struct resource *io_resource;
unsigned long io_offset;
unsigned long io_map_base;
-   struct resource *busn_resource;
 
 #ifndef CONFIG_PCI_DOMAINS_GENERIC
unsigned int index;
diff --git a/arch/mips/pci/pci-legacy.c b/arch/mips/pci/pci-legacy.c
index ec3f52ade72d..78c22987bef0 100644
--- a/arch/mips/pci/pci-legacy.c
+++ b/arch/mips/pci/pci-legacy.c
@@ -89,7 +89,6 @@ static void pcibios_scanbus(struct pci_controller *hose)
hose->mem_resource, hose->mem_offset);
pci_add_resource_offset(,
hose->io_resource, hose->io_offset);
-   pci_add_resource(, hose->busn_resource);
list_splice_init(, >windows);
bridge->dev.parent = NULL;
bridge->sysdata = hose;
-- 
2.31.1

[PATCH v2 5/8] MIPS: pci-legacy: stop using of_pci_range_to_resource

Mirror commit aeba3731b150 ("powerpc/pci: Fix IO space breakage after
of_pci_range_to_resource() change").

Most MIPS platforms do not define PCI_IOBASE, nor implement
pci_address_to_pio(). Moreover, IO_SPACE_LIMIT is 0x for most MIPS
platforms. of_pci_range_to_resource passes the _start address_ of the IO
range into pci_address_to_pio, which then checks it against
IO_SPACE_LIMIT and fails, because for MIPS platforms that use
pci-legacy (pci-lantiq, pci-rt3883, pci-mt7620), IO ranges start much
higher than 0x.

In fact, pci-mt7621 in staging already works around this problem, see
commit 09dd629eeabb ("staging: mt7621-pci: fix io space and properly set
resource limits")

So just stop using of_pci_range_to_resource, which does not work for
MIPS.

Fixes PCI errors like:
  pci_bus :00: root bus resource [io  0x]

Fixes: 0b0b0893d49b ("of/pci: Fix the conversion of IO ranges into IO 
resources")
Signed-off-by: Ilya Lipnitskiy 
Cc: Liviu Dudau 
---
 arch/mips/pci/pci-legacy.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/mips/pci/pci-legacy.c b/arch/mips/pci/pci-legacy.c
index 39052de915f3..3a909194284a 100644
--- a/arch/mips/pci/pci-legacy.c
+++ b/arch/mips/pci/pci-legacy.c
@@ -166,8 +166,13 @@ void pci_load_of_ranges(struct pci_controller *hose, 
struct device_node *node)
res = hose->mem_resource;
break;
}
-   if (res != NULL)
-   of_pci_range_to_resource(, node, res);
+   if (res != NULL) {
+   res->name = node->full_name;
+   res->flags = range.flags;
+   res->start = range.cpu_addr;
+   res->end = range.cpu_addr + range.size - 1;
+   res->parent = res->child = res->sibling = NULL;
+   }
}
 }
 
-- 
2.31.1

[PATCH v2 1/8] MIPS: pci-rt2880: fix slot 0 configuration

pci_fixup_irqs() used to call pcibios_map_irq on every PCI device, which
for RT2880 included bus 0 slot 0. After pci_fixup_irqs() got removed,
only slots/funcs with devices attached would be called. While arguably
the right thing, that left no chance for this driver to ever initialize
slot 0, effectively bricking PCI and USB on RT2880 devices such as the
Belkin F5D8235-4 v1.

Slot 0 configuration needs to happen after PCI bus enumeration, but
before any device at slot 0x11 (func 0 or 1) is talked to. That was
determined empirically by testing on a Belkin F5D8235-4 v1 device. A
minimal BAR 0 config write followed by read, then setting slot 0
PCI_COMMAND to MASTER | IO | MEMORY is all that seems to be required for
proper functionality.

Tested by ensuring that full- and high-speed USB devices get enumerated
on the Belkin F5D8235-4 v1 (with an out of tree DTS file from OpenWrt).

Fixes: 04c81c7293df ("MIPS: PCI: Replace pci_fixup_irqs() call with host bridge 
IRQ mapping hooks")
Signed-off-by: Ilya Lipnitskiy 
Cc: Lorenzo Pieralisi 
Cc: Tobias Wolf 
Cc:  # v4.14+
---
 arch/mips/pci/pci-rt2880.c | 37 -
 1 file changed, 24 insertions(+), 13 deletions(-)

diff --git a/arch/mips/pci/pci-rt2880.c b/arch/mips/pci/pci-rt2880.c
index e1f12e398136..f1538d2be89e 100644
--- a/arch/mips/pci/pci-rt2880.c
+++ b/arch/mips/pci/pci-rt2880.c
@@ -180,7 +180,6 @@ static inline void rt2880_pci_write_u32(unsigned long reg, 
u32 val)
 
 int pcibios_map_irq(const struct pci_dev *dev, u8 slot, u8 pin)
 {
-   u16 cmd;
int irq = -1;
 
if (dev->bus->number != 0)
@@ -188,8 +187,6 @@ int pcibios_map_irq(const struct pci_dev *dev, u8 slot, u8 
pin)
 
switch (PCI_SLOT(dev->devfn)) {
case 0x00:
-   rt2880_pci_write_u32(PCI_BASE_ADDRESS_0, 0x0800);
-   (void) rt2880_pci_read_u32(PCI_BASE_ADDRESS_0);
break;
case 0x11:
irq = RT288X_CPU_IRQ_PCI;
@@ -201,16 +198,6 @@ int pcibios_map_irq(const struct pci_dev *dev, u8 slot, u8 
pin)
break;
}
 
-   pci_write_config_byte((struct pci_dev *) dev,
-   PCI_CACHE_LINE_SIZE, 0x14);
-   pci_write_config_byte((struct pci_dev *) dev, PCI_LATENCY_TIMER, 0xFF);
-   pci_read_config_word((struct pci_dev *) dev, PCI_COMMAND, );
-   cmd |= PCI_COMMAND_MASTER | PCI_COMMAND_IO | PCI_COMMAND_MEMORY |
-   PCI_COMMAND_INVALIDATE | PCI_COMMAND_FAST_BACK |
-   PCI_COMMAND_SERR | PCI_COMMAND_WAIT | PCI_COMMAND_PARITY;
-   pci_write_config_word((struct pci_dev *) dev, PCI_COMMAND, cmd);
-   pci_write_config_byte((struct pci_dev *) dev, PCI_INTERRUPT_LINE,
- dev->irq);
return irq;
 }
 
@@ -251,6 +238,30 @@ static int rt288x_pci_probe(struct platform_device *pdev)
 
 int pcibios_plat_dev_init(struct pci_dev *dev)
 {
+   static bool slot0_init;
+
+   /*
+* Nobody seems to initialize slot 0, but this platform requires it, so
+* do it once when some other slot is being enabled. The PCI subsystem
+* should configure other slots properly, so no need to do anything
+* special for those.
+*/
+   if (!slot0_init && dev->bus->number == 0) {
+   u16 cmd;
+   u32 bar0;
+
+   slot0_init = true;
+
+   pci_bus_write_config_dword(dev->bus, 0, PCI_BASE_ADDRESS_0,
+  0x0800);
+   pci_bus_read_config_dword(dev->bus, 0, PCI_BASE_ADDRESS_0,
+ );
+
+   pci_bus_read_config_word(dev->bus, 0, PCI_COMMAND, );
+   cmd |= PCI_COMMAND_MASTER | PCI_COMMAND_IO | PCI_COMMAND_MEMORY;
+   pci_bus_write_config_word(dev->bus, 0, PCI_COMMAND, cmd);
+   }
+
return 0;
 }
 
-- 
2.31.1

[PATCH v2 4/8] MIPS: pci-rt3883: more accurate DT error messages

Existing strings do not make sense: one is always NULL and the other
refers to the wrong parent node.

Signed-off-by: Ilya Lipnitskiy 
---
 arch/mips/pci/pci-rt3883.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/mips/pci/pci-rt3883.c b/arch/mips/pci/pci-rt3883.c
index e422f78db5bc..aebd4964ea34 100644
--- a/arch/mips/pci/pci-rt3883.c
+++ b/arch/mips/pci/pci-rt3883.c
@@ -431,8 +431,7 @@ static int rt3883_pci_probe(struct platform_device *pdev)
 
if (!rpc->intc_of_node) {
dev_err(dev, "%pOF has no %s child node",
-   rpc->intc_of_node,
-   "interrupt controller");
+   np, "interrupt controller");
return -EINVAL;
}
 
@@ -446,8 +445,7 @@ static int rt3883_pci_probe(struct platform_device *pdev)
 
if (!rpc->pci_controller.of_node) {
dev_err(dev, "%pOF has no %s child node",
-   rpc->intc_of_node,
-   "PCI host bridge");
+   np, "PCI host bridge");
err = -EINVAL;
goto err_put_intc_node;
}
-- 
2.31.1

[PATCH v2 0/8] MIPS: fixes for PCI legacy drivers (rt2880, rt3883)

One major fix for rt2880-pci in the first patch - fixes breakage that
existed since v4.14.

Other more minor fixes, cleanups, and improvements that either free up
memory, make dmesg messages clearer, or remove redundant dmesg output.

v2:
- Do not use internal pci-rt2880 config read and write functions after
  the device has been registered with the PCI subsystem to avoid races.
  Use safe pci_bus_{read,write}_config_{d}word wrappers instead.

Ilya Lipnitskiy (8):
  MIPS: pci-rt2880: fix slot 0 configuration
  MIPS: pci-rt2880: remove unneeded locks
  MIPS: pci-rt3883: trivial: remove unused variable
  MIPS: pci-rt3883: more accurate DT error messages
  MIPS: pci-legacy: stop using of_pci_range_to_resource
  MIPS: pci-legacy: remove redundant info messages
  MIPS: pci-legacy: remove busn_resource field
  MIPS: pci-legacy: use generic pci_enable_resources

 arch/mips/include/asm/pci.h |  1 -
 arch/mips/pci/pci-legacy.c  | 57 ++---
 arch/mips/pci/pci-rt2880.c  | 50 
 arch/mips/pci/pci-rt3883.c  | 10 ++-
 4 files changed, 35 insertions(+), 83 deletions(-)

-- 
2.31.1

[PATCH v2 2/8] MIPS: pci-rt2880: remove unneeded locks

Mirror pci-rt3883 fix from commit e5067c718b3a ("MIPS: pci-rt3883:
Remove odd locking in PCI config space access code"). pci-rt2880 shares
the driver layout with pci-rt3883 and the same reasons apply.

Caller (generic PCI code) already does proper locking, so no need to add
another one here. Local PCI read/write functions are never called
simultaneously, also they do not require synchronization with the PCI
controller ops, since they are used before the controller registration.

Suggested-by: Sergey Ryazanov 
Signed-off-by: Ilya Lipnitskiy 
---
 arch/mips/pci/pci-rt2880.c | 13 -
 1 file changed, 13 deletions(-)

diff --git a/arch/mips/pci/pci-rt2880.c b/arch/mips/pci/pci-rt2880.c
index f1538d2be89e..e9dd01431f21 100644
--- a/arch/mips/pci/pci-rt2880.c
+++ b/arch/mips/pci/pci-rt2880.c
@@ -41,7 +41,6 @@
 #define RT2880_PCI_REG_ARBCTL  0x80
 
 static void __iomem *rt2880_pci_base;
-static DEFINE_SPINLOCK(rt2880_pci_lock);
 
 static u32 rt2880_pci_reg_read(u32 reg)
 {
@@ -63,17 +62,14 @@ static inline u32 rt2880_pci_get_cfgaddr(unsigned int bus, 
unsigned int slot,
 static int rt2880_pci_config_read(struct pci_bus *bus, unsigned int devfn,
  int where, int size, u32 *val)
 {
-   unsigned long flags;
u32 address;
u32 data;
 
address = rt2880_pci_get_cfgaddr(bus->number, PCI_SLOT(devfn),
 PCI_FUNC(devfn), where);
 
-   spin_lock_irqsave(_pci_lock, flags);
rt2880_pci_reg_write(address, RT2880_PCI_REG_CONFIG_ADDR);
data = rt2880_pci_reg_read(RT2880_PCI_REG_CONFIG_DATA);
-   spin_unlock_irqrestore(_pci_lock, flags);
 
switch (size) {
case 1:
@@ -93,14 +89,12 @@ static int rt2880_pci_config_read(struct pci_bus *bus, 
unsigned int devfn,
 static int rt2880_pci_config_write(struct pci_bus *bus, unsigned int devfn,
   int where, int size, u32 val)
 {
-   unsigned long flags;
u32 address;
u32 data;
 
address = rt2880_pci_get_cfgaddr(bus->number, PCI_SLOT(devfn),
 PCI_FUNC(devfn), where);
 
-   spin_lock_irqsave(_pci_lock, flags);
rt2880_pci_reg_write(address, RT2880_PCI_REG_CONFIG_ADDR);
data = rt2880_pci_reg_read(RT2880_PCI_REG_CONFIG_DATA);
 
@@ -119,7 +113,6 @@ static int rt2880_pci_config_write(struct pci_bus *bus, 
unsigned int devfn,
}
 
rt2880_pci_reg_write(data, RT2880_PCI_REG_CONFIG_DATA);
-   spin_unlock_irqrestore(_pci_lock, flags);
 
return PCIBIOS_SUCCESSFUL;
 }
@@ -151,31 +144,25 @@ static struct pci_controller rt2880_pci_controller = {
 
 static inline u32 rt2880_pci_read_u32(unsigned long reg)
 {
-   unsigned long flags;
u32 address;
u32 ret;
 
address = rt2880_pci_get_cfgaddr(0, 0, 0, reg);
 
-   spin_lock_irqsave(_pci_lock, flags);
rt2880_pci_reg_write(address, RT2880_PCI_REG_CONFIG_ADDR);
ret = rt2880_pci_reg_read(RT2880_PCI_REG_CONFIG_DATA);
-   spin_unlock_irqrestore(_pci_lock, flags);
 
return ret;
 }
 
 static inline void rt2880_pci_write_u32(unsigned long reg, u32 val)
 {
-   unsigned long flags;
u32 address;
 
address = rt2880_pci_get_cfgaddr(0, 0, 0, reg);
 
-   spin_lock_irqsave(_pci_lock, flags);
rt2880_pci_reg_write(address, RT2880_PCI_REG_CONFIG_ADDR);
rt2880_pci_reg_write(val, RT2880_PCI_REG_CONFIG_DATA);
-   spin_unlock_irqrestore(_pci_lock, flags);
 }
 
 int pcibios_map_irq(const struct pci_dev *dev, u8 slot, u8 pin)
-- 
2.31.1

[PATCH v2] time: Fix overwrite err unexpected in clock_adjtime32

2021-04-13 Thread Chen Jun

the correct error is covered by put_old_timex32.

Fixes: 3a4d44b61625 ("ntp: Move adjtimex related compat syscalls to native 
counterparts")
Signed-off-by: Chen Jun 
---

v2: Make "Fixes" tag correct

 kernel/time/posix-timers.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/time/posix-timers.c b/kernel/time/posix-timers.c
index bf540f5a..dd5697d 100644
--- a/kernel/time/posix-timers.c
+++ b/kernel/time/posix-timers.c
@@ -1191,8 +1191,8 @@ SYSCALL_DEFINE2(clock_adjtime32, clockid_t, which_clock,
 
err = do_clock_adjtime(which_clock, );
 
-   if (err >= 0)
-   err = put_old_timex32(utp, );
+   if (err >= 0 && put_old_timex32(utp, ))
+   return -EFAULT;
 
return err;
 }
-- 
2.9.4

Re: [PATCH 2/5] swap: fix do_swap_page() race with swapoff

Miaohe Lin  writes:

> On 2021/4/13 9:27, Huang, Ying wrote:
>> Miaohe Lin  writes:
>> 
>>> When I was investigating the swap code, I found the below possible race
>>> window:
>>>
>>> CPU 1   CPU 2
>>> -   -
>>> do_swap_page
>>>   synchronous swap_readpage
>>> alloc_page_vma
>>> swapoff
>>>   release swap_file, bdev, or ...
>>>   swap_readpage
>>> check sis->flags is ok
>>>   access swap_file, bdev...[oops!]
>>> si->flags = 0
>>>
>>> Using current get/put_swap_device() to guard against concurrent swapoff for
>>> swap_readpage() looks terrible because swap_readpage() may take really long
>>> time. And this race may not be really pernicious because swapoff is usually
>>> done when system shutdown only. To reduce the performance overhead on the
>>> hot-path as much as possible, it appears we can use the percpu_ref to close
>>> this race window(as suggested by Huang, Ying).
>>>
>>> Fixes: 235b62176712 ("mm/swap: add cluster lock")
>> 
>> This isn't the commit that introduces the race.  You can use `git blame`
>> find out the correct commit.  For this it's commit 0bcac06f27d7 "mm,
>> swap: skip swapcache for swapin of synchronous device".
>> 
>
> Sorry about it! What I refer to is commit eb085574a752 ("mm, swap: fix race 
> between
> swapoff and some swap operations"). And I think this commit does not fix the 
> race
> condition completely, so I reuse the Fixes tag inside it.
>
>> And I suggest to merge 1/5 and 2/5 to make it easy to get the full
>> picture.
>> 
>>> Signed-off-by: Miaohe Lin 
>>> ---
>>>  include/linux/swap.h |  2 +-
>>>  mm/memory.c  | 10 ++
>>>  mm/swapfile.c| 28 +++-
>>>  3 files changed, 22 insertions(+), 18 deletions(-)
>>>
>>> diff --git a/include/linux/swap.h b/include/linux/swap.h
>>> index 849ba5265c11..9066addb57fd 100644
>>> --- a/include/linux/swap.h
>>> +++ b/include/linux/swap.h
>>> @@ -513,7 +513,7 @@ sector_t swap_page_sector(struct page *page);
>>>  
>>>  static inline void put_swap_device(struct swap_info_struct *si)
>>>  {
>>> -   rcu_read_unlock();
>>> +   percpu_ref_put(>users);
>>>  }
>>>  
>>>  #else /* CONFIG_SWAP */
>>> diff --git a/mm/memory.c b/mm/memory.c
>>> index cc71a445c76c..8543c47b955c 100644
>>> --- a/mm/memory.c
>>> +++ b/mm/memory.c
>>> @@ -3311,6 +3311,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
>>>  {
>>> struct vm_area_struct *vma = vmf->vma;
>>> struct page *page = NULL, *swapcache;
>>> +   struct swap_info_struct *si = NULL;
>>> swp_entry_t entry;
>>> pte_t pte;
>>> int locked;
>>> @@ -3339,6 +3340,11 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
>>> }
>>>  
>>>
>> 
>> I suggest to add comments here as follows (words copy from Matthew Wilcox)
>> 
>>  /* Prevent swapoff from happening to us */
>
> Ok.
>
>> 
>>> +   si = get_swap_device(entry);
>>> +   /* In case we raced with swapoff. */
>>> +   if (unlikely(!si))
>>> +   goto out;
>>> +
>> 
>> Because we wrap the whole do_swap_page() with get/put_swap_device()
>> now.  We can remove several get/put_swap_device() for function called by
>> do_swap_page().  That can be another optimization patch.
>
> I tried to remove several get/put_swap_device() for function called
> by do_swap_page() only before I send this series. But it seems they have
> other callers without proper get/put_swap_device().

Then we need to revise these callers instead.  Anyway, can be another
series.

Best Regards,
Huang, Ying

Re: [PATCH v3] nbd_genl_status: null check for nla_nest_start

2021-04-13 Thread Mark-PK Tsai

From: Michal Kubecek 

Hi,

I found that CVE-2019-16089 still exist in upstream kernel.
Does anyone know why this patch was not merged?

Re: [PATCH 6/7] i915: Convert to verify_page_range()

2021-04-13 Thread Kees Cook

On Mon, Apr 12, 2021 at 10:00:18AM +0200, Peter Zijlstra wrote:
> check_{present,absent}() only need R/O access, use verify_page_range()
> instead to remove modular use of apply_to_page_range().
> 
> Signed-off-by: Peter Zijlstra (Intel) 
> ---
>  drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c |   16 
>  1 file changed, 8 insertions(+), 8 deletions(-)
> 
> --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
> +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
> @@ -1225,9 +1225,9 @@ static int igt_mmap_gpu(void *arg)
>   return 0;
>  }
>  
> -static int check_present_pte(pte_t *pte, unsigned long addr, void *data)
> +static int check_present_pte(pte_t pte, unsigned long addr, void *data)
>  {
> - if (!pte_present(*pte) || pte_none(*pte)) {
> + if (!pte_present(pte) || pte_none(pte)) {
>   pr_err("missing PTE:%lx\n",
>  (addr - (unsigned long)data) >> PAGE_SHIFT);
>   return -EINVAL;
> @@ -1236,9 +1236,9 @@ static int check_present_pte(pte_t *pte,
>   return 0;
>  }
>  
> -static int check_absent_pte(pte_t *pte, unsigned long addr, void *data)
> +static int check_absent_pte(pte_t pte, unsigned long addr, void *data)
>  {
> - if (pte_present(*pte) && !pte_none(*pte)) {
> + if (pte_present(pte) && !pte_none(pte)) {
>   pr_err("present PTE:%lx; expected to be revoked\n",
>  (addr - (unsigned long)data) >> PAGE_SHIFT);
>   return -EINVAL;
> @@ -1249,14 +1249,14 @@ static int check_absent_pte(pte_t *pte,
>  
>  static int check_present(unsigned long addr, unsigned long len)
>  {
> - return apply_to_page_range(current->mm, addr, len,
> -check_present_pte, (void *)addr);
> + return verify_page_range(current->mm, addr, len,
> +  check_present_pte, (void *)addr);
>  }

And this would be:

static int check_present(unsigned long addr, unsigned long len)
unsigned long fail;

fail = verify_page_range(current->mm, addr, len, check_present_pte);
if (fail) {
pr_err("missing PTE:%lx\n", addr);
return -EINVAL;
}
}

(Oh, and I think I messed up the page shifting macro name in the earlier
one...)


-- 
Kees Cook

RE: [PATCH] arm64: dts: imx8mq-evk: add one regulator used to power up pcie phy

2021-04-13 Thread Richard Zhu

Hi Shawn:
Regarding Lucas' advice, this patch should be split out and post for you to 
pick up into DT tree.
Since the other two patches are accepted by PCIe tree now.
Can you help to pick up this patch?
Thanks in advanced.
https://patchwork.kernel.org/project/linux-pci/patch/1616661882-26487-3-git-send-email-hongxing@nxp.com/
https://patchwork.kernel.org/project/linux-pci/patch/1617091701-6444-2-git-send-email-hongxing@nxp.com/

Best Regards
Richard Zhu

> -Original Message-
> From: Richard Zhu 
> Sent: Wednesday, April 14, 2021 10:26 AM
> To: shawn...@kernel.org
> Cc: l.st...@pengutronix.de; dl-linux-imx ;
> linux-arm-ker...@lists.infradead.org; linux-kernel@vger.kernel.org; Richard
> Zhu 
> Subject: [PATCH] arm64: dts: imx8mq-evk: add one regulator used to power
> up pcie phy
> 
> Both 1.8v and 3.3v power supplies can be used by i.MX8MQ PCIe PHY.
> In default, the PCIE_VPH voltage is suggested to be 1.8v refer to data sheet.
> When PCIE_VPH is supplied by 3.3v in the HW schematic design, the
> VREG_BYPASS bits of GPR registers should be cleared from default value 1b'1
> to 1b'0. Thus, the internal 3v3 to 1v8 translator would be turned on.
> 
> Signed-off-by: Richard Zhu 
> Reviewed-by: Lucas Stach 
> ---
>  arch/arm64/boot/dts/freescale/imx8mq-evk.dts | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/arch/arm64/boot/dts/freescale/imx8mq-evk.dts
> b/arch/arm64/boot/dts/freescale/imx8mq-evk.dts
> index 85b045253a0e..4d2035e3dd7c 100644
> --- a/arch/arm64/boot/dts/freescale/imx8mq-evk.dts
> +++ b/arch/arm64/boot/dts/freescale/imx8mq-evk.dts
> @@ -318,6 +318,7 @@
>< IMX8MQ_CLK_PCIE1_PHY>,
><_refclk>;
>   clock-names = "pcie", "pcie_aux", "pcie_phy", "pcie_bus";
> + vph-supply = <_reg>;
>   status = "okay";
>  };
> 
> --
> 2.17.1

Re: [PATCH 4/7] mm: Introduce verify_page_range()

2021-04-13 Thread Kees Cook

On Tue, Apr 13, 2021 at 09:36:32AM +0200, Peter Zijlstra wrote:
> On Mon, Apr 12, 2021 at 01:05:09PM -0700, Kees Cook wrote:
> > On Mon, Apr 12, 2021 at 10:00:16AM +0200, Peter Zijlstra wrote:
> > > +struct vpr_data {
> > > + int (*fn)(pte_t pte, unsigned long addr, void *data);
> > > + void *data;
> > > +};
> > 
> > Eeerg. This is likely to become an attack target itself. Stored function
> > pointer with stored (3rd) argument.
> > 
> > This doesn't seem needed: only DRM uses it, and that's for error
> > reporting. I'd rather plumb back errors in a way to not have to add
> > another place in the kernel where we do func+arg stored calling.
> 
> Is this any better? It does have the stored pointer, but not a stored
> argument, assuming you don't count returns as arguments I suppose.

It's better in the sense that it's not the func/arg pair that really
bugs me, yes. :)

> The alternative is refactoring apply_to_page_range() :-/

Yeah, I'm looking now, I see what you mean.

> ---
> 
> struct vpr_data {
>   bool (*fn)(pte_t pte, unsigned long addr);
>   unsigned long addr;
> };
> 
> static int vpr_fn(pte_t *pte, unsigned long addr, void *data)
> {
>   struct vpr_data *vpr = data;
>   if (!vpr->fn(*pte, addr)) {
>   vpr->addr = addr;
>   return -EINVAL;
>   }
>   return 0;
> }

My point about passing "addr" was that nothing in the callback actually
needs it -- the top level can just as easily report the error. And that
the helper is always vpr_fn(), so it doesn't need to be passed either.

So the addr can just be encoded in "int", and no structure is needed at:

typedef bool (*vpr_fn_t)(pte_t pte);

static int vpr_fn(pte_t *pte, unsigned long addr, void *data)
{
vpr_fn_t callback = data;

if (!callback(*pte))
return addr >> PAGE_SIZE;
return 0;
}

unsigned long verify_page_range(struct mm_struct *mm,
unsigned long addr, unsigned long size,
vpr_fn_t callback)
{
return apply_to_page_range(mm, addr, size, vpr_fn, callback) << 
PAGE_SIZE;
}

But maybe I'm missing something?

-- 
Kees Cook

Re: [v2 PATCH 6/7] mm: migrate: check mapcount for THP instead of ref count