Re: [PATCH] gpu: drm: i915: fix error return code of igt_buddy_alloc_smoke()

2021-03-08 Thread Chris Wilson
Quoting Jia-Ju Bai (2021-03-08 08:59:52)
> When i915_random_order() returns NULL to order, no error return code of
> igt_buddy_alloc_smoke() is assigned.
> To fix this bug, err is assigned with -EINVAL in this case.

It would not be EINVAL since that is used for a reference failure, but
in this case the idea was to return 0 as no testing was done and the
ENOMEM was raised before testing began i.e. not an internal and
unexpected driver allocation failure.
-Chris


Re: [PATCH] gpu: drm: i915: fix error return code of igt_threaded_blt()

2021-03-08 Thread Chris Wilson
Quoting Jia-Ju Bai (2021-03-08 09:07:22)
> When kcalloc() returns NULL to tsk or thread, no error code of 
> igt_threaded_blt() is returned.
> To fix this bug, -ENOMEM is returned as error code.

Because we decided to skip the test if it could not be run due to
insufficient memory, as opposed to an internal allocation failure from
the driver.
-Chris


[PATCH] rtc: cmos: Disable irq around direct invocation of cmos_interrupt()

2021-03-05 Thread Chris Wilson
9b/0xaa0
<4>[  254.194066]  pm_suspend.cold.8+0x301/0x34a
<4>[  254.194094]  state_store+0x7b/0xe0
<4>[  254.194124]  kernfs_fop_write_iter+0x11d/0x1c0
<4>[  254.194151]  new_sync_write+0x11d/0x1b0
<4>[  254.194183]  vfs_write+0x265/0x390
<4>[  254.194207]  ksys_write+0x5a/0xd0
<4>[  254.194232]  do_syscall_64+0x33/0x80
<4>[  254.194251]  entry_SYSCALL_64_after_hwframe+0x44/0xae
<4>[  254.194274] RIP: 0033:0x7f07d79691e7
<4>[  254.194293] Code: 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 
00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 
00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
<4>[  254.194312] RSP: 002b:7ffd9cc2c768 EFLAGS: 0246 ORIG_RAX: 
0001
<4>[  254.194337] RAX: ffda RBX: 0004 RCX: 
7f07d79691e7
<4>[  254.194352] RDX: 0004 RSI: 556ebfc63590 RDI: 
000b
<4>[  254.194366] RBP: 556ebfc63590 R08:  R09: 
0004
<4>[  254.194379] R10: 556ebf0ec2a6 R11: 0246 R12: 
0004

which breaks S3-resume on fi-kbl-soraka presumably as that's slow enough
to trigger the alarm during the suspend.

Fixes: 6950d046eb6e ("rtc: cmos: Replace spin_lock_irqsave with spin_lock in 
hard IRQ")
References: 66e4f4a9cc38 ("rtc: cmos: Use spin_lock_irqsave() in 
cmos_interrupt()"):
Signed-off-by: Chris Wilson 
Cc: Xiaofei Tan 
Cc: Alexandre Belloni 
Cc: Alessandro Zummo 
Cc: Ville Syrjälä 
---
 drivers/rtc/rtc-cmos.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/rtc/rtc-cmos.c b/drivers/rtc/rtc-cmos.c
index 670fd8a2970e..6545afb2f20e 100644
--- a/drivers/rtc/rtc-cmos.c
+++ b/drivers/rtc/rtc-cmos.c
@@ -1053,7 +1053,9 @@ static void cmos_check_wkalrm(struct device *dev)
 * ACK the rtc irq here
 */
if (t_now >= cmos->alarm_expires && cmos_use_acpi_alarm()) {
+   local_irq_disable();
cmos_interrupt(0, (void *)cmos->rtc);
+   local_irq_enable();
return;
}
 
-- 
2.20.1



Re: [PATCH] drm/i915: Enable -Wuninitialized

2021-02-17 Thread Chris Wilson
Quoting Nathan Chancellor (2021-02-16 21:29:54)
> -Wunintialized was disabled in commit c5627461490e ("drm/i915: Disable
> -Wuninitialized") because there were two warnings that were false
> positives. The first was due to DECLARE_WAIT_QUEUE_HEAD_ONSTACK, which
> was fixed in LLVM 9.0.0. The second was in busywait_stop, which was
> fixed in LLVM 10.0.0 (issue 415). The kernel's minimum version for LLVM
> is 10.0.1 so this warning can be safely enabled, where it has already
> caught a couple bugs.
> 
> Link: https://github.com/ClangBuiltLinux/linux/issues/220
> Link: https://github.com/ClangBuiltLinux/linux/issues/415
> Link: https://github.com/ClangBuiltLinux/linux/issues/499
> Link: 
> https://github.com/llvm/llvm-project/commit/2e040398f8d691cc378c1abb098824ff49f3f28f
> Link: 
> https://github.com/llvm/llvm-project/commit/c667cdc850c2aa821ffeedbc08c24bc985c59edd
> Fixes: c5627461490e ("drm/i915: Disable -Wuninitialized")
> References: 2ea4a7ba9bf6 ("drm/i915/gt: Avoid uninitialized use of rpcurupei 
> in frequency_show")
> References: 2034c2129bc4 ("drm/i915/display: Ensure that ret is always 
> initialized in icl_combo_phy_verify_state")
> Reported-by: Arnd Bergmann 
> Signed-off-by: Nathan Chancellor 

make CC=clang-11 now compiles cleanly for me as well,
Reviewed-by: Chris Wilson 
-Chris


[PATCH v3] kcmp: Support selection of SYS_kcmp without CHECKPOINT_RESTORE

2021-02-05 Thread Chris Wilson
Userspace has discovered the functionality offered by SYS_kcmp and has
started to depend upon it. In particular, Mesa uses SYS_kcmp for
os_same_file_description() in order to identify when two fd (e.g. device
or dmabuf) point to the same struct file. Since they depend on it for
core functionality, lift SYS_kcmp out of the non-default
CONFIG_CHECKPOINT_RESTORE into the selectable syscall category.

Rasmus Villemoes also pointed out that systemd uses SYS_kcmp to
deduplicate the per-service file descriptor store.

Note that some distributions such as Ubuntu are already enabling
CHECKPOINT_RESTORE in their configs and so, by extension, SYS_kcmp.

References: https://gitlab.freedesktop.org/drm/intel/-/issues/3046
Signed-off-by: Chris Wilson 
Cc: Kees Cook 
Cc: Andy Lutomirski 
Cc: Will Drewry 
Cc: Andrew Morton 
Cc: Dave Airlie 
Cc: Daniel Vetter 
Cc: Lucas Stach 
Cc: Rasmus Villemoes 
Cc: Cyrill Gorcunov 
Cc: sta...@vger.kernel.org
Acked-by: Daniel Vetter  # DRM depends on kcmp
Acked-by: Rasmus Villemoes  # systemd uses kcmp

---
v2:
  - Default n.
  - Borrrow help message from man kcmp.
  - Export get_epoll_tfile_raw_ptr() for CONFIG_KCMP
v3:
  - Select KCMP for CONFIG_DRM
---
 drivers/gpu/drm/Kconfig   |  3 +++
 fs/eventpoll.c|  4 ++--
 include/linux/eventpoll.h |  2 +-
 init/Kconfig  | 11 +++
 kernel/Makefile   |  2 +-
 tools/testing/selftests/seccomp/seccomp_bpf.c |  2 +-
 6 files changed, 19 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
index 0973f408d75f..af6c6d214d91 100644
--- a/drivers/gpu/drm/Kconfig
+++ b/drivers/gpu/drm/Kconfig
@@ -15,6 +15,9 @@ menuconfig DRM
select I2C_ALGOBIT
select DMA_SHARED_BUFFER
select SYNC_FILE
+# gallium uses SYS_kcmp for os_same_file_description() to de-duplicate
+# device and dmabuf fd. Let's make sure that is available for our userspace.
+   select KCMP
help
  Kernel-level support for the Direct Rendering Infrastructure (DRI)
  introduced in XFree86 4.0. If you say Y here, you need to select
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index a829af074eb5..3196474cbe24 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -979,7 +979,7 @@ static struct epitem *ep_find(struct eventpoll *ep, struct 
file *file, int fd)
return epir;
 }
 
-#ifdef CONFIG_CHECKPOINT_RESTORE
+#ifdef CONFIG_KCMP
 static struct epitem *ep_find_tfd(struct eventpoll *ep, int tfd, unsigned long 
toff)
 {
struct rb_node *rbp;
@@ -1021,7 +1021,7 @@ struct file *get_epoll_tfile_raw_ptr(struct file *file, 
int tfd,
 
return file_raw;
 }
-#endif /* CONFIG_CHECKPOINT_RESTORE */
+#endif /* CONFIG_KCMP */
 
 /**
  * Adds a new entry to the tail of the list in a lockless way, i.e.
diff --git a/include/linux/eventpoll.h b/include/linux/eventpoll.h
index 0350393465d4..593322c946e6 100644
--- a/include/linux/eventpoll.h
+++ b/include/linux/eventpoll.h
@@ -18,7 +18,7 @@ struct file;
 
 #ifdef CONFIG_EPOLL
 
-#ifdef CONFIG_CHECKPOINT_RESTORE
+#ifdef CONFIG_KCMP
 struct file *get_epoll_tfile_raw_ptr(struct file *file, int tfd, unsigned long 
toff);
 #endif
 
diff --git a/init/Kconfig b/init/Kconfig
index b77c60f8b963..9cc7436b2f73 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1194,6 +1194,7 @@ endif # NAMESPACES
 config CHECKPOINT_RESTORE
bool "Checkpoint/restore support"
select PROC_CHILDREN
+   select KCMP
default n
help
  Enables additional kernel features in a sake of checkpoint/restore.
@@ -1737,6 +1738,16 @@ config ARCH_HAS_MEMBARRIER_CALLBACKS
 config ARCH_HAS_MEMBARRIER_SYNC_CORE
bool
 
+config KCMP
+   bool "Enable kcmp() system call" if EXPERT
+   help
+ Enable the kernel resource comparison system call. It provides
+ user-space with the ability to compare two processes to see if they
+ share a common resource, such as a file descriptor or even virtual
+ memory space.
+
+ If unsure, say N.
+
 config RSEQ
bool "Enable rseq() system call" if EXPERT
default y
diff --git a/kernel/Makefile b/kernel/Makefile
index aa7368c7eabf..320f1f3941b7 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -51,7 +51,7 @@ obj-y += livepatch/
 obj-y += dma/
 obj-y += entry/
 
-obj-$(CONFIG_CHECKPOINT_RESTORE) += kcmp.o
+obj-$(CONFIG_KCMP) += kcmp.o
 obj-$(CONFIG_FREEZER) += freezer.o
 obj-$(CONFIG_PROFILING) += profile.o
 obj-$(CONFIG_STACKTRACE) += stacktrace.o
diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c 
b/tools/testing/selftests/seccomp/seccomp_bpf.c
index 26c72f2b61b1..1b6c7d33c4ff 100644
--- a/tools/testing/selftests/seccomp/seccomp_bpf.c
+++ b/tools/testing/selftests/seccomp/seccomp_bpf.c
@@ -315,7 +315,7 @@ TEST(kcmp)
ret = __filecmp(getpid(), getpid(), 1, 1);
EXPECT_EQ(ret, 0);

Re: [PATCH v2] kernel: Expose SYS_kcmp by default

2021-02-05 Thread Chris Wilson
Quoting Kees Cook (2021-02-05 21:20:33)
> On Fri, Feb 05, 2021 at 09:16:01PM +0000, Chris Wilson wrote:
> > The subject should of course be changed, as it is no longer being
> > enabled by default.
> 
> "default n" is redundant.

I thought being explicit would be preferred. There are a few other
default n, so at least it's not the odd-one-out!

> I thought Daniel said CONFIG_DRM needed to
> "select" it too, though?

Yes. We will need to select it for any DRM driver so that the Vulkan/GL
stacks can rely on having SYS_kcmp. That deserves to be handled and
explain within drm/Kconfig, and as they are already shipping with calls
to SYS_kcmp we may have to ask for a stable backport.

> Otherwise, yeah, this looks good. Was the
> export due to the 0-day bot failure reports?

Yes.
-Chris


[PATCH] kernel: Expose SYS_kcmp by default

2021-02-05 Thread Chris Wilson
Userspace has discovered the functionality offered by SYS_kcmp and has
started to depend upon it. In particular, Mesa uses SYS_kcmp for
os_same_file_description() in order to identify when two fd (e.g. device
or dmabuf) point to the same struct file. Since they depend on it for
core functionality, lift SYS_kcmp out of the non-default
CONFIG_CHECKPOINT_RESTORE into the selectable syscall category.

Signed-off-by: Chris Wilson 
Cc: Kees Cook 
Cc: Andy Lutomirski 
Cc: Will Drewry 
Cc: Andrew Morton 
Cc: Dave Airlie 
Cc: Daniel Vetter 
Cc: Lucas Stach 
---
 init/Kconfig  | 11 +++
 kernel/Makefile   |  2 +-
 tools/testing/selftests/seccomp/seccomp_bpf.c |  2 +-
 3 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/init/Kconfig b/init/Kconfig
index b77c60f8b963..f62fca13ac5b 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1194,6 +1194,7 @@ endif # NAMESPACES
 config CHECKPOINT_RESTORE
bool "Checkpoint/restore support"
select PROC_CHILDREN
+   select KCMP
default n
help
  Enables additional kernel features in a sake of checkpoint/restore.
@@ -1737,6 +1738,16 @@ config ARCH_HAS_MEMBARRIER_CALLBACKS
 config ARCH_HAS_MEMBARRIER_SYNC_CORE
bool
 
+config KCMP
+   bool "Enable kcmp() system call" if EXPERT
+   default y
+   help
+ Enable the file descriptor comparison system call. It provides
+ user-space with the ability to compare two fd to see if they
+ point to the same file, and check other attributes.
+
+ If unsure, say Y.
+
 config RSEQ
bool "Enable rseq() system call" if EXPERT
default y
diff --git a/kernel/Makefile b/kernel/Makefile
index aa7368c7eabf..320f1f3941b7 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -51,7 +51,7 @@ obj-y += livepatch/
 obj-y += dma/
 obj-y += entry/
 
-obj-$(CONFIG_CHECKPOINT_RESTORE) += kcmp.o
+obj-$(CONFIG_KCMP) += kcmp.o
 obj-$(CONFIG_FREEZER) += freezer.o
 obj-$(CONFIG_PROFILING) += profile.o
 obj-$(CONFIG_STACKTRACE) += stacktrace.o
diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c 
b/tools/testing/selftests/seccomp/seccomp_bpf.c
index 26c72f2b61b1..1b6c7d33c4ff 100644
--- a/tools/testing/selftests/seccomp/seccomp_bpf.c
+++ b/tools/testing/selftests/seccomp/seccomp_bpf.c
@@ -315,7 +315,7 @@ TEST(kcmp)
ret = __filecmp(getpid(), getpid(), 1, 1);
EXPECT_EQ(ret, 0);
if (ret != 0 && errno == ENOSYS)
-   SKIP(return, "Kernel does not support kcmp() (missing 
CONFIG_CHECKPOINT_RESTORE?)");
+   SKIP(return, "Kernel does not support kcmp() (missing 
CONFIG_KCMP?)");
 }
 
 TEST(mode_strict_support)
-- 
2.20.1



Re: [PATCH v2] kernel: Expose SYS_kcmp by default

2021-02-05 Thread Chris Wilson
The subject should of course be changed, as it is no longer being
enabled by default.

Something like

kcmp: Support selection of SYS_kcmp without CHECKPOINT_RESTORE

Quoting Chris Wilson (2021-02-05 21:06:10)
> Userspace has discovered the functionality offered by SYS_kcmp and has
> started to depend upon it. In particular, Mesa uses SYS_kcmp for
> os_same_file_description() in order to identify when two fd (e.g. device
> or dmabuf) point to the same struct file. Since they depend on it for
> core functionality, lift SYS_kcmp out of the non-default
> CONFIG_CHECKPOINT_RESTORE into the selectable syscall category.
> 
> Note that some distributions such as Ubuntu are already enabling
> CHECKPOINT_RESTORE in their configs and so, by extension, SYS_kcmp.
> 
> References: https://gitlab.freedesktop.org/drm/intel/-/issues/3046
> Signed-off-by: Chris Wilson 
> Cc: Kees Cook 
> Cc: Andy Lutomirski 
> Cc: Will Drewry 
> Cc: Andrew Morton 
> Cc: Dave Airlie 
> Cc: Daniel Vetter 
> Cc: Lucas Stach 
> Acked-by: Daniel Vetter  # DRM depends on SYS_kcmp
> 
> ---
> v2:
>   - Default n.
>   - Borrrow help message from man kcmp.
>   - Export get_epoll_tfile_raw_ptr() for CONFIG_KCMP
> ---
>  fs/eventpoll.c|  4 ++--
>  include/linux/eventpoll.h |  2 +-
>  init/Kconfig  | 12 
>  kernel/Makefile   |  2 +-
>  tools/testing/selftests/seccomp/seccomp_bpf.c |  2 +-
>  5 files changed, 17 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/eventpoll.c b/fs/eventpoll.c
> index a829af074eb5..3196474cbe24 100644
> --- a/fs/eventpoll.c
> +++ b/fs/eventpoll.c
> @@ -979,7 +979,7 @@ static struct epitem *ep_find(struct eventpoll *ep, 
> struct file *file, int fd)
> return epir;
>  }
>  
> -#ifdef CONFIG_CHECKPOINT_RESTORE
> +#ifdef CONFIG_KCMP
>  static struct epitem *ep_find_tfd(struct eventpoll *ep, int tfd, unsigned 
> long toff)
>  {
> struct rb_node *rbp;
> @@ -1021,7 +1021,7 @@ struct file *get_epoll_tfile_raw_ptr(struct file *file, 
> int tfd,
>  
> return file_raw;
>  }
> -#endif /* CONFIG_CHECKPOINT_RESTORE */
> +#endif /* CONFIG_KCMP */
>  
>  /**
>   * Adds a new entry to the tail of the list in a lockless way, i.e.
> diff --git a/include/linux/eventpoll.h b/include/linux/eventpoll.h
> index 0350393465d4..593322c946e6 100644
> --- a/include/linux/eventpoll.h
> +++ b/include/linux/eventpoll.h
> @@ -18,7 +18,7 @@ struct file;
>  
>  #ifdef CONFIG_EPOLL
>  
> -#ifdef CONFIG_CHECKPOINT_RESTORE
> +#ifdef CONFIG_KCMP
>  struct file *get_epoll_tfile_raw_ptr(struct file *file, int tfd, unsigned 
> long toff);
>  #endif
>  
> diff --git a/init/Kconfig b/init/Kconfig
> index b77c60f8b963..1b75141bc18b 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -1194,6 +1194,7 @@ endif # NAMESPACES
>  config CHECKPOINT_RESTORE
> bool "Checkpoint/restore support"
> select PROC_CHILDREN
> +   select KCMP
> default n
> help
>   Enables additional kernel features in a sake of checkpoint/restore.
> @@ -1737,6 +1738,17 @@ config ARCH_HAS_MEMBARRIER_CALLBACKS
>  config ARCH_HAS_MEMBARRIER_SYNC_CORE
> bool
>  
> +config KCMP
> +   bool "Enable kcmp() system call" if EXPERT
> +   default n
> +   help
> + Enable the kernel resource comparison system call. It provides
> + user-space with the ability to compare two processes to see if they
> + share a common resource, such as a file descriptor or even virtual
> + memory space.
> +
> + If unsure, say N.
> +
>  config RSEQ
> bool "Enable rseq() system call" if EXPERT
> default y
> diff --git a/kernel/Makefile b/kernel/Makefile
> index aa7368c7eabf..320f1f3941b7 100644
> --- a/kernel/Makefile
> +++ b/kernel/Makefile
> @@ -51,7 +51,7 @@ obj-y += livepatch/
>  obj-y += dma/
>  obj-y += entry/
>  
> -obj-$(CONFIG_CHECKPOINT_RESTORE) += kcmp.o
> +obj-$(CONFIG_KCMP) += kcmp.o
>  obj-$(CONFIG_FREEZER) += freezer.o
>  obj-$(CONFIG_PROFILING) += profile.o
>  obj-$(CONFIG_STACKTRACE) += stacktrace.o
> diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c 
> b/tools/testing/selftests/seccomp/seccomp_bpf.c
> index 26c72f2b61b1..1b6c7d33c4ff 100644
> --- a/tools/testing/selftests/seccomp/seccomp_bpf.c
> +++ b/tools/testing/selftests/seccomp/seccomp_bpf.c
> @@ -315,7 +315,7 @@ TEST(kcmp)
> ret = __filecmp(getpid(), getpid(), 1, 1);
> EXPECT_EQ(ret, 0);
> if (ret != 0 && errno == ENOSYS)
> -   SKIP(return, "Kernel does not support kcmp() (missing 
> CONFIG_CHECKPOINT_RESTORE?)");
> +   SKIP(return, "Kernel does not support kcmp() (missing 
> CONFIG_KCMP?)");
>  }
>  
>  TEST(mode_strict_support)
> -- 
> 2.20.1
>


[PATCH v2] kernel: Expose SYS_kcmp by default

2021-02-05 Thread Chris Wilson
Userspace has discovered the functionality offered by SYS_kcmp and has
started to depend upon it. In particular, Mesa uses SYS_kcmp for
os_same_file_description() in order to identify when two fd (e.g. device
or dmabuf) point to the same struct file. Since they depend on it for
core functionality, lift SYS_kcmp out of the non-default
CONFIG_CHECKPOINT_RESTORE into the selectable syscall category.

Note that some distributions such as Ubuntu are already enabling
CHECKPOINT_RESTORE in their configs and so, by extension, SYS_kcmp.

References: https://gitlab.freedesktop.org/drm/intel/-/issues/3046
Signed-off-by: Chris Wilson 
Cc: Kees Cook 
Cc: Andy Lutomirski 
Cc: Will Drewry 
Cc: Andrew Morton 
Cc: Dave Airlie 
Cc: Daniel Vetter 
Cc: Lucas Stach 
Acked-by: Daniel Vetter  # DRM depends on SYS_kcmp

---
v2:
  - Default n.
  - Borrrow help message from man kcmp.
  - Export get_epoll_tfile_raw_ptr() for CONFIG_KCMP
---
 fs/eventpoll.c|  4 ++--
 include/linux/eventpoll.h |  2 +-
 init/Kconfig  | 12 
 kernel/Makefile   |  2 +-
 tools/testing/selftests/seccomp/seccomp_bpf.c |  2 +-
 5 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index a829af074eb5..3196474cbe24 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -979,7 +979,7 @@ static struct epitem *ep_find(struct eventpoll *ep, struct 
file *file, int fd)
return epir;
 }
 
-#ifdef CONFIG_CHECKPOINT_RESTORE
+#ifdef CONFIG_KCMP
 static struct epitem *ep_find_tfd(struct eventpoll *ep, int tfd, unsigned long 
toff)
 {
struct rb_node *rbp;
@@ -1021,7 +1021,7 @@ struct file *get_epoll_tfile_raw_ptr(struct file *file, 
int tfd,
 
return file_raw;
 }
-#endif /* CONFIG_CHECKPOINT_RESTORE */
+#endif /* CONFIG_KCMP */
 
 /**
  * Adds a new entry to the tail of the list in a lockless way, i.e.
diff --git a/include/linux/eventpoll.h b/include/linux/eventpoll.h
index 0350393465d4..593322c946e6 100644
--- a/include/linux/eventpoll.h
+++ b/include/linux/eventpoll.h
@@ -18,7 +18,7 @@ struct file;
 
 #ifdef CONFIG_EPOLL
 
-#ifdef CONFIG_CHECKPOINT_RESTORE
+#ifdef CONFIG_KCMP
 struct file *get_epoll_tfile_raw_ptr(struct file *file, int tfd, unsigned long 
toff);
 #endif
 
diff --git a/init/Kconfig b/init/Kconfig
index b77c60f8b963..1b75141bc18b 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1194,6 +1194,7 @@ endif # NAMESPACES
 config CHECKPOINT_RESTORE
bool "Checkpoint/restore support"
select PROC_CHILDREN
+   select KCMP
default n
help
  Enables additional kernel features in a sake of checkpoint/restore.
@@ -1737,6 +1738,17 @@ config ARCH_HAS_MEMBARRIER_CALLBACKS
 config ARCH_HAS_MEMBARRIER_SYNC_CORE
bool
 
+config KCMP
+   bool "Enable kcmp() system call" if EXPERT
+   default n
+   help
+ Enable the kernel resource comparison system call. It provides
+ user-space with the ability to compare two processes to see if they
+ share a common resource, such as a file descriptor or even virtual
+ memory space.
+
+ If unsure, say N.
+
 config RSEQ
bool "Enable rseq() system call" if EXPERT
default y
diff --git a/kernel/Makefile b/kernel/Makefile
index aa7368c7eabf..320f1f3941b7 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -51,7 +51,7 @@ obj-y += livepatch/
 obj-y += dma/
 obj-y += entry/
 
-obj-$(CONFIG_CHECKPOINT_RESTORE) += kcmp.o
+obj-$(CONFIG_KCMP) += kcmp.o
 obj-$(CONFIG_FREEZER) += freezer.o
 obj-$(CONFIG_PROFILING) += profile.o
 obj-$(CONFIG_STACKTRACE) += stacktrace.o
diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c 
b/tools/testing/selftests/seccomp/seccomp_bpf.c
index 26c72f2b61b1..1b6c7d33c4ff 100644
--- a/tools/testing/selftests/seccomp/seccomp_bpf.c
+++ b/tools/testing/selftests/seccomp/seccomp_bpf.c
@@ -315,7 +315,7 @@ TEST(kcmp)
ret = __filecmp(getpid(), getpid(), 1, 1);
EXPECT_EQ(ret, 0);
if (ret != 0 && errno == ENOSYS)
-   SKIP(return, "Kernel does not support kcmp() (missing 
CONFIG_CHECKPOINT_RESTORE?)");
+   SKIP(return, "Kernel does not support kcmp() (missing 
CONFIG_KCMP?)");
 }
 
 TEST(mode_strict_support)
-- 
2.20.1



Re: [Intel-gfx] v5.11-rc5 BUG kmalloc-1k (Not tainted): Redzone overwritten

2021-02-01 Thread Chris Wilson
Quoting Jani Nikula (2021-01-28 13:23:48)
> 
> A number of our CI systems are hitting redzone overwritten errors after
> s2idle, with the errors introduced between v5.11-rc4 and v5.11-rc5. See
> snippet below, full logs for one affected machine at [1].
> 
> Known issue?

Fwiw, I think this should be fixed by

commit 08d60e5999540110576e7c1346d486220751b7f9
Author: John Ogness 
Date:   Sun Jan 24 21:33:28 2021 +0106

printk: fix string termination for record_print_text()

Commit f0e386ee0c0b ("printk: fix buffer overflow potential for
print_text()") added string termination in record_print_text().
However it used the wrong base pointer for adding the terminator.
This led to a 0-byte being written somewhere beyond the buffer.

Use the correct base pointer when adding the terminator.

Fixes: f0e386ee0c0b ("printk: fix buffer overflow potential for 
print_text()")
Reported-by: Sven Schnelle 
Signed-off-by: John Ogness 
Signed-off-by: Petr Mladek 
Link: 
https://lore.kernel.org/r/20210124202728.4718-1-john.ogn...@linutronix.de

din should be rolled forward, but there's yet another regression in rc6
breaking suspend on all machines.
-Chris


Re: [PATCH] drm/i915: Remove unreachable code

2021-01-30 Thread Chris Wilson
Quoting Vinicius Tinti (2021-01-30 12:34:11)
> On Fri, Jan 29, 2021 at 08:55:54PM +0000, Chris Wilson wrote:
> > Quoting Vinicius Tinti (2021-01-29 18:15:19)
> > > By enabling -Wunreachable-code-aggressive on Clang the following code
> > > paths are unreachable.
> > 
> > That code exists as commentary and, especially for sdvo, library
> > functions that we may need in future.
> 
> I would argue that this code could be removed since it is in git history.
> It can be restored when needed.
> 
> This will make the code cleaner.

It doesn't change the control flow, so no complexity argument. It
removes documentation from the code, so I have the opposite opinion.

> > The ivb-gt1 case => as we now set the gt level for ivb, should we not
> > enable the optimisation for ivb unaffected by the w/a? Just no one has
> > taken the time to see if it causes a regression.
> 
> I don't know. I just found out that the code is unreachable.
> 
> > For error state, the question remains whether we should revert to
> > uncompressed data if the compressed stream is larger than the original.
> 
> I don't know too.
> 
> In this last two cases the code could be commented and the decisions
> and problems explained in the comment section.

They already are, that is the point.
-Chris


Re: [PATCH] drm/i915: Remove unreachable code

2021-01-29 Thread Chris Wilson
Quoting Vinicius Tinti (2021-01-29 18:15:19)
> By enabling -Wunreachable-code-aggressive on Clang the following code
> paths are unreachable.

That code exists as commentary and, especially for sdvo, library
functions that we may need in future.

The ivb-gt1 case => as we now set the gt level for ivb, should we not
enable the optimisation for ivb unaffected by the w/a? Just no one has
taken the time to see if it causes a regression.

For error state, the question remains whether we should revert to
uncompressed data if the compressed stream is larger than the original.
-Chris


Re: [Intel-gfx] linux-next: Tree for Jan 27 (drm/i915)

2021-01-27 Thread Chris Wilson
Quoting Randy Dunlap (2021-01-27 20:28:05)
> On 1/27/21 11:30 AM, Randy Dunlap wrote:
> > On 1/27/21 11:08 AM, Randy Dunlap wrote:
> >> On 1/27/21 6:44 AM, Stephen Rothwell wrote:
> >>> Hi all,
> >>>
> >>> Note: the patch file has failed to upload :-(
> >>>
> >>> Changes since 20210125:
> >>>
> >>
> >> on x86_64:
> >>
> >> ../drivers/gpu/drm/i915/i915_gem.c: In function ‘i915_gem_freeze_late’:
> >> ../drivers/gpu/drm/i915/i915_gem.c:1182:2: error: implicit declaration of 
> >> function ‘wbinvd_on_all_cpus’; did you mean ‘wrmsr_on_cpus’? 
> >> [-Werror=implicit-function-declaration]
> >>   wbinvd_on_all_cpus();
> >>
> > 
> > My apologies: this was in Andrew's mmotm 2021-01-25.
> > Sorry about that.
> 
> 
> and now I see that it also happens in today's linux-next.

The fix is in the tree that should be feeding into linux-next, so I
trust it will resolve itself shortly. Along with the WERROR depends
snafu.
-Chris


Re: Linux 5.11-rc5

2021-01-26 Thread Chris Wilson
Quoting Chris Wilson (2021-01-25 21:46:19)
> Quoting Mike Rapoport (2021-01-25 21:33:48)
> > On Mon, Jan 25, 2021 at 12:49:39PM -0800, Linus Torvalds wrote:
> > > Mike: should we perhaps revert the first patch too (commit
> > > bde9cfa3afe4: "x86/setup: don't remove E820_TYPE_RAM for pfn 0")?
> > 
> > I wonder, maybe actually this one is causing troubles?
> > 
> > Chris, would it be possible to check what happens if you revert only
> > bde9cfa3afe4?
> 
> Queued for CI, will be run in about an hour.

I ran just the revert of bde9cfa3afe4 through CI twice, on both occasions
all machines failed to boot. 
-Chris


Re: [PATCH v2] drm/i915/gt: use new tasklet API for execution list

2021-01-26 Thread Chris Wilson
Quoting Emil Renner Berthing (2021-01-26 15:01:55)
> This converts the driver to use the new tasklet API introduced in
> commit 12cc923f1ccc ("tasklet: Introduce new initialization API")
> 
> Signed-off-by: Emil Renner Berthing 
> 
> ---
> v2: Rebased on drm-intel-next

Ta. Saves me having to do the fixup.

Reviewed-by: Chris Wilson 

Will be applied to drm-intel-gt-next which is scheduled for inclusion in
5.13. It should apply against the 5.12 merge window if there's a tree
through which you want to migrate the tasklet API faster.
-Chris


Re: [PATCH] i915: Fix DRM_I915_WERROR dependencies

2021-01-25 Thread Chris Wilson
Quoting Arnd Bergmann (2021-01-25 12:26:44)
> From: Arnd Bergmann 
> 
> CONFIG_DRM_I915_DEBUG now selects CONFIG_DRM_I915_WERROR, but fails
> to honor its dependencies:
> 
> WARNING: unmet direct dependencies detected for DRM_I915_WERROR
>   Depends on [n]: HAS_IOMEM [=y] && DRM_I915 [=m] && EXPERT [=y] && 
> !COMPILE_TEST [=y]
>   Selected by [m]:
>   - DRM_I915_DEBUG [=y] && HAS_IOMEM [=y] && EXPERT [=y] && DRM_I915 [=m]

DRM_I915_DEBUG now depends on !COMPILE_TEST and EXPERT.
-Chris


Re: Linux 5.11-rc5

2021-01-25 Thread Chris Wilson
Quoting Mike Rapoport (2021-01-25 21:33:48)
> On Mon, Jan 25, 2021 at 12:49:39PM -0800, Linus Torvalds wrote:
> > On Mon, Jan 25, 2021 at 12:35 PM Chris Wilson  
> > wrote:
> > >
> > > Quoting Linus Torvalds (2021-01-25 01:06:40)
> > > > Mike Rapoport (3):
> > > ...
> > > >   mm: fix initialization of struct page for holes in memory layout
> > >
> > > We have half a dozen or so different machines in CI that are silently
> > > failing to boot, that we believe is bisected to this patch.
> > 
> > That commit reverts cleanly - so if you can verify that reverting it
> > fixes your CI machines, I think that that's the right thing to do for
> > now, unless Mike can figure out some obvious "Duh!" moment from your
> > working dmesg.
> 
> Unfortunately not, at least at 11pm :(
> Maybe tomorrow I'll have something smarter to say.

CI does confirm that the revert of d3921cb8be29 brings the machines back
to life.
  
> > Mike: should we perhaps revert the first patch too (commit
> > bde9cfa3afe4: "x86/setup: don't remove E820_TYPE_RAM for pfn 0")?
> 
> I wonder, maybe actually this one is causing troubles?
> 
> Chris, would it be possible to check what happens if you revert only
> bde9cfa3afe4?

Queued for CI, will be run in about an hour.
-Chris


Re: Linux 5.11-rc5

2021-01-25 Thread Chris Wilson
Quoting Mike Rapoport (2021-01-25 21:04:56)
> On Mon, Jan 25, 2021 at 08:34:34PM +0000, Chris Wilson wrote:
> > Quoting Linus Torvalds (2021-01-25 01:06:40)
> > > Mike Rapoport (3):
> > ...
> > >   mm: fix initialization of struct page for holes in memory layout
> > 
> > We have half a dozen or so different machines in CI that are silently
> > failing to boot, that we believe is bisected to this patch.
> > 
> > 17:56  tsa : ickle: dolphin: I hit the following patch in my 
> > bisection, and the hang is also dependent on kconfig
> > 17:56  tsa : first bad commit: 
> > [d3921cb8be29ce5668c64e23ffdaeec5f8c69399] mm: fix initialization of struct 
> > page for holes in
> >  memory layout
> > 17:57  tsa : couldn't reproduce on older CI kconfig, current 
> > one does it
> >  
> > https://gitlab.freedesktop.org/gfx-ci/i915-infra/-/blob/master/kconfig/debug
> > 
> > Here's a boot dmesg from some affected machines from just before the merge
> > with rc5:
> > https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9676/shard-skl1/boot18.txt
> > https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9676/fi-skl-6600u/boot.html
> > https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9676/fi-bsw-cyan/boot.html
> > https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9676/fi-bdw-samus/boot.html
> 
> Is there any way to get early console from these machines?

12:16 tsa : none of those have good hook for serial

Nothing on the console and no serial console option, and panics before
netconsole.

Maybe some early_printk and boot_delay if you think there's something to
see with those, but I'll have to ask Tomi nicely tomorrow.
-Chris


Re: Linux 5.11-rc5

2021-01-25 Thread Chris Wilson
Quoting Linus Torvalds (2021-01-25 01:06:40)
> Mike Rapoport (3):
...
>   mm: fix initialization of struct page for holes in memory layout

We have half a dozen or so different machines in CI that are silently
failing to boot, that we believe is bisected to this patch.

17:56  tsa : ickle: dolphin: I hit the following patch in my 
bisection, and the hang is also dependent on kconfig
17:56  tsa : first bad commit: 
[d3921cb8be29ce5668c64e23ffdaeec5f8c69399] mm: fix initialization of struct 
page for holes in
 memory layout
17:57  tsa : couldn't reproduce on older CI kconfig, current one 
does it
 
https://gitlab.freedesktop.org/gfx-ci/i915-infra/-/blob/master/kconfig/debug

Here's a boot dmesg from some affected machines from just before the merge
with rc5:
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9676/shard-skl1/boot18.txt
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9676/fi-skl-6600u/boot.html
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9676/fi-bsw-cyan/boot.html
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9676/fi-bdw-samus/boot.html
-Chris


Re: [PATCH] drm/i915/gem: fix non-SMP build failure

2021-01-25 Thread Chris Wilson
Quoting Arnd Bergmann (2021-01-25 12:25:34)
> From: Arnd Bergmann 
> 
> The x86-specific wbinvd_on_all_cpus() function is exported
> through asm/smp.h, causing a build failure in the i915 driver
> when SMP is disabled:
> 
> drivers/gpu/drm/i915/i915_gem.c:1182:2: error: implicit declaration of 
> function 'wbinvd_on_all_cpus' [-Werror,-Wimplicit-function-declaration]

I thought the code was already in i915_gem_pm.c (which included smp.h);
it is now.
-Chris


Re: [PATCH] drm/i915/gvt: fix uninitialized return in intel_gvt_update_reg_whitelist()

2021-01-25 Thread Chris Wilson
Quoting Dan Carpenter (2021-01-25 08:48:30)
> Smatch found an uninitialized variable bug in this code:
> 
> drivers/gpu/drm/i915/gvt/cmd_parser.c:3191 
> intel_gvt_update_reg_whitelist()
> error: uninitialized symbol 'ret'.
> 
> The first thing that Smatch complains about is that "ret" isn't set if
> we don't enter the "for_each_engine(engine, _priv->gt, id) {" loop.
> Presumably we always have at least one engine so that's a false
> positive.
> 
> But it's definitely a bug to not set "ret" if i915_gem_object_pin_map()
> fails.

True.
 
> Let's fix the bug and silence the false positive.
> 
> Fixes: 493f30cd086e ("drm/i915/gvt: parse init context to update cmd 
> accessible reg whitelist")
> Signed-off-by: Dan Carpenter 
Reviewed-by: Chris Wilson 
-Chris


Re: [PATCH] drm/i915/selftest: Fix potential memory leak

2021-01-22 Thread Chris Wilson
Quoting Pan Bian (2021-01-22 01:56:40)
> Object out is not released on path that no VMA instance found. The root
> cause is jumping to an unexpected label on the error path.

Wouldn't the root cause be whatever caused the allocation to fail?

Language notwithstanding,
Reviewed-by: Chris Wilson 
-Chris


Re: [Intel-gfx] [BUG] on reboot: bisected to: drm/i915: Shut down displays gracefully on reboot

2021-01-14 Thread Chris Wilson
Quoting Steven Rostedt (2021-01-14 21:32:06)
> On reboot, one of my test boxes now triggers the following warning:

057fe3535eb3 ("drm/i915: Disable RPM wakeref assertions during driver shutdown")
is included with the drm-intel-fixes PR.
-Chris


Re: [PATCH] [v2] i915: fix shift warning

2021-01-03 Thread Chris Wilson
Quoting Arnd Bergmann (2021-01-03 13:51:44)
> From: Arnd Bergmann 
> 
> Randconfig builds on 32-bit machines show lots of warnings for
> the i915 driver for passing a 32-bit value into __const_hweight64():
> 
> drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c:2584:9: error: shift count >= 
> width of type [-Werror,-Wshift-count-overflow]
> return hweight64(VDBOX_MASK(>gt));
>^~~~
> include/asm-generic/bitops/const_hweight.h:29:49: note: expanded from macro 
> 'hweight64'
>  #define hweight64(w) (__builtin_constant_p(w) ? __const_hweight64(w) : 
> __arch_hweight64(w))
> 
> Change it to hweight_long() to avoid the warning.
> 
> Signed-off-by: Arnd Bergmann 
Reviewed-by:: Chris Wilson 
-Chris


Re: [PATCH] drm/i915: remove h from printk format specifier

2020-12-15 Thread Chris Wilson
Quoting t...@redhat.com (2020-12-15 14:41:01)
> From: Tom Rix 
> 
> See Documentation/core-api/printk-formats.rst.
> h should no longer be used in the format specifier for printk.

It's understood by format_decode().
* 'h', 'l', or 'L' for integer fields

At least reference commit cbacb5ab0aa0 ("docs: printk-formats: Stop
encouraging use of unnecessary %h[xudi] and %hh[xudi]") as to why the
printk-formats.rst was altered so we know the code is merely in bad
taste and not using undefined behaviour of printk.
-Chris


Re: [PATCH] drm/i915: remove trailing semicolon in macro definition

2020-11-27 Thread Chris Wilson
Quoting t...@redhat.com (2020-11-27 16:28:28)
> From: Tom Rix 
> 
> The macro use will already have a semicolon.
> 
> Signed-off-by: Tom Rix 
> ---
>  drivers/gpu/drm/i915/intel_device_info.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_device_info.c 
> b/drivers/gpu/drm/i915/intel_device_info.c
> index e67cec8fa2aa..ef767f04c37c 100644
> --- a/drivers/gpu/drm/i915/intel_device_info.c
> +++ b/drivers/gpu/drm/i915/intel_device_info.c
> @@ -104,7 +104,7 @@ void intel_device_info_print_static(const struct 
> intel_device_info *info,
> drm_printf(p, "ppgtt-type: %d\n", info->ppgtt_type);
> drm_printf(p, "dma_mask_size: %u\n", info->dma_mask_size);
>  
> -#define PRINT_FLAG(name) drm_printf(p, "%s: %s\n", #name, yesno(info->name));
> +#define PRINT_FLAG(name) drm_printf(p, "%s: %s\n", #name, yesno(info->name))
> DEV_INFO_FOR_EACH_FLAG(PRINT_FLAG);

I thought that this was a macro that avoided adding the ';' to each
invocation. Perhaps another time.

Reviewed-by: Chris Wilson 
-Chris


Re: [Intel-gfx] [drm/i915/gem] 59dd13ad31: phoronix-test-suite.jxrendermark.RadialGradientPaint.1024x1024.operations_per_second -54.0% regression

2020-11-27 Thread Chris Wilson
Quoting Xing Zhengjun (2020-11-27 01:51:41)
> 
> 
> On 11/27/2020 5:34 AM, Chris Wilson wrote:
> > Quoting Xing Zhengjun (2020-11-26 01:44:55)
> >>
> >>
> >> On 11/25/2020 4:47 AM, Chris Wilson wrote:
> >>> Quoting Oliver Sang (2020-11-19 07:20:18)
> >>>> On Fri, Nov 13, 2020 at 04:27:13PM +0200, Joonas Lahtinen wrote:
> >>>>> Hi,
> >>>>>
> >>>>> Could you add intel-...@lists.freedesktop.org into reports going
> >>>>> forward.
> >>>>>
> >>>>> Quoting kernel test robot (2020-11-11 17:58:11)
> >>>>>>
> >>>>>> Greeting,
> >>>>>>
> >>>>>> FYI, we noticed a -54.0% regression of 
> >>>>>> phoronix-test-suite.jxrendermark.RadialGradientPaint.1024x1024.operations_per_second
> >>>>>>  due to commit:
> >>>>>
> >>>>> How many runs are there on the bad version to ensure the bisect is
> >>>>> repeatable?
> >>>>
> >>>> test 4 times.
> >>>> zxing@inn:/result/phoronix-test-suite/performance-true-Radial_Gradient_Paint-1024x1024-jxrendermark-1.2.4-ucode=0xd6-monitor=da39a3ee/lkp-cfl-d1/debian-x86_64-phoronix/x86_64-rhel-8.3/gcc-9/59dd13ad310793757e34afa489dd6fc8544fc3da$
> >>>>  grep -r "operations_per_second" */stats.json
> >>>> 0/stats.json: 
> >>>> "phoronix-test-suite.jxrendermark.RadialGradientPaint.1024x1024.operations_per_second":
> >>>>  4133.487932,
> >>>> 1/stats.json: 
> >>>> "phoronix-test-suite.jxrendermark.RadialGradientPaint.1024x1024.operations_per_second":
> >>>>  4120.421503,
> >>>> 2/stats.json: 
> >>>> "phoronix-test-suite.jxrendermark.RadialGradientPaint.1024x1024.operations_per_second":
> >>>>  4188.414835,
> >>>> 3/stats.json: 
> >>>> "phoronix-test-suite.jxrendermark.RadialGradientPaint.1024x1024.operations_per_second":
> >>>>  4068.549514,
> >>>
> >>> a w/o revert (drm-tip)
> >>> b w/ revert
> >>> +mB+
> >>> | ..b 
> >>>  |
> >>> | ..b.aa  
> >>>  |
> >>> | a.a 
> >>>  |
> >>> | a.a 
> >>>  |
> >>> |  b  b  a
> >>>  |
> >>> |   b  b  b b. a  
> >>>  |
> >>> |   b  bb bbb...  
> >>>  |
> >>> |b   ab bbab.bb.bba b a aab   
> >>> a|
> >>> | |__A__| 
> >>>  |
> >>> | |MA_|   
> >>>  |
> >>> +--+
> >>>   NMin   MaxMedian   Avg  
> >>>   Stddev
> >>> a 120  3621.8761 7356.4442 4606.7895 4607.9132 
> >>> 156.17693
> >>> b 120  2664.0563 6359.9686 4519.5036 4534.4463 
> >>> 95.471121
> >>>
> >>> The patch is not expected to have any impact on the machine you are 
> >>> testing on.
> >>> -Chris
> >>>
> >>
> >> What's your code base?
> >> For my side:
> >> 1) sync the code to the head of Linux mainline
> >> 2) git reset --hard 59dd13ad31
> >> 3) git revert 59dd13ad3107
> >> We compare the test result of commit 59dd13ad3107 (step 2) and
> >> 2052847b06f8 (step 3, revert 59dd13ad3107), the regression should
> >> related with 59dd13ad3107. Each test case we run 5 times.
> > 
> > a 59dd13ad31
> > b revert
> > +mB+
> > |a  

Re: [Intel-gfx] [drm/i915/gem] 59dd13ad31: phoronix-test-suite.jxrendermark.RadialGradientPaint.1024x1024.operations_per_second -54.0% regression

2020-11-26 Thread Chris Wilson
Quoting Xing Zhengjun (2020-11-26 01:44:55)
> 
> 
> On 11/25/2020 4:47 AM, Chris Wilson wrote:
> > Quoting Oliver Sang (2020-11-19 07:20:18)
> >> On Fri, Nov 13, 2020 at 04:27:13PM +0200, Joonas Lahtinen wrote:
> >>> Hi,
> >>>
> >>> Could you add intel-...@lists.freedesktop.org into reports going
> >>> forward.
> >>>
> >>> Quoting kernel test robot (2020-11-11 17:58:11)
> >>>>
> >>>> Greeting,
> >>>>
> >>>> FYI, we noticed a -54.0% regression of 
> >>>> phoronix-test-suite.jxrendermark.RadialGradientPaint.1024x1024.operations_per_second
> >>>>  due to commit:
> >>>
> >>> How many runs are there on the bad version to ensure the bisect is
> >>> repeatable?
> >>
> >> test 4 times.
> >> zxing@inn:/result/phoronix-test-suite/performance-true-Radial_Gradient_Paint-1024x1024-jxrendermark-1.2.4-ucode=0xd6-monitor=da39a3ee/lkp-cfl-d1/debian-x86_64-phoronix/x86_64-rhel-8.3/gcc-9/59dd13ad310793757e34afa489dd6fc8544fc3da$
> >>  grep -r "operations_per_second" */stats.json
> >> 0/stats.json: 
> >> "phoronix-test-suite.jxrendermark.RadialGradientPaint.1024x1024.operations_per_second":
> >>  4133.487932,
> >> 1/stats.json: 
> >> "phoronix-test-suite.jxrendermark.RadialGradientPaint.1024x1024.operations_per_second":
> >>  4120.421503,
> >> 2/stats.json: 
> >> "phoronix-test-suite.jxrendermark.RadialGradientPaint.1024x1024.operations_per_second":
> >>  4188.414835,
> >> 3/stats.json: 
> >> "phoronix-test-suite.jxrendermark.RadialGradientPaint.1024x1024.operations_per_second":
> >>  4068.549514,
> > 
> > a w/o revert (drm-tip)
> > b w/ revert
> > +mB+
> > | ..b   
> >|
> > | ..b.aa
> >|
> > | a.a   
> >|
> > | a.a   
> >|
> > |  b  b  a  
> >|
> > |   b  b  b b. a
> >|
> > |   b  bb bbb...
> >|
> > |b   ab bbab.bb.bba b a aab 
> >   a|
> > | |__A__|   
> >|
> > | |MA_| 
> >|
> > +--+
> >  NMin   MaxMedian   Avg
> > Stddev
> > a 120  3621.8761 7356.4442 4606.7895 4607.9132 
> > 156.17693
> > b 120  2664.0563 6359.9686 4519.5036 4534.4463 
> > 95.471121
> > 
> > The patch is not expected to have any impact on the machine you are testing 
> > on.
> > -Chris
> > 
> 
> What's your code base?
> For my side:
> 1) sync the code to the head of Linux mainline
> 2) git reset --hard 59dd13ad31
> 3) git revert 59dd13ad3107
> We compare the test result of commit 59dd13ad3107 (step 2) and 
> 2052847b06f8 (step 3, revert 59dd13ad3107), the regression should 
> related with 59dd13ad3107. Each test case we run 5 times.

a 59dd13ad31
b revert
+mB+
|a |
|   aa |
| .bba |
| .bbaab   |
| .b . b   b   |
|a   b.. ..bb  bb  |
|  b a   b.b.a bb  |
|aa  b..aaa..b.b..bab   b a   .|
|  |__A__| |
|  |___A_| |
+--+
NMin   MaxMedian   AvgStddev
a 120  3658.3435 6363.7812 4527.4406  4536.612 86.095459
b 120  3928.9643  6375.829 4576.0482 4585.4224  157.284


Re: [Intel-gfx] [drm/i915/gem] 59dd13ad31: phoronix-test-suite.jxrendermark.RadialGradientPaint.1024x1024.operations_per_second -54.0% regression

2020-11-24 Thread Chris Wilson
Quoting Oliver Sang (2020-11-19 07:20:18)
> On Fri, Nov 13, 2020 at 04:27:13PM +0200, Joonas Lahtinen wrote:
> > Hi,
> > 
> > Could you add intel-...@lists.freedesktop.org into reports going
> > forward.
> > 
> > Quoting kernel test robot (2020-11-11 17:58:11)
> > > 
> > > Greeting,
> > > 
> > > FYI, we noticed a -54.0% regression of 
> > > phoronix-test-suite.jxrendermark.RadialGradientPaint.1024x1024.operations_per_second
> > >  due to commit:
> > 
> > How many runs are there on the bad version to ensure the bisect is
> > repeatable?
> 
> test 4 times.
> zxing@inn:/result/phoronix-test-suite/performance-true-Radial_Gradient_Paint-1024x1024-jxrendermark-1.2.4-ucode=0xd6-monitor=da39a3ee/lkp-cfl-d1/debian-x86_64-phoronix/x86_64-rhel-8.3/gcc-9/59dd13ad310793757e34afa489dd6fc8544fc3da$
>  grep -r "operations_per_second" */stats.json
> 0/stats.json: 
> "phoronix-test-suite.jxrendermark.RadialGradientPaint.1024x1024.operations_per_second":
>  4133.487932,
> 1/stats.json: 
> "phoronix-test-suite.jxrendermark.RadialGradientPaint.1024x1024.operations_per_second":
>  4120.421503,
> 2/stats.json: 
> "phoronix-test-suite.jxrendermark.RadialGradientPaint.1024x1024.operations_per_second":
>  4188.414835,
> 3/stats.json: 
> "phoronix-test-suite.jxrendermark.RadialGradientPaint.1024x1024.operations_per_second":
>  4068.549514,

a w/o revert (drm-tip)
b w/ revert
+mB+
| ..b  |
| ..b.aa   |
| a.a  |
| a.a  |
|  b  b  a |
|   b  b  b b. a   |
|   b  bb bbb...   |
|b   ab bbab.bb.bba b a aab   a|
| |__A__|  |
| |MA_||
+--+
NMin   MaxMedian   AvgStddev
a 120  3621.8761 7356.4442 4606.7895 4607.9132 156.17693
b 120  2664.0563 6359.9686 4519.5036 4534.4463 95.471121

The patch is not expected to have any impact on the machine you are testing on.
-Chris


Re: [PATCH v5 0/7] Convert the intel iommu driver to the dma-iommu api

2020-11-20 Thread Chris Wilson
Quoting Lu Baolu (2020-11-20 10:17:12)
> Lu Baolu (3):
>   iommu: Add quirk for Intel graphic devices in map_sg
>   iommu/vt-d: Update domain geometry in iommu_ops.at(de)tach_dev
>   iommu/vt-d: Cleanup after converting to dma-iommu ops
> 
> Tom Murphy (4):
>   iommu: Handle freelists when using deferred flushing in iommu drivers
>   iommu: Add iommu_dma_free_cpu_cached_iovas()
>   iommu: Allow the dma-iommu api to use bounce buffers
>   iommu/vt-d: Convert intel iommu driver to the iommu ops

Something that may be of interest is that we encounter problems with
using intel-iommu across a PCI remove event. All HW generations fail
with faults like:

DMAR: DRHD: handling fault status reg 3
DMAR: [DMA Write] Request device [00:02.0] PASID  fault addr 4b822000 
[fault reason 02] Present bit in context entry is clear

i.e. they all report missing present bit after re-adding the device to the
iommu group. Forcing an identity map (or disabling iommu) works fine.

I applied this series just on the off-chance it changed the symptoms; it
does not. If you have any ideas on how to chase down this fault, that
would be very useful. We have a few other DMAR faults visible on many
platforms, all "[fault reason 07] Next page table ptr is invalid" that
are again not affected by this series, that we also need to resolve.
-Chris


Re: Deadlock cpuctx_mutex / pmus_lock / >mmap_lock#2

2020-11-19 Thread Chris Wilson
Quoting Peter Zijlstra (2020-11-19 13:02:44)
> 
> Chris, I suspect this is due to i915 calling stop machine with all sorts
> of locks held. Is there anything to be done about this? stop_machine()
> is really nasty to begin with.
> 
> What problem is it typing to solve?

If there is any concurrent access through a PCI bar (that is exported to
userspace via mmap) as the GTT is updated, results in undefined HW
behaviour (where that is not limited to users writing to other system
pages).

stop_machine() is the most foolproof method we know that works.

This particular cycle is easy to break by moving the copy_to_user to
after releasing perf_event_ctx_unlock in perf_read().
-Chris


Re: drivers/gpu/drm/i915/gem/i915_gem_throttle.c:59 i915_gem_throttle_ioctl() error: double locked 'ctx->engines_mutex' (orig line 59)

2020-11-16 Thread Chris Wilson
Quoting Dan Carpenter (2020-11-16 10:08:38)
> tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
> master
> head:   0062442ecfef0d82cd69e3e600d5006357f8d8e4
> commit: 27a5dcfe73f4b696b3de8c23a560199bb1c193a4 drm/i915/gem: Remove 
> disordered per-file request list for throttling
> config: i386-randconfig-m021-20201115 (attached as .config)
> compiler: gcc-9 (Debian 9.3.0-15) 9.3.0
> 
> If you fix the issue, kindly add following tag as appropriate
> Reported-by: kernel test robot 
> Reported-by: Dan Carpenter 
> 
> smatch warnings:
> drivers/gpu/drm/i915/gem/i915_gem_throttle.c:59 i915_gem_throttle_ioctl() 
> error: double locked 'ctx->engines_mutex' (orig line 59)
> 
> vim +59 drivers/gpu/drm/i915/gem/i915_gem_throttle.c
> 
> 35  int
> 36  i915_gem_throttle_ioctl(struct drm_device *dev, void *data,
> 37  struct drm_file *file)
> 38  {
> 39  const unsigned long recent_enough = jiffies - 
> DRM_I915_THROTTLE_JIFFIES;
> 40  struct drm_i915_file_private *file_priv = file->driver_priv;
> 41  struct i915_gem_context *ctx;
> 42  unsigned long idx;
> 43  long ret;
> 44  
> 45  /* ABI: return -EIO if already wedged */
> 46  ret = intel_gt_terminally_wedged(_i915(dev)->gt);
> 47  if (ret)
> 48  return ret;
> 49  
> 50  rcu_read_lock();
> 51  xa_for_each(_priv->context_xa, idx, ctx) {
> 52  struct i915_gem_engines_iter it;
> 53  struct intel_context *ce;
> 54  
> 55  if (!kref_get_unless_zero(>ref))
> 56  continue;
> 57  rcu_read_unlock();
> 58  
> 59  for_each_gem_engine(ce,
> 60  
> i915_gem_context_lock_engines(ctx),
> ^^
> I don't understand why this takes the lock every iteration through the
> loop

It doesn't.

static inline struct i915_gem_engines *
i915_gem_context_lock_engines(struct i915_gem_context *ctx)
__acquires(>engines_mutex)
{
mutex_lock(>engines_mutex);
return i915_gem_context_engines(ctx);
}

static inline void
i915_gem_context_unlock_engines(struct i915_gem_context *ctx)
__releases(>engines_mutex)
{
mutex_unlock(>engines_mutex);
}

with the i915_gem_engines stored as a local in the iterator at the start
of the for loop.
-Chris


Re: [tip: locking/core] lockdep: Fix usage_traceoverflow

2020-10-28 Thread Chris Wilson
Quoting Chris Wilson (2020-10-28 17:40:48)
> Quoting Chris Wilson (2020-10-27 16:34:53)
> > Quoting Peter Zijlstra (2020-10-27 15:45:33)
> > > On Tue, Oct 27, 2020 at 01:29:10PM +, Chris Wilson wrote:
> > > 
> > > > <4> [304.908891] hm#2, depth: 6 [6], 3425cfea6ff31f7f != 
> > > > 547d92e9ec2ab9af
> > > > <4> [304.908897] WARNING: CPU: 0 PID: 5658 at 
> > > > kernel/locking/lockdep.c:3679 check_chain_key+0x1a4/0x1f0
> > > 
> > > Urgh, I don't think I've _ever_ seen that warning trigger.
> > > 
> > > The comments that go with it suggest memory corruption is the most
> > > likely trigger of it. Is it easy to trigger?
> > 
> > For the automated CI, yes, the few machines that run that particular HW
> > test seem to hit it regularly. I have not yet reproduced it for myself.
> > I thought it looked like something kasan would provide some insight for
> > and we should get a kasan run through CI over the w/e. I suspect we've
> > feed in some garbage and called it a lock.
> 
> I tracked it down to a second invocation of lock_acquire_shared_recursive()
> intermingled with some other regular mutexes (in this case ww_mutex).
> 
> We hit this path in validate_chain():
> /*
>  * Mark recursive read, as we jump over it when
>  * building dependencies (just like we jump over
>  * trylock entries):
>  */
> if (ret == 2)
> hlock->read = 2;
> 
> and that is modifying hlock_id() and so the chain-key, after it has
> already been computed.
> 
> diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
> index 035f81b1cc87..f193f756e1e3 100644
> --- a/kernel/locking/lockdep.c
> +++ b/kernel/locking/lockdep.c
> @@ -4831,7 +4831,7 @@ static int __lock_acquire(struct lockdep_map *lock, 
> unsigned int subclass,
> if (!validate_chain(curr, hlock, chain_head, chain_key))
> return 0;
> 
> -   curr->curr_chain_key = chain_key;
> +   curr->curr_chain_key = iterate_chain_key(chain_key, hlock_id(hlock));
> curr->lockdep_depth++;
> check_chain_key(curr);

No, chain_key should not be chained onto itself again, but I hope it
makes the issue clear nevertheless.
-Chris


Re: [tip: locking/core] lockdep: Fix usage_traceoverflow

2020-10-28 Thread Chris Wilson
Quoting Chris Wilson (2020-10-27 16:34:53)
> Quoting Peter Zijlstra (2020-10-27 15:45:33)
> > On Tue, Oct 27, 2020 at 01:29:10PM +0000, Chris Wilson wrote:
> > 
> > > <4> [304.908891] hm#2, depth: 6 [6], 3425cfea6ff31f7f != 547d92e9ec2ab9af
> > > <4> [304.908897] WARNING: CPU: 0 PID: 5658 at 
> > > kernel/locking/lockdep.c:3679 check_chain_key+0x1a4/0x1f0
> > 
> > Urgh, I don't think I've _ever_ seen that warning trigger.
> > 
> > The comments that go with it suggest memory corruption is the most
> > likely trigger of it. Is it easy to trigger?
> 
> For the automated CI, yes, the few machines that run that particular HW
> test seem to hit it regularly. I have not yet reproduced it for myself.
> I thought it looked like something kasan would provide some insight for
> and we should get a kasan run through CI over the w/e. I suspect we've
> feed in some garbage and called it a lock.

I tracked it down to a second invocation of lock_acquire_shared_recursive()
intermingled with some other regular mutexes (in this case ww_mutex).

We hit this path in validate_chain():
/*
 * Mark recursive read, as we jump over it when
 * building dependencies (just like we jump over
 * trylock entries):
 */
if (ret == 2)
hlock->read = 2;

and that is modifying hlock_id() and so the chain-key, after it has
already been computed.

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 035f81b1cc87..f193f756e1e3 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -4831,7 +4831,7 @@ static int __lock_acquire(struct lockdep_map *lock, 
unsigned int subclass,
if (!validate_chain(curr, hlock, chain_head, chain_key))
return 0;

-   curr->curr_chain_key = chain_key;
+   curr->curr_chain_key = iterate_chain_key(chain_key, hlock_id(hlock));
curr->lockdep_depth++;
check_chain_key(curr);

works as a heavy hammer.
-Chris


Re: linux-next: Signed-off-by missing for commit in the drm-intel-fixes tree

2020-10-28 Thread Chris Wilson
Quoting Stephen Rothwell (2020-10-28 21:28:23)
> Hi all,
> 
> Commit
> 
>   d13208a88f41 ("lockdep: Fix nr_unused_locks")
> 
> is missing a Signed-off-by from its author.
> 
> Also, the author's email name is missing the leading 'P'.

And it shouldn't be in the drm-intel-fixes tree.
-Chris


Re: [tip: locking/core] lockdep: Fix usage_traceoverflow

2020-10-27 Thread Chris Wilson
Quoting Peter Zijlstra (2020-10-27 12:48:34)
> On Tue, Oct 27, 2020 at 01:30:56PM +0100, Peter Zijlstra wrote:
> > This seems to make it happy. Not quite sure that's the best solution.
> > 
> > diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
> > index 3e99dfef8408..81295bc760fe 100644
> > --- a/kernel/locking/lockdep.c
> > +++ b/kernel/locking/lockdep.c
> > @@ -4411,7 +4405,9 @@ static int mark_lock(struct task_struct *curr, struct 
> > held_lock *this,
> >   break;
> >  
> >   case LOCK_USED:
> > - debug_atomic_dec(nr_unused_locks);
> > + case LOCK_USED_READ:
> > + if ((hlock_class(this)->usage_mask & 
> > (LOCKF_USED|LOCKF_USED_READ)) == new_mask)
> > + debug_atomic_dec(nr_unused_locks);
> >   break;
> >  
> >   default:
> 
> This also works, and I think I likes it better.. anyone?
> 
> diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
> index 3e99dfef8408..e603e86c0227 100644
> --- a/kernel/locking/lockdep.c
> +++ b/kernel/locking/lockdep.c
> @@ -4396,6 +4390,9 @@ static int mark_lock(struct task_struct *curr, struct 
> held_lock *this,
> if (unlikely(hlock_class(this)->usage_mask & new_mask))
> goto unlock;
>  
> +   if (!hlock_class(this)->usage_mask)
> +   debug_atomic_dec(nr_unused_locks);
> +

>From an outside perspective, this is much easier for me to match with
the assertion in lockdep_proc.

Our CI confirms this works, and we are just left with the new issue of

<4> [260.903453] hm#2, depth: 6 [6], eb18a85a2df37d3d != a6ee4649c0022599
<4> [260.903458] WARNING: CPU: 7 PID: 5515 at kernel/locking/lockdep.c:3679 
check_chain_key+0x1a4/0x1f0

Thanks,
-Chris


Re: [tip: locking/core] lockdep: Fix usage_traceoverflow

2020-10-27 Thread Chris Wilson
Quoting Peter Zijlstra (2020-10-27 15:45:33)
> On Tue, Oct 27, 2020 at 01:29:10PM +0000, Chris Wilson wrote:
> 
> > <4> [304.908891] hm#2, depth: 6 [6], 3425cfea6ff31f7f != 547d92e9ec2ab9af
> > <4> [304.908897] WARNING: CPU: 0 PID: 5658 at kernel/locking/lockdep.c:3679 
> > check_chain_key+0x1a4/0x1f0
> 
> Urgh, I don't think I've _ever_ seen that warning trigger.
> 
> The comments that go with it suggest memory corruption is the most
> likely trigger of it. Is it easy to trigger?

For the automated CI, yes, the few machines that run that particular HW
test seem to hit it regularly. I have not yet reproduced it for myself.
I thought it looked like something kasan would provide some insight for
and we should get a kasan run through CI over the w/e. I suspect we've
feed in some garbage and called it a lock.
-Chris


Re: [tip: locking/core] lockdep: Fix usage_traceoverflow

2020-10-27 Thread Chris Wilson
Quoting Peter Zijlstra (2020-10-27 12:30:56)
> On Tue, Oct 27, 2020 at 12:59:55PM +0100, Peter Zijlstra wrote:
> > On Tue, Oct 27, 2020 at 11:29:35AM +0000, Chris Wilson wrote:
> > > Quoting tip-bot2 for Peter Zijlstra (2020-10-07 17:20:13)
> > > > The following commit has been merged into the locking/core branch of 
> > > > tip:
> > > > 
> > > > Commit-ID: 24d5a3bffef117ed90685f285c6c9d2faa3a02b4
> > > > Gitweb:
> > > > https://git.kernel.org/tip/24d5a3bffef117ed90685f285c6c9d2faa3a02b4
> > > > Author:Peter Zijlstra 
> > > > AuthorDate:Wed, 30 Sep 2020 11:49:37 +02:00
> > > > Committer: Peter Zijlstra 
> > > > CommitterDate: Wed, 07 Oct 2020 18:14:17 +02:00
> > > > 
> > > > lockdep: Fix usage_traceoverflow
> > > > 
> > > > Basically print_lock_class_header()'s for loop is out of sync with the
> > > > the size of of ->usage_traces[].
> > > 
> > > We're hitting a problem,
> > > 
> > > $ cat /proc/lockdep_stats
> > > 
> > > upon boot generates:
> > > 
> > > [   29.465702] DEBUG_LOCKS_WARN_ON(debug_atomic_read(nr_unused_locks) != 
> > > nr_unused)
> > > [   29.465716] WARNING: CPU: 0 PID: 488 at 
> > > kernel/locking/lockdep_proc.c:256 lockdep_stats_show+0xa33/0xac0
> > > 
> > > that bisected to this patch. Only just completed the bisection and
> > > thought you would like a heads up.
> > 
> > Oh hey, that's 'curious'... it does indeed trivially reproduce, let me
> > have a poke.
> 
> This seems to make it happy. Not quite sure that's the best solution.

Finished the first round of testing on this patch (will try the second
in a second). It solves the nr_unused_locks issue, but we find something
else:

<4> [304.908891] hm#2, depth: 6 [6], 3425cfea6ff31f7f != 547d92e9ec2ab9af
<4> [304.908897] WARNING: CPU: 0 PID: 5658 at kernel/locking/lockdep.c:3679 
check_chain_key+0x1a4/0x1f0
<4> [304.908898] Modules linked in: vgem snd_hda_codec_hdmi 
snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio i915 mei_hdcp 
x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul snd_hda_intel btusb 
snd_intel_dspcfg btrtl snd_hda_codec btbcm btintel ghash_clmulni_intel 
snd_hwdep bluetooth snd_hda_core e1000e snd_pcm cdc_ether ptp usbnet mei_me mii 
pps_core mei ecdh_generic ecc intel_lpss_pci prime_numbers
<4> [304.908920] CPU: 0 PID: 5658 Comm: kms_psr Not tainted 
5.10.0-rc1-CI-Trybot_7174+ #1
<4> [304.908922] Hardware name: Intel Corporation Ice Lake Client 
Platform/IceLake U DDR4 SODIMM PD RVP TLC, BIOS 
ICLSFWR1.R00.3183.A00.1905020411 05/02/2019
<4> [304.908923] RIP: 0010:check_chain_key+0x1a4/0x1f0
<4> [304.908925] Code: a5 d8 08 00 00 74 e7 e8 7a eb 96 00 8b b5 e0 08 00 00 4c 
89 e1 89 da 4c 8b 85 d8 08 00 00 48 c7 c7 d0 8f 30 82 e8 5f 2c 92 00 <0f> 0b 5b 
5d 41 5c 41 5d c3 49 89 d5 49 c7 c4 ff ff ff ff 31 db e8
<4> [304.908926] RSP: 0018:c9ba7af0 EFLAGS: 00010086
<4> [304.908928] RAX:  RBX: 0006 RCX: 
0002
<4> [304.908929] RDX: 8002 RSI: 82348c47 RDI: 

<4> [304.908930] RBP: 88812d7dc040 R08:  R09: 
c0012c92
<4> [304.908931] R10: 003b5380 R11: c9ba7900 R12: 
3425cfea6ff31f7f
<4> [304.908931] R13: 88812d7dc9f0 R14: 0003 R15: 
88812d7dc9f0
<4> [304.908933] FS:  7f51722bb300() GS:88849fa0() 
knlGS:
<4> [304.908934] CS:  0010 DS:  ES:  CR0: 80050033
<4> [304.908935] CR2: 7ffd197adff0 CR3: 00011d9ee004 CR4: 
00770ef0
<4> [304.908935] PKRU: 5554
<4> [304.908936] Call Trace:
<4> [304.908939]  __lock_acquire+0x5d0/0x2740
<4> [304.908941]  lock_acquire+0xdc/0x3c0
<4> [304.908944]  ? drm_modeset_lock+0xf6/0x110
<4> [304.908947]  __ww_mutex_lock.constprop.18+0xd0/0x1010
<4> [304.908949]  ? drm_modeset_lock+0xf6/0x110
<4> [304.908951]  ? drm_modeset_lock+0xf6/0x110
<4> [304.908953]  ? ww_mutex_lock_interruptible+0x39/0xa0
<4> [304.908954]  ww_mutex_lock_interruptible+0x39/0xa0
<4> [304.908956]  drm_modeset_lock+0xf6/0x110
<4> [304.908958]  drm_atomic_get_connector_state+0x28/0x180
<4> [304.909003]  intel_psr_fastset_force+0x76/0x170 [i915]
<4> [304.909034]  i915_edp_psr_debug_set+0x53/0x70 [i915]
<4> [304.909037]  simple_attr_write+0xb1/0xd0
<4> [304.909040]  full_proxy_write+0x51/0x80
<4> [304.909042]  vfs_write+0xc4/0x230
<4> [304.909043]  ksys_write+0x5a/0xd0
<4> [304.909045]  do_syscall_64+0x33/0x80
<4> [3

Re: [tip: locking/core] lockdep: Fix usage_traceoverflow

2020-10-27 Thread Chris Wilson
Quoting tip-bot2 for Peter Zijlstra (2020-10-07 17:20:13)
> The following commit has been merged into the locking/core branch of tip:
> 
> Commit-ID: 24d5a3bffef117ed90685f285c6c9d2faa3a02b4
> Gitweb:
> https://git.kernel.org/tip/24d5a3bffef117ed90685f285c6c9d2faa3a02b4
> Author:Peter Zijlstra 
> AuthorDate:Wed, 30 Sep 2020 11:49:37 +02:00
> Committer: Peter Zijlstra 
> CommitterDate: Wed, 07 Oct 2020 18:14:17 +02:00
> 
> lockdep: Fix usage_traceoverflow
> 
> Basically print_lock_class_header()'s for loop is out of sync with the
> the size of of ->usage_traces[].

We're hitting a problem,

$ cat /proc/lockdep_stats

upon boot generates:

[   29.465702] DEBUG_LOCKS_WARN_ON(debug_atomic_read(nr_unused_locks) != 
nr_unused)
[   29.465716] WARNING: CPU: 0 PID: 488 at kernel/locking/lockdep_proc.c:256 
lockdep_stats_show+0xa33/0xac0

that bisected to this patch. Only just completed the bisection and
thought you would like a heads up.
-Chris


Re: [drm/i915/gt] 98479ada42: phoronix-test-suite.supertuxkart.1024x768.Windowed.Basic.1.OldMine.frames_per_second -36.0% regression

2020-10-02 Thread Chris Wilson
Quoting kernel test robot (2020-10-02 09:27:40)
> Greeting,
> 
> FYI, we noticed a -36.0% regression of 
> phoronix-test-suite.supertuxkart.1024x768.Windowed.Basic.1.OldMine.frames_per_second
>  due to commit:

Where's the power consumption graph? You have a benchmark that although
is running faster than the display is failing to keep the GPU busy. That
alone says there is a latency issue along the GL paths. Is papering over
an issue in command submission worth the power cost?
-Chris


Re: REGRESSION: in intel video driver following introduction of mm_struct.has_pinned

2020-09-29 Thread Chris Wilson
Quoting Joonas Lahtinen (2020-09-29 09:18:34)
> (+ intel-gfx for being i915 related)
> (+ Chris who has looked into the issue)
> 
> Hi,
> 
> Thanks for reporting!

Fixed in commit a4d63c3732f1a0c91abcf5b7f32b4ef7dcd82025
Author: Jason A. Donenfeld 
Date:   Mon Sep 28 12:35:07 2020 +0200

mm: do not rely on mm == current->mm in __get_user_pages_locked
-Chris


Re: [PATCH v4 07/15] iommu/vt-d: Delegate the dma domain to upper layer

2020-08-24 Thread Chris Wilson
Quoting Lu Baolu (2020-08-24 07:31:23)
> Hi Chris,
> 
> On 2020/8/22 2:33, Chris Wilson wrote:
> > Quoting Lu Baolu (2019-05-25 06:41:28)
> >> This allows the iommu generic layer to allocate a dma domain and
> >> attach it to a device through the iommu api's. With all types of
> >> domains being delegated to upper layer, we can remove an internal
> >> flag which was used to distinguish domains mananged internally or
> >> externally.
> > 
> > I'm seeing some really strange behaviour with this patch on a 32b
> > Skylake system (and still present on mainline). Before this patch
> > everything is peaceful and appears to work correctly. Applying this patch,
> > and we fail to initialise the GPU with a few DMAR errors reported, e.g.
> > 
> > [   20.279445] DMAR: DRHD: handling fault status reg 3
> > [   20.279508] DMAR: [DMA Read] Request device [00:02.0] fault addr 
> > 8900a000 [fault reason 05] PTE Write access is not set
> > 
> > Setting an identity map for the igfx made the DMAR errors disappear, but
> > the GPU still failed to initialise.
> > 
> > There's no difference in the DMAR configuration dmesg between working and
> > the upset patch. And the really strange part is that switching to a 64b
> > kernel with this patch, it's working.
> > 
> > Any suggestions on what I should look for?
> 
> Can the patch titled "[PATCH] iommu/intel: Handle 36b addressing for
> x86-32" solve this problem?

It does. Not sure why, but that mystery I can leave for others.
-Chris


Re: [PATCH v2] mm: Track page table modifications in __apply_to_page_range()

2020-08-22 Thread Chris Wilson
Quoting Chris Wilson (2020-08-22 00:39:09)
> Quoting Andrew Morton (2020-08-21 23:34:12)
> > On Fri, 21 Aug 2020 14:37:46 +0200 Joerg Roedel  wrote:
> > 
> > > The __apply_to_page_range() function is also used to change and/or
> > > allocate page-table pages in the vmalloc area of the address space.
> > > Make sure these changes get synchronized to other page-tables in the
> > > system by calling arch_sync_kernel_mappings() when necessary.
> > > 
> > > Tested-by: Chris Wilson  #x86-32
> > > Cc:  # v5.8+
> > 
> > I'm trying to figure out how you figured out that this is 5.8+.  Has a
> > particular misbehaving commit been identified?
> 
> The two commits of relevance, in my eyes, were
> 
>   2ba3e6947aed ("mm/vmalloc: track which page-table levels were modified")
>   86cf69f1d893 ("x86/mm/32: implement arch_sync_kernel_mappings()")
> 
> I can reproduce the failure on v5.8, but not on v5.7. A bisect would
> seem to be plausible.

The active ingredient was

7f0a002b5a21 ("x86/mm: remove vmalloc faulting")

which explains a lot.
-Chris


Re: [PATCH v2] mm: Track page table modifications in __apply_to_page_range()

2020-08-21 Thread Chris Wilson
Quoting Andrew Morton (2020-08-21 23:34:12)
> On Fri, 21 Aug 2020 14:37:46 +0200 Joerg Roedel  wrote:
> 
> > The __apply_to_page_range() function is also used to change and/or
> > allocate page-table pages in the vmalloc area of the address space.
> > Make sure these changes get synchronized to other page-tables in the
> > system by calling arch_sync_kernel_mappings() when necessary.
> > 
> > Tested-by: Chris Wilson  #x86-32
> > Cc:  # v5.8+
> 
> I'm trying to figure out how you figured out that this is 5.8+.  Has a
> particular misbehaving commit been identified?

The two commits of relevance, in my eyes, were

  2ba3e6947aed ("mm/vmalloc: track which page-table levels were modified")
  86cf69f1d893 ("x86/mm/32: implement arch_sync_kernel_mappings()")

I can reproduce the failure on v5.8, but not on v5.7. A bisect would
seem to be plausible.
-Chris


Re: [PATCH v2] mm: Track page table modifications in __apply_to_page_range()

2020-08-21 Thread Chris Wilson
Quoting Andrew Morton (2020-08-21 21:35:48)
> On Fri, 21 Aug 2020 14:37:46 +0200 Joerg Roedel  wrote:
> 
> > The __apply_to_page_range() function is also used to change and/or
> > allocate page-table pages in the vmalloc area of the address space.
> > Make sure these changes get synchronized to other page-tables in the
> > system by calling arch_sync_kernel_mappings() when necessary.
> 
> There's no description here of the user-visible effects of the bug. 
> Please always provide this, especially when proposing a -stable
> backport.  Take pity upon all the downstream kernel maintainers who are
> staring at this wondering whether they should risk adding it to their
> kernels.

The impact appears limited to x86-32, where apply_to_page_range may miss
updating the PMD. That leads to explosions in drivers like

[   24.227844] BUG: unable to handle page fault for address: fe036000
[   24.228076] #PF: supervisor write access in kernel mode
[   24.228294] #PF: error_code(0x0002) - not-present page
[   24.228494] *pde = 
[   24.228640] Oops: 0002 [#1] SMP
[   24.228788] CPU: 3 PID: 1300 Comm: gem_concurrent_ Not tainted 5.9.0-rc1+ #16
[   24.228957] Hardware name:  /NUC6i3SYB, BIOS 
SYSKLi35.86A.0024.2015.1027.2142 10/27/2015
[   24.229297] EIP: __execlists_context_alloc+0x132/0x2d0 [i915]
[   24.229462] Code: 31 d2 89 f0 e8 2f 55 02 00 89 45 e8 3d 00 f0 ff ff 0f 87 
11 01 00 00 8b 4d e8 03 4b 30 b8 5a 5a 5a 5a ba 01 00 00 00 8d 79 04  01 5a 
5a 5a 5a c7 81 fc 0f 00 00 5a 5a 5a 5a 83 e7 fc 29 f9 81
[   24.229759] EAX: 5a5a5a5a EBX: f60ca000 ECX: fe036000 EDX: 0001
[   24.229915] ESI: f43b7340 EDI: fe036004 EBP: f6389cb8 ESP: f6389c9c
[   24.230072] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010286
[   24.230229] CR0: 80050033 CR2: fe036000 CR3: 2d361000 CR4: 001506d0
[   24.230385] DR0:  DR1:  DR2:  DR3: 
[   24.230539] DR6: fffe0ff0 DR7: 0400
[   24.230675] Call Trace:
[   24.230957]  execlists_context_alloc+0x10/0x20 [i915]
[   24.231266]  intel_context_alloc_state+0x3f/0x70 [i915]
[   24.231547]  __intel_context_do_pin+0x117/0x170 [i915]
[   24.231850]  i915_gem_do_execbuffer+0xcc7/0x2500 [i915]
[   24.232024]  ? __kmalloc_track_caller+0x54/0x230
[   24.232181]  ? ktime_get+0x3e/0x120
[   24.232333]  ? dma_fence_signal+0x34/0x50
[   24.232617]  i915_gem_execbuffer2_ioctl+0xcd/0x1f0 [i915]
[   24.232912]  ? i915_gem_execbuffer_ioctl+0x2e0/0x2e0 [i915]
[   24.233084]  drm_ioctl_kernel+0x8f/0xd0
[   24.233236]  drm_ioctl+0x223/0x3d0
[   24.233505]  ? i915_gem_execbuffer_ioctl+0x2e0/0x2e0 [i915]
[   24.233684]  ? pick_next_task_fair+0x1b5/0x3d0
[   24.233873]  ? __switch_to_asm+0x36/0x50
[   24.234021]  ? drm_ioctl_kernel+0xd0/0xd0
[   24.234167]  __ia32_sys_ioctl+0x1ab/0x760
[   24.234313]  ? exit_to_user_mode_prepare+0xe5/0x110
[   24.234453]  ? syscall_exit_to_user_mode+0x23/0x130
[   24.234601]  __do_fast_syscall_32+0x3f/0x70
[   24.234744]  do_fast_syscall_32+0x29/0x60
[   24.234885]  do_SYSENTER_32+0x15/0x20
[   24.235021]  entry_SYSENTER_32+0x9f/0xf2
[   24.235157] EIP: 0xb7f28559
[   24.235288] Code: 03 74 c0 01 10 05 03 74 b8 01 10 06 03 74 b4 01 10 07 03 
74 b0 01 10 08 03 74 d8 01 00 00 00 00 00 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 
c3 90 90 90 90 8d 76 00 58 b8 77 00 00 00 cd 80 90 8d 76
[   24.235576] EAX: ffda EBX: 0005 ECX: c0406469 EDX: bf95556c
[   24.235722] ESI: b7e68000 EDI: c0406469 EBP: 0005 ESP: bf9554d8
[   24.235869] DS: 007b ES: 007b FS:  GS: 0033 SS: 007b EFLAGS: 0296
[   24.236018] Modules linked in: i915 x86_pkg_temp_thermal intel_powerclamp 
crc32_pclmul crc32c_intel intel_cstate intel_uncore intel_gtt drm_kms_helper 
intel_pch_thermal video button autofs4 i2c_i801 i2c_smbus fan
[   24.236336] CR2: fe036000

It looks like kasan, xen and i915 are vulnerable.
-Chris


Re: [PATCH v2] mm: Track page table modifications in __apply_to_page_range()

2020-08-21 Thread Chris Wilson
Quoting Joerg Roedel (2020-08-21 13:37:46)
> From: Joerg Roedel 
> 
> The __apply_to_page_range() function is also used to change and/or
> allocate page-table pages in the vmalloc area of the address space.
> Make sure these changes get synchronized to other page-tables in the
> system by calling arch_sync_kernel_mappings() when necessary.
> 
> Tested-by: Chris Wilson  #x86-32
> Cc:  # v5.8+
> Signed-off-by: Joerg Roedel 

I've doubled check that this patch by itself fixes our x86-32 vmapping
issue. Thanks,
-Chris


Re: [PATCH v4 07/15] iommu/vt-d: Delegate the dma domain to upper layer

2020-08-21 Thread Chris Wilson
Quoting Lu Baolu (2019-05-25 06:41:28)
> This allows the iommu generic layer to allocate a dma domain and
> attach it to a device through the iommu api's. With all types of
> domains being delegated to upper layer, we can remove an internal
> flag which was used to distinguish domains mananged internally or
> externally.

I'm seeing some really strange behaviour with this patch on a 32b
Skylake system (and still present on mainline). Before this patch
everything is peaceful and appears to work correctly. Applying this patch,
and we fail to initialise the GPU with a few DMAR errors reported, e.g.

[   20.279445] DMAR: DRHD: handling fault status reg 3
[   20.279508] DMAR: [DMA Read] Request device [00:02.0] fault addr 8900a000 
[fault reason 05] PTE Write access is not set

Setting an identity map for the igfx made the DMAR errors disappear, but
the GPU still failed to initialise.

There's no difference in the DMAR configuration dmesg between working and
the upset patch. And the really strange part is that switching to a 64b
kernel with this patch, it's working.

Any suggestions on what I should look for?
-Chris


Re: [PATCH 2/4] drm/i915/gem: Sync the vmap PTEs upon construction

2020-08-21 Thread Chris Wilson
Quoting Linus Torvalds (2020-08-21 13:41:03)
> On Fri, Aug 21, 2020 at 1:50 AM Chris Wilson  wrote:
> >
> > Since synchronising the PTE after assignment is a manual step, use the
> > newly exported interface to flush the PTE after assigning via
> > alloc_vm_area().
> 
> This commit message doesn't make much sense to me.
> 
> Are you talking about synchronizing the page directory structure
> across processes after possibly creating new kernel page tables?
> 
> Because that has nothing to do with the PTE. It's all about making
> sure the _upper_ layers of the page directories are populated
> everywhere..
> 
> The name seems off to me too - what are you "flushing"? (And yes, I
> know about the flush_cache_vmap(), but that looks just bogus, since
> any non-mapped area shouldn't have any virtual caches to begin with,
> so I suspect that is just the crazy architectures being confused -
> flush_cache_vmap() is a no-op on any sane architecture - and powerpc
> that mis-uses it for other things).

I was trying to mimic map_kernel_range() which does the
arch_sync_kernel_mappings and flush_cache_vmap on top of the
apply_to_page_range performed by alloc_vm_area, because buried away in
there, on x86-32, is a set_pmd(). Since map_kernel_range() wrapped
map_kernel_range_noflush(), flush seemed like the right verb.

Joerg pointed out that the sync belonged to __apply_to_page_range and
fixed it in situ.
-Chris


Re: [PATCH] mm: Track page table modifications in __apply_to_page_range() construction

2020-08-21 Thread Chris Wilson
Quoting Chris Wilson (2020-08-21 11:39:19)
> Quoting Joerg Roedel (2020-08-21 11:23:43)
> > On Fri, Aug 21, 2020 at 11:13:36AM +0100, Chris Wilson wrote:
> > > We need to store the initial addr, as here addr == end [or earlier on
> > > earlier], so range is (start, addr).
> > 
> > Right, I missed that, thanks for pointing it out.
> 
> And with that (start, addr)
> 
> Tested-by: Chris Wilson  #x86-32

In the version I tested, I also had

@@ -2216,7 +2216,7 @@ static int apply_to_pte_range(struct mm_struct *mm, pmd_t 
*pmd,

if (create) {
pte = (mm == _mm) ?
-   pte_alloc_kernel(pmd, addr) :
+   pte_alloc_kernel_track(pmd, addr, mask) :
pte_alloc_map_lock(mm, pmd, addr, );
if (!pte)
return -ENOMEM;

And that PGTBL_PMD_MODIFIED makes a difference.
-Chris


Re: [PATCH] mm: Track page table modifications in __apply_to_page_range() construction

2020-08-21 Thread Chris Wilson
Quoting Joerg Roedel (2020-08-21 11:23:43)
> On Fri, Aug 21, 2020 at 11:13:36AM +0100, Chris Wilson wrote:
> > We need to store the initial addr, as here addr == end [or earlier on
> > earlier], so range is (start, addr).
> 
> Right, I missed that, thanks for pointing it out.

And with that (start, addr)

Tested-by: Chris Wilson  #x86-32
-Chris


Re: [PATCH 1/4] mm: Export flush_vm_area() to sync the PTEs upon construction

2020-08-21 Thread Chris Wilson
Quoting Joerg Roedel (2020-08-21 11:22:04)
> On Fri, Aug 21, 2020 at 10:54:22AM +0100, Chris Wilson wrote:
> > Ok. I thought it had to be after assigning the *ptep. If we apply the
> > sync first, do not have to worry about PGTBL_PTE_MODIFIED from the
> > *ptep?
> 
> Hmm, if I understand the code correctly, you are re-implementing some
> generic ioremap/vmalloc mapping logic in the i915 driver. I don't know
> the reason, but if it is valid you need to manually call
> arch_sync_kernel_mappings() from your driver like this to be correct:
> 
> if (ARCH_PAGE_TABLE_SYNC_MASK & PGTBL_PTE_MODIFIED)
> arch_sync_kernel_mappings();
> 
> In practice this is a no-op, because nobody sets PGTBL_PTE_MODIFIED in
> ARCH_PAGE_TABLE_SYNC_MASK, so the above code would be optimized away.
> 
> But what you really care about is the tracking in apply_to_page_range(),
> as that allocates the !pte levels of your page-table, which needs
> synchronization on x86-32.
> 
> Btw, what are the reasons you can't use generic vmalloc/ioremap
> interfaces to map the range?

ioremap_prot and ioremap_page_range assume a contiguous IO address. So
we needed to allocate the vmalloc area [and would then need to iterate
over the discontiguous iomem chunks with ioremap_page_range], and since
alloc_vm_area returned the ptep, it looked clearer to then assign those
according to whether we wanted ioremapping or a plain page. So we ended
up with one call to the core to return us a vm_struct and a pte array
that worked for either backing store.
-Chris


Re: [PATCH] mm: Track page table modifications in __apply_to_page_range() construction

2020-08-21 Thread Chris Wilson
Quoting Joerg Roedel (2020-08-21 11:09:02)
> @@ -2333,6 +2339,7 @@ static int __apply_to_page_range(struct mm_struct *mm, 
> unsigned long addr,
> pgd_t *pgd;
> unsigned long next;
> unsigned long end = addr + size;
> +   pgtbl_mod_mask mask = 0;
> int err = 0;
>  
> if (WARN_ON(addr >= end))
> @@ -2343,11 +2350,14 @@ static int __apply_to_page_range(struct mm_struct 
> *mm, unsigned long addr,
> next = pgd_addr_end(addr, end);
> if (!create && pgd_none_or_clear_bad(pgd))
> continue;
> -   err = apply_to_p4d_range(mm, pgd, addr, next, fn, data, 
> create);
> +   err = apply_to_p4d_range(mm, pgd, addr, next, fn, data, 
> create, );
> if (err)
> break;
> } while (pgd++, addr = next, addr != end);
>  
> +   if (mask & ARCH_PAGE_TABLE_SYNC_MASK)
> +   arch_sync_kernel_mappings(addr, addr + size);

We need to store the initial addr, as here addr == end [or earlier on
earlier], so range is (start, addr).
-Chris


Re: [PATCH 1/4] mm: Export flush_vm_area() to sync the PTEs upon construction

2020-08-21 Thread Chris Wilson
Quoting Joerg Roedel (2020-08-21 10:51:29)
> On Fri, Aug 21, 2020 at 09:50:08AM +0100, Chris Wilson wrote:
> > The alloc_vm_area() is another method for drivers to
> > vmap/map_kernel_range that uses apply_to_page_range() rather than the
> > direct vmalloc walkers. This is missing the page table modification
> > tracking, and the ability to synchronize the PTE updates afterwards.
> > Provide flush_vm_area() for the users of alloc_vm_area() that assumes
> > the worst and ensures that the page directories are correctly flushed
> > upon construction.
> > 
> > The impact is most pronounced on x86_32 due to the delayed set_pmd().
> > 
> > Reported-by: Pavel Machek 
> > References: 2ba3e6947aed ("mm/vmalloc: track which page-table levels were 
> > modified")
> > References: 86cf69f1d893 ("x86/mm/32: implement 
> > arch_sync_kernel_mappings()")
> > Signed-off-by: Chris Wilson 
> > Cc: Andrew Morton 
> > Cc: Joerg Roedel 
> > Cc: Linus Torvalds 
> > Cc: Dave Airlie 
> > Cc: Joonas Lahtinen 
> > Cc: Rodrigo Vivi 
> > Cc: Pavel Machek 
> > Cc: David Vrabel 
> > Cc:  # v5.8+
> > ---
> >  include/linux/vmalloc.h |  1 +
> >  mm/vmalloc.c| 16 
> >  2 files changed, 17 insertions(+)
> > 
> > diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
> > index 0221f852a7e1..a253b27df0ac 100644
> > --- a/include/linux/vmalloc.h
> > +++ b/include/linux/vmalloc.h
> > @@ -204,6 +204,7 @@ static inline void set_vm_flush_reset_perms(void *addr)
> >  
> >  /* Allocate/destroy a 'vmalloc' VM area. */
> >  extern struct vm_struct *alloc_vm_area(size_t size, pte_t **ptes);
> > +extern void flush_vm_area(struct vm_struct *area);
> >  extern void free_vm_area(struct vm_struct *area);
> >  
> >  /* for /dev/kmem */
> > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > index b482d240f9a2..c41934486031 100644
> > --- a/mm/vmalloc.c
> > +++ b/mm/vmalloc.c
> > @@ -3078,6 +3078,22 @@ struct vm_struct *alloc_vm_area(size_t size, pte_t 
> > **ptes)
> >  }
> >  EXPORT_SYMBOL_GPL(alloc_vm_area);
> >  
> > +void flush_vm_area(struct vm_struct *area)
> > +{
> > + unsigned long addr = (unsigned long)area->addr;
> > +
> > + /* apply_to_page_range() doesn't track the damage, assume the worst */
> > + if (ARCH_PAGE_TABLE_SYNC_MASK & (PGTBL_PTE_MODIFIED |
> > +  PGTBL_PMD_MODIFIED |
> > +  PGTBL_PUD_MODIFIED |
> > +  PGTBL_P4D_MODIFIED |
> > +  PGTBL_PGD_MODIFIED))
> > + arch_sync_kernel_mappings(addr, addr + area->size);
> 
> This should happen in __apply_to_page_range() directly and look like
> this:

Ok. I thought it had to be after assigning the *ptep. If we apply the
sync first, do not have to worry about PGTBL_PTE_MODIFIED from the
*ptep?
-Chris


[PATCH 2/4] drm/i915/gem: Sync the vmap PTEs upon construction

2020-08-21 Thread Chris Wilson
Since synchronising the PTE after assignment is a manual step, use the
newly exported interface to flush the PTE after assigning via
alloc_vm_area().

Reported-by: Pavel Machek 
References: 2ba3e6947aed ("mm/vmalloc: track which page-table levels were 
modified")
Signed-off-by: Chris Wilson 
Cc: Andrew Morton 
Cc: Joerg Roedel 
Cc: Linus Torvalds 
Cc: Dave Airlie 
Cc: Joonas Lahtinen 
Cc: Rodrigo Vivi 
Cc: Pavel Machek 
Cc:  # v5.8+
---
 drivers/gpu/drm/i915/gem/i915_gem_pages.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c 
b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
index 7050519c87a4..0fee67f34d74 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
@@ -304,6 +304,7 @@ static void *i915_gem_object_map(struct drm_i915_gem_object 
*obj,
for_each_sgt_daddr(addr, iter, sgt)
**ptes++ = iomap_pte(iomap, addr, pgprot);
}
+   flush_vm_area(area);
 
if (mem != stack)
kvfree(mem);
-- 
2.20.1



[PATCH 4/4] drm/i915/gem: Replace reloc chain with terminator on error unwind

2020-08-21 Thread Chris Wilson
If we hit an error during construction of the reloc chain, we need to
replace the chain into the next batch with the terminator so that upon
flushing the relocations so far, we do not execute a hanging batch.

Reported-by: Pavel Machek 
Fixes: 964a9b0f611e ("drm/i915/gem: Use chained reloc batches")
Signed-off-by: Chris Wilson 
Cc: Joonas Lahtinen 
Cc: Pavel Machek 
Cc:  # v5.8+
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 31 ++-
 1 file changed, 16 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 24a1486d2dc5..a09f04eee417 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -972,21 +972,6 @@ static int reloc_gpu_chain(struct reloc_cache *cache)
if (err)
goto out_pool;
 
-   GEM_BUG_ON(cache->rq_size + RELOC_TAIL > PAGE_SIZE  / sizeof(u32));
-   cmd = cache->rq_cmd + cache->rq_size;
-   *cmd++ = MI_ARB_CHECK;
-   if (cache->gen >= 8)
-   *cmd++ = MI_BATCH_BUFFER_START_GEN8;
-   else if (cache->gen >= 6)
-   *cmd++ = MI_BATCH_BUFFER_START;
-   else
-   *cmd++ = MI_BATCH_BUFFER_START | MI_BATCH_GTT;
-   *cmd++ = lower_32_bits(batch->node.start);
-   *cmd++ = upper_32_bits(batch->node.start); /* Always 0 for gen<8 */
-   i915_gem_object_flush_map(cache->rq_vma->obj);
-   i915_gem_object_unpin_map(cache->rq_vma->obj);
-   cache->rq_vma = NULL;
-
err = intel_gt_buffer_pool_mark_active(pool, rq);
if (err == 0) {
i915_vma_lock(batch);
@@ -999,15 +984,31 @@ static int reloc_gpu_chain(struct reloc_cache *cache)
if (err)
goto out_pool;
 
+   GEM_BUG_ON(cache->rq_size + RELOC_TAIL > PAGE_SIZE  / sizeof(u32));
+   cmd = cache->rq_cmd + cache->rq_size;
+   *cmd++ = MI_ARB_CHECK;
+   if (cache->gen >= 8)
+   *cmd++ = MI_BATCH_BUFFER_START_GEN8;
+   else if (cache->gen >= 6)
+   *cmd++ = MI_BATCH_BUFFER_START;
+   else
+   *cmd++ = MI_BATCH_BUFFER_START | MI_BATCH_GTT;
+   *cmd++ = lower_32_bits(batch->node.start);
+   *cmd++ = upper_32_bits(batch->node.start); /* Always 0 for gen<8 */
+
cmd = i915_gem_object_pin_map(batch->obj,
  cache->has_llc ?
  I915_MAP_FORCE_WB :
  I915_MAP_FORCE_WC);
if (IS_ERR(cmd)) {
+   /* We will replace the BBS with BBE upon flushing the rq */
err = PTR_ERR(cmd);
goto out_pool;
}
 
+   i915_gem_object_flush_map(cache->rq_vma->obj);
+   i915_gem_object_unpin_map(cache->rq_vma->obj);
+
/* Return with batch mapping (cmd) still pinned */
cache->rq_cmd = cmd;
cache->rq_size = 0;
-- 
2.20.1



[PATCH 3/4] drm/i915/gem: Use set_pte_at() for assigning the vmapped PTE

2020-08-21 Thread Chris Wilson
Use set_pte_at() to assign the PTE pointer returned by alloc_vm_area(),
rather than a direct assignment.

Fixes: 6056e50033d9 ("drm/i915/gem: Support discontiguous lmem object maps")
Signed-off-by: Chris Wilson 
Cc: Matthew Auld 
---
 drivers/gpu/drm/i915/gem/i915_gem_pages.c | 21 -
 1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c 
b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
index 0fee67f34d74..6838cf9bdae6 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
@@ -286,23 +286,34 @@ static void *i915_gem_object_map(struct 
drm_i915_gem_object *obj,
}
 
if (i915_gem_object_has_struct_page(obj)) {
+   unsigned long addr = (unsigned long)area->addr;
struct sgt_iter iter;
struct page *page;
pte_t **ptes = mem;
 
-   for_each_sgt_page(page, iter, sgt)
-   **ptes++ = mk_pte(page, pgprot);
+   for_each_sgt_page(page, iter, sgt) {
+   set_pte_at(_mm, addr, *ptes, mk_pte(page, pgprot));
+   addr += PAGE_SIZE;
+   ptes++;
+   }
+   GEM_BUG_ON(addr != (unsigned long)area->addr + obj->base.size);
} else {
+   unsigned long addr = (unsigned long)area->addr;
resource_size_t iomap;
struct sgt_iter iter;
pte_t **ptes = mem;
-   dma_addr_t addr;
+   dma_addr_t offset;
 
iomap = obj->mm.region->iomap.base;
iomap -= obj->mm.region->region.start;
 
-   for_each_sgt_daddr(addr, iter, sgt)
-   **ptes++ = iomap_pte(iomap, addr, pgprot);
+   for_each_sgt_daddr(offset, iter, sgt) {
+   set_pte_at(_mm, addr, *ptes,
+  iomap_pte(iomap, offset, pgprot));
+   addr += PAGE_SIZE;
+   ptes++;
+   }
+   GEM_BUG_ON(addr != (unsigned long)area->addr + obj->base.size);
}
flush_vm_area(area);
 
-- 
2.20.1



[PATCH 1/4] mm: Export flush_vm_area() to sync the PTEs upon construction

2020-08-21 Thread Chris Wilson
The alloc_vm_area() is another method for drivers to
vmap/map_kernel_range that uses apply_to_page_range() rather than the
direct vmalloc walkers. This is missing the page table modification
tracking, and the ability to synchronize the PTE updates afterwards.
Provide flush_vm_area() for the users of alloc_vm_area() that assumes
the worst and ensures that the page directories are correctly flushed
upon construction.

The impact is most pronounced on x86_32 due to the delayed set_pmd().

Reported-by: Pavel Machek 
References: 2ba3e6947aed ("mm/vmalloc: track which page-table levels were 
modified")
References: 86cf69f1d893 ("x86/mm/32: implement arch_sync_kernel_mappings()")
Signed-off-by: Chris Wilson 
Cc: Andrew Morton 
Cc: Joerg Roedel 
Cc: Linus Torvalds 
Cc: Dave Airlie 
Cc: Joonas Lahtinen 
Cc: Rodrigo Vivi 
Cc: Pavel Machek 
Cc: David Vrabel 
Cc:  # v5.8+
---
 include/linux/vmalloc.h |  1 +
 mm/vmalloc.c| 16 
 2 files changed, 17 insertions(+)

diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 0221f852a7e1..a253b27df0ac 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -204,6 +204,7 @@ static inline void set_vm_flush_reset_perms(void *addr)
 
 /* Allocate/destroy a 'vmalloc' VM area. */
 extern struct vm_struct *alloc_vm_area(size_t size, pte_t **ptes);
+extern void flush_vm_area(struct vm_struct *area);
 extern void free_vm_area(struct vm_struct *area);
 
 /* for /dev/kmem */
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index b482d240f9a2..c41934486031 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -3078,6 +3078,22 @@ struct vm_struct *alloc_vm_area(size_t size, pte_t 
**ptes)
 }
 EXPORT_SYMBOL_GPL(alloc_vm_area);
 
+void flush_vm_area(struct vm_struct *area)
+{
+   unsigned long addr = (unsigned long)area->addr;
+
+   /* apply_to_page_range() doesn't track the damage, assume the worst */
+   if (ARCH_PAGE_TABLE_SYNC_MASK & (PGTBL_PTE_MODIFIED |
+PGTBL_PMD_MODIFIED |
+PGTBL_PUD_MODIFIED |
+PGTBL_P4D_MODIFIED |
+PGTBL_PGD_MODIFIED))
+   arch_sync_kernel_mappings(addr, addr + area->size);
+
+   flush_cache_vmap(addr, area->size);
+}
+EXPORT_SYMBOL_GPL(flush_vm_area);
+
 void free_vm_area(struct vm_struct *area)
 {
struct vm_struct *ret;
-- 
2.20.1



[PATCH] perf tools: Use %zd for size_t printf formats on 32b

2020-08-20 Thread Chris Wilson
A couple of trivial fixes for using %zd for size_t.

Signed-off-by: Chris Wilson 
Cc: Peter Zijlstra 
Cc: Ingo Molnar 
cc: Arnaldo Carvalho de Melo 
---
 tools/perf/util/session.c | 2 +-
 tools/perf/util/zstd.c| 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index ffbc9d35a383..7a5f03764702 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -87,7 +87,7 @@ static int perf_session__process_compressed_event(struct 
perf_session *session,
session->decomp_last = decomp;
}
 
-   pr_debug("decomp (B): %ld to %ld\n", src_size, decomp_size);
+   pr_debug("decomp (B): %zd to %zd\n", src_size, decomp_size);
 
return 0;
 }
diff --git a/tools/perf/util/zstd.c b/tools/perf/util/zstd.c
index d2202392ffdb..48dd2b018c47 100644
--- a/tools/perf/util/zstd.c
+++ b/tools/perf/util/zstd.c
@@ -99,7 +99,7 @@ size_t zstd_decompress_stream(struct zstd_data *data, void 
*src, size_t src_size
while (input.pos < input.size) {
ret = ZSTD_decompressStream(data->dstream, , );
if (ZSTD_isError(ret)) {
-   pr_err("failed to decompress (B): %ld -> %ld, dst_size 
%ld : %s\n",
+   pr_err("failed to decompress (B): %zd -> %zd, dst_size 
%zd : %s\n",
   src_size, output.size, dst_size, 
ZSTD_getErrorName(ret));
break;
}
-- 
2.20.1



Re: [PATCH] kernel/relay.c: fix memleak on destroy relay channel

2020-08-17 Thread Chris Wilson
Quoting Wei Yongjun (2020-08-17 13:28:26)
> kmemleak report memory leak as follows:
> 
> unreferenced object 0x607ee4e5f948 (size 8):
> comm "syz-executor.1", pid 2098, jiffies 4295031601 (age 288.468s)
> hex dump (first 8 bytes):
> 00 00 00 00 00 00 00 00 
> backtrace:
> [<ca1de2fa>] relay_open kernel/relay.c:583 [inline]
> [<ca1de2fa>] relay_open+0xb6/0x970 kernel/relay.c:563
> [<38ae5a4b>] do_blk_trace_setup+0x4a8/0xb20 
> kernel/trace/blktrace.c:557
> [<d5e778e9>] __blk_trace_setup+0xb6/0x150 kernel/trace/blktrace.c:597
> [<38fdf803>] blk_trace_ioctl+0x146/0x280 kernel/trace/blktrace.c:738
> [<ce25a0ca>] blkdev_ioctl+0xb2/0x6a0 block/ioctl.c:613
> [<579e47e0>] block_ioctl+0xe5/0x120 fs/block_dev.c:1871
> [<b1588c11>] vfs_ioctl fs/ioctl.c:48 [inline]
> [<b1588c11>] __do_sys_ioctl fs/ioctl.c:753 [inline]
> [<b1588c11>] __se_sys_ioctl fs/ioctl.c:739 [inline]
> [<b1588c11>] __x64_sys_ioctl+0x170/0x1ce fs/ioctl.c:739
> [<88fc9942>] do_syscall_64+0x33/0x40 arch/x86/entry/common.c:46
> [<4f6dd57a>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 
> 'chan->buf' is malloced in relay_open() by alloc_percpu() but not free
> while destroy the relay channel. Fix it by adding free_percpu() before
> return from relay_destroy_channel().
> 
> Fixes: 017c59c042d0 ("relay: Use per CPU constructs for the relay channel 
> buffer pointers")
> Reported-by: Hulk Robot 
> Signed-off-by: Wei Yongjun 
> ---
>  kernel/relay.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/kernel/relay.c b/kernel/relay.c
> index 72fe443ea78f..fb4e0c530c08 100644
> --- a/kernel/relay.c
> +++ b/kernel/relay.c
> @@ -197,6 +197,7 @@ static struct rchan_buf *relay_create_buf(struct rchan 
> *chan)
>  static void relay_destroy_channel(struct kref *kref)
>  {
> struct rchan *chan = container_of(kref, struct rchan, kref);
> +   free_percpu(chan->buf);
> kfree(chan);

That catches the error path for relay_open as well, and the fixes is
correct. Worth a cc:stable, #v4.9+?

Reviewed-by: Chris Wilson 
-Chris


Re: [Intel-gfx] [PATCH] drm/i915: Fix wrong return value

2020-08-02 Thread Chris Wilson
Quoting Andi Shyti (2020-08-02 12:40:44)
> Hi Tianjia,
> 
> > diff --git a/drivers/gpu/drm/i915/i915_active.c 
> > b/drivers/gpu/drm/i915/i915_active.c
> > index d960d0be5bd2..cc017e3cc9c5 100644
> > --- a/drivers/gpu/drm/i915/i915_active.c
> > +++ b/drivers/gpu/drm/i915/i915_active.c
> > @@ -758,7 +758,7 @@ int i915_active_acquire_preallocate_barrier(struct 
> > i915_active *ref,
> >   intel_engine_mask_t tmp, mask = engine->mask;
> >   struct llist_node *first = NULL, *last = NULL;
> >   struct intel_gt *gt = engine->gt;
> > - int err;
> > + int err = 0;
> 
> you don't need the initialization here.

But it's close enough that I can munge the patch inline.
Reviewed-by: Chris Wilson 
-Chris


Re: Poor windows VFIO performance, GPU stalls (bisected)

2020-07-26 Thread Chris Wilson
Quoting Alex Williamson (2020-07-26 14:30:52)
> On Sun, 26 Jul 2020 17:49:07 +1000
> Geoffrey McRae  wrote:
> 
> > Hi All,
> > 
> > The commit 22540ca3d00d2990a4148a13b92209c3dc5422db causes a Windows KVM 
> > guest running under QEMU with a VFIO passthrough GPU to randomly stall 
> > when using the GPU leading to the guest assuming that the driver has 
> > hung. Reverting this commit resolves the problem.
> 
> Please double check this commit ID, I can't find it in mainline or
> linux-next.  Thanks,

See commit aa202f1f5696 ("workqueue: don't use wq_select_unbound_cpu()
for bound works"). 22540ca3 is the cherry-pick into v5.4.26
-Chris


[tip: locking/urgent] locking/lockdep: Fix overflow in presentation of average lock-time

2020-07-25 Thread tip-bot2 for Chris Wilson
The following commit has been merged into the locking/urgent branch of tip:

Commit-ID: a7ef9b28aa8d72a1656fa6f0a01bbd1493886317
Gitweb:
https://git.kernel.org/tip/a7ef9b28aa8d72a1656fa6f0a01bbd1493886317
Author:Chris Wilson 
AuthorDate:Sat, 25 Jul 2020 19:51:10 +01:00
Committer: Ingo Molnar 
CommitterDate: Sat, 25 Jul 2020 21:47:42 +02:00

locking/lockdep: Fix overflow in presentation of average lock-time

Though the number of lock-acquisitions is tracked as unsigned long, this
is passed as the divisor to div_s64() which interprets it as a s32,
giving nonsense values with more than 2 billion acquisitons. E.g.

  acquisitions   holdtime-min   holdtime-max holdtime-total   holdtime-avg
  -
2350439395   0.07 353.38   649647067.36  0.-32

Signed-off-by: Chris Wilson 
Signed-off-by: Ingo Molnar 
Cc: Peter Zijlstra 
Link: https://lore.kernel.org/r/20200725185110.11588-1-ch...@chris-wilson.co.uk
---
 kernel/locking/lockdep_proc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/locking/lockdep_proc.c b/kernel/locking/lockdep_proc.c
index 5525cd3..02ef87f 100644
--- a/kernel/locking/lockdep_proc.c
+++ b/kernel/locking/lockdep_proc.c
@@ -423,7 +423,7 @@ static void seq_lock_time(struct seq_file *m, struct 
lock_time *lt)
seq_time(m, lt->min);
seq_time(m, lt->max);
seq_time(m, lt->total);
-   seq_time(m, lt->nr ? div_s64(lt->total, lt->nr) : 0);
+   seq_time(m, lt->nr ? div64_u64(lt->total, lt->nr) : 0);
 }
 
 static void seq_stats(struct seq_file *m, struct lock_stat_data *data)


[PATCH] locking/lockdep: Fix overflow in presentation of average lock-time

2020-07-25 Thread Chris Wilson
Though the number of lock-acquisitions is tracked as unsigned long, this
is passed as the divisor to div_s64() which interprets it as a s32,
giving nonsense values with more than 2 billion acquisitons. E.g.

  acquisitions   holdtime-min   holdtime-max holdtime-total   holdtime-avg
  -
2350439395   0.07 353.38   649647067.36  0.-32

Signed-off-by: Chris Wilson 
Cc: Peter Zijlstra 
Cc: Ingo Molnar 
Cc: Will Deacon 
---
 kernel/locking/lockdep_proc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/locking/lockdep_proc.c b/kernel/locking/lockdep_proc.c
index 5525cd3ba0c8..02ef87f50df2 100644
--- a/kernel/locking/lockdep_proc.c
+++ b/kernel/locking/lockdep_proc.c
@@ -423,7 +423,7 @@ static void seq_lock_time(struct seq_file *m, struct 
lock_time *lt)
seq_time(m, lt->min);
seq_time(m, lt->max);
seq_time(m, lt->total);
-   seq_time(m, lt->nr ? div_s64(lt->total, lt->nr) : 0);
+   seq_time(m, lt->nr ? div64_u64(lt->total, lt->nr) : 0);
 }
 
 static void seq_stats(struct seq_file *m, struct lock_stat_data *data)
-- 
2.20.1



[tip: sched/urgent] sched: Warn if garbage is passed to default_wake_function()

2020-07-24 Thread tip-bot2 for Chris Wilson
The following commit has been merged into the sched/urgent branch of tip:

Commit-ID: 062d3f95b630113e1156a31f376ad36e25da29a7
Gitweb:
https://git.kernel.org/tip/062d3f95b630113e1156a31f376ad36e25da29a7
Author:Chris Wilson 
AuthorDate:Thu, 23 Jul 2020 21:10:42 +01:00
Committer: Ingo Molnar 
CommitterDate: Fri, 24 Jul 2020 14:40:25 +02:00

sched: Warn if garbage is passed to default_wake_function()

Since the default_wake_function() passes its flags onto
try_to_wake_up(), warn if those flags collide with internal values.

Given that the supplied flags are garbage, no repair can be done but at
least alert the user to the damage they are causing.

In the belief that these errors should be picked up during testing, the
warning is only compiled in under CONFIG_SCHED_DEBUG.

Signed-off-by: Chris Wilson 
Signed-off-by: Ingo Molnar 
Acked-by: Peter Zijlstra 
Link: https://lore.kernel.org/r/20200723201042.18861-1-ch...@chris-wilson.co.uk
---
 kernel/sched/core.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 5dece9b..2142c67 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4485,6 +4485,7 @@ asmlinkage __visible void __sched 
preempt_schedule_irq(void)
 int default_wake_function(wait_queue_entry_t *curr, unsigned mode, int 
wake_flags,
  void *key)
 {
+   WARN_ON_ONCE(IS_ENABLED(CONFIG_SCHED_DEBUG) && wake_flags & ~WF_SYNC);
return try_to_wake_up(curr->private, mode, wake_flags);
 }
 EXPORT_SYMBOL(default_wake_function);


[tip: sched/core] sched: Warn if garbage is passed to default_wake_function()

2020-07-24 Thread tip-bot2 for Chris Wilson
The following commit has been merged into the sched/core branch of tip:

Commit-ID: da3520d750e36051ecd5847ef712659b9c68ce20
Gitweb:
https://git.kernel.org/tip/da3520d750e36051ecd5847ef712659b9c68ce20
Author:Chris Wilson 
AuthorDate:Thu, 23 Jul 2020 21:10:42 +01:00
Committer: Ingo Molnar 
CommitterDate: Fri, 24 Jul 2020 13:47:12 +02:00

sched: Warn if garbage is passed to default_wake_function()

Since the default_wake_function() passes its flags onto
try_to_wake_up(), warn if those flags collide with internal values.

Given that the supplied flags are garbage, no repair can be done but at
least alert the user to the damage they are causing.

In the belief that these errors should be picked up during testing, the
warning is only compiled in under CONFIG_SCHED_DEBUG.

Signed-off-by: Chris Wilson 
Signed-off-by: Ingo Molnar 
Acked-by: Peter Zijlstra 
Link: https://lore.kernel.org/r/20200723201042.18861-1-ch...@chris-wilson.co.uk
---
 kernel/sched/core.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index bd8e521..637365e 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4686,6 +4686,7 @@ asmlinkage __visible void __sched 
preempt_schedule_irq(void)
 int default_wake_function(wait_queue_entry_t *curr, unsigned mode, int 
wake_flags,
  void *key)
 {
+   WARN_ON_ONCE(IS_ENABLED(CONFIG_SCHED_DEBUG) && wake_flags & ~WF_SYNC);
return try_to_wake_up(curr->private, mode, wake_flags);
 }
 EXPORT_SYMBOL(default_wake_function);


[PATCH] sched: Warn if garbage is passed to default_wake_function()

2020-07-23 Thread Chris Wilson
Since the default_wake_function() passes its flags onto
try_to_wake_up(), warn if those flags collide with internal values.
Given that the supplied flags are garbage, no repair can be done but at
least alert the user to the damage they are causing.

In the belief that these errors should be picked up during testing, the
warning is only compiled in under CONFIG_SCHED_DEBUG.

Signed-off-by: Chris Wilson 
Cc: Ingo Molnar 
Cc: Peter Zijlstra 
Cc: Juri Lelli 
Cc: Vincent Guittot 
---
 kernel/sched/core.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 7b918059332b..4be209266d8a 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4481,6 +4481,7 @@ asmlinkage __visible void __sched 
preempt_schedule_irq(void)
 int default_wake_function(wait_queue_entry_t *curr, unsigned mode, int 
wake_flags,
  void *key)
 {
+   WARN_ON_ONCE(IS_ENABLED(CONFIG_SCHED_DEBUG) && wake_flags & ~WF_SYNC);
return try_to_wake_up(curr->private, mode, wake_flags);
 }
 EXPORT_SYMBOL(default_wake_function);
-- 
2.20.1



Re: [PATCH -v2 1/5] sched: Fix ttwu() race

2020-07-23 Thread Chris Wilson
Quoting Peter Zijlstra (2020-07-23 19:28:41)
> On Wed, Jul 22, 2020 at 10:57:56AM +0100, Chris Wilson wrote:
> 
> > Perhaps more damning is that I can replace WF_ON_CPU with p->on_cpu to
> > suppress the warning:
> 
> *argh*, I'm starting to go mad...
> 
> Chris, could you please try the below patch?

 ttwu-IPI-self: 1==1, p->on_cpu=0;0, task_cpu(p)=1;1
 ttwu-IPI-self: 1==1, p->on_cpu=0;0, task_cpu(p)=1;1
 ttwu-IPI-self: 0==0, p->on_cpu=0;0, task_cpu(p)=0;0
 ttwu-IPI-self: 3==3, p->on_cpu=0;0, task_cpu(p)=3;3
 ttwu-IPI-self: 2==2, p->on_cpu=0;0, task_cpu(p)=2;2
 ttwu-IPI-self: 1==1, p->on_cpu=0;0, task_cpu(p)=1;1
 ttwu-IPI-self: 2==2, p->on_cpu=0;0, task_cpu(p)=2;2
 ttwu-IPI-self: 2==2, p->on_cpu=0;0, task_cpu(p)=2;2
 ttwu-IPI-self: 2==2, p->on_cpu=0;0, task_cpu(p)=2;2

> Can you also confirm that if you do:
> 
> $ echo NO_TTWU_QUEUE_ON_CPU > /debug/sched_features

With,

 sched_feat_disable(10):TTWU_QUEUE_ON_CPU

the pr_alert is still being hit

 ttwu-IPI-self: 3==3, p->on_cpu=0;0, task_cpu(p)=3;3

At which point, it darns on me. Mea culpa, stray bits being passed into
default_wake_function.

I am very sorry for the wild goose chase.
-Chris


Re: [PATCH -v2 1/5] sched: Fix ttwu() race

2020-07-22 Thread Chris Wilson
Quoting pet...@infradead.org (2020-07-21 12:37:19)
> On Tue, Jul 21, 2020 at 11:49:05AM +0100, Chris Wilson wrote:
> > Quoting Peter Zijlstra (2020-06-22 11:01:23)
> > > @@ -2378,6 +2385,9 @@ static inline bool ttwu_queue_cond(int c
> > >  static bool ttwu_queue_wakelist(struct task_struct *p, int cpu, int 
> > > wake_flags)
> > >  {
> > > if (sched_feat(TTWU_QUEUE) && ttwu_queue_cond(cpu, wake_flags)) {
> > > +   if (WARN_ON_ONCE(cpu == smp_processor_id()))
> > > +   return false;
> > > +
> > > sched_clock_cpu(cpu); /* Sync clocks across CPUs */
> > > __ttwu_queue_wakelist(p, cpu, wake_flags);
> > > return true;
> > 
> > We've been hitting this warning frequently, but have never seen the
> > rcu-torture-esque oops ourselves.
> 
> How easy is it to hit this? What, if anything, can I do to make my own
> computer go bang?

I tried reproducing it in a mockup, hrtimer + irq_work + waitqueue, but
it remains elusive. It pops up in an obscure HW tests where we are
exercising timeout handling for rogue HW.
> 
> > <4> [181.766705] RIP: 0010:ttwu_queue_wakelist+0xbc/0xd0
> > <4> [181.766710] Code: 00 00 00 5b 5d 41 5c 41 5d c3 31 c0 5b 5d 41 5c 41 
> > 5d c3 31 c0 f6 c3 08 74 f2 48 c7 c2 00 ad 03 00 83 7c 11 40 01 77 e4 eb 80 
> > <0f> 0b 31 c0 eb dc 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 bf 17
> > <4> [181.766726] RSP: 0018:c9003e08 EFLAGS: 00010046
> > <4> [181.766733] RAX:  RBX:  RCX: 
> > 888276a0
> > <4> [181.766740] RDX: 0003ad00 RSI: 8232045b RDI: 
> > 8233103e
> > <4> [181.766747] RBP:  R08:  R09: 
> > 0001
> > <4> [181.766754] R10: d3fa25c3 R11: 53712267 R12: 
> > 88825b912940
> > <4> [181.766761] R13:  R14: 0087 R15: 
> > 0003ad00
> > <4> [181.766769] FS:  () GS:888276a0() 
> > knlGS:
> > <4> [181.766777] CS:  0010 DS:  ES:  CR0: 80050033
> > <4> [181.766783] CR2: 55b8245814e0 CR3: 05610003 CR4: 
> > 003606f0
> > <4> [181.766790] Call Trace:
> > <4> [181.766794]  
> > <4> [181.766798]  try_to_wake_up+0x21b/0x690
> > <4> [181.766805]  autoremove_wake_function+0xc/0x50
> > <4> [181.766858]  __i915_sw_fence_complete+0x1ee/0x250 [i915]
> > <4> [181.766912]  dma_i915_sw_fence_wake+0x2d/0x40 [i915]
> 
> Please, don't trim oopses..
> 
> > We are seeing this on the ttwu_queue() path, so with p->on_cpu=0, and the
> > warning is cleared up by
> > 
> > -   if (WARN_ON_ONCE(cpu == smp_processor_id()))
> > +   if (WARN_ON_ONCE(p->on_cpu && cpu == smp_processor_id()))
> > 
> > which would appear to restore the old behaviour for ttwu_queue() and
> > seem to be consistent with the intent of this patch. Hopefully this
> > helps identify the problem correctly.
> 
> Hurmph, that's actively wrong. We should never queue to self, as that
> would result in self-IPI, which is not possible on a bunch of archs. It
> works for you because x86 can in fact do that.
> 
> So ttwu_queue_cond() will only return true when:
> 
>  - target-cpu and current-cpu do not share cache;
>so it cannot be this condition, because you _always_
>share cache with yourself.
> 
>  - when WF_ON_CPU and target-cpu has nr_running <= 1;
>which means p->on_cpu == true.
> 
> So now you have cpu == smp_processor_id() && p->on_cpu == 1, however
> your modified WARN contradicts that.
> 
> *puzzle*

Perhaps more damning is that I can replace WF_ON_CPU with p->on_cpu to
suppress the warning:

-static inline bool ttwu_queue_cond(int cpu, int wake_flags)
+static inline bool ttwu_queue_cond(struct task_struct *p, int cpu, int 
wake_flags)
 {
/*
 * If the CPU does not share cache, then queue the task on the
@@ -2370,7 +2370,7 @@ static inline bool ttwu_queue_cond(int cpu, int 
wake_flags)
 * the soon-to-be-idle CPU as the current CPU is likely busy.
 * nr_running is checked to avoid unnecessary task stacking.
 */
-   if ((wake_flags & WF_ON_CPU) && cpu_rq(cpu)->nr_running <= 1)
+   if (p->on_cpu && cpu_rq(cpu)->nr_running <= 1)
return true;

return false;
@@ -2378,7 +2378,7 @@ static inline bool ttwu_queue_cond(int cpu, int 
wake_flags)

 static bool ttwu_queue_wakelist(struct task_struct *p, int cpu, int wake_flags)
 {
-   if (sched_feat(TTWU_QUEUE) && ttwu_queue_cond(cpu, wake_flags)) {
+   if (sched_feat(TTWU_QUEUE) && ttwu_queue_cond(p, cpu, wake_flags)) {
if (WARN_ON_ONCE(cpu == smp_processor_id()))
return false;


Re: [PATCH -v2 1/5] sched: Fix ttwu() race

2020-07-21 Thread Chris Wilson
Quoting Peter Zijlstra (2020-06-22 11:01:23)
> @@ -2378,6 +2385,9 @@ static inline bool ttwu_queue_cond(int c
>  static bool ttwu_queue_wakelist(struct task_struct *p, int cpu, int 
> wake_flags)
>  {
> if (sched_feat(TTWU_QUEUE) && ttwu_queue_cond(cpu, wake_flags)) {
> +   if (WARN_ON_ONCE(cpu == smp_processor_id()))
> +   return false;
> +
> sched_clock_cpu(cpu); /* Sync clocks across CPUs */
> __ttwu_queue_wakelist(p, cpu, wake_flags);
> return true;

We've been hitting this warning frequently, but have never seen the
rcu-torture-esque oops ourselves.

<4> [181.766705] RIP: 0010:ttwu_queue_wakelist+0xbc/0xd0
<4> [181.766710] Code: 00 00 00 5b 5d 41 5c 41 5d c3 31 c0 5b 5d 41 5c 41 5d c3 
31 c0 f6 c3 08 74 f2 48 c7 c2 00 ad 03 00 83 7c 11 40 01 77 e4 eb 80 <0f> 0b 31 
c0 eb dc 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 bf 17
<4> [181.766726] RSP: 0018:c9003e08 EFLAGS: 00010046
<4> [181.766733] RAX:  RBX:  RCX: 
888276a0
<4> [181.766740] RDX: 0003ad00 RSI: 8232045b RDI: 
8233103e
<4> [181.766747] RBP:  R08:  R09: 
0001
<4> [181.766754] R10: d3fa25c3 R11: 53712267 R12: 
88825b912940
<4> [181.766761] R13:  R14: 0087 R15: 
0003ad00
<4> [181.766769] FS:  () GS:888276a0() 
knlGS:
<4> [181.766777] CS:  0010 DS:  ES:  CR0: 80050033
<4> [181.766783] CR2: 55b8245814e0 CR3: 05610003 CR4: 
003606f0
<4> [181.766790] Call Trace:
<4> [181.766794]  
<4> [181.766798]  try_to_wake_up+0x21b/0x690
<4> [181.766805]  autoremove_wake_function+0xc/0x50
<4> [181.766858]  __i915_sw_fence_complete+0x1ee/0x250 [i915]
<4> [181.766912]  dma_i915_sw_fence_wake+0x2d/0x40 [i915]

We are seeing this on the ttwu_queue() path, so with p->on_cpu=0, and the
warning is cleared up by

-   if (WARN_ON_ONCE(cpu == smp_processor_id()))
+   if (WARN_ON_ONCE(p->on_cpu && cpu == smp_processor_id()))

which would appear to restore the old behaviour for ttwu_queue() and
seem to be consistent with the intent of this patch. Hopefully this
helps identify the problem correctly.
-Chris


Re: [PATCH] drm/i915: Don't force IOSF_MBI

2020-07-17 Thread Chris Wilson
Quoting Jisheng Zhang (2020-07-17 07:11:38)
> The i915 doesn't depend on IOSF_MBI, asm/iosf_mbi.h already defines
> isof_mbi_* APIs when ISOF_MBI is disabled.
> 
> Don't force IOSF_MBI to allow disabling IOSF_MBI for non SoC platforms.

But it is required for Valleyview/Cherryview and we want to support
those by default. Tricky.
-Chris


Re: [Intel-gfx] [PATCH -next] drm/i915: Remove unused inline function drain_delayed_work()

2020-07-15 Thread Chris Wilson
Quoting YueHaibing (2020-07-15 04:21:04)
> It is not used since commit 058179e72e09 ("drm/i915/gt: Replace
> hangcheck by heartbeats")
> 
> Signed-off-by: YueHaibing 

Indeed, it is no more.
Reviewed-by: Chris Wilson 
-Chris


Re: [PATCH][next] drm/i915/selftest: fix an error return path where err is not being set

2020-07-13 Thread Chris Wilson
Quoting Colin King (2020-07-13 15:25:51)
> From: Colin Ian King 
> 
> There is an error condition where err is not being set and an uninitialized
> garbage value in err is being returned.  Fix this by assigning err to an
> appropriate error return value before taking the error exit path.
> 
> Addresses-Coverity: ("Uninitialized scalar value")
> Fixes: ed2690a9ca89 ("drm/i915/selftest: Check that GPR are restored across 
> noa_wait")
> Signed-off-by: Colin Ian King 

Thanks, pushed.
-Chris


Re: [PATCH] drm/i915: Fix spelling mistake in i915_reg.h

2020-07-06 Thread Chris Wilson
Quoting Flavio Suligoi (2020-07-03 13:50:46)
> Fix typo: "TRIGER" --> "TRIGGER"
> 
> The two misplelled macros:
> 
> 1) OAREPORTTRIG1_EDGE_LEVEL_TRIGER_SELECT_MASK
> 2) OAREPORTTRIG5_EDGE_LEVEL_TRIGER_SELECT_MASK
> 
> are not used in any other sources of the kernel,
> so this change can be consider only a local change
> for the i915_reg.h file.
> 
> Signed-off-by: Flavio Suligoi 
> ---
>  drivers/gpu/drm/i915/i915_reg.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
> index 9d6536afc94b..c2153364724a 100644
> --- a/drivers/gpu/drm/i915/i915_reg.h
> +++ b/drivers/gpu/drm/i915/i915_reg.h
> @@ -868,7 +868,7 @@ static inline bool i915_mmio_reg_valid(i915_reg_t reg)
>  
>  #define OAREPORTTRIG1 _MMIO(0x2740)
>  #define OAREPORTTRIG1_THRESHOLD_MASK 0x
> -#define OAREPORTTRIG1_EDGE_LEVEL_TRIGER_SELECT_MASK 0x /* 0=level */
> +#define OAREPORTTRIG1_EDGE_LEVEL_TRIGGER_SELECT_MASK 0x /* 0=level */
>  
>  #define OAREPORTTRIG2 _MMIO(0x2744)
>  #define OAREPORTTRIG2_INVERT_A_0  (1 << 0)
> @@ -921,7 +921,7 @@ static inline bool i915_mmio_reg_valid(i915_reg_t reg)
>  
>  #define OAREPORTTRIG5 _MMIO(0x2750)
>  #define OAREPORTTRIG5_THRESHOLD_MASK 0x
> -#define OAREPORTTRIG5_EDGE_LEVEL_TRIGER_SELECT_MASK 0x /* 0=level */
> +#define OAREPORTTRIG5_EDGE_LEVEL_TRIGGER_SELECT_MASK 0x /* 0=level */

Ok, these names are not copied straight from the docs, so renaming them
will not hinder us when finding the relevant fields.

Reviewed-by: Chris Wilson 
-Chris


[PATCH] mm: Skip opportunistic reclaim for dma pinned pages

2020-06-24 Thread Chris Wilson
A general rule of thumb is that shrinkers should be fast and effective.
They are called from direct reclaim at the most incovenient of times when
the caller is waiting for a page. If we attempt to reclaim a page being
pinned for active dma [pin_user_pages()], we will incur far greater
latency than a normal anonymous page mapped multiple times. Worse the
page may be in use indefinitely by the HW and unable to be reclaimed
in a timely manner.

A side effect of the LRU shrinker not being dma aware is that we will
often attempt to perform direct reclaim on the persistent group of dma
pages while continuing to use the dma HW (an issue as the HW may already
be actively waiting for the next user request), and even attempt to
reclaim a partially allocated dma object in order to satisfy pinning
the next user page for that object.

It is to be expected that such pages are made available for reclaim at
the end of the dma operation [unpin_user_pages()], and for truly
longterm pins to be proactively recovered via device specific shrinkers
[i.e. stop the HW, allow the pages to be returned to the system, and
then compete again for the memory].

Signed-off-by: Chris Wilson 
Cc: Andrew Morton 
Cc: Jan Kara 
Cc: Jérôme Glisse 
Cc: John Hubbard 
Cc: Claudio Imbrenda 
Cc: Jan Kara 
Cc: Kirill A. Shutemov 
Cc: Jason Gunthorpe 
---
This seems perhaps a little devious and overzealous. Is there a more
appropriate TTU flag? Would there be a way to limit its effect to say
FOLL_LONGTERM? Doing the migration first would seem to be sensible if
we disable opportunistic migration for the duration of the pin.
---
 mm/rmap.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/mm/rmap.c b/mm/rmap.c
index 5fe2dedce1fc..374c6e65551b 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1393,6 +1393,22 @@ static bool try_to_unmap_one(struct page *page, struct 
vm_area_struct *vma,
is_zone_device_page(page) && !is_device_private_page(page))
return true;
 
+   /*
+* Try and fail early to revoke a costly DMA pinned page.
+*
+* Reclaiming an active DMA page requires stopping the hardware
+* and flushing access. [Hardware that does support pagefaulting,
+* and so can quickly revoke DMA pages at any time, does not need
+* to pin the DMA page.] At worst, the page may be indefinitely in
+* use by the hardware. Even at best it will take far longer to
+* revoke the access via the mmu notifier, forcing that latency
+* onto our callers rather than the consumer of the HW. As we are
+* called during opportunistic direct reclaim, declare the
+* opportunity cost too high and ignore the page.
+*/
+   if (page_maybe_dma_pinned(page))
+   return true;
+
if (flags & TTU_SPLIT_HUGE_PMD) {
split_huge_pmd_address(vma, address,
flags & TTU_SPLIT_FREEZE, page);
-- 
2.20.1



Re: [PATCH 1/2] mm/mmu_notifier: Mark up direct reclaim paths with MAYFAIL

2020-06-24 Thread Chris Wilson
Quoting Jason Gunthorpe (2020-06-24 17:50:57)
> On Wed, Jun 24, 2020 at 03:37:32PM +0100, Chris Wilson wrote:
> > Quoting Jason Gunthorpe (2020-06-24 15:25:44)
> > > On Wed, Jun 24, 2020 at 03:21:49PM +0100, Chris Wilson wrote:
> > > > Quoting Jason Gunthorpe (2020-06-24 15:16:04)
> > > > > On Wed, Jun 24, 2020 at 03:12:42PM +0100, Chris Wilson wrote:
> > > > > > Quoting Jason Gunthorpe (2020-06-24 13:39:10)
> > > > > > > On Wed, Jun 24, 2020 at 01:21:03PM +0100, Chris Wilson wrote:
> > > > > > > > Quoting Jason Gunthorpe (2020-06-24 13:10:53)
> > > > > > > > > On Wed, Jun 24, 2020 at 09:02:47AM +0100, Chris Wilson wrote:
> > > > > > > > > > When direct reclaim enters the shrinker and tries to 
> > > > > > > > > > reclaim pages, it
> > > > > > > > > > has to opportunitically unmap them [try_to_unmap_one]. For 
> > > > > > > > > > direct
> > > > > > > > > > reclaim, the calling context is unknown and may include 
> > > > > > > > > > attempts to
> > > > > > > > > > unmap one page of a dma object while attempting to allocate 
> > > > > > > > > > more pages
> > > > > > > > > > for that object. Pass the information along that we are 
> > > > > > > > > > inside an
> > > > > > > > > > opportunistic unmap that can allow that page to remain 
> > > > > > > > > > referenced and
> > > > > > > > > > mapped, and let the callback opt in to avoiding a recursive 
> > > > > > > > > > wait.
> > > > > > > > > 
> > > > > > > > > i915 should already not be holding locks shared with the 
> > > > > > > > > notifiers
> > > > > > > > > across allocations that can trigger reclaim. This is already 
> > > > > > > > > required
> > > > > > > > > to use notifiers correctly anyhow - why do we need something 
> > > > > > > > > in the
> > > > > > > > > notifiers?
> > > > > > > > 
> > > > > > > > for (n = 0; n < num_pages; n++)
> > > > > > > >   pin_user_page()
> > > > > > > > 
> > > > > > > > may call try_to_unmap_page from the lru shrinker for [0, n-1].
> > > > > > > 
> > > > > > > Yes, of course you can't hold any locks that intersect with 
> > > > > > > notifiers
> > > > > > > across pin_user_page()/get_user_page()
> > > > > > 
> > > > > > What lock though? It's just the page refcount, shrinker asks us to 
> > > > > > drop
> > > > > > it [via mmu], we reply we would like to keep using that page as 
> > > > > > freeing
> > > > > > it for the current allocation is "robbing Peter to pay Paul".
> > > > > 
> > > > > Maybe I'm unclear what this series is actually trying to fix? 
> > > > > 
> > > > > You said "avoiding a recursive wait" which sounds like some locking
> > > > > deadlock to me.
> > > > 
> > > > It's the shrinker being called while we are allocating for/on behalf of
> > > > the object. As we are actively using the object, we don't want to free
> > > > it -- the partial object allocation being the clearest, if the object
> > > > consists of 2 pages, trying to free page 0 in order to allocate page 1
> > > > has to fail (and the shrinker should find another candidate to reclaim,
> > > > or fail the allocation).
> > > 
> > > mmu notifiers are not for influencing policy of the mm.
> > 
> > It's policy is "this may fail" regardless of the mmu notifier at this
> > point. That is not changed.
> 
> MMU notifiers are for tracking updates, they are not allowed to fail.
> The one slightly weird case of non-blocking is the only exception.
> 
> > Your suggestion is that we move the pages to the unevictable mapping so
> > that the shrinker LRU is never invoked on pages we have grabbed with
> > pin_user_page. Does that work with the rest of the mmu notifiers?
> 
> That is beyond what I'm familiar with - but generally - if you want to
> influence decisions the MM is making then it needs to be at the
> front of the process and not inside notifiers. 
> 
> So what you describe seems broadly appropriate to me.

Sadly, it's a mlock_vma_page problem all over again.
 
> I'm still a little unclear on what you are trying to fix - pinned
> pages are definitely not freed, do you have some case where pages
> which are pinned are being cleaned out from the MM despite being
> pinned? Sounds a bit strange, maybe that is worth adressing directly?

It suffices to say that pin_user_pages does not prevent try_to_unmap_one
from trying to revoke the page. But we could perhaps slip a
page_maybe_dma_pinned() in around there and see what happens.
-Chris


Re: [PATCH 1/2] mm/mmu_notifier: Mark up direct reclaim paths with MAYFAIL

2020-06-24 Thread Chris Wilson
Quoting Jason Gunthorpe (2020-06-24 15:25:44)
> On Wed, Jun 24, 2020 at 03:21:49PM +0100, Chris Wilson wrote:
> > Quoting Jason Gunthorpe (2020-06-24 15:16:04)
> > > On Wed, Jun 24, 2020 at 03:12:42PM +0100, Chris Wilson wrote:
> > > > Quoting Jason Gunthorpe (2020-06-24 13:39:10)
> > > > > On Wed, Jun 24, 2020 at 01:21:03PM +0100, Chris Wilson wrote:
> > > > > > Quoting Jason Gunthorpe (2020-06-24 13:10:53)
> > > > > > > On Wed, Jun 24, 2020 at 09:02:47AM +0100, Chris Wilson wrote:
> > > > > > > > When direct reclaim enters the shrinker and tries to reclaim 
> > > > > > > > pages, it
> > > > > > > > has to opportunitically unmap them [try_to_unmap_one]. For 
> > > > > > > > direct
> > > > > > > > reclaim, the calling context is unknown and may include 
> > > > > > > > attempts to
> > > > > > > > unmap one page of a dma object while attempting to allocate 
> > > > > > > > more pages
> > > > > > > > for that object. Pass the information along that we are inside 
> > > > > > > > an
> > > > > > > > opportunistic unmap that can allow that page to remain 
> > > > > > > > referenced and
> > > > > > > > mapped, and let the callback opt in to avoiding a recursive 
> > > > > > > > wait.
> > > > > > > 
> > > > > > > i915 should already not be holding locks shared with the notifiers
> > > > > > > across allocations that can trigger reclaim. This is already 
> > > > > > > required
> > > > > > > to use notifiers correctly anyhow - why do we need something in 
> > > > > > > the
> > > > > > > notifiers?
> > > > > > 
> > > > > > for (n = 0; n < num_pages; n++)
> > > > > >   pin_user_page()
> > > > > > 
> > > > > > may call try_to_unmap_page from the lru shrinker for [0, n-1].
> > > > > 
> > > > > Yes, of course you can't hold any locks that intersect with notifiers
> > > > > across pin_user_page()/get_user_page()
> > > > 
> > > > What lock though? It's just the page refcount, shrinker asks us to drop
> > > > it [via mmu], we reply we would like to keep using that page as freeing
> > > > it for the current allocation is "robbing Peter to pay Paul".
> > > 
> > > Maybe I'm unclear what this series is actually trying to fix? 
> > > 
> > > You said "avoiding a recursive wait" which sounds like some locking
> > > deadlock to me.
> > 
> > It's the shrinker being called while we are allocating for/on behalf of
> > the object. As we are actively using the object, we don't want to free
> > it -- the partial object allocation being the clearest, if the object
> > consists of 2 pages, trying to free page 0 in order to allocate page 1
> > has to fail (and the shrinker should find another candidate to reclaim,
> > or fail the allocation).
> 
> mmu notifiers are not for influencing policy of the mm.

It's policy is "this may fail" regardless of the mmu notifier at this
point. That is not changed.

Your suggestion is that we move the pages to the unevictable mapping so
that the shrinker LRU is never invoked on pages we have grabbed with
pin_user_page. Does that work with the rest of the mmu notifiers?
-Chris


Re: [PATCH 1/2] mm/mmu_notifier: Mark up direct reclaim paths with MAYFAIL

2020-06-24 Thread Chris Wilson
Quoting Jason Gunthorpe (2020-06-24 15:16:04)
> On Wed, Jun 24, 2020 at 03:12:42PM +0100, Chris Wilson wrote:
> > Quoting Jason Gunthorpe (2020-06-24 13:39:10)
> > > On Wed, Jun 24, 2020 at 01:21:03PM +0100, Chris Wilson wrote:
> > > > Quoting Jason Gunthorpe (2020-06-24 13:10:53)
> > > > > On Wed, Jun 24, 2020 at 09:02:47AM +0100, Chris Wilson wrote:
> > > > > > When direct reclaim enters the shrinker and tries to reclaim pages, 
> > > > > > it
> > > > > > has to opportunitically unmap them [try_to_unmap_one]. For direct
> > > > > > reclaim, the calling context is unknown and may include attempts to
> > > > > > unmap one page of a dma object while attempting to allocate more 
> > > > > > pages
> > > > > > for that object. Pass the information along that we are inside an
> > > > > > opportunistic unmap that can allow that page to remain referenced 
> > > > > > and
> > > > > > mapped, and let the callback opt in to avoiding a recursive wait.
> > > > > 
> > > > > i915 should already not be holding locks shared with the notifiers
> > > > > across allocations that can trigger reclaim. This is already required
> > > > > to use notifiers correctly anyhow - why do we need something in the
> > > > > notifiers?
> > > > 
> > > > for (n = 0; n < num_pages; n++)
> > > >   pin_user_page()
> > > > 
> > > > may call try_to_unmap_page from the lru shrinker for [0, n-1].
> > > 
> > > Yes, of course you can't hold any locks that intersect with notifiers
> > > across pin_user_page()/get_user_page()
> > 
> > What lock though? It's just the page refcount, shrinker asks us to drop
> > it [via mmu], we reply we would like to keep using that page as freeing
> > it for the current allocation is "robbing Peter to pay Paul".
> 
> Maybe I'm unclear what this series is actually trying to fix? 
> 
> You said "avoiding a recursive wait" which sounds like some locking
> deadlock to me.

It's the shrinker being called while we are allocating for/on behalf of
the object. As we are actively using the object, we don't want to free
it -- the partial object allocation being the clearest, if the object
consists of 2 pages, trying to free page 0 in order to allocate page 1
has to fail (and the shrinker should find another candidate to reclaim,
or fail the allocation).
-Chris


Re: [PATCH 1/2] mm/mmu_notifier: Mark up direct reclaim paths with MAYFAIL

2020-06-24 Thread Chris Wilson
Quoting Jason Gunthorpe (2020-06-24 13:39:10)
> On Wed, Jun 24, 2020 at 01:21:03PM +0100, Chris Wilson wrote:
> > Quoting Jason Gunthorpe (2020-06-24 13:10:53)
> > > On Wed, Jun 24, 2020 at 09:02:47AM +0100, Chris Wilson wrote:
> > > > When direct reclaim enters the shrinker and tries to reclaim pages, it
> > > > has to opportunitically unmap them [try_to_unmap_one]. For direct
> > > > reclaim, the calling context is unknown and may include attempts to
> > > > unmap one page of a dma object while attempting to allocate more pages
> > > > for that object. Pass the information along that we are inside an
> > > > opportunistic unmap that can allow that page to remain referenced and
> > > > mapped, and let the callback opt in to avoiding a recursive wait.
> > > 
> > > i915 should already not be holding locks shared with the notifiers
> > > across allocations that can trigger reclaim. This is already required
> > > to use notifiers correctly anyhow - why do we need something in the
> > > notifiers?
> > 
> > for (n = 0; n < num_pages; n++)
> >   pin_user_page()
> > 
> > may call try_to_unmap_page from the lru shrinker for [0, n-1].
> 
> Yes, of course you can't hold any locks that intersect with notifiers
> across pin_user_page()/get_user_page()

What lock though? It's just the page refcount, shrinker asks us to drop
it [via mmu], we reply we would like to keep using that page as freeing
it for the current allocation is "robbing Peter to pay Paul".
-Chris


Re: [PATCH 1/2] mm/mmu_notifier: Mark up direct reclaim paths with MAYFAIL

2020-06-24 Thread Chris Wilson
Quoting Jason Gunthorpe (2020-06-24 13:10:53)
> On Wed, Jun 24, 2020 at 09:02:47AM +0100, Chris Wilson wrote:
> > When direct reclaim enters the shrinker and tries to reclaim pages, it
> > has to opportunitically unmap them [try_to_unmap_one]. For direct
> > reclaim, the calling context is unknown and may include attempts to
> > unmap one page of a dma object while attempting to allocate more pages
> > for that object. Pass the information along that we are inside an
> > opportunistic unmap that can allow that page to remain referenced and
> > mapped, and let the callback opt in to avoiding a recursive wait.
> 
> i915 should already not be holding locks shared with the notifiers
> across allocations that can trigger reclaim. This is already required
> to use notifiers correctly anyhow - why do we need something in the
> notifiers?

for (n = 0; n < num_pages; n++)
pin_user_page()

may call try_to_unmap_page from the lru shrinker for [0, n-1].

We're in the middle of allocating the object, how are we best to untangle
that?

Or during allocation of something that is using the pages pinned by that
object, how are we best to not to shrink that object as there is a
mutual dependency?
-Chris


[PATCH 1/2] mm/mmu_notifier: Mark up direct reclaim paths with MAYFAIL

2020-06-24 Thread Chris Wilson
When direct reclaim enters the shrinker and tries to reclaim pages, it
has to opportunitically unmap them [try_to_unmap_one]. For direct
reclaim, the calling context is unknown and may include attempts to
unmap one page of a dma object while attempting to allocate more pages
for that object. Pass the information along that we are inside an
opportunistic unmap that can allow that page to remain referenced and
mapped, and let the callback opt in to avoiding a recursive wait.

Signed-off-by: Chris Wilson 
Cc: Andrew Morton 
CC: Jason Gunthorpe 
---
 include/linux/mmu_notifier.h | 15 ++-
 mm/mmu_notifier.c|  3 +++
 mm/rmap.c|  5 +++--
 3 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
index fc68f3570e19..ee1ad008951c 100644
--- a/include/linux/mmu_notifier.h
+++ b/include/linux/mmu_notifier.h
@@ -48,7 +48,8 @@ enum mmu_notifier_event {
MMU_NOTIFY_RELEASE,
 };
 
-#define MMU_NOTIFIER_RANGE_BLOCKABLE (1 << 0)
+#define MMU_NOTIFIER_RANGE_BLOCKABLE   BIT(0)
+#define MMU_NOTIFIER_RANGE_MAYFAIL BIT(1)
 
 struct mmu_notifier_ops {
/*
@@ -169,6 +170,12 @@ struct mmu_notifier_ops {
 * a non-blocking behavior then the same applies to
 * invalidate_range_end.
 *
+* If mayfail is set then the callback may return -EAGAIN while still
+* holding its page references. This flag is set inside direct
+* reclaim paths that are opportunistically trying to unmap pages
+* from unknown contexts. The callback must be prepared to handle
+* the matching invalidate_range_end even after failing the
+* invalidate_range_start.
 */
int (*invalidate_range_start)(struct mmu_notifier *subscription,
  const struct mmu_notifier_range *range);
@@ -397,6 +404,12 @@ mmu_notifier_range_blockable(const struct 
mmu_notifier_range *range)
return (range->flags & MMU_NOTIFIER_RANGE_BLOCKABLE);
 }
 
+static inline bool
+mmu_notifier_range_mayfail(const struct mmu_notifier_range *range)
+{
+   return (range->flags & MMU_NOTIFIER_RANGE_MAYFAIL);
+}
+
 static inline void mmu_notifier_release(struct mm_struct *mm)
 {
if (mm_has_notifiers(mm))
diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
index 352bb9f3ecc0..95b89cee7af4 100644
--- a/mm/mmu_notifier.c
+++ b/mm/mmu_notifier.c
@@ -493,6 +493,9 @@ static int mn_hlist_invalidate_range_start(
_ret = ops->invalidate_range_start(subscription, range);
if (!mmu_notifier_range_blockable(range))
non_block_end();
+   if (_ret == -EAGAIN &&
+   mmu_notifier_range_mayfail(range))
+   _ret = 0;
if (_ret) {
pr_info("%pS callback failed with %d in 
%sblockable context.\n",
ops->invalidate_range_start, _ret,
diff --git a/mm/rmap.c b/mm/rmap.c
index 5fe2dedce1fc..912b737a3353 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1406,8 +1406,9 @@ static bool try_to_unmap_one(struct page *page, struct 
vm_area_struct *vma,
 * Note that the page can not be free in this function as call of
 * try_to_unmap() must hold a reference on the page.
 */
-   mmu_notifier_range_init(, MMU_NOTIFY_CLEAR, 0, vma, vma->vm_mm,
-   address,
+   mmu_notifier_range_init(,
+   MMU_NOTIFY_CLEAR, MMU_NOTIFIER_RANGE_MAYFAIL,
+   vma, vma->vm_mm, address,
min(vma->vm_end, address + page_size(page)));
if (PageHuge(page)) {
/*
-- 
2.20.1



[PATCH 2/2] drm/i915/gem: Use mmu_notifier_range_mayfail() to avoid waiting inside reclaim

2020-06-24 Thread Chris Wilson
The direct reclaim path may trigger the mmu_notifier callback as part of
try_to_unmap_one. As this is purely an opportunitistic attempt to
reclaim pages, and will be called from any allocation context under
unknown conditions (that include attempting to allocate pages for the
userptr object itself and subsequently trying to reclaim parts of the
partially acquired object) we have to be careful never to wait on
anything being held by the calling context. Since that is unknown, we
have to avoid waiting from inside direct reclaim.

Signed-off-by: Chris Wilson 
---
 drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c 
b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
index 9c53eb883400..72cfb91230ea 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
@@ -103,6 +103,7 @@ userptr_mn_invalidate_range_start(struct mmu_notifier *_mn,
it = interval_tree_iter_first(>objects, range->start, end);
while (it) {
struct drm_i915_gem_object *obj;
+   unsigned int flags;
 
if (!mmu_notifier_range_blockable(range)) {
ret = -EAGAIN;
@@ -126,9 +127,12 @@ userptr_mn_invalidate_range_start(struct mmu_notifier *_mn,
}
spin_unlock(>lock);
 
-   ret = i915_gem_object_unbind(obj,
-I915_GEM_OBJECT_UNBIND_ACTIVE |
-I915_GEM_OBJECT_UNBIND_BARRIER);
+   flags = (I915_GEM_OBJECT_UNBIND_ACTIVE |
+I915_GEM_OBJECT_UNBIND_BARRIER);
+   if (mmu_notifier_range_mayfail(range))
+   flags = 0;
+
+   ret = i915_gem_object_unbind(obj, flags);
if (ret == 0)
ret = __i915_gem_object_put_pages(obj);
i915_gem_object_put(obj);
-- 
2.20.1



Re: [Intel-gfx] v5.8-rc1 on thinkpad x220, intel graphics: interface frozen, can still switch to text console

2020-06-22 Thread Chris Wilson
Quoting Pavel Machek (2020-06-22 09:52:59)
> Hi!
> 
> Linux duo 5.8.0-rc1+ #117 SMP PREEMPT Mon Jun 15 16:13:54 CEST 2020 x86_64 
> GNU/Linux
> 
> [133747.719711] [  17456] 0 17456 4166  271655360 
> 0 sshd
> [133747.719718] [  17466]  1000 17466 4166  289655360 
> 0 sshd
> [133747.719724] [  17468]  1000 17468   433587   303033  25886720 
> 0 unison
> [133747.719730] [  18023]  1000 18023 1316   16409600 
> 0 sleep
> [133747.719737] 
> oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),task=chromium,pid=27368,uid=1000
> [133747.719795] Out of memory: Killed process 27368 (chromium) 
> total-vm:6686908kB, anon-rss:647056kB, file-rss:0kB, shmem-rss:7452kB, 
> UID:1000 pgtables:5304kB oom_score_adj:300
> [133747.799893] oom_reaper: reaped process 27368 (chromium), now 
> anon-rss:0kB, file-rss:0kB, shmem-rss:6836kB
> [136841.820558] i915 :00:02.0: [drm] Resetting chip for stopped heartbeat 
> on rcs0
> [136841.924333] i915 :00:02.0: [drm] Xorg[3016] context reset due
> to GPU hang

If that was the first occurrence it would have pointed to the error
state containing more information on the cause of the hang.
Attach /sys/class/drm/card0/error
-Chris


Re: [PATCH][next] drm/i915/selftests: Fix inconsistent IS_ERR and PTR_ERR

2020-06-16 Thread Chris Wilson
Quoting Gustavo A. R. Silva (2020-06-16 15:54:52)
> Fix inconsistent IS_ERR and PTR_ERR in live_timeslice_nopreempt().
> 
> The proper pointer to be passed as argument to PTR_ERR() is ce.
> 
> This bug was detected with the help of Coccinelle.
> 
> Fixes: b72f02d78e4f ("drm/i915/gt: Prevent timeslicing into unpreemptable 
> requests")
> Signed-off-by: Gustavo A. R. Silva 

Fair enough, I had sent out a patch that included this, but never split
it out for early adoption.

Reviewed-by: Chris Wilson 
-Chris


Re: [PATCH v2] dma-buf: Move dma_buf_release() from fops to dentry_ops

2020-06-16 Thread Chris Wilson
Quoting Sumit Semwal (2020-06-16 13:42:13)
> Hello,
> 
> If there are no objections to this, I will plan to merge it soon.

I was going to suggest running it against our CI, but that's unavailable
at the moment.

There's a particularly nasty BUG_ON() in dma_buf_release that we hit
irregularly, that this might help with.
-Chris


Re: [Intel-gfx] [PATCH] drm/i915: Fix comments mentioning typo in IS_ENABLED()

2020-06-05 Thread Chris Wilson
Quoting Kees Cook (2020-06-05 15:19:53)
> This has no code changes, but the typo is clearly getting copy/pasted,
> so better to avoid this now and fix the typo. IS_ENABLED() takes full
> names, and must have the "CONFIG_" prefix.
> 
> Reported-by: Joe Perches 
> Link: 
> https://lore.kernel.org/lkml/b08611018fdb6d88757c6008a5c02fa0e07b32fb.ca...@perches.com
> Signed-off-by: Kees Cook 
Reviewed-by: Chris Wilson 
-Chris


Re: [PATCH 3/3] drm/i915/selftests: avoid bogus maybe-uninitialized warning

2020-05-27 Thread Chris Wilson
Quoting Arnd Bergmann (2020-05-27 15:05:10)
> gcc has a problem following values through IS_ERR/PTR_ERR macros,
> leading to a false-positive warning in allmodconfig, with any
> compiler version:
> 
> In file included from drivers/gpu/drm/i915/gt/intel_lrc.c:5892:
> drivers/gpu/drm/i915/gt/selftest_lrc.c: In function 
> 'create_gpr_client.isra.0':
> drivers/gpu/drm/i915/gt/selftest_lrc.c:2902:23: error: 'rq' may be used 
> uninitialized in this function [-Werror=maybe-uninitialized]
> 
> This one is hard to avoid without impairing readability or adding a
> bugus NULL pointer.
> 
> Fixes: c92724de6db1 ("drm/i915/selftests: Try to detect rollback during 
> batchbuffer preemption")
> Signed-off-by: Arnd Bergmann 
> ---
>  drivers/gpu/drm/i915/gt/selftest_lrc.c | 21 +
>  1 file changed, 13 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/selftest_lrc.c 
> b/drivers/gpu/drm/i915/gt/selftest_lrc.c
> index 824f99c4cc7c..933c3f5adf0a 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_lrc.c
> +++ b/drivers/gpu/drm/i915/gt/selftest_lrc.c
> @@ -2908,23 +2908,25 @@ create_gpr_client(struct intel_engine_cs *engine,
>  
> vma = i915_vma_instance(global->obj, ce->vm, NULL);
> if (IS_ERR(vma)) {
> -   err = PTR_ERR(vma);
> +   rq = ERR_CAST(vma);
> goto out_ce;
> }
>  
> err = i915_vma_pin(vma, 0, 0, PIN_USER);
> -   if (err)
> +   if (err) {
> +   rq = ERR_PTR(err);
> goto out_ce;
> +   }
>  
> batch = create_gpr_user(engine, vma, offset);
> if (IS_ERR(batch)) {
> -   err = PTR_ERR(batch);
> +   rq = ERR_CAST(batch);
> goto out_vma;
> }
>  
> rq = intel_context_create_request(ce);
> if (IS_ERR(rq)) {
> -   err = PTR_ERR(rq);
> +   rq = ERR_CAST(rq);
> goto out_batch;
> }
>  
> @@ -2946,17 +2948,20 @@ create_gpr_client(struct intel_engine_cs *engine,
> i915_vma_unlock(batch);
> i915_vma_unpin(batch);
>  
> -   if (!err)
> +   if (!err) {
> i915_request_get(rq);
> -   i915_request_add(rq);
> -
> +   i915_request_add(rq);
> +   } else {
> +   i915_request_add(rq);
> +   rq = ERR_PTR(err);
> +   }
>  out_batch:
> i915_vma_put(batch);
>  out_vma:
> i915_vma_unpin(vma);
>  out_ce:
> intel_context_put(ce);
> -   return err ? ERR_PTR(err) : rq;
> +   return rq;

Hmm, I've used this style a few times, so could do with some syntactic
refinement.

drivers/gpu/drm/i915/gem/i915_gem_userptr.c:return err ? ERR_PTR(err) : 
mm->mn;
drivers/gpu/drm/i915/gt/selftest_hangcheck.c:   return err ? ERR_PTR(err) : rq;
drivers/gpu/drm/i915/gt/selftest_lrc.c: return err ? ERR_PTR(err) : rq;
drivers/gpu/drm/i915/selftests/i915_gem_gtt.c:  return err ? ERR_PTR(err) : rq;
drivers/gpu/drm/i915/selftests/i915_request.c:  return err ? ERR_PTR(err) : 
request;
drivers/gpu/drm/i915/selftests/igt_spinner.c:   return err ? ERR_PTR(err) : rq;
-Chris



Re: [PATCH 2/3] drm/i915: work around false-positive maybe-uninitialized warning

2020-05-27 Thread Chris Wilson
Quoting Arnd Bergmann (2020-05-27 15:05:09)
> gcc-9 gets confused by the code flow in check_dirty_whitelist:
> 
> drivers/gpu/drm/i915/gt/selftest_workarounds.c: In function 
> 'check_dirty_whitelist':
> drivers/gpu/drm/i915/gt/selftest_workarounds.c:492:17: error: 'rsvd' may be 
> used uninitialized in this function [-Werror=maybe-uninitialized]
> 
> I could not figure out a good way to do this in a way that gcc
> understands better, so initialize the variable to zero, as last
> resort.

Does it look neater if we initialise it as a local? No.
Reviewed-by: Chris Wilson 
-Chris


Re: [PATCH v2 0/4] mm/gup, drm/i915: refactor gup_fast, convert to pin_user_pages()

2020-05-23 Thread Chris Wilson
Quoting John Hubbard (2020-05-22 06:19:27)
> The purpose of posting this series is to launch a test in the
> intel-gfx-ci tree. (The patches have already been merged into Andrew's
> linux-mm tree.)
> 
> This applies to today's linux.git (note the base-commit tag at the
> bottom).
> 
> Changes since V1:
> 
> * Fixed a bug in the refactoring patch: added FOLL_FAST_ONLY to the
>   list of gup_flags *not* to WARN() on. This lead to a failure in the
>   first intel-gfx-ci test run [1].
> 
> [1] 
> https://lore.kernel.org/r/159008745422.32320.5724805750977048...@build.alporthouse.com

Ran this through our CI, warn and subsequent lockup were gone. That
lockup is worrying me now, but that doesn't seem to be an issue from
this series.

The i915 changes were simple enough, I would have computed the pin flags
just once (since the readonly bit is static, that would be interesting
if that was allowed to change mid gup :)
Reviewed-by: Chris Wilson 
-Chris


Re: [PATCH] mm/gup: fixup gup.c for "mm/gup: refactor and de-duplicate gup_fast() code"

2020-05-21 Thread Chris Wilson
Quoting John Hubbard (2020-05-22 00:38:41)
> Include FOLL_FAST_ONLY in the list of flags to *not* WARN()
> on, in internal_get_user_pages_fast().
> 
> Cc: Chris Wilson 
> Cc: Daniel Vetter 
> Cc: David Airlie 
> Cc: Jani Nikula 
> Cc: "Joonas Lahtinen" 
> Cc: Matthew Auld 
> Cc: Matthew Wilcox 
> Cc: Rodrigo Vivi 
> Cc: Souptick Joarder 
> Cc: Tvrtko Ursulin 
> Signed-off-by: John Hubbard 
> ---
> 
> Hi Andrew, Chris,
> 
> Andrew: This is a fixup that applies to today's (20200521) linux-next.
> In that tree, this fixes up:
> 
> commit dfb8dfe80808 ("mm/gup: refactor and de-duplicate gup_fast() code")
> 
> Chris: I'd like to request another CI run for the drm/i915 changes, so
> for that, would you prefer that I post a v2 of the series [1], or
> is it easier for you to just apply this patch here, on top of [2]?

If you post your series again with this patch included to intel-gfx, CI
will pick it up. Or I'll do that in the morning.
-Chris


Re: [PATCH 0/4] mm/gup, drm/i915: refactor gup_fast, convert to pin_user_pages()

2020-05-21 Thread Chris Wilson
Quoting John Hubbard (2020-05-19 01:21:20)
> This needs to go through Andrew's -mm tree, due to adding a new gup.c
> routine. However, I would really love to have some testing from the
> drm/i915 folks, because I haven't been able to run-time test that part
> of it.

CI hit

<4> [185.667750] WARNING: CPU: 0 PID: 1387 at mm/gup.c:2699 
internal_get_user_pages_fast+0x63a/0xac0
<4> [185.667752] Modules linked in: vgem snd_hda_codec_hdmi 
snd_hda_codec_realtek snd_hda_codec_generic i915 mei_hdcp x86_pkg_temp_thermal 
coretemp snd_hda_intel snd_intel_dspcfg crct10dif_pclmul snd_hda_codec 
crc32_pclmul snd_hwdep snd_hda_core ghash_clmulni_intel cdc_ether usbnet mii 
snd_pcm e1000e mei_me ptp pps_core mei intel_lpss_pci prime_numbers
<4> [185.667774] CPU: 0 PID: 1387 Comm: gem_userptr_bli Tainted: G U
5.7.0-rc5-CI-Patchwork_17704+ #1
<4> [185.66] Hardware name: Intel Corporation Ice Lake Client 
Platform/IceLake U DDR4 SODIMM PD RVP, BIOS ICLSFWR1.R00.3234.A01.1906141750 
06/14/2019
<4> [185.667782] RIP: 0010:internal_get_user_pages_fast+0x63a/0xac0
<4> [185.667785] Code: 24 40 08 48 39 5c 24 38 49 89 df 0f 85 74 fc ff ff 48 83 
44 24 50 08 48 39 5c 24 58 49 89 dc 0f 85 e0 fb ff ff e9 14 fe ff ff <0f> 0b b8 
ea ff ff ff e9 36 fb ff ff 4c 89 e8 48 21 e8 48 39 e8 0f
<4> [185.667789] RSP: 0018:c90001133c38 EFLAGS: 00010206
<4> [185.667792] RAX:  RBX:  RCX: 
8884999ee800
<4> [185.667795] RDX: 000c0001 RSI: 0100 RDI: 
7f419e774000
<4> [185.667798] RBP: 888453dbf040 R08:  R09: 
0001
<4> [185.667800] R10:  R11:  R12: 
888453dbf380
<4> [185.667803] R13: 8884999ee800 R14: 888453dbf3e8 R15: 
0040
<4> [185.667806] FS:  7f419e875e40() GS:88849fe0() 
knlGS:
<4> [185.667808] CS:  0010 DS:  ES:  CR0: 80050033
<4> [185.667811] CR2: 7f419e873000 CR3: 000458bd2004 CR4: 
00760ef0
<4> [185.667814] PKRU: 5554
<4> [185.667816] Call Trace:
<4> [185.667912]  ? i915_gem_userptr_get_pages+0x1c6/0x290 [i915]
<4> [185.667918]  ? mark_held_locks+0x49/0x70
<4> [185.667998]  ? i915_gem_userptr_get_pages+0x1c6/0x290 [i915]
<4> [185.668073]  ? i915_gem_userptr_get_pages+0x1c6/0x290 [i915]

and then panicked, across a range of systems.
-Chris


Re: [PATCH] drm/i915/gvt: Use ARRAY_SIZE for vgpu_types

2020-05-18 Thread Chris Wilson
Quoting Aishwarya Ramakrishnan (2020-05-18 16:03:36)
> Prefer ARRAY_SIZE instead of using sizeof
> 
> Fixes coccicheck warning: Use ARRAY_SIZE
> 
> Signed-off-by: Aishwarya Ramakrishnan 
Reviewed-by: Chris Wilson 
-Chris


Re: [PATCH] drm/mm: Fix an error handling path in '__igt_once()'

2020-05-17 Thread Chris Wilson
Quoting Christophe JAILLET (2020-05-17 09:50:49)
> The label of the last gotos must be switched for the error handling code to
> work as expected.
> 
> Fixes: 83bc4ec37210 ("drm/mm: Add a search-by-address variant to only inspect 
> a single hole")
> Signed-off-by: Christophe JAILLET 
> ---
>  drivers/gpu/drm/selftests/test-drm_mm.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/selftests/test-drm_mm.c 
> b/drivers/gpu/drm/selftests/test-drm_mm.c
> index 9aabe82dcd3a..af38c4fa4db5 100644
> --- a/drivers/gpu/drm/selftests/test-drm_mm.c
> +++ b/drivers/gpu/drm/selftests/test-drm_mm.c
> @@ -1864,14 +1864,14 @@ static int __igt_once(unsigned int mode)
> pr_err("Unexpectedly inserted the node into the wrong hole: 
> node.start=%llx\n",
>node.start);
> err = -EINVAL;
> -   goto err_node;
> +   goto err_hi;

This needs to call drm_mm_remove_node() [err_node:] as it accidentally inserted 
the
node.

> }
>  
> err = drm_mm_insert_node_generic(, , 2, 0, 0, mode);
> if (err) {
> pr_err("Could not insert the node into the available 
> hole!\n");
> err = -EINVAL;
> -   goto err_hi;
> +   goto err_node;

And this must *not* call drm_mm_remove_node.
-Chris


Re: [PATCH] drm/i915: Remove duplicate inline specifier on write_pte

2020-05-13 Thread Chris Wilson
Quoting Nathan Chancellor (2020-05-13 19:23:40)
> When building with clang:
> 
>  drivers/gpu/drm/i915/gt/gen8_ppgtt.c:392:24: warning: duplicate
>  'inline' declaration specifier [-Wduplicate-decl-specifier]
>  declaration specifier [-Wduplicate-decl-specifier]
>  static __always_inline inline void
> ^
>  include/linux/compiler_types.h:138:16: note: expanded from macro
>  'inline'
>  #define inline inline __gnu_inline __inline_maybe_unused notrace
> ^
>  1 warning generated.
> 
> __always_inline is defined as 'inline __attribute__((__always_inline))'
> so we do not need to specify it twice.
> 
> Fixes: 84eac0c65940 ("drm/i915/gt: Force pte cacheline to main memory")
> Link: https://github.com/ClangBuiltLinux/linux/issues/1024
> Reported-by: kbuild test robot 
> Signed-off-by: Nathan Chancellor 

I forgot to ping Mika about this,
Reviewed-by: Chris Wilson 
-Chris


Re: [PATCH 2/3] dma-fence: use default wait function for mock fences

2020-05-11 Thread Chris Wilson
Quoting Daniel Vetter (2020-05-11 10:11:41)
> No need to micro-optmize when we're waiting in a mocked object ...

It's setting up the expected return values for the test.
-Chris


Re: [PATCH] drm/i915: check to see if SIMD registers are available before using SIMD

2020-05-03 Thread Chris Wilson
Quoting Jason A. Donenfeld (2020-04-30 23:10:16)
> Sometimes it's not okay to use SIMD registers, the conditions for which
> have changed subtly from kernel release to kernel release. Usually the
> pattern is to check for may_use_simd() and then fallback to using
> something slower in the unlikely case SIMD registers aren't available.
> So, this patch fixes up i915's accelerated memcpy routines to fallback
> to boring memcpy if may_use_simd() is false.
> 
> Cc: sta...@vger.kernel.org

The same argument as on the previous submission is that we return to the
caller if we can't use movntqda as their fallback path should be faster
than uncached memcpy.
-Chris


Re: [PATCH] drm/i915: check to see if SIMD registers are available before using SIMD

2020-05-03 Thread Chris Wilson
Quoting Christoph Hellwig (2020-05-01 19:07:31)
> On Thu, Apr 30, 2020 at 04:10:16PM -0600, Jason A. Donenfeld wrote:
> > Sometimes it's not okay to use SIMD registers, the conditions for which
> > have changed subtly from kernel release to kernel release. Usually the
> > pattern is to check for may_use_simd() and then fallback to using
> > something slower in the unlikely case SIMD registers aren't available.
> > So, this patch fixes up i915's accelerated memcpy routines to fallback
> > to boring memcpy if may_use_simd() is false.
> 
> Err, why does i915 implements its own uncached memcpy instead of relying
> on core functionality to start with?

What is this core functionality that provides movntqda?
-Chris


Re: [PATCH] drm/i915/gt: Avoid uninitialized use of rpcurupei in frequency_show

2020-04-29 Thread Chris Wilson
Quoting Nathan Chancellor (2020-04-29 04:00:52)
> When building with clang + -Wuninitialized:
> 
> drivers/gpu/drm/i915/gt/debugfs_gt_pm.c:407:7: warning: variable
> 'rpcurupei' is uninitialized when used here [-Wuninitialized]
>rpcurupei,
>^
> drivers/gpu/drm/i915/gt/debugfs_gt_pm.c:304:16: note: initialize the
> variable 'rpcurupei' to silence this warning
> u32 rpcurupei, rpcurup, rpprevup;
>  ^
>   = 0
> 1 warning generated.
> 
> rpupei is assigned twice; based on the second argument to
> intel_uncore_read, it seems this one should have been assigned to
> rpcurupei.
> 
> Fixes: 9c878557b1eb ("drm/i915/gt: Use the RPM config register to determine 
> clk frequencies")
> Link: https://github.com/ClangBuiltLinux/linux/issues/1016
> Signed-off-by: Nathan Chancellor 

Thanks.
Reviewed-by: Chris Wilson 
-Chris


Re: [PATCH][next] drm/i915: remove redundant variable err

2019-10-09 Thread Chris Wilson
Quoting Colin King (2019-10-09 10:39:35)
> From: Colin Ian King 
> 
> An earlier commit removed any error assignments to err and we
> are now left with a zero assignment to err and a check that is
> always false. Clean this up by removing the redundant variable
> err and the error check.

Oh, we add one again shortly. Might as well wait.
-Chris


Re: [PATCH] drm/i810: Prevent underflow in ioctl

2019-10-04 Thread Chris Wilson
Quoting Chris Wilson (2019-10-04 15:08:57)
> Quoting Dan Carpenter (2019-10-04 11:22:51)
> > The "used" variables here come from the user in the ioctl and it can be
> > negative.  It could result in an out of bounds write.
> > 
> > Signed-off-by: Dan Carpenter 
> > ---
> >  drivers/gpu/drm/i810/i810_dma.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i810/i810_dma.c 
> > b/drivers/gpu/drm/i810/i810_dma.c
> > index 2a77823b8e9a..e66c38332df4 100644
> > --- a/drivers/gpu/drm/i810/i810_dma.c
> > +++ b/drivers/gpu/drm/i810/i810_dma.c
> > @@ -728,7 +728,7 @@ static void i810_dma_dispatch_vertex(struct drm_device 
> > *dev,
> > if (nbox > I810_NR_SAREA_CLIPRECTS)
> > nbox = I810_NR_SAREA_CLIPRECTS;
> >  
> > -   if (used > 4 * 1024)
> > +   if (used < 0 || used > 4 * 1024)
> > used = 0;
> 
> Yes, as passed to the GPU instruction, negative used is invalid.
> 
> Then it is used as an offset into a memblock, where a negative offset
> would be very bad.
> 
> Reviewed-by: Chris Wilson 

Applied to drm-misc-next with cc'ed stable.
-Chris


Re: Linux 5.3-rc7

2019-09-07 Thread Chris Wilson
Quoting Thomas Gleixner (2019-09-07 16:00:17)
> Does this only happen with that CPU0 hotplug stuff enabled or on CPUs other
> than CPU0 as well? That hotplug CPU0 stuff is a bandaid so I wouldn't be
> surprised if we broke that somehow.

If I ignore cpu0 in that test and so use

[  133.847187] smpboot: CPU 1 is now offline
[  134.861861] x86: Booting SMP configuration:
[  134.861875] smpboot: Booting Node 0 Processor 1 APIC 0x2
[  134.880218] smpboot: CPU 2 is now offline
[  135.893806] smpboot: Booting Node 0 Processor 2 APIC 0x1
[  135.935115] smpboot: CPU 3 is now offline
[  136.949760] smpboot: Booting Node 0 Processor 3 APIC 0x3

that has run for 10 minutes without failure, so it seems confined to
cpu0 hotplugging. All we are doing in the test to generate the hotplugs
is:

for (int cpu = 0;; cpu++) {
char name[128];
int cpufd;

snprintf(name, sizeof(name),
 "/sys/devices/system/cpu/cpu%d/online",
 cpu), sizeof(name));
cpufd = open(name, O_WRONLY);
if (cpufd < 0)
break;

write(cpufd, "0", 2);
usleep(1e6);
write(cpufd, "1", 2);

close(cpufd);
}

-Chris


  1   2   3   4   5   6   7   8   9   10   >