On 15.07.2013 19:02, Jonathan Gray wrote:
Here is a diff that is a bit of a guess. We don't have a way
to produce something similiar to the i915_error_state files
at the moment.
https://bugs.freedesktop.org/show_bug.cgi?id=55984#c143
commit 262b6d363fcff16359c93bd58c297f961f6e6273
Author: Chris Wilson <[email protected]>
Date: Tue Jan 15 16:17:54 2013 +0000
drm/i915: Invalidate the relocation presumed_offsets along the slow
path
In the slow path, we are forced to copy the relocations prior to
acquiring the struct mutex in order to handle pagefaults. We forgo
copying the new offsets back into the relocation entries in order
to
prevent a recursive locking bug should we trigger a pagefault
whilst
holding the mutex for the reservations of the execbuffer.
Therefore, we
need to reset the presumed_offsets just in case the objects are
rebound
back into their old locations after relocating for this exexbuffer
- if
that were to happen we would assume the relocations were valid and
leave
the actual pointers to the kernels dangling, instant hang.
Fixes regression from commit
bcf50e2775bbc3101932d8e4ab8c7902aa4163b4
Author: Chris Wilson <[email protected]>
Date: Sun Nov 21 22:07:12 2010 +0000
drm/i915: Handle pagefaults in execbuffer user relocations
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=55984
Signed-off-by: Chris Wilson <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: [email protected]
Signed-off-by: Daniel Vetter <[email protected]>
Index: i915_drv.c
===================================================================
RCS file: /cvs/src/sys/dev/pci/drm/i915/i915_drv.c,v
retrieving revision 1.35
diff -u -p -r1.35 i915_drv.c
--- i915_drv.c 5 Jul 2013 07:20:27 -0000 1.35
+++ i915_drv.c 15 Jul 2013 15:45:04 -0000
@@ -1726,6 +1726,12 @@ i915_gem_get_relocs_from_user(struct drm
return (EINVAL);
*relocs = drm_alloc(reloc_count * sizeof(**relocs));
for (i = 0; i < buffer_count; i++) {
+ struct drm_i915_gem_relocation_entry *user_relocs;
+ u64 invalid_offset = (u64)-1;
+ int j;
+
+ user_relocs = (void *)(uintptr_t)exec_list[i].relocs_ptr;
+
if ((ret = copyin((void *)(uintptr_t)exec_list[i].relocs_ptr,
&(*relocs)[reloc_index], exec_list[i].relocation_count *
sizeof(**relocs))) != 0) {
@@ -1733,6 +1739,26 @@ i915_gem_get_relocs_from_user(struct drm
*relocs = NULL;
return (ret);
}
+
+ /* As we do not update the known relocation offsets after
+ * relocating (due to the complexities in lock handling),
+ * we need to mark them as invalid now so that we force the
+ * relocation processing next time. Just in case the target
+ * object is evicted and then rebound into its old
+ * presumed_offset before the next execbuffer - if that
+ * happened we would make the mistake of assuming that the
+ * relocations were valid.
+ */
+ for (j = 0; j < exec_list[i].relocation_count; j++) {
+ if (DRM_COPY_TO_USER(&user_relocs[j].presumed_offset,
+ &invalid_offset,
+ sizeof(invalid_offset))) {
+ drm_free(*relocs);
+ *relocs = NULL;
+ return (EFAULT);
+ }
+ }
+
reloc_index += exec_list[i].relocation_count;
}
I just applied this diff and quickly got another hang. I would like to
mention (maybe it helps to diagnose this) that I get visual glitches in
Youtube player area and sometimes in a whole tab where youtube.com is
open. After GPU hangs chrome.core is in my home directory. Everything
else I use including mplayer -vo gl,xv,x11 works without problems. It
looks like only Youtube player has this nasty effect on the GPU. Is it
possible to trace it somehow to see what causes this?