[Bug 99312] Long-running OpenCL kernels cause ring stalls and GPU lockups on Kabini when radeon.lockup_timeout is enabled

2019-09-25 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=99312

GitLab Migration User  changed:

   What|Removed |Added

 Resolution|--- |MOVED
 Status|NEW |RESOLVED

--- Comment #3 from GitLab Migration User  ---
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been
closed from further activity.

You can subscribe and participate further through the new bug through this link
to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/1246.

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[Bug 99312] Long-running OpenCL kernels cause ring stalls and GPU lockups on Kabini when radeon.lockup_timeout is enabled

2019-05-11 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=99312

Jan Vesely  changed:

   What|Removed |Added

 Blocks||99553


Referenced Bugs:

https://bugs.freedesktop.org/show_bug.cgi?id=99553
[Bug 99553] Tracker bug for runnning OpenCL applications on Clover
-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[Bug 99312] Long-running OpenCL kernels cause ring stalls and GPU lockups on Kabini when radeon.lockup_timeout is enabled

2017-01-09 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=99312

Vedran Miletić  changed:

   What|Removed |Added

Summary|Long-running OpenCL kernels |Long-running OpenCL kernels
   |cause ring stalls and GPU   |cause ring stalls and GPU
   |lockups on Kabini   |lockups on Kabini when
   ||radeon.lockup_timeout is
   ||enabled

--- Comment #2 from Vedran Miletić  ---
(In reply to John Bridgman from comment #1)
> If you have not already done so, try disabling the watchdog timer:
> 
> 
> MODULE_PARM_DESC(lockup_timeout, "GPU lockup timeout in ms (default 1 =
> 10 seconds, 0 = disable)");
> module_param_named(lockup_timeout, radeon_lockup_timeout, int, 0444);
> 

Yup, that works around the problem.

> As part of HSA/ROC development we dropped the priority of compute work
> relative to graphics which improved interactivity and *almost* eliminated
> timeouts without having to disable the timer  - when I get back in the
> office I'll dig up the changes. In the meantime, I think disabling the timer
> will do what you need although you will still have sluggish graphics while
> long-running kernels are active.
> 

Eager to hear the details.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 99312] Long-running OpenCL kernels cause ring stalls and GPU lockups on Kabini

2017-01-07 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=99312

--- Comment #1 from John Bridgman  ---
If you have not already done so, try disabling the watchdog timer:


MODULE_PARM_DESC(lockup_timeout, "GPU lockup timeout in ms (default 1 = 10
seconds, 0 = disable)");
module_param_named(lockup_timeout, radeon_lockup_timeout, int, 0444);

As part of HSA/ROC development we dropped the priority of compute work relative
to graphics which improved interactivity and *almost* eliminated timeouts
without having to disable the timer  - when I get back in the office I'll dig
up the changes. In the meantime, I think disabling the timer will do what you
need although you will still have sluggish graphics while long-running kernels
are active.

Lowering the priority of compute waves across the board won't be a fully
general solution because there are going to be some cases (eg Valve's recent
work with using high priority compute to improve VR smoothness) where compute
will need to be *higher* priority than graphics but it should cover most cases
other than "simultaneously running GROMACS and VR".

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 99312] Long-running OpenCL kernels cause ring stalls and GPU lockups on Kabini

2017-01-07 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=99312

Vedran Miletić  changed:

   What|Removed |Added

   Hardware|Other   |x86-64 (AMD64)
   Severity|normal  |major
Version|13.0|git
 OS|All |Linux (All)

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 99312] Long-running OpenCL kernels cause ring stalls and GPU lockups on Kabini

2017-01-07 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=99312

Bug ID: 99312
   Summary: Long-running OpenCL kernels cause ring stalls and GPU
lockups on Kabini
   Product: Mesa
   Version: 13.0
  Hardware: Other
OS: All
Status: NEW
  Severity: normal
  Priority: medium
 Component: Drivers/Gallium/radeonsi
  Assignee: dri-devel at lists.freedesktop.org
  Reporter: vedran at miletic.net
QA Contact: dri-devel at lists.freedesktop.org

Running long lasting OpenCL kernels (e.g. GROMACS with a system of many atoms)
using kernel 4.8.15, Mesa git, and LLVM git on Kabini APU:

vendor_id   : AuthenticAMD
cpu family  : 22
model   : 0
model name  : AMD Athlon(tm) 5350 APU with Radeon(tm) R3
stepping: 1
microcode   : 0x700010b

with GPU:

00:01.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc.
[AMD/ATI] Kabini [Radeon HD 8400 / R3 Series] [1002:9830]

causes GPU lockups like:

[338584.980657] radeon :00:01.0: ring 0 stalled for more than 10351msec
[338584.980811] radeon :00:01.0: GPU lockup (current fence id
0x000827c1 last fence id 0x000827c2 on ring 0)
[338585.484633] radeon :00:01.0: ring 0 stalled for more than 10855msec
[338585.484789] radeon :00:01.0: GPU lockup (current fence id
0x000827c1 last fence id 0x000827c2 on ring 0)
[338585.988632] radeon :00:01.0: ring 0 stalled for more than 11359msec
[338585.988787] radeon :00:01.0: GPU lockup (current fence id
0x000827c1 last fence id 0x000827c2 on ring 0)

Machine does not hang. This is reliably reproducible. Any other info I can
provide?

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: