Hey Oded,

Sorry to be a nuisance, but if you have everything still setup could you give this fix a quick go?

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index 5321d18..9f70ee0 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -667,7 +667,7 @@ static int set_sched_resources(struct device_queue_manager *dqm)
                /* This situation may be hit in the future if a new HW
                 * generation exposes more than 64 queues. If so, the
                 * definition of res.queue_mask needs updating */
-               if (WARN_ON(i > sizeof(res.queue_mask))) {
+               if (WARN_ON(i > (sizeof(res.queue_mask)*8))) {
                        pr_err("Invalid queue enabled by amdgpu: %d\n", i);
                        break;
                }

John/Felix,

Any chance I could borrow a carrizo/kaveri for a few days? Or maybe you could help me run some final tests on this patch series?

- Andres


On 2017-02-09 03:11 PM, Oded Gabbay wrote:
  Andres,

I tried your patches on Kaveri with airlied's drm-next branch.
I used radeon+amdkfd

The following test failed: KFDQMTest.CreateMultipleCpQueues
However, I can't debug it because I don't have the sources of kfdtest.

In dmesg, I saw the following warning during boot:
WARNING: CPU: 0 PID: 150 at
drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c:670
start_cpsch+0xc5/0x220 [amdkfd]
[    4.393796] Modules linked in: hid_logitech_hidpp hid_logitech_dj
hid_generic usbhid hid uas usb_storage amdkfd amd_iommu_v2 radeon(+)
i2c_algo_bit ttm drm_kms_helper syscopyarea ahci sysfillrect sysimgblt
libahci fb_sys_fops drm r8169 mii fjes video
[    4.393811] CPU: 0 PID: 150 Comm: systemd-udevd Not tainted 4.10.0-rc5+ #1
[    4.393811] Hardware name: Gigabyte Technology Co., Ltd. To be
filled by O.E.M./F2A88XM-D3H, BIOS F5 01/09/2014
[    4.393812] Call Trace:
[    4.393818]  dump_stack+0x63/0x90
[    4.393822]  __warn+0xcb/0xf0
[    4.393823]  warn_slowpath_null+0x1d/0x20
[    4.393830]  start_cpsch+0xc5/0x220 [amdkfd]
[    4.393836]  ? initialize_cpsch+0xa0/0xb0 [amdkfd]
[    4.393841]  kgd2kfd_device_init+0x375/0x490 [amdkfd]
[    4.393883]  radeon_kfd_device_init+0xaf/0xd0 [radeon]
[    4.393911]  radeon_driver_load_kms+0x11e/0x1f0 [radeon]
[    4.393933]  drm_dev_register+0x14a/0x200 [drm]
[    4.393946]  drm_get_pci_dev+0x9d/0x160 [drm]
[    4.393974]  radeon_pci_probe+0xb8/0xe0 [radeon]
[    4.393976]  local_pci_probe+0x45/0xa0
[    4.393978]  pci_device_probe+0x103/0x150
[    4.393981]  driver_probe_device+0x2bf/0x460
[    4.393982]  __driver_attach+0xdf/0xf0
[    4.393984]  ? driver_probe_device+0x460/0x460
[    4.393985]  bus_for_each_dev+0x6c/0xc0
[    4.393987]  driver_attach+0x1e/0x20
[    4.393988]  bus_add_driver+0x1fd/0x270
[    4.393989]  ? 0xffffffffc05c8000
[    4.393991]  driver_register+0x60/0xe0
[    4.393992]  ? 0xffffffffc05c8000
[    4.393993]  __pci_register_driver+0x4c/0x50
[    4.394007]  drm_pci_init+0xeb/0x100 [drm]
[    4.394008]  ? 0xffffffffc05c8000
[    4.394031]  radeon_init+0x98/0xb6 [radeon]
[    4.394034]  do_one_initcall+0x53/0x1a0
[    4.394037]  ? __vunmap+0x81/0xd0
[    4.394039]  ? kmem_cache_alloc_trace+0x152/0x1c0
[    4.394041]  ? vfree+0x2e/0x70
[    4.394044]  do_init_module+0x5f/0x1ff
[    4.394046]  load_module+0x24cc/0x29f0
[    4.394047]  ? __symbol_put+0x60/0x60
[    4.394050]  ? security_kernel_post_read_file+0x6b/0x80
[    4.394052]  SYSC_finit_module+0xdf/0x110
[    4.394054]  SyS_finit_module+0xe/0x10
[    4.394056]  entry_SYSCALL_64_fastpath+0x1e/0xad
[    4.394058] RIP: 0033:0x7f9cda77c8e9
[    4.394059] RSP: 002b:00007ffe195d3378 EFLAGS: 00000246 ORIG_RAX:
0000000000000139
[    4.394060] RAX: ffffffffffffffda RBX: 00007f9cdb8dda7e RCX: 00007f9cda77c8e9
[    4.394061] RDX: 0000000000000000 RSI: 00007f9cdac7ce2a RDI: 0000000000000013
[    4.394062] RBP: 00007ffe195d2450 R08: 0000000000000000 R09: 0000000000000000
[    4.394063] R10: 0000000000000013 R11: 0000000000000246 R12: 00007ffe195d245a
[    4.394063] R13: 00007ffe195d1378 R14: 0000563f70cc93b0 R15: 0000563f70cba4d0
[    4.394091] ---[ end trace 9c5af17304d998bb ]---
[    4.394092] Invalid queue enabled by amdgpu: 9

I suggest you get a Kaveri/Carrizo machine to debug these issues.

Until that, I don't think we should merge this patch-set.

Oded

On Wed, Feb 8, 2017 at 9:47 PM, Andres Rodriguez <andre...@gmail.com> wrote:
Thank you Oded.

- Andres


On 2017-02-08 02:32 PM, Oded Gabbay wrote:
On Wed, Feb 8, 2017 at 6:23 PM, Andres Rodriguez <andre...@gmail.com>
wrote:
Hey Felix,

Thanks for the pointer to the ROCm mqd commit. I like that the
workarounds
are easy to spot. I'll add that to a new patch series I'm working on for
some bug-fixes for perf being lower on pipes other than pipe 0.

I haven't tested this yet on kaveri/carrizo. I'm hoping someone with the
HW
will be able to give it a go. I put in a few small hacks to get KFD to
boot
but do nothing on polaris10.

Regards,
Andres


On 2017-02-06 03:20 PM, Felix Kuehling wrote:
Hi Andres,

Thank you for tackling this task. It's more involved than I expected,
mostly because I didn't have much awareness of the MQD management in
amdgpu.

I made one comment in a separate message about the unified MQD commit
function, if you want to bring that more in line with our latest ROCm
release on github.

Also, were you able to test the upstream KFD with your changes on a
Kaveri or Carrizo?

Regards,
    Felix


On 17-02-03 11:51 PM, Andres Rodriguez wrote:
The current queue/pipe split policy is for amdgpu to take the first
pipe
of
MEC0 and leave the rest for amdkfd to use. This policy is taken as an
assumption in a few areas of the implementation.

This patch series aims to allow for flexible/tunable queue/pipe split
policies
between kgd and kfd. It also updates the queue/pipe split policy to one
that
allows better compute app concurrency for both drivers.

In the process some duplicate code and hardcoded constants were
removed.

Any suggestions or feedback on improvements welcome.

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Hi Andres,
I will try to find sometime to test it on my Kaveri machine.

Oded


_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Reply via email to