On 11/10/2017 07:17 AM, Christian König wrote:
Series is Acked-by: Christian König <christian.koe...@amd.com>.

Please note that I think your OOM killer test shows quite a bug we currently have in the kernel driver.

A single allocation of 1TB shouldn't trigger the OOM killer, but rather be reacted immediately.

Maybe we should add a second test which does incremental 1GB allocations but still keep this tests ? With this test i get a callstack as bellow + crash of the test suite with general protection fault - As normal behavior I would have expected just some errno returning from the amdgpu_bo_alloc which we could check in the test.

Thanks,
Andrey

[169053.128981 <72032.811683>] ------------[ cut here ]------------
[169053.129006 < 0.000025>] WARNING: CPU: 0 PID: 22883 at mm/page_alloc.c:3883 __alloc_pages_slowpath+0xf03/0x14e0 [169053.129007 < 0.000001>] Modules linked in: amdgpu chash ttm drm_kms_helper drm i2c_algo_bit fb_sys_fops syscopyarea sysfillrect sysimgblt edac_mce_amd kvm_amd kvm irqbypass crct10dif_pclmul crc32_pclmul snd_hda_codec_realtek ghash_clmulni_intel snd_hda_codec_generic pcbc snd_hda_codec_hdmi snd_hda_intel aesni_intel snd_hda_codec aes_x86_64 snd_hda_core crypto_simd glue_helper snd_hwdep rfkill_gpio cryptd snd_pcm snd_seq_midi snd_seq_midi_event serio_raw snd_rawmidi snd_seq cdc_ether usbnet snd_seq_device joydev fam15h_power k10temp r8152 snd_timer mii i2c_piix4 rtsx_pci_ms snd memstick soundcore shpchp 8250_dw i2c_designware_platform i2c_designware_core mac_hid binfmt_misc nfsd auth_rpcgss nfs_acl lockd grace sunrpc parport_pc ppdev lp parport autofs4 rtsx_pci_sdmmc psmouse rtsx_pci sdhci_pci ahci sdhci libahci
[169053.129084 <    0.000077>]  video i2c_hid hid_generic usbhid hid
[169053.129096 < 0.000012>] CPU: 0 PID: 22883 Comm: lt-amdgpu_test Tainted: G W 4.14.0-rc3+ #1 [169053.129097 < 0.000001>] Hardware name: AMD Gardenia/Gardenia, BIOS RGA1101C 07/20/2015 [169053.129099 < 0.000002>] task: ffff880048803d80 task.stack: ffff880064688000
[169053.129103 <    0.000004>] RIP: 0010:__alloc_pages_slowpath+0xf03/0x14e0
[169053.129105 <    0.000002>] RSP: 0018:ffff88006468f108 EFLAGS: 00010246
[169053.129108 < 0.000003>] RAX: 0000000000000000 RBX: 00000000014000c0 RCX: ffffffff81279065 [169053.129109 < 0.000001>] RDX: dffffc0000000000 RSI: 000000000000000f RDI: ffffffff82609000 [169053.129111 < 0.000002>] RBP: ffff88006468f328 R08: 0000000000000000 R09: ffffffffffff8576 [169053.129113 < 0.000002>] R10: 000000005c2044e7 R11: 0000000000000000 R12: ffff88006468f3d8 [169053.129114 < 0.000001>] R13: ffff880048803d80 R14: 000000000140c0c0 R15: 000000000000000f [169053.129117 < 0.000003>] FS: 00007f707863b700(0000) GS:ffff88006ce00000(0000) knlGS:0000000000000000 [169053.129119 < 0.000002>] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [169053.129120 < 0.000001>] CR2: 0000000001250000 CR3: 00000000644cf000 CR4: 00000000001406f0
[169053.129122 <    0.000002>] Call Trace:
[169053.129131 <    0.000009>]  ? __module_address+0x145/0x190
[169053.129135 <    0.000004>]  ? is_bpf_text_address+0xe/0x20
[169053.129140 <    0.000005>]  ? __kernel_text_address+0x12/0x40
[169053.129144 <    0.000004>]  ? unwind_get_return_address+0x36/0x50
[169053.129150 <    0.000006>]  ? memcmp+0x5b/0x90
[169053.129152 <    0.000002>]  ? warn_alloc+0x250/0x250
[169053.129156 <    0.000004>]  ? get_page_from_freelist+0x147/0x10f0
[169053.129160 <    0.000004>]  ? save_stack_trace+0x1b/0x20
[169053.129164 <    0.000004>]  ? kasan_kmalloc+0xad/0xe0
[169053.129186 <    0.000022>]  ? ttm_bo_mem_space+0x79/0x6b0 [ttm]
[169053.129196 <    0.000010>]  ? ttm_bo_validate+0x178/0x220 [ttm]
[169053.129200 <    0.000004>] __alloc_pages_nodemask+0x3c4/0x400
[169053.129203 <    0.000003>]  ? __alloc_pages_slowpath+0x14e0/0x14e0
[169053.129205 <    0.000002>]  ? __save_stack_trace+0x66/0xd0
[169053.129209 <    0.000004>]  ? rb_insert_color+0x32/0x3e0
[169053.129213 <    0.000004>]  ? do_syscall_64+0xea/0x280
[169053.129217 <    0.000004>]  alloc_pages_current+0x75/0x110
[169053.129221 <    0.000004>]  kmalloc_order+0x1f/0x80
[169053.129223 <    0.000002>]  kmalloc_order_trace+0x24/0xa0
[169053.129226 <    0.000003>]  __kmalloc+0x264/0x280
[169053.129383 <    0.000157>] amdgpu_vram_mgr_new+0x11b/0x3b0 [amdgpu]
[169053.129391 < 0.000008>] ? reservation_object_reserve_shared+0x64/0xf0
[169053.129401 <    0.000010>]  ttm_bo_mem_space+0x196/0x6b0 [ttm]
[169053.129478 <    0.000077>]  ? add_hole+0x20a/0x220 [drm]
[169053.129489 <    0.000011>]  ttm_bo_validate+0x178/0x220 [ttm]
[169053.129498 <    0.000009>]  ? ttm_bo_evict_mm+0x70/0x70 [ttm]
[169053.129508 <    0.000010>]  ? ttm_check_swapping+0xf6/0x110 [ttm]
[169053.129541 <    0.000033>]  ? drm_vma_offset_add+0x5b/0x80 [drm]
[169053.129572 <    0.000031>]  ? drm_vma_offset_add+0x68/0x80 [drm]
[169053.129584 <    0.000012>] ttm_bo_init_reserved+0x546/0x630 [ttm]
[169053.129716 <    0.000132>] amdgpu_bo_do_create+0x28b/0x630 [amdgpu]
[169053.129816 <    0.000100>]  ? amdgpu_fill_buffer+0x580/0x580 [amdgpu]
[169053.129952 < 0.000136>] ? amdgpu_ttm_placement_from_domain+0x320/0x320 [amdgpu]
[169053.129956 <    0.000004>]  ? try_to_wake_up+0xbe/0x720
[169053.130054 <    0.000098>]  amdgpu_bo_create+0x85/0x400 [amdgpu]
[169053.130153 <    0.000099>]  ? amdgpu_bo_do_create+0x630/0x630 [amdgpu]
[169053.130155 <    0.000002>]  ? wake_up_process+0x15/0x20
[169053.130158 <    0.000003>]  ? insert_work+0xf3/0x110
[169053.130257 <    0.000099>] amdgpu_gem_object_create+0x101/0x190 [amdgpu]
[169053.130356 <    0.000099>]  ? amdgpu_gem_object_free+0xe0/0xe0 [amdgpu]
[169053.130360 < 0.000004>] ? tty_insert_flip_string_fixed_flag+0xab/0x110
[169053.130468 <    0.000108>] amdgpu_gem_create_ioctl+0x364/0x460 [amdgpu]
[169053.130695 < 0.000227>] ? amdgpu_gem_object_close+0x320/0x320 [amdgpu]
[169053.130767 <    0.000072>]  ? drm_dev_printk+0x120/0x120 [drm]
[169053.130840 <    0.000073>]  ? __wake_up_common_lock+0xe9/0x170
[169053.130989 < 0.000149>] ? amdgpu_gem_object_close+0x320/0x320 [amdgpu]
[169053.131061 <    0.000072>]  drm_ioctl_kernel+0xae/0xf0 [drm]
[169053.131115 <    0.000054>]  drm_ioctl+0x466/0x520 [drm]
[169053.131238 < 0.000123>] ? amdgpu_gem_object_close+0x320/0x320 [amdgpu]
[169053.131291 <    0.000053>]  ? drm_getunique+0xf0/0xf0 [drm]
[169053.131426 <    0.000135>]  amdgpu_drm_ioctl+0x78/0xd0 [amdgpu]
[169053.131451 <    0.000025>]  do_vfs_ioctl+0x12e/0x860
[169053.131466 <    0.000015>]  ? apparmor_file_permission+0x1a/0x20
[169053.131489 <    0.000023>]  ? ioctl_preallocate+0x130/0x130
[169053.131503 <    0.000014>]  ? rw_verify_area+0x78/0x140
[169053.131520 <    0.000017>]  ? vfs_write+0x1a2/0x260
[169053.131544 <    0.000024>]  ? syscall_trace_enter+0x1fd/0x520
[169053.131568 <    0.000024>]  ? sched_clock+0x9/0x10
[169053.131584 <    0.000016>]  ? exit_to_usermode_loop+0xc0/0xc0
[169053.131607 <    0.000023>]  ? __fget_light+0xa7/0xc0
[169053.131631 <    0.000024>]  SyS_ioctl+0x79/0x90
[169053.131651 <    0.000020>]  ? __context_tracking_exit.part.4+0x53/0xc0
[169053.131672 <    0.000021>]  ? do_vfs_ioctl+0x860/0x860
[169053.131683 <    0.000011>]  do_syscall_64+0xea/0x280
[169053.131708 <    0.000025>] entry_SYSCALL64_slow_path+0x25/0x25
[169053.131720 <    0.000012>] RIP: 0033:0x7f70778eef07
[169053.131740 < 0.000020>] RSP: 002b:00007ffc509d13d8 EFLAGS: 00000202 ORIG_RAX: 0000000000000010 [169053.131756 < 0.000016>] RAX: ffffffffffffffda RBX: 000000000000001e RCX: 00007f70778eef07 [169053.131778 < 0.000022>] RDX: 00007ffc509d1490 RSI: 00000000c0206440 RDI: 0000000000000004 [169053.131798 < 0.000020>] RBP: 00007ffc509d1410 R08: 000000000124c660 R09: 0000000000000000 [169053.131815 < 0.000017>] R10: 000000000000006e R11: 0000000000000202 R12: 000000000124b530 [169053.131835 < 0.000020>] R13: 00007ffc509d1800 R14: 0000000000000000 R15: 0000000000000000 [169053.131854 < 0.000019>] Code: 89 85 c8 fe ff ff e9 5d fc ff ff 8d 42 ff 45 31 f6 c6 85 d0 fe ff ff 01 89 85 c8 fe ff ff e9 45 fc ff ff 41 89 c5 e9 10 fc ff ff <0f> ff e9 ba f1 ff ff 0f ff 89 d8 25 ff ff f7 ff 89 85 8c fe ff
[169053.131933 <    0.000079>] ---[ end trace 8253dc1e92579724 ]---
[169053.132622 < 0.000689>] [drm:amdgpu_gem_object_create [amdgpu]] *ERROR* Failed to allocate GEM object (1000000000000, 6, 4096, -12) [169053.132877 < 0.000255>] traps: lt-amdgpu_test[22883] general protection ip:7f7077ff6007 sp:7ffc509d13e0 error:0 in libdrm_amdgpu.so.1.0.0[7f7077ff2000+b000]



Instead I expected that we need to do multiple 1GB allocations to trigger the next problem that our TTM code doesn't imply a global limit.

Regards,
Christian.

Am 10.11.2017 um 05:29 schrieb Andrey Grodzovsky:
THe following  patch series intoroduce dynamic tests dusabling/enabling
in amdgpu  tester using Cunit API. Today test suits that
don't apply to specific HW just return success w/o executing while
single tests that can't be executed properly are commented out.

Suits are diasbled based on hooks they provide (e.g incompatible
ASIC or missing blocks) while single tests are diasbled explicitly since this is usually due to some bug preventing from the tester or the system to handle
the test w/o crashing or killing the tester.

Inside this series also a minor cleanup and new test for memory over allocation.

Andrey Grodzovsky (4):
   amdgpu: Add functions to disable suites and tests.
   amdgpu: Use new suite/test disabling functionality.
   amdgpu: Move memory alloc tests in bo suite.
   amdgpu: Add memory over allocation test.

tests/amdgpu/amdgpu_test.c | 169 +++++++++++++++++++++++++++++++++++++-----
  tests/amdgpu/amdgpu_test.h    |  46 ++++++++++++
  tests/amdgpu/basic_tests.c    |  49 ------------
  tests/amdgpu/bo_tests.c       |  69 +++++++++++++++++
  tests/amdgpu/deadlock_tests.c |   8 +-
  tests/amdgpu/uvd_enc_tests.c  |  81 ++++++++------------
  tests/amdgpu/vce_tests.c      |  65 ++++++++--------
  tests/amdgpu/vcn_tests.c      |  74 ++++++++----------
  8 files changed, 363 insertions(+), 198 deletions(-)



_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Reply via email to