Hi Luis,

well was "drm/amdgpu: defer test IBs on the rings at boot (V3)" does is delaying the IB test a bit and running it async to the rest of the bootup.

So what most likely happens is that some hardware feature (like power or clock gating) which doesn't works correctly on your system kicks in and lets the IB test fail.

It's rather likely that this problem is also responsible for the crashes you expect later on. So I think we should concentrate on fixing that.

Regards,
Christian.

Am 11.07.2018 um 23:27 schrieb Luís Mendes:
Hi Jim,

I followed your suggestion and was able to bisect the kernel patches.
The offending patch is: drm/amdgpu: defer test IBs on the rings at boot (V3)
commit:

2c773de2ecb8c327f2448bd1eecad224e9227087 <https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v4.18-rc4&id=2c773de2ecb8c327f2448bd1eecad224e9227087>


After reverting this patch the IB test succeeded with kernel v4.18-rc4 on both systems and the amdgpu driver was correctly loaded both on SAPPHIRE RX550 4GB and on SAPPHIRE RX460 2GB.

The GPU hang remains, however.
 I will try to configure a remote IPMI connection to see what is happening with the kernel boot or setup a serial console for the Kernel.

Thanks & Regards,
Luís

On Wed, Jul 11, 2018 at 10:56 AM, jimqu <ji...@amd.com <mailto:ji...@amd.com>> wrote:

    HI Luis,


    Let us trace the issue one by one.


    IB test fail:

    This should be regression issue on 4.18, you can bisect the kernel
    patches.

    GPU hang:

    Fix IB test fail first.


    Thanks

    JimQu



    On 2018年07月11日 17:34, Luís Mendes wrote:
    Hi Jim,

    Thanks for your interest in this issue. Actually this is a
    multiple issue... not only the IB ring test is failing... as I am
    having quite some trouble getting the cards SAPPHIRE RX 550 4GB
    on a Tyan S7025 and SAPPHIRE RX 460 2GB on a TYAN S7002 to work,
    both systems using same Ubuntu 18.04 with vanilla kernel.

    *1. May you also test earlier kernel? v4.17 or v4.16.*
    I've tested kernels v4.17.5 and v4.16.6 with same system and both
    are able to pass the IB ring test and system boots into X using
    NVIDIA as the display connected card.
    dmesg log attached for kernel 4.17.5, file
    TYAN_S7025_kernelv4.17.5_amdgpu_IB_ring_test_OK.txt.

    *2. May you test the issue only with amdgpu?*
    - I've tested on a TYAN S7002 system with a single SAPPHIRE RX
    460 2GB, on-board VGA enabled and used as primary display.
    Kernel v4.18-rc4 fails the IB ring test, system is able to enter
    X through the on-board VGA.
    dmesg log attached for kernel 4.18-rc4, file
    TYAN_S7002_kernel_v4.18-rc4_IB_ring_test_fail.txt.

    - Same TYAN S7002 system, but now with on-board VGA disabled and
    using RX 460 as display connected card.
    Kernels v4.17.5 and v4.16.6 are able to pass the IB ring test,
    but GPU hangs before entering X. Don't have logs for these yet.

    Regards,
    Luís Mendes
    Aparapi contributor and MSc Researcher





    On Wed, Jul 11, 2018 at 3:49 AM, Qu, Jim <jim...@amd.com
    <mailto:jim...@amd.com>> wrote:

        Hi Luis,

        1. May you also test earlier kernel? v4.17 or v4.16.
        2. May you test the issue only with amdgpu?

        Thanks
        JimQu

        ________________________________________
        发件人: amd-gfx <amd-gfx-boun...@lists.freedesktop.org
        <mailto:amd-gfx-boun...@lists.freedesktop.org>> 代表 Luís
        Mendes <luis.p.men...@gmail.com <mailto:luis.p.men...@gmail.com>>
        发送时间: 2018年7月11日 6:04:00
        收件人: Michel Dänzer; Koenig, Christian; amd-gfx list
        主题: Re: Regression with kernel 4.18 - AMD RX 550 fails IB
        ring test on power-up

        Hi,

        Issue remains in kernel 4.18-rc4 using SAPPHIRE RX 550 4GB.

        Logs follow attached.

        Regards,
        Luis

        On Tue, Jun 26, 2018 at 10:08 AM, Luís Mendes
        <luis.p.men...@gmail.com
        <mailto:luis.p.men...@gmail.com><mailto:luis.p.men...@gmail.com
        <mailto:luis.p.men...@gmail.com>>> wrote:
        Hi,

        I've tried kernel 4.18-rc2 on a system with a NVIDIA GTX 1050
        Ti and an AMD RX 550 4GB and the RX 550 card is failing the
        IB ring test.

        [    5.033217] [drm:gfx_v8_0_ring_test_ib [amdgpu]] *ERROR*
        amdgpu: ib test failed (scratch(0xC040)=0xFFFFFFFF)
        [    5.033264] [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR*
        amdgpu: failed testing IB on ring 6 (-22).

        Please see the attached log.

        Regards,
        Luís





_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Reply via email to