Re: [Mesa-dev] [PATCH 2/2] ac, radeonsi: reduce optimizations for complex compute shaders on older APUs

Dave Airlie Wed, 01 Aug 2018 13:12:01 -0700

Sounds like a major project for someone to fix llvm, doesn't AMD have
compiled devs?


Acked-by: Dave Airlie <[email protected]>

Dave.

On Thu., 2 Aug. 2018, 04:43 Marek Olšák, <[email protected]> wrote:

> On Mon, Jul 23, 2018 at 11:33 PM, Timothy Arceri <[email protected]>
> wrote:
> > On 24/07/18 11:15, Marek Olšák wrote:
> >>
> >> On Fri, Jul 20, 2018 at 12:53 AM, Dave Airlie <[email protected]>
> wrote:
> >>>
> >>> On 20 July 2018 at 13:12, Marek Olšák <[email protected]> wrote:
> >>>>
> >>>> From: Marek Olšák <[email protected]>
> >>>>
> >>>> To make dEQP-GLES31.functional.ssbo.layout.random.all_shared_buffer.23
> >>>> finish sooner on the older CPUs. (otherwise it gets killed and we fail
> >>>> the test)
> >>>
> >>>
> >>> I think this is possibly a bad idea, since it's clear LLVM has some
> >>> pathalogical
> >>> behaviour the AMDGPU backend for this shader and we are just papering
> >>> over it.
> >>>
> >>> A quick dig into LLVM shows horrible misuse of a SmallVector data
> >>> structure
> >>> for what ends up having 2000 entries in it.
> >>>
> >>> I'm not going to out right NAK this, but it would be nice to have it
> >>> accompanied
> >>> by a pointer to an llvm bug against the amdgpu backend for the
> >>> pathalogical case.
> >>
> >>
> >> Even if I comment out the push_back call in LLVM, it's still too slow.
> >> (the dEQP test times out and fails) LLVMCodeGenLevelLess is faster,
> >> but I don't know yet if it's enough for the test.
> >
> >
> > I hard-coded the second buffer block to column_major rather than
> row_major
> > which reduced total run time from 15 -> 9 seconds on my machine. So it
> seems
> > temps would definitely help. Proper packing support would also likely
> help a
> > little more but not as much.
>
> 15 -> 9 is not enough. We need to decrease the compile time by 60% or more.
>
> For Dave: Commenting out the "push_back" call in LLVM is also not enough.
>
> Only LLVMCodeGenLevelLess gives the desired improvement (~60%), though
> the test is dangerously close to timing out and getting killed.
> LLVMCodeGenLevelNone is fastest, but the bytecode is horrible (live
> variables between blocks are always spilled).
>
> If there is no straightforward way to improve compile times (I think
> there isn't), I'll have to push this.
>
> Marek
>

_______________________________________________
mesa-dev mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/2] ac, radeonsi: reduce optimizations for complex compute shaders on older APUs

Reply via email to