On Sun, 2018-02-04 at 10:58 -0800, Matt Turner wrote: > On Wed, Dec 13, 2017 at 11:21 PM, Iago Toral <ito...@igalia.com> > wrote: > > On Tue, 2017-12-12 at 08:20 +0100, Iago Toral wrote: > > > > On Mon, 2017-12-11 at 08:01 -0800, Jason Ekstrand wrote: > > > > On Mon, Dec 11, 2017 at 12:55 AM, Iago Toral <ito...@igalia.com> > > wrote: > > > > This didn't get any reviews yet. Any takers? > > > > On Fri, 2017-12-01 at 13:46 +0100, Iago Toral Quiroga wrote: > > > Otherwise loop unrolling will fail to see the actual cost of > > > the unrolling operations when the loop body contains 64-bit > > > integer > > > instructions, and very specially when the divmod64 lowering > > > applies, > > > since its lowering is quite expensive. > > > > > > Without this change, some in-development CTS tests for int64 > > > get stuck forever trying to register allocate a shader with > > > over 50K SSA values. The large number of SSA values is the result > > > of NIR first unrolling multiple seemingly simple loops that > > > involve > > > int64 instructions, only to then lower these instructions to > > > produce > > > a massive pile of code (due to the divmod64 lowering in the > > > unrolled > > > instructions). > > > > > > With this change, loop unrolling will see the loops with the > > > int64 > > > code already lowered and will realize that it is too expensive to > > > unroll. > > > > > > Hrm... I'm not quite sure what I think of this. I put it after > > nir_optimize > > because I wanted opt_algebraic to be able to work it's magic and > > hopefully > > remove a bunch of int64 ops before we lower them. In particular, > > we have > > optimizations to remove integer division and replace it with > > shifts. > > However, loop unrolling does need to happen before > > lower_indirect_derefs so > > that lower_indirect_derefs will do as little work as possible. > > > > This is a bit of a pickle... I don't really want to add a third > > brw_nir_optimize call. It probably wouldn't be the end of the > > world but it > > does add compile time. > > > > One crazy idea which I don't think I like would be to have a quick > > pass that > > walks the IR and sees if there are any 64-bit SSA values. If it > > does, we > > run brw_nir_optimize without loop unrolling then 64-bit lowering > > and then we > > go into the normal brw_nir_optimize. > > > > > > With the constraints you mention above, I am not sure that we have > > many more > > options... what if we always run opt_algebraic first followed by > > int64 > > lowering before the first nir_optimize? That would only add an > > extra > > opt_algebraic instead of a full nir_optimize. Would that be better > > than > > adding that 64-bit SSA scan pre-pass? > > > > > > We still need to make a decision for this, does my proposal sound > > better > > than than the other options on the table? If not I guess we should > > go with > > the 64-bit SSA scan pre-pass. > > Realized I never responded to this -- sorry. > > Yes, I think your proposal sounds good.
Thanks, just sent a v2. Iago _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev