Hi! On Tue, Nov 30, 2021 at 01:05:48PM +0800, Kewen.Lin wrote: > on 2021/11/30 上午6:06, Segher Boessenkool wrote: > > On Tue, Sep 28, 2021 at 04:16:04PM +0800, Kewen.Lin wrote: > >> unsigned adjusted_cost = (nunits == 2) ? 2 : 1; > >> unsigned extra_cost = nunits * adjusted_cost; > > > >> For V2DI/V2DF, it uses 2 penalized cost for each scalar load > >> while for the other modes, it uses 1. > > > > So for V2D[IF] we get 4, for V4S[IF] we get 4, for V8HI it's 8, and > > for V16QI it is 16? Pretty terrible as well, heh (I would expect all > > vector ops to be similar cost). > > But for different vector units it has different number of loads, it seems > reasonable to have more costs when it has more loads to be fed into those > limited number of load/store units.
More expensive, yes. This expensive? That doesn't look optimal :-) > > This also suggests we should cost vector construction separately, which > > would pretty obviously be a good thing anyway (it happens often, it has > > a quite different cost structure). > > vectorizer does model vector construction separately, there is an enum > vect_cost_for_stmt *vec_construct*, normally it works well. But for this > bwaves hotspot, it requires us to do some more penalization as evaluated, > so we put the penalized cost onto this special vector construction when > some heuristic thresholds are met. Ah, heuristics. We can adjust them forever :-) Segher