https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018
--- Comment #33 from Jiu Fu Guo <guojiufu at gcc dot gnu.org> --- (In reply to Richard Biener from comment #32) > Note I don't think the unrolling is excessive - store motion then applying > to all count[] and all computations hoisted out of the loop may be a bit > too much for register pressure though, especially since we're using > flag-based store-motion. But it causes the stores to be materialized > on all exits of the loop which means we end up with N*N conditional stores :/ In general, it may not very aggressive for param_max_peel_branches = 31, param_max_completely_peel_times = 16. For in_pack_i4.c, the loop is at most 13+1 times and then be unrolled. While for the loop, unrolling increases size and does not help performance. > > I guess SM could be improved here. Thanks all!