https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122277

            Bug ID: 122277
           Summary: Costing VF 1 instead of 4 since
                    r16-4411-gb6e802fd55d37e
           Product: gcc
           Version: 16.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: rdapp at gcc dot gnu.org
  Target Milestone: ---
            Target: riscv

Since r16-4411-gb6e802fd55d37e I'm seeing different vectorization for x264
SATD:

Source is in gcc.target/riscv/rvv/autovec/pr118019-2.c.

Built with -O3 -march=rv64gcv_zvl512b x264-pixel-satd-8x4.c
-mno-vector-strict-align

We now choose a VLA mode (RVVM1SI) for vectorizing the second loop

  for (int i = 0; i < 4; i++)
    {
      HADAMARD4 (a0, a1, a2, a3, tmp[0][i], tmp[1][i], tmp[2][i], tmp[3][i]);
      sum += abs2 (a0) + abs2 (a1) + abs2 (a2) + abs2 (a3);
    }

instead of V4QI.

Costing prefers RVVM1SI due to its VF={16, 16} (= 4 via estimate in
compare_inside_loop_cost) while V4QI is considered to have VF=1.
It also had VF=4 before.

Thus, we scale the V4QI loop costs by 4 and the RVVM1SI costs by 1, making it
cheaper.

I can add more later, just dumping the initial info I have available right now,
am a bit distracted today.

Reply via email to