https://gcc.gnu.org/bugzilla/show_bug.cgi?id=125540

            Bug ID: 125540
           Summary: [15 Regression] Vectorizer regression:
                    *std::max_element falls back to scalar cmovl argmax
                    loop
           Product: gcc
           Version: 15.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: torsten.mandel at sap dot com
  Target Milestone: ---

Component: tree-optimization
Version: 15.2.0
Keywords: missed-optimization
Known to work: 13.2.1
Known to fail: 15.2.0

--- Description ---

GCC 13 vectorizes `*std::max_element(begin, end)` to a SIMD vpmaxsd reduction
when the returned iterator is only dereferenced (never escapes). GCC 15
regresses to a scalar cmovl-based argmax loop for the same code.

The regression affects both `std::vector<int>::iterator` and raw pointer
iterators (via `std::span<const int>`), and reproduces at every x86 ISA level
tested (SSE4.1, AVX2, AVX512F). Semantically equivalent value-recurrence
loops and `std::ranges::max` continue to vectorize on GCC 15.

--- Reproducer (argmax_repro.cpp) ---

#include <algorithm>
#include <ranges>
#include <span>
#include <vector>

int iter_vec(const std::vector<int>& v) {
    return *std::max_element(v.begin(), v.end());
}

int iter_span(std::span<const int> s) {
    return *std::max_element(s.begin(), s.end());
}

int loop_vec(const std::vector<int>& v) {
    int m = v.front();
    for (int x : v) if (x > m) m = x;
    return m;
}

int loop_span(std::span<const int> s) {
    int m = s.front();
    for (int x : s) if (x > m) m = x;
    return m;
}

int ranges_vec(const std::vector<int>& v) {
    return std::ranges::max(v);
}

int ranges_span(std::span<const int> s) {
    return std::ranges::max(s);
}

--- Command to reproduce ---

  g++ -std=c++20 -O2 -mavx2 -S -o argmax_repro.s argmax_repro.cpp

Then inspect iter_vec / iter_span for presence of vpmaxsd:

  grep -c vpmaxsd argmax_repro.s

--- Expected result ---

All six functions emit vpmaxsd instructions (SIMD max reduction), as GCC 13
does. The iterator in std::max_element is only dereferenced at the call site
and never escapes, so the pointer recurrence is dead and the loop is
semantically a value max.

--- Actual result (GCC 15) ---

iter_vec and iter_span emit a scalar cmovl argmax loop with zero vpmaxsd
instructions. The remaining four functions (loop_vec, loop_span, ranges_vec,
ranges_span) continue to vectorize correctly.

--- Assembly analysis (AVX2, -O2) ---

iter_vec on GCC 13.2.1 — SIMD reduction via vpmaxsd:

  _Z8iter_vecRKSt6vectorIiSaIiEE:
      movq    8(%rdi), %rdx
      movq    (%rdi), %rsi
      cmpq    %rdx, %rsi
      je      .L2
      vmovd   (%rsi), %xmm0
      leaq    4(%rsi), %rax
      vmovd   %xmm0, %ecx
      cmpq    %rax, %rdx
      je      .L1
      movq    %rdx, %rcx
      subq    %rax, %rcx
      andl    $4, %ecx
      je      .L4
      vmovd   (%rax), %xmm1
      leaq    8(%rsi), %rax
      vpmaxsd %xmm1, %xmm0, %xmm0       ; <-- SIMD integer max
      vmovd   %xmm0, %ecx
      cmpq    %rax, %rdx
      je      .L1
  .L4:
      vmovd   (%rax), %xmm1
      addq    $8, %rax
      vpmaxsd %xmm1, %xmm0, %xmm0       ; <-- SIMD integer max
      vmovd   -4(%rax), %xmm1
      vpmaxsd %xmm1, %xmm0, %xmm0       ; <-- SIMD integer max
      vmovd   %xmm0, %ecx
      cmpq    %rax, %rdx
      jne     .L4
  .L1:
      movl    %ecx, %eax
      ret

iter_vec on GCC 15.2.0 — scalar cmovl argmax (REGRESSION):

  _Z8iter_vecRKSt6vectorIiSaIiEE:
      movq    8(%rdi), %rcx
      movq    (%rdi), %rdx
      cmpq    %rcx, %rdx
      je      .L2
      leaq    4(%rdx), %rax
      cmpq    %rax, %rcx
      je      .L2
  .L4:
      movl    (%rax), %esi
      cmpl    %esi, (%rdx)
      cmovl   %rax, %rdx                 ; carries the POINTER across
iterations
      addq    $4, %rax
      cmpq    %rax, %rcx
      jne     .L4
  .L2:
      movl    (%rdx), %eax
      ret

The iter_span function shows the identical pattern on each compiler,
confirming the iterator type (vector::iterator vs raw const int*) is not the
discriminator.

Reply via email to