https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102435
Bug ID: 102435
Summary: gcc 9: aarch64 -ftree-loop-vectorize results in wrong
code
Product: gcc
Version: 9.4.1
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: dimi...@unified-streaming.com
Target Milestone: ---
We noticed a problem with a loop optimization enabled by -O3 on a program
targeting AArch64. It turns out that this problem is specifically caused by
-ftree-loop-vectorize, and has actually been fixed by (or as a side-effect of)
commit https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=c89366b12ff4f362
("[AArch64] Support vectorising with multiple vector sizes") by Richard
Sandiford.
However, this commit was made on master when it was gcc-10, so while the
problem does not occur with gcc 10.x and 11.x, it *does* occur with 9.x. In our
particular instance, this is the default version on Ubuntu 20.04 for arm64,
e.g. gcc version 9.3.0 (Ubuntu 9.3.0-17ubuntu1~20.04).
Reduced test case:
// g++ -std=c++17 -O2 -ftree-loop-vectorize testcase.cpp
// or
// g++ -std=c++17 -O3 testcase.cpp
#include <cassert>
#include <cstdint>
#include <iostream>
#include <vector>
struct sample_t
{
sample_t(uint64_t dts, uint32_t duration)
: dts_(dts)
, duration_(duration)
, cto_(0)
, sample_description_index_(0)
, pos_(0)
, size_(0)
, flags_(0)
, aux_pos_(0)
, aux_size_(0)
{
}
uint64_t dts_;
uint32_t duration_;
int32_t cto_;
uint32_t sample_description_index_;
uint64_t pos_;
uint32_t size_;
uint32_t flags_;
uint64_t aux_pos_;
uint32_t aux_size_;
};
typedef std::vector<sample_t> samples_t;
__attribute__((__noinline__))
samples_t get_result(samples_t&& samples)
{
uint64_t base_media_decode_time = ~0;
auto first = samples.begin();
auto last = samples.end();
if(first != last)
{
base_media_decode_time = first->dts_;
uint32_t duration = 0;
for(--last; first != last; ++first)
{
duration = static_cast<uint32_t>(first[1].dts_ - first->dts_);
first->duration_ = duration;
}
first->duration_ = duration;
}
return samples;
}
int main(void)
{
samples_t samples_in = { {0, 3}, {3, 3}, {6, 3}, {9, 1}, {10, 2} };
samples_t samples_out = get_result(std::move(samples_in));
for(sample_t sample : samples_out)
{
std::cout << sample.dts_ << ", " << sample.duration_ << '\n';
}
// Expected output:
// 0, 3
// 3, 3
// 6, 3
// 9, 1
// 10, 1
//
// Bad output:
// 0, 3
// 3, 0
// 6, 0
// 9, 0
// 10, 0
return 0;
}
Not that it appears vital that the struct sample_t is pretty large, e.g.
removing all of the members after the first two makes the output correct, even
with gcc 9 and -ftree-loop-vectorize. I have not determined precisely what the
cutoff size is.