https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114596

            Bug ID: 114596
           Summary: [OpenMP] "declare variant" scoring seems incorrect for
                    construct selectors
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: middle-end
          Assignee: unassigned at gcc dot gnu.org
          Reporter: sandra at gcc dot gnu.org
  Target Milestone: ---

Created attachment 57882
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57882&action=edit
reduced test case

Background: I've been working on integrating the dynamic selector support added
for "metadirective" (patches previously submitted but not yet approved) with
the existing support for "declare variant", rewriting a bunch of code so that
they both share the same representations and functions for matching, scoring,
and sorting.  Those patches aren't yet ready to submit and are stage 1
material, but in comparing differences in test results between my new
implementation and the previous behavior, it looks to me like the scoring for
construct selectors in "declare variant" is pretty borked on mainline.

The attached testcase is a reduced version of
c-c++-common/gomp/declare-variant-12.c.  The "16+8+4" comment on f14 agrees
with what the code is doing, but it seems incorrect according to the spec.  By
my reading, the OpenMP context at the point of call to f17 is {teams, parallel,
for, parallel, for} with associated scores {1, 2, 4, 8, 16}, per sections 7.1
and 7.3 of the 5.2 spec, respectively.  My understanding of the latter part of
item (1) in 7.3

"if the traits that correspond to the construct selector set appear multiple
times in the OpenMP context, the highest valued subset of context traits that
contains all selectors in the same order are used"

is that the matching selectors don't have to appear contiguously, so the score
would be 1+8+16.  In discussion with Tobias, he thinks there is still some
ambiguity about which of the duplicates is preferred for the match, though.

I added some instrumentation to the code to try to figure out how it was coming
up with "16+8+4"; the patch is attached, and produced output 

$ install/bin/x86_64-linux-gnu-gcc declare-variant-12.c -fopenmp
-foffload=disable -S
<snip>
codes[0] = omp_for
codes[1] = omp_parallel
codes[2] = omp_for
codes[3] = omp_parallel
codes[4] = omp_teams
constructs[0] = omp_for
constructs[1] = omp_parallel
constructs[2] = omp_teams
omp_teams, i = 2, j = 4, position = 4
omp_parallel, i = 1, j = 3, position = 3
omp_for, i = 0, j = 2, position = 2
constructs[0] = omp_for, scores[0] = 4, score = 16
constructs[1] = omp_parallel, scores[1] = 3, score = 8
constructs[2] = omp_teams, scores[2] = 2, score = 4

For selector construct = {teams, parallel, for}
  score1 = 29   score2 = 29

The "codes" array is the OpenMP context, "constructs" are the traits in the
construct selector, and the "scores" array does not contain scores, but rather
the positions of the elements of the "constructs" array in the "codes" array.

I believe there are at least three bugs here:

(1) The constructs array is being processed backwards from n-1 to 0, but the
scores array is indexed from 0 to n-1, and later it's assumed both arrays are
in the same order.

(2) The codes array is backwards and so are the position values, and there's no
adjustment to subtract the position from the length of the "codes" array in
computing the corresponding score.

(3) It's choosing the wrong subset from the duplicate traits, preferring the
should-be-lower-valued outer ones over the higher-valued inner ones.

I don't know if anybody wants to tackle a bug fix for this code for GCC 14.  As
I've said, I've got a complete rewrite I'm planning to submit early in stage 1,
which I hope will be an improvement from an engineering perspective in terms of
readability/maintainability, with cleaner and better-documented interfaces and
references to the spec to explain what it is doing.

Reply via email to