https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114596
Bug ID: 114596 Summary: [OpenMP] "declare variant" scoring seems incorrect for construct selectors Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: sandra at gcc dot gnu.org Target Milestone: --- Created attachment 57882 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57882&action=edit reduced test case Background: I've been working on integrating the dynamic selector support added for "metadirective" (patches previously submitted but not yet approved) with the existing support for "declare variant", rewriting a bunch of code so that they both share the same representations and functions for matching, scoring, and sorting. Those patches aren't yet ready to submit and are stage 1 material, but in comparing differences in test results between my new implementation and the previous behavior, it looks to me like the scoring for construct selectors in "declare variant" is pretty borked on mainline. The attached testcase is a reduced version of c-c++-common/gomp/declare-variant-12.c. The "16+8+4" comment on f14 agrees with what the code is doing, but it seems incorrect according to the spec. By my reading, the OpenMP context at the point of call to f17 is {teams, parallel, for, parallel, for} with associated scores {1, 2, 4, 8, 16}, per sections 7.1 and 7.3 of the 5.2 spec, respectively. My understanding of the latter part of item (1) in 7.3 "if the traits that correspond to the construct selector set appear multiple times in the OpenMP context, the highest valued subset of context traits that contains all selectors in the same order are used" is that the matching selectors don't have to appear contiguously, so the score would be 1+8+16. In discussion with Tobias, he thinks there is still some ambiguity about which of the duplicates is preferred for the match, though. I added some instrumentation to the code to try to figure out how it was coming up with "16+8+4"; the patch is attached, and produced output $ install/bin/x86_64-linux-gnu-gcc declare-variant-12.c -fopenmp -foffload=disable -S <snip> codes[0] = omp_for codes[1] = omp_parallel codes[2] = omp_for codes[3] = omp_parallel codes[4] = omp_teams constructs[0] = omp_for constructs[1] = omp_parallel constructs[2] = omp_teams omp_teams, i = 2, j = 4, position = 4 omp_parallel, i = 1, j = 3, position = 3 omp_for, i = 0, j = 2, position = 2 constructs[0] = omp_for, scores[0] = 4, score = 16 constructs[1] = omp_parallel, scores[1] = 3, score = 8 constructs[2] = omp_teams, scores[2] = 2, score = 4 For selector construct = {teams, parallel, for} score1 = 29 score2 = 29 The "codes" array is the OpenMP context, "constructs" are the traits in the construct selector, and the "scores" array does not contain scores, but rather the positions of the elements of the "constructs" array in the "codes" array. I believe there are at least three bugs here: (1) The constructs array is being processed backwards from n-1 to 0, but the scores array is indexed from 0 to n-1, and later it's assumed both arrays are in the same order. (2) The codes array is backwards and so are the position values, and there's no adjustment to subtract the position from the length of the "codes" array in computing the corresponding score. (3) It's choosing the wrong subset from the duplicate traits, preferring the should-be-lower-valued outer ones over the higher-valued inner ones. I don't know if anybody wants to tackle a bug fix for this code for GCC 14. As I've said, I've got a complete rewrite I'm planning to submit early in stage 1, which I hope will be an improvement from an engineering perspective in terms of readability/maintainability, with cleaner and better-documented interfaces and references to the spec to explain what it is doing.