[I intent to commit this patch later today or probably tomorrow, unless there are comments, questions or concerns.]
The issue showed up for SPEC HPC's 634.hpgmgfv_s benchmark, but only when running with multiple MPI processes. The reason is that in that case, the work is distributed over multiple processed and for some, the code #pragma omp target enter data map(to:level->vectors[i][:num_my_boxes*box_volume]) happens to have the value num_my_boxes == 0. While for map(ptr[:0]) the pointer attach was permitted to fail and for map(ptr[:5]) both mapping and pointer attach happened, it failed for map(ptr[:n]) if n == 0 at runtime In this case, it is simple to check whether a previous item - e.g. the one just before - is now a zero-sized item (which has an extra map type - updated for non-constant values at runtime). However, with 'target enter data' items get split - and while that tries to keep items together, for more complex code like in the second test case, only the ATTACH remains. The third testcase is even weirder as the preceding item is unrelated to the attach and just happens to sit there - as with struct mappings, the attach are all clustered at the end. An example is code like 'map(var->array[i][:n])' and possibly combined with mapping multiple members in of the same 'var', some with n > 0 and others with n == 0. Solution: For 'target' an attempt is made to check whether this is just a broken attachment or valid. But for 'target data', the code now assumes that not finding a pointer target is fine as it comes from some zero-sized attachment. Thus, while loosing some diagnostic checks, it at least has no false positives for valid real-world code. Any comment before I apply the patch? Tobias PS: I excluded OpenACC - but I think it will have similar issues. However, I have not tried to identify them.
OpenMP: Fix mapping of zero-sized arrays with non-literal size: map(var[:n]), n = 0 For map(ptr[:0]), the used map kind is GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION and it is permitted that 'ptr' does not exist. 'ptr' is set to the device pointee if it exists or to the host value otherwise. For map(ptr[:3]), the variable is first mapped and then ptr is updated to point to the just-mapped device data; the attachment uses GOMP_MAP_ATTACH. For map(ptr[:n]), generates always a GOMP_MAP_ATTACH, but when n == 0, it was failing with: "pointer target not mapped for attach" The solution is not to fail but first to check whether it was mapped before. It turned out that for the mapping part, GCC adds a run-time check whether n == 0 - and uses GOMP_MAP_ZERO_LEN_ARRAY_SECTION for the mapping. Thus, we just have to check whether there such a mapping for the address for which the GOMP_MAP_ATTACH. was requested. And, if there was, the error diagnostic can be skipped. Unsurprisingly, this issue occurs in real-world code; it was detected in a code that distributes work via MPI and for some processes, some bounds ended up to be zero. libgomp/ChangeLog: * target.c (gomp_attach_pointer): Return bool; accept additional bool to optionally silence the fatal pointee-not-found error. (gomp_map_vars_internal): If the pointee could not be found, check whether it was mapped as GOMP_MAP_ZERO_LEN_ARRAY_SECTION. * libgomp.h (gomp_attach_pointer): Update prototype. * oacc-mem.c (acc_attach_async, goacc_enter_data_internal): Update calls. * testsuite/libgomp.c/target-map-zero-sized.c: New test. * testsuite/libgomp.c/target-map-zero-sized-2.c: New test. * testsuite/libgomp.c/target-map-zero-sized-3.c: New test. libgomp/libgomp.h | 4 +- libgomp/oacc-mem.c | 6 +- libgomp/target.c | 64 +++++++++--- .../testsuite/libgomp.c/target-map-zero-sized-2.c | 74 ++++++++++++++ .../testsuite/libgomp.c/target-map-zero-sized-3.c | 49 ++++++++++ .../testsuite/libgomp.c/target-map-zero-sized.c | 107 +++++++++++++++++++++ 6 files changed, 288 insertions(+), 16 deletions(-) diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h index d97768f5125..6030f9d0a2c 100644 --- a/libgomp/libgomp.h +++ b/libgomp/libgomp.h @@ -1468,10 +1468,10 @@ extern void gomp_copy_dev2host (struct gomp_device_descr *, struct goacc_asyncqueue *, void *, const void *, size_t); extern uintptr_t gomp_map_val (struct target_mem_desc *, void **, size_t); -extern void gomp_attach_pointer (struct gomp_device_descr *, +extern bool gomp_attach_pointer (struct gomp_device_descr *, struct goacc_asyncqueue *, splay_tree, splay_tree_key, uintptr_t, size_t, - struct gomp_coalesce_buf *, bool); + struct gomp_coalesce_buf *, bool, bool); extern void gomp_detach_pointer (struct gomp_device_descr *, struct goacc_asyncqueue *, splay_tree_key, uintptr_t, bool, struct gomp_coalesce_buf *); diff --git a/libgomp/oacc-mem.c b/libgomp/oacc-mem.c index 718252b44ba..0482ed37d95 100644 --- a/libgomp/oacc-mem.c +++ b/libgomp/oacc-mem.c @@ -951,7 +951,7 @@ acc_attach_async (void **hostaddr, int async) } gomp_attach_pointer (acc_dev, aq, &acc_dev->mem_map, n, (uintptr_t) hostaddr, - 0, NULL, false); + 0, NULL, false, true); gomp_mutex_unlock (&acc_dev->lock); } @@ -1158,7 +1158,7 @@ goacc_enter_data_internal (struct gomp_device_descr *acc_dev, size_t mapnum, if ((kinds[i] & 0xff) == GOMP_MAP_ATTACH) { gomp_attach_pointer (acc_dev, aq, &acc_dev->mem_map, n, - (uintptr_t) h, s, NULL, false); + (uintptr_t) h, s, NULL, false, true); /* OpenACC 'attach'/'detach' doesn't affect structured/dynamic reference counts ('n->refcount', 'n->dynamic_refcount'). */ } @@ -1176,7 +1176,7 @@ goacc_enter_data_internal (struct gomp_device_descr *acc_dev, size_t mapnum, = lookup_host (acc_dev, hostaddrs[j], sizeof (void *)); gomp_attach_pointer (acc_dev, aq, &acc_dev->mem_map, m, (uintptr_t) hostaddrs[j], sizes[j], NULL, - false); + false, true); } bool processed = false; diff --git a/libgomp/target.c b/libgomp/target.c index a64ee96af2a..9674ff4c9c0 100644 --- a/libgomp/target.c +++ b/libgomp/target.c @@ -800,12 +800,22 @@ gomp_map_fields_existing (struct target_mem_desc *tgt, (void *) cur_node.host_end); } -attribute_hidden void +/* Update the devptr by setting it to the device address of the host pointee + 'attach_to'; devptr is obtained from the splay_tree_key n. + When the pointer is already attached or the host pointee is either + NULL or in memory map, this function returns true. + Otherwise, the device pointer is set to point to the host pointee and: + - If allow_zero_length_array_sections is set, true is returned. + - Else, if fail_if_not_found is set, a fatal error is issued. + - Otherwise, false is returned. */ + +attribute_hidden bool gomp_attach_pointer (struct gomp_device_descr *devicep, struct goacc_asyncqueue *aq, splay_tree mem_map, splay_tree_key n, uintptr_t attach_to, size_t bias, struct gomp_coalesce_buf *cbufp, - bool allow_zero_length_array_sections) + bool allow_zero_length_array_sections, + bool fail_if_not_found) { struct splay_tree_key_s s; size_t size, idx; @@ -860,7 +870,7 @@ gomp_attach_pointer (struct gomp_device_descr *devicep, gomp_copy_host2dev (devicep, aq, (void *) devptr, (void *) &data, sizeof (void *), true, cbufp); - return; + return true; } s.host_start = target + bias; @@ -869,15 +879,16 @@ gomp_attach_pointer (struct gomp_device_descr *devicep, if (!tn) { - if (allow_zero_length_array_sections) - /* When allowing attachment to zero-length array sections, we - copy the host pointer when the target region is not mapped. */ - data = target; - else + /* We copy the host pointer when the target region is not mapped; + for allow_zero_length_array_sections, that's permitted. + Otherwise, it depends on the context. Return false in that + case, unless fail_if_not_found. */ + if (!allow_zero_length_array_sections && fail_if_not_found) { gomp_mutex_unlock (&devicep->lock); gomp_fatal ("pointer target not mapped for attach"); } + data = target; } else data = tn->tgt->tgt_start + tn->tgt_offset + target - tn->host_start; @@ -889,10 +900,13 @@ gomp_attach_pointer (struct gomp_device_descr *devicep, gomp_copy_host2dev (devicep, aq, (void *) devptr, (void *) &data, sizeof (void *), true, cbufp); + if (!tn && !allow_zero_length_array_sections) + return false; } else gomp_debug (1, "%s: attach count for %p -> %u\n", __FUNCTION__, (void *) attach_to, (int) n->aux->attach_count[idx]); + return true; } attribute_hidden void @@ -1587,9 +1601,37 @@ gomp_map_vars_internal (struct gomp_device_descr *devicep, bool zlas = ((kind & typemask) == GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION); - gomp_attach_pointer (devicep, aq, mem_map, n, - (uintptr_t) hostaddrs[i], sizes[i], - cbufp, zlas); + /* For 'target enter data', the map clauses are split; + however, for more complex code with struct and + pointer members, the mapping and the attach can end up + in different sets; or the wrong mapping with the + attach. As there is no way to know whether a size + zero like 'var->ptr[i][:0]' happend in the same + directive or not, the not-attached check is now + fully silenced for 'enter data'. */ + if (openmp_p && (pragma_kind & GOMP_MAP_VARS_ENTER_DATA)) + zlas = true; + if (!gomp_attach_pointer (devicep, aq, mem_map, n, + (uintptr_t) hostaddrs[i], sizes[i], + cbufp, zlas, !openmp_p)) + { + /* Pointee not found; that's an error except for + map(var[:n]) with n == 0; the compiler adds a + runtime condition such that for those the kind is + always GOMP_MAP_ZERO_LEN_ARRAY_SECTION. */ + for (j = i; j > 0; j--) + if (*(void**) hostaddrs[i] == hostaddrs[j-1] - sizes[i] + && sizes[j-1] == 0 + && (GOMP_MAP_ZERO_LEN_ARRAY_SECTION + == (get_kind (short_mapkind, kinds, j-1) + & typemask))) + break; + if (j == 0) + { + gomp_mutex_unlock (&devicep->lock); + gomp_fatal ("pointer target not mapped for attach"); + } + } } else if ((pragma_kind & GOMP_MAP_VARS_OPENACC) != 0) { diff --git a/libgomp/testsuite/libgomp.c/target-map-zero-sized-2.c b/libgomp/testsuite/libgomp.c/target-map-zero-sized-2.c new file mode 100644 index 00000000000..3220828efd0 --- /dev/null +++ b/libgomp/testsuite/libgomp.c/target-map-zero-sized-2.c @@ -0,0 +1,74 @@ +int +main () +{ + int i, n; + int data[] = {1,2}; + struct S { int **ptrset; }; + +// ----------------------------------- + +/* The produced mapping for sptr1->ptrset[i][:n] + + GOMP_MAP_STRUCT (size = 1) + GOMP_MAP_ZERO_LEN_ARRAY_SECTION + GOMP_MAP_ZERO_LEN_ARRAY_SECTION + GOMP_MAP_ATTACH + GOMP_MAP_ATTACH -> attaching to 2nd GOMP_MAP_ZERO_LEN_ARRAY_SECTION + +which get split into 3 separate map_vars call; in particular, +the latter is separate and points to an unmpapped variable. + +Thus, it failed with: + libgomp: pointer target not mapped for attach */ + + struct S s1, *sptr1; + s1.ptrset = (int **) __builtin_malloc (sizeof(void*) * 3); + s1.ptrset[0] = data; + s1.ptrset[1] = data; + s1.ptrset[2] = data; + sptr1 = &s1; + + i = 1; + n = 0; + #pragma omp target enter data map(sptr1[:1], sptr1->ptrset[:3]) + #pragma omp target enter data map(sptr1->ptrset[i][:n]) + + #pragma omp target exit data map(sptr1->ptrset[i][:n]) + #pragma omp target exit data map(sptr1[:1], sptr1->ptrset[:3]) + + __builtin_free (s1.ptrset); + +// ----------------------------------- + +/* The produced mapping for sptr2->ptrset[i][:n] is similar: + + GOMP_MAP_STRUCT (size = 1) + GOMP_MAP_ZERO_LEN_ARRAY_SECTION + GOMP_MAP_TO ! this one has now a finite size + GOMP_MAP_ATTACH + GOMP_MAP_ATTACH -> attach to the GOMP_MAP_TO + +As the latter GOMP_MAP_ATTACH has now a pointer target, +the attachment worked. */ + + struct S s2, *sptr2; + s2.ptrset = (int **) __builtin_malloc (sizeof(void*) * 3); + s2.ptrset[0] = data; + s2.ptrset[1] = data; + s2.ptrset[2] = data; + sptr2 = &s2; + + i = 1; + n = 2; + #pragma omp target enter data map(sptr2[:1], sptr2->ptrset[:3]) + #pragma omp target enter data map(sptr2->ptrset[i][:n]) + + #pragma omp target + if (sptr2->ptrset[1][0] != 1 || sptr2->ptrset[1][1] != 2) + __builtin_abort (); + + #pragma omp target exit data map(sptr2->ptrset[i][:n]) + #pragma omp target exit data map(sptr2[:1], sptr2->ptrset[:3]) + + __builtin_free (s2.ptrset); +} diff --git a/libgomp/testsuite/libgomp.c/target-map-zero-sized-3.c b/libgomp/testsuite/libgomp.c/target-map-zero-sized-3.c new file mode 100644 index 00000000000..f968bd377c2 --- /dev/null +++ b/libgomp/testsuite/libgomp.c/target-map-zero-sized-3.c @@ -0,0 +1,49 @@ +int +main () +{ + int i, n, n2; + int data[] = {1,2}; + struct S { + int **ptrset; + int **ptrset2; + }; + + /* This is the same as target-map-zero-sized-3.c, but by mixing + mapped and non-mapped items, the mapping before the ATTACH + might (or here: is) not actually associated with the the + pointer used for attaching. Thus, if one does a simple + + if (openmp_p + && (pragma_kind & GOMP_MAP_VARS_ENTER_DATA) + && mapnum == 1) + check in target.c's gomp_map_vars_internal will fail + as mapnum > 1 but still the map associated with this + ATTACH is in a different set. */ + + struct S s1, *sptr1; + s1.ptrset = (int **) __builtin_malloc (sizeof(void*) * 3); + s1.ptrset2 = (int **) __builtin_malloc (sizeof(void*) * 3); + s1.ptrset[0] = data; + s1.ptrset[1] = data; + s1.ptrset[2] = data; + s1.ptrset2[0] = data; + s1.ptrset2[1] = data; + s1.ptrset2[2] = data; + sptr1 = &s1; + + i = 1; + n = 0; + n2 = 2; + #pragma omp target enter data map(sptr1[:1], sptr1->ptrset[:3], sptr1->ptrset2[:3]) + #pragma omp target enter data map(sptr1->ptrset[i][:n], sptr1->ptrset2[i][:n]) + + #pragma omp target + if (sptr1->ptrset2[1][0] != 1 || sptr1->ptrset2[1][1] != 2) + __builtin_abort (); + + #pragma omp target exit data map(sptr1->ptrset[i][:n], sptr1->ptrset2[i][:n]) + #pragma omp target exit data map(sptr1[:1], sptr1->ptrset[:3], sptr1->ptrset2[:3]) + + __builtin_free (s1.ptrset); + __builtin_free (s1.ptrset2); +} diff --git a/libgomp/testsuite/libgomp.c/target-map-zero-sized.c b/libgomp/testsuite/libgomp.c/target-map-zero-sized.c new file mode 100644 index 00000000000..7c4ab80bc1a --- /dev/null +++ b/libgomp/testsuite/libgomp.c/target-map-zero-sized.c @@ -0,0 +1,107 @@ +/* { dg-do run } */ +/* { dg-additional-options "-O0" } */ + +/* Issue showed up in the real world when large data was distributed + over multiple MPI progresses - such that for one process n == 0 + happend at run time. + + Before map(var[:0]) and map(var[:n]) with n > 0 was handled, + this patch now also handles map(var[:n]) with n == 0. + + Failed before with "libgomp: pointer target not mapped for attach". */ + +/* Here, the base address is shifted - which should have no effect, + but must work as well. */ +void +with_offset () +{ + struct S { + int *ptr1, *ptr2; + }; + struct S s1, s2; + int *a, *b, *c, *d; + s1.ptr1 = (int *) 0L; + s1.ptr2 = (int *) 0xdeedbeef; + s2.ptr1 = (int *) 0L; + s2.ptr2 = (int *) 0xdeedbeef; + a = (int *) 0L; + b = (int *) 0xdeedbeef; + c = (int *) 0L; + d = (int *) 0xdeedbeef; + + int n1, n2, n3, n4; + n1 = n2 = n3 = n4 = 0; + + #pragma omp target enter data map(s1.ptr1[4:n1], s1.ptr2[6:n2], a[3:n3], b[2:n4]) + + #pragma omp target map(s2.ptr1[4:n1], s2.ptr2[2:n2], c[6:n3], d[9:n4]) + { + if (s2.ptr1 != (void *) 0L || s2.ptr2 != (void *) 0xdeedbeef + || c != (void *) 0L || d != (void *) 0xdeedbeef) + __builtin_abort (); + } + + #pragma omp target map(s1.ptr1[4:n1], s1.ptr2[6:n2], a[3:n3], b[2:n4]) + { + if (s1.ptr1 != (void *) 0L || s1.ptr2 != (void *) 0xdeedbeef + || a != (void *) 0L || b != (void *) 0xdeedbeef) + __builtin_abort (); + } + + #pragma omp target + { + if (s1.ptr1 != (void *) 0L || s1.ptr2 != (void *) 0xdeedbeef + || a != (void *) 0L || b != (void *) 0xdeedbeef) + __builtin_abort (); + } + + #pragma omp target exit data map(s1.ptr1[4:n1], s1.ptr2[6:n2], a[3:n3], b[2:n4]) +} + +int +main () +{ + struct S { + int *ptr1, *ptr2; + }; + struct S s1, s2; + int *a, *b, *c, *d; + s1.ptr1 = (int *) 0L; + s1.ptr2 = (int *) 0xdeedbeef; + s2.ptr1 = (int *) 0L; + s2.ptr2 = (int *) 0xdeedbeef; + a = (int *) 0L; + b = (int *) 0xdeedbeef; + c = (int *) 0L; + d = (int *) 0xdeedbeef; + + int n1, n2, n3, n4; + n1 = n2 = n3 = n4 = 0; + + #pragma omp target enter data map(s1.ptr1[:n1], s1.ptr2[:n2], a[:n3], b[:n4]) + + #pragma omp target map(s2.ptr1[:n1], s2.ptr2[:n2], c[:n3], d[:n4]) + { + if (s2.ptr1 != (void *) 0L || s2.ptr2 != (void *) 0xdeedbeef + || c != (void *) 0L || d != (void *) 0xdeedbeef) + __builtin_abort (); + } + + #pragma omp target map(s1.ptr1[:n1], s1.ptr2[:n2], a[:n3], b[:n4]) + { + if (s1.ptr1 != (void *) 0L || s1.ptr2 != (void *) 0xdeedbeef + || a != (void *) 0L || b != (void *) 0xdeedbeef) + __builtin_abort (); + } + + #pragma omp target + { + if (s1.ptr1 != (void *) 0L || s1.ptr2 != (void *) 0xdeedbeef + || a != (void *) 0L || b != (void *) 0xdeedbeef) + __builtin_abort (); + } + + #pragma omp target exit data map(s1.ptr1[:n1], s1.ptr2[:n2], a[:n3], b[:n4]) + + with_offset (); +}