This patch refactors 'hashtab.h' from a prime-sized double-hash map to a power of two sized Robin Hood hashing regime. This is done to improve cache locality, alongside allowing for the use of masking rather than modulo computation. Additional changes were also made with regards to reducing the number of branches when possible, such as splitting the INSERT and NO_INSERT options into separate functions. The API additionally now makes some stronger promises, such as actually inserting when the insert path is taken, rather than relying on the caller to do the insertion. Integration with these new semantics is also handled.
The data structure was microbenchmarked by testing insertion of 20,000,000 elements, erasing 1,000 elements while the table had 20,000,000 elements, lookup of 1,000 elements while the table had 20,000,000 elements, both in the case of the elements existing and not existing, and lookup of if different slots exist 1,000 times in a table of 20,000,000 elements. Tests were performed on an AMD Ryzen 9 9950X3D. With the previous implementation, we see a time of 1537ms for the insertion test, and 1140ms for this implementation. We additionally see improvements in lookups for elements that exist (15508 -> 9303 ns, 'htab_find') and slots that exist (23339 -> 9339 ns, 'htab_find_slot_no_insert'). However, there are some trade-offs: we perform worse in deletion due to the extra logic for Robin Hood maintenance (26 -> 43 us) and in lookups where the element doesn't exist (1648 -> 6912 ns, 'htab_find'). I conjecture the latter case is due to RHH being less forgiving of clustering, and thus we encounter longer probe chains looking for elements that don't exist. This is further discussed below. For the former case, this seems to only be encountered when clearing a task dependency list, and as such, should not be as frequently encountered. All times provided are averages over 50 runs. Relating to the probe chain issue, a better hash function than the one inherited from the current implementation will probably alleviate it, however, I was unable to devise a better one without significant trade offs. Approaches borrowing from SplitMix64-style design, PCGs, and TGFSRs did not yield better distributions. Xor-Shift LFSRs followed by a left rotation did, however, made the hash function a bit unwieldy. This is left open in this patch as a fixme. Built/tested x86_64-pc-linux-gnu.
From 3b873821fb9e9c5ef6467ff617a028685d3ce59e Mon Sep 17 00:00:00 2001 From: supers1ngular <[email protected]> Date: Fri, 26 Jun 2026 09:10:02 -0700 Subject: [PATCH] libgomp: Refactor Hashtab This patch refactors the libgomp hashmap 'hashtab.h'. This work is done primarily to improve performance. We switch the structure from a prime sized map with double hashing to a power of two sized map with Robin Hood Hashing. This serves to improve cache locality in the case of collisions. It additionally enables the use of a bitwise AND rather than performing modulo operations. We are also able to separate the INSERT and NO_INSERT paths, which experimentally showed improved inlining behavior alongside reducing encountered branches, and thus potential mispredicts. libgomp/ChangeLog: * config/accel/target-indirect.c (build_indirect_map): Change function calls to reflect new API and structure. * hashtab.h (HTAB_DELETED_ENTRY): Remove. (struct htab): Redefine. (enum insert_option): Remove. (struct prime_ent): Ditto. (struct rh_entry): Define. (RH_EMPTY): Define as 0. (higher_prime_index): Remove. (htab_expand): Refactor to be more concise. (htab_size): Remove extra logic. (htab_mod_1): Remove. (htab_mod): Ditto. (htab_mod_m2): Ditto. (rup2): Round up to nearest power of 2 helper function. (htab_clear): Refactor to account for 'rh_entry'. (htab_create): Refactor for new table sizes and members. (find_empty_slot_for_expand): Remove. (htab_find): Refactor to use RH logic. (htab_find_slot_insert): New function for insertion. (htab_find_slot): Remove. (htab_find_slot_no_insert): New function for slot lookup. (htab_clear_slot): Refactor to support RH invariant. * plugin/build-target-indirect-htab.h (create_target_indirect_map): Change function calls to reflect new API and structure. * target.c (gomp_increment_refcount): Ditto. (gomp_decrement_refcount): Ditto. * task.c (gomp_task_handle_depend): Ditto. (gomp_task_run_post_handle_depend_hash): Ditto. (gomp_task_maybe_wait_for_dependencies): Ditto. (gomp_reduction_register): Ditto. Co-Authored-By: Waffl3x <[email protected]> --- libgomp/config/accel/target-indirect.c | 4 +- libgomp/hashtab.h | 562 ++++++++------------ libgomp/plugin/build-target-indirect-htab.h | 6 +- libgomp/target.c | 6 +- libgomp/task.c | 98 ++-- 5 files changed, 293 insertions(+), 383 deletions(-) diff --git a/libgomp/config/accel/target-indirect.c b/libgomp/config/accel/target-indirect.c index 475cd528a49..38f50d9a13a 100644 --- a/libgomp/config/accel/target-indirect.c +++ b/libgomp/config/accel/target-indirect.c @@ -113,9 +113,7 @@ build_indirect_map (void) { hash_entry_type element; SET_INDIRECT_ADDRS (element, *map_entry, *(map_entry + 1)); - hash_entry_type *slot = htab_find_slot (&indirect_htab, element, - INSERT); - *slot = element; + htab_find_slot_insert (&indirect_htab, element); } } diff --git a/libgomp/hashtab.h b/libgomp/hashtab.h index b3462aa09e8..1792ef58ab3 100644 --- a/libgomp/hashtab.h +++ b/libgomp/hashtab.h @@ -24,428 +24,310 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see <http://www.gnu.org/licenses/>. */ -/* The hash table code copied from include/hashtab.[hc] and adjusted, - so that the hash table entries are in the flexible array at the end - of the control structure, no callbacks are used and the elements in the - table are of the hash_entry_type type. - Before including this file, define hash_entry_type type and - htab_alloc and htab_free functions. After including it, define - htab_hash and htab_eq inline functions. */ +/* The hash map here is an adaption of the previous libgomp version, itself + an iteration of an implementation from libiberty, made to retain a + similar interface and modernize the internals. Previously used was a + prime-sized double hash table. Now, we use a power of two sized table + using linear probing and Robin Hood hashing. -/* This package implements basic hash table functionality. It is possible - to search for an entry, create an entry and destroy an entry. + This header implements basic hash table functionality. One can search + for an entry, create an entry, and delete an entry. Elements in the table are generic pointers. - The size of the table is not fixed; if the occupancy of the table - grows too high the hash table will be expanded. + The size of the table is variant, but will always be a power of two. + The table will appropriately grow according to the defined policy of: - The abstract data implementation is based on generalized Algorithm D - from Knuth's book "The art of computer programming". Hash table is - expanded by creation of new hash table and transferring elements from - the old table to the new table. */ + If the table is at over half occupancy, expand. -/* The type for a hash code. */ -typedef unsigned int hashval_t; - -static inline hashval_t htab_hash (hash_entry_type); -static inline bool htab_eq (hash_entry_type, hash_entry_type); - -/* This macro defines reserved value for empty table entry. */ - -#define HTAB_EMPTY_ENTRY ((hash_entry_type) 0) + The table will always maintain the "Robin Hood" invariant, that is: -/* This macro defines reserved value for table entry which contained - a deleted element. */ + If a slot is found during probing that has a smaller distance from + home than the current probe, it is safe to return, since if the key + did exist, it would have a probe distance of at most the probed + entries. -#define HTAB_DELETED_ENTRY ((hash_entry_type) 1) + This was done to minimize probe chains and enforce consistency in + lookups. Further, it improves expected probe chain lengths in the + worst case to theta (logn) when paired with a distance check. -/* Hash tables are of the following type. The structure - (implementation) of this type is not needed for using the hash - tables. All work with hash table should be executed only through - functions mentioned below. The size of this structure is subject to - change. */ + Do note, however, that we do not necessarily always observe this + early exit in practice, as the additional branch can impact performance + negatively. As such, we have a hybrid design here, in which we preserve + the Robin Hood invariant, but rather use typical linear probing without + checking for the case of an early return. This, in theory, returns us + back to the typical worst case, but in practice, we still usually get the + benefit. */ -struct htab { - /* Current size (in entries) of the hash table. */ - size_t size; +/* In order to properly use the map, we inherit the same requirements + as the previous map. - /* Current number of elements including also deleted elements. */ - size_t n_elements; + Prior to including the header: + 1. You must define the type of hash_entry_type + 2. You must define the htab_alloc inline function + 3. You must define the htab_free inline function - /* Current number of deleted elements in the table. */ - size_t n_deleted; + After including the header: + 1. You must define the htab_hash inline function + 2. You must define the htab_eq inline function. */ - /* Current size (in entries) of the hash table, as an index into the - table of primes. */ - unsigned int size_prime_index; - - /* Table itself. */ - hash_entry_type entries[]; -}; - -typedef struct htab *htab_t; - -/* An enum saying whether we insert into the hash table or not. */ -enum insert_option {NO_INSERT, INSERT}; +/* The type for the hash code. */ +typedef unsigned int hashval_t; -/* Table of primes and multiplicative inverses. - Note that these are not minimally reduced inverses. Unlike when generating - code to divide by a constant, we want to be able to use the same algorithm - all the time. All of these inverses (are implied to) have bit 32 set. +static inline hashval_t htab_hash (hash_entry_type); +static inline bool htab_eq (hash_entry_type, hash_entry_type); - For the record, the function that computed the table is in - libiberty/hashtab.c. */ +/* Reserves 0 to define empty table entry. */ +#define HTAB_EMPTY_ENTRY ((hash_entry_type) 0) -struct prime_ent +/* Struct that contains the actual entry and Robin Hood metadata. */ +struct rh_entry { - hashval_t prime; - hashval_t inv; - hashval_t inv_m2; /* inverse of prime-2 */ - hashval_t shift; -}; - -static struct prime_ent const prime_tab[] = { - { 7, 0x24924925, 0x9999999b, 2 }, - { 13, 0x3b13b13c, 0x745d1747, 3 }, - { 31, 0x08421085, 0x1a7b9612, 4 }, - { 61, 0x0c9714fc, 0x15b1e5f8, 5 }, - { 127, 0x02040811, 0x0624dd30, 6 }, - { 251, 0x05197f7e, 0x073260a5, 7 }, - { 509, 0x01824366, 0x02864fc8, 8 }, - { 1021, 0x00c0906d, 0x014191f7, 9 }, - { 2039, 0x0121456f, 0x0161e69e, 10 }, - { 4093, 0x00300902, 0x00501908, 11 }, - { 8191, 0x00080041, 0x00180241, 12 }, - { 16381, 0x000c0091, 0x00140191, 13 }, - { 32749, 0x002605a5, 0x002a06e6, 14 }, - { 65521, 0x000f00e2, 0x00110122, 15 }, - { 131071, 0x00008001, 0x00018003, 16 }, - { 262139, 0x00014002, 0x0001c004, 17 }, - { 524287, 0x00002001, 0x00006001, 18 }, - { 1048573, 0x00003001, 0x00005001, 19 }, - { 2097143, 0x00004801, 0x00005801, 20 }, - { 4194301, 0x00000c01, 0x00001401, 21 }, - { 8388593, 0x00001e01, 0x00002201, 22 }, - { 16777213, 0x00000301, 0x00000501, 23 }, - { 33554393, 0x00001381, 0x00001481, 24 }, - { 67108859, 0x00000141, 0x000001c1, 25 }, - { 134217689, 0x000004e1, 0x00000521, 26 }, - { 268435399, 0x00000391, 0x000003b1, 27 }, - { 536870909, 0x00000019, 0x00000029, 28 }, - { 1073741789, 0x0000008d, 0x00000095, 29 }, - { 2147483647, 0x00000003, 0x00000007, 30 }, - /* Avoid "decimal constant so large it is unsigned" for 4294967291. */ - { 0xfffffffb, 0x00000006, 0x00000008, 31 } + hash_entry_type entry; + uint32_t dist; }; -/* The following function returns an index into the above table of the - nearest prime number which is greater than N, and near a power of two. */ +/* Defines RH_EMPTY to 0. */ +#define RH_EMPTY HTAB_EMPTY_ENTRY -static unsigned int -higher_prime_index (unsigned long n) -{ - unsigned int low = 0; - unsigned int high = sizeof(prime_tab) / sizeof(prime_tab[0]); +/* Struct containing table and associated metadata. + size: Current table size. + n_elements: Number of elements that have been inserted. + entries: Table. */ +struct htab + { + size_t size; + size_t n_elements; + struct rh_entry entries[]; + }; - while (low != high) - { - unsigned int mid = low + (high - low) / 2; - if (n > prime_tab[mid].prime) - low = mid + 1; - else - high = mid; - } - - /* If we've run out of primes, abort. */ - if (n > prime_tab[low].prime) - abort (); - - return low; -} -/* Return the current size of given hash table. */ +typedef struct htab *htab_t; +/* Forward declaration. */ +static htab_t htab_expand (htab_t htab); +/* Return current table size. */ static inline size_t htab_size (htab_t htab) { return htab->size; } -/* Return the current number of elements in given hash table. */ - +/* Return current number of elements in the table. */ static inline size_t htab_elements (htab_t htab) { - return htab->n_elements - htab->n_deleted; -} - -/* Return X % Y. */ - -static inline hashval_t -htab_mod_1 (hashval_t x, hashval_t y, hashval_t inv, int shift) -{ - /* The multiplicative inverses computed above are for 32-bit types, and - requires that we be able to compute a highpart multiply. */ - if (sizeof (hashval_t) * __CHAR_BIT__ <= 32) - { - hashval_t t1, t2, t3, t4, q, r; - - t1 = ((unsigned long long)x * inv) >> 32; - t2 = x - t1; - t3 = t2 >> 1; - t4 = t1 + t3; - q = t4 >> shift; - r = x - (q * y); - - return r; - } - - /* Otherwise just use the native division routines. */ - return x % y; -} - -/* Compute the primary hash for HASH given HTAB's current size. */ - -static inline hashval_t -htab_mod (hashval_t hash, htab_t htab) -{ - const struct prime_ent *p = &prime_tab[htab->size_prime_index]; - return htab_mod_1 (hash, p->prime, p->inv, p->shift); + return htab->n_elements; } -/* Compute the secondary hash for HASH given HTAB's current size. */ - -static inline hashval_t -htab_mod_m2 (hashval_t hash, htab_t htab) +/* Helper function to determine the next largest power of two + from an unknown, non-zero value of an unsigned integral type. */ +static inline size_t +rup2 (size_t num) { - const struct prime_ent *p = &prime_tab[htab->size_prime_index]; - return 1 + htab_mod_1 (hash, p->prime - 2, p->inv_m2, p->shift); + num -= 1; + num |= (num >> 1); + num |= (num >> 2); + num |= (num >> 4); + num |= (num >> 8); + num |= (num >> 16); + if (sizeof (size_t) == 8) + num |= (num >> 32); + return num + 1; } +/* Clears the table. */ static inline htab_t htab_clear (htab_t htab) { htab->n_elements = 0; - htab->n_deleted = 0; - memset (htab->entries, 0, htab->size * sizeof (hash_entry_type)); + memset (htab->entries, 0, htab->size * sizeof (struct rh_entry)); return htab; } -/* Create hash table of size SIZE. */ - +/* Creates a new table of power of two size not less than 32. + This was done to preclude very early grow calls, while still + choosing a number that allows quick traversals. */ static htab_t htab_create (size_t size) { - htab_t result; - unsigned int size_prime_index; - - size_prime_index = higher_prime_index (size); - size = prime_tab[size_prime_index].prime; - - result = (htab_t) htab_alloc (sizeof (struct htab) - + size * sizeof (hash_entry_type)); + size = (size < 32) ? 32 : rup2 (size); + htab_t result = (htab_t) htab_alloc (sizeof (struct htab) + + (size * sizeof (struct rh_entry))); result->size = size; - result->size_prime_index = size_prime_index; return htab_clear (result); } -/* Similar to htab_find_slot, but without several unwanted side effects: - - Does not call htab_eq when it finds an existing entry. - - Does not change the count of elements in the hash table. - This function also assumes there are no deleted entries in the table. - HASH is the hash value for the element to be inserted. */ - -static hash_entry_type * -find_empty_slot_for_expand (htab_t htab, hashval_t hash) +/* Find an element, if it is in the table. */ +static hash_entry_type +htab_find (htab_t htab, const hash_entry_type elem) { - hashval_t index = htab_mod (hash, htab); - size_t size = htab_size (htab); - hash_entry_type *slot = htab->entries + index; - hashval_t hash2; - - if (*slot == HTAB_EMPTY_ENTRY) - return slot; - else if (*slot == HTAB_DELETED_ENTRY) - abort (); + size_t mask = htab->size - 1; + hashval_t idx = htab_hash (elem) & mask; + struct rh_entry *slot = &htab->entries[idx]; + size_t dist = 0; - hash2 = htab_mod_m2 (hash, htab); - for (;;) + while (1) { - index += hash2; - if (index >= size) - index -= size; - - slot = htab->entries + index; - if (*slot == HTAB_EMPTY_ENTRY) - return slot; - else if (*slot == HTAB_DELETED_ENTRY) - abort (); + if (!slot->entry || htab_eq (slot->entry, elem)) + return slot->entry; + if (slot->entry && dist > slot->dist) + return HTAB_EMPTY_ENTRY; + idx = (idx + 1) & mask; + slot = &htab->entries[idx]; + dist += 1; } } -/* The following function changes size of memory allocated for the - entries and repeatedly inserts the table elements. The occupancy - of the table after the call will be about 50%. Naturally the hash - table must already exist. Remember also that the place of the - table entries is changed. */ - -static htab_t -htab_expand (htab_t htab) -{ - htab_t nhtab; - hash_entry_type *olimit; - hash_entry_type *p; - size_t osize, elts; - - osize = htab->size; - olimit = htab->entries + osize; - elts = htab_elements (htab); - - /* Resize only when table after removal of unused elements is either - too full or too empty. */ - if (elts * 2 > osize || (elts * 8 < osize && osize > 32)) - nhtab = htab_create (elts * 2); - else - nhtab = htab_create (osize - 1); - nhtab->n_elements = htab->n_elements - htab->n_deleted; - - p = htab->entries; - do - { - hash_entry_type x = *p; - - if (x != HTAB_EMPTY_ENTRY && x != HTAB_DELETED_ENTRY) - *find_empty_slot_for_expand (nhtab, htab_hash (x)) = x; - - p++; - } - while (p < olimit); - - htab_free (htab); - return nhtab; -} - -/* This function searches for a hash table entry equal to the given - element. It cannot be used to insert or delete an element. */ - -static hash_entry_type -htab_find (htab_t htab, const hash_entry_type element) +/* The insertion codepath. Inserts an element while maintaining + the Robin Hood invariant. */ +static struct rh_entry * +htab_find_slot_insert (htab_t *htabp, const hash_entry_type elem) { - hashval_t index, hash2, hash = htab_hash (element); - size_t size; - hash_entry_type entry; - - size = htab_size (htab); - index = htab_mod (hash, htab); + htab_t htab = *htabp; + if (htab->n_elements * 2 >= htab->size) + htab = *htabp = htab_expand (htab); + const size_t mask = htab->size - 1; + hashval_t idx = htab_hash (elem) & mask; - entry = htab->entries[index]; - if (entry == HTAB_EMPTY_ENTRY - || (entry != HTAB_DELETED_ENTRY && htab_eq (entry, element))) - return entry; + struct rh_entry cur; + cur.entry = elem; + cur.dist = 0; - hash2 = htab_mod_m2 (hash, htab); - for (;;) + while (1) { - index += hash2; - if (index >= size) - index -= size; - - entry = htab->entries[index]; - if (entry == HTAB_EMPTY_ENTRY - || (entry != HTAB_DELETED_ENTRY && htab_eq (entry, element))) - return entry; + struct rh_entry *slot = &htab->entries[idx]; + uint32_t slot_dist = slot->dist; + + if (!slot->entry) + { + *slot = cur; + htab->n_elements++; + return slot; + } + if (htab_eq (slot->entry, elem)) + return slot; + if (slot_dist < cur.dist) + { + struct rh_entry tmp = *slot; + *slot = cur; + cur = tmp; + struct rh_entry *ret = slot; + idx = (idx + 1) & mask; + ++cur.dist; + while (1) + { + slot = &htab->entries[idx]; + if (slot->entry == RH_EMPTY) + { + *slot = cur; + htab->n_elements++; + return ret; + } + if (slot->dist < cur.dist) + { + struct rh_entry tmp = *slot; + *slot = cur; + cur = tmp; + } + idx = (idx + 1) & mask; + ++cur.dist; + } + } + idx = (idx + 1) & mask; + ++cur.dist; } } -/* This function searches for a hash table slot containing an entry - equal to the given element. To delete an entry, call this with - insert=NO_INSERT, then call htab_clear_slot on the slot returned - (possibly after doing some checks). To insert an entry, call this - with insert=INSERT, then write the value you want into the returned - slot. */ +/* The codepaths are entirely different for the NO_INSERT and for the + INSERT cases, and as such, they were made into separate functions + to mitigate unnecessary logic/branching. -static hash_entry_type * -htab_find_slot (htab_t *htabp, const hash_entry_type element, - enum insert_option insert) + We do presume that the caller knows the element to be within the map, + as this is the use case in libgomp thus far. Otherwise, this does not + maintain the RH invariant, so probing may continue longer than + one would anticipate. For other uses, consider htab_find. + + Unused attribute applied for target.c. */ +__attribute__ ((unused)) +static struct rh_entry * +htab_find_slot_no_insert (htab_t *htabp, const hash_entry_type elem) { - hash_entry_type *first_deleted_slot; - hashval_t index, hash2, hash = htab_hash (element); - size_t size; - hash_entry_type entry; htab_t htab = *htabp; + const size_t mask = htab->size - 1; + hashval_t idx = htab_hash (elem) & mask; - size = htab_size (htab); - if (insert == INSERT && size * 3 <= htab->n_elements * 4) + struct rh_entry *slot = &htab->entries[idx]; + while (slot->entry != RH_EMPTY) { - htab = *htabp = htab_expand (htab); - size = htab_size (htab); - } - - index = htab_mod (hash, htab); - - first_deleted_slot = NULL; - - entry = htab->entries[index]; - if (entry == HTAB_EMPTY_ENTRY) - goto empty_entry; - else if (entry == HTAB_DELETED_ENTRY) - first_deleted_slot = &htab->entries[index]; - else if (htab_eq (entry, element)) - return &htab->entries[index]; - - hash2 = htab_mod_m2 (hash, htab); - for (;;) - { - index += hash2; - if (index >= size) - index -= size; - - entry = htab->entries[index]; - if (entry == HTAB_EMPTY_ENTRY) - goto empty_entry; - else if (entry == HTAB_DELETED_ENTRY) - { - if (!first_deleted_slot) - first_deleted_slot = &htab->entries[index]; - } - else if (htab_eq (entry, element)) - return &htab->entries[index]; + if (htab_eq (slot->entry, elem)) + return slot; + idx = (idx + 1) & mask; + slot = &htab->entries[idx]; } + return NULL; +} - empty_entry: - if (insert == NO_INSERT) - return NULL; - if (first_deleted_slot) +/* Handles expansion/shrink logic + Allocates a new map of the appropriate size, then + populates it by iterating through the previous. + Shrink/growth policy are defined by doubling or + halving. Meant to be an internal helper. */ +static htab_t +htab_expand (htab_t htab) +{ + size_t osize = htab->size; + /* We only ever call to grow. */ + htab_t nhtab = htab_create (osize * 2); + for (size_t i = 0; i < osize; ++i) { - htab->n_deleted--; - *first_deleted_slot = HTAB_EMPTY_ENTRY; - return first_deleted_slot; + struct rh_entry *exp = &htab->entries[i]; + if (exp->entry) + htab_find_slot_insert (&nhtab, exp->entry); } - - htab->n_elements++; - return &htab->entries[index]; + htab_free (htab); + return nhtab; } -/* This function clears a specified slot in a hash table. It is - useful when you've already done the lookup and don't want to do it - again. */ - +/* Deletion logic. To be called on a rh_entry slot. + Performs deletion whilst maintaining the Robin + Hood invariant. */ static inline void -htab_clear_slot (htab_t htab, hash_entry_type *slot) +htab_clear_slot (htab_t htab, struct rh_entry *slot) { - if (slot < htab->entries || slot >= htab->entries + htab_size (htab) - || *slot == HTAB_EMPTY_ENTRY || *slot == HTAB_DELETED_ENTRY) + size_t idx = slot - htab->entries; + if (idx >= htab->size || slot->entry == RH_EMPTY) abort (); - - *slot = HTAB_DELETED_ENTRY; - htab->n_deleted++; + size_t mask = htab->size - 1; + while (1) + { + size_t next = (idx + 1) & mask; + struct rh_entry *nentry = &htab->entries[next]; + if (nentry->entry == RH_EMPTY || nentry->dist == 0) + { + htab->entries[idx].entry = RH_EMPTY; + htab->entries[idx].dist = 0; + break; + } + htab->entries[idx] = *nentry; + --(htab->entries[idx].dist); + idx = next; + } + --(htab->n_elements); } -/* Returns a hash code for pointer P. Simplified version of evahash */ - +/* Default pointer hash function. */ +/* FIXME: This default hash seems to work well enough, however, we do + see larger worst-case probe chain sizes with it than would be preferable. + Testing with several other methods (e.g. SplitMix64) yield good + statistical results, however, we only saw a meaningful reduction in + maximum probe chain length with several iterations of a LFSR followed + by a proportional rotate. Even with the special case noted by Brent of + Marsaglia's xorshift generator class, being the (7,9) pair with 64-bit + state rather than the usual triplet structure, the function was too + expensive to justify. */ static inline hashval_t hash_pointer (const void *p) { diff --git a/libgomp/plugin/build-target-indirect-htab.h b/libgomp/plugin/build-target-indirect-htab.h index a13f9717bf7..3a5bb77dda8 100644 --- a/libgomp/plugin/build-target-indirect-htab.h +++ b/libgomp/plugin/build-target-indirect-htab.h @@ -73,11 +73,9 @@ create_target_indirect_map (size_t *h_size, size_t count, for (int i = 0; i < count; i++) { SET_INDIRECT_ADDRS (element, host_addrs[i], device_addrs[i]); - hash_entry_type *slot = htab_find_slot (&indirect_htab, element, - INSERT); - *slot = element; + htab_find_slot_insert (&indirect_htab, element); } *h_size = (sizeof (struct htab) - + htab_size (indirect_htab) * sizeof (hash_entry_type)); + + htab_size (indirect_htab) * sizeof (struct rh_entry)); return (void*) indirect_htab; } diff --git a/libgomp/target.c b/libgomp/target.c index 78f390d593f..90ba8aa6430 100644 --- a/libgomp/target.c +++ b/libgomp/target.c @@ -513,8 +513,7 @@ gomp_increment_refcount (splay_tree_key k, htab_t *refcount_set) { if (htab_find (*refcount_set, refcount_ptr)) return; - uintptr_t **slot = htab_find_slot (refcount_set, refcount_ptr, INSERT); - *slot = refcount_ptr; + htab_find_slot_insert (refcount_set, refcount_ptr); } *refcount_ptr += 1; @@ -572,8 +571,7 @@ gomp_decrement_refcount (splay_tree_key k, htab_t *refcount_set, bool delete_p, goto end; } - uintptr_t **slot = htab_find_slot (refcount_set, refcount_ptr, INSERT); - *slot = refcount_ptr; + htab_find_slot_insert (refcount_set, refcount_ptr); new_encountered_refcount = true; } else diff --git a/libgomp/task.c b/libgomp/task.c index 89dafb87220..cb6b453e0e3 100644 --- a/libgomp/task.c +++ b/libgomp/task.c @@ -280,19 +280,20 @@ gomp_task_handle_depend (struct gomp_task *task, struct gomp_task *parent, task->depend[0].redundant_out = false; if (parent->depend_hash) { - /* Inlined htab_traverse + htab_clear. All newer siblings can - just depend on this task. Add dependencies on all previous + /* Inlined traversal of the hashmap + htab_clear. All newer siblings + can just depend on this task. Add dependencies on all previous sibling tasks with dependencies and make them redundant and clear the hash table. */ - hash_entry_type *slot = &parent->depend_hash->entries[0]; - hash_entry_type *end = slot + htab_size (parent->depend_hash); + struct rh_entry *slot = &parent->depend_hash->entries[0]; + struct rh_entry *end = slot + htab_size (parent->depend_hash); for (; slot != end; ++slot) { - if (*slot == HTAB_EMPTY_ENTRY) + /* If the slot is empty, just skip it. */ + if (slot->entry == HTAB_EMPTY_ENTRY) continue; - if (*slot != HTAB_DELETED_ENTRY) + else { - for (ent = *slot; ent; ent = ent->next) + for (ent = slot->entry; ent; ent = ent->next) { struct gomp_task *tsk = ent->task; @@ -327,6 +328,8 @@ gomp_task_handle_depend (struct gomp_task *task, struct gomp_task *parent, sizeof (struct gomp_dependers_vec) + (tsk->dependers->allocated * sizeof (struct gomp_task *))); + /* gomp_realloc will fatally error on fail, no need + to check for NULL. */ } tsk->dependers->elem[tsk->dependers->n_elem++] = task; task->num_dependees++; @@ -337,20 +340,24 @@ gomp_task_handle_depend (struct gomp_task *task, struct gomp_task *parent, ent = ent->next; } } - *slot = HTAB_EMPTY_ENTRY; + /* We can still keep the inline deletion, as we are traversing + the whole map, and clearing it unconditionally. So, + maintenance of the Robin Hood invariant is overhead in this + case. Similarly, we don't have to worry about the size + of the table changing and thus thrashing the references, + since we aren't modifying n_elements. */ + slot->entry = HTAB_EMPTY_ENTRY; + slot->dist = 0; } if (htab_size (parent->depend_hash) <= 32) - { - parent->depend_hash->n_elements = 0; - parent->depend_hash->n_deleted = 0; - } + parent->depend_hash->n_elements = 0; else { /* Shrink the hash table if it would be too large. We don't want to walk e.g. megabytes of empty hash table for every depend(inout: omp_all_memory). */ - free (parent->depend_hash); - parent->depend_hash = htab_create (12); + htab_free (parent->depend_hash); + parent->depend_hash = htab_create (31); } } parent->depend_all_memory = task; @@ -358,7 +365,7 @@ gomp_task_handle_depend (struct gomp_task *task, struct gomp_task *parent, } task->depend_count = ndepend; if (parent->depend_hash == NULL) - parent->depend_hash = htab_create (2 * ndepend > 12 ? 2 * ndepend : 12); + parent->depend_hash = htab_create (2 * ndepend > 32 ? 2 * ndepend : 31); for (i = 0; i < ndepend; i++) { task->depend[i].next = NULL; @@ -367,8 +374,23 @@ gomp_task_handle_depend (struct gomp_task *task, struct gomp_task *parent, task->depend[i].redundant = false; task->depend[i].redundant_out = false; - hash_entry_type *slot = htab_find_slot (&parent->depend_hash, - &task->depend[i], INSERT); + /* Below is a bit of a kludge in order to preserve semantics despite + API differences. Previously, using the insert option would not + actually guarantee insertion, trusting the caller to + manually assign the element afterwards. The new API automatically + performs the insertion. Since actual insertion was delayed + previously, we need to first check to see if the element exists, + and if it doesn't, then we insert it and set it to null, then keep + track of the entry reference and assign later. */ + struct rh_entry *entry = htab_find_slot_no_insert (&parent->depend_hash, + &task->depend[i]); + if (!entry) + { + entry = htab_find_slot_insert (&parent->depend_hash, + &task->depend[i]); + entry->entry = NULL; + } + hash_entry_type *slot = &entry->entry; hash_entry_type out = NULL, last = NULL; if (*slot) { @@ -1360,15 +1382,16 @@ gomp_task_run_post_handle_depend_hash (struct gomp_task *child_task) child_task->depend[i].prev->next = child_task->depend[i].next; else { - hash_entry_type *slot - = htab_find_slot (&parent->depend_hash, &child_task->depend[i], - NO_INSERT); + struct rh_entry *entry = htab_find_slot_no_insert ( + &parent->depend_hash, + &child_task->depend[i]); + hash_entry_type *slot = &entry->entry; if (*slot != &child_task->depend[i]) abort (); if (child_task->depend[i].next) *slot = child_task->depend[i].next; else - htab_clear_slot (parent->depend_hash, slot); + htab_clear_slot (parent->depend_hash, entry); } } } @@ -2020,15 +2043,15 @@ gomp_task_maybe_wait_for_dependencies (void **depend) size_t size = htab_size (task->depend_hash); if (htab_elements (task->depend_hash) * 8 < size && size > 32) htab_expand (task->depend_hash); - /* depend(inout: omp_all_memory) - depend on all previous sibling tasks that do have dependencies. Inlined htab_traverse. */ - hash_entry_type *slot = &task->depend_hash->entries[0]; - hash_entry_type *end = slot + htab_size (task->depend_hash); - for (; slot != end; ++slot) + struct rh_entry *entry = &task->depend_hash->entries[0]; + struct rh_entry *end = entry + htab_size (task->depend_hash); + for (; entry != end; ++entry) { - if (*slot == HTAB_EMPTY_ENTRY || *slot == HTAB_DELETED_ENTRY) + hash_entry_type *slot = &entry->entry; + if (*slot == HTAB_EMPTY_ENTRY) continue; for (ent = *slot; ent; ent = ent->next) { @@ -2463,16 +2486,27 @@ gomp_reduction_register (uintptr_t *data, uintptr_t *old, uintptr_t *orig, new_htab = htab_create (total_cnt); if (old_htab) { - /* Copy old hash table, like in htab_expand. */ - hash_entry_type *p, *olimit; + /* Copy old hash table. We intentionally copy over the number of + elements from the old table, such as to not need to repeatedly + expand by setting it to zero and reinserting. We manage the + increment done in 'htab_find_slot_insert' by decrementing before + insertion. Since we never shrink the map, only expand it, this + prevents growing it mid-copy in the case of being on the + boundary. */ + struct rh_entry *p, *olimit; new_htab->n_elements = htab_elements (old_htab); olimit = old_htab->entries + old_htab->size; p = old_htab->entries; do { - hash_entry_type x = *p; - if (x != HTAB_EMPTY_ENTRY && x != HTAB_DELETED_ENTRY) - *find_empty_slot_for_expand (new_htab, htab_hash (x)) = x; + struct rh_entry x = *p; + if (x.entry) + { + /* This will never wrap, since it is only triggered if an + entry exists. Thus, 'n_elements' is necessarily >= 1. */ + new_htab->n_elements = new_htab->n_elements - 1; + htab_find_slot_insert (&new_htab, x.entry); + } p++; } while (p < olimit); @@ -2491,7 +2525,7 @@ gomp_reduction_register (uintptr_t *data, uintptr_t *old, uintptr_t *orig, a pointer. Hide the cast from the compiler. */ hash_entry_type n; __asm ("" : "=g" (n) : "0" (p)); - *htab_find_slot (&new_htab, n, INSERT) = n; + htab_find_slot_insert (&new_htab, n); } if (d[4] == (uintptr_t) old) break; -- 2.54.0
