* Paul E. McKenney ([email protected]) wrote: > On Mon, Apr 23, 2012 at 03:45:32PM -0400, Mathieu Desnoyers wrote: > > Hi Paul, > > > > Here is the updated text I plan for the next update. Comments are > > welcome, thanks ! > > Looks much improved! The inevitable questions and comments interspersed. > > Thanx, Paul > > > Mathieu > > > > * Ordering Guarantees: > > * > > * To discuss these guarantees, we first define "read" operation as any > > * of the the basic cds_lfht_lookup, cds_lfht_next_duplicate, > > * cds_lfht_first, cds_lfht_next operation. > > * > > * We define "read traversal" operation as any of the following > > * group of operations > > * - cds_lfht_lookup followed by iteration with cds_lfht_next_duplicate > > * - cds_lfht_first followed iteration with cds_lfht_next > > Can't cds_lfht_next() and cds_lfht_next_duplicate() be intermixed? > Something like the following: > > rcu_read_lock(); > p = cds_lfht_lookup(...); > while (p != NULL) { > do_something(p); > if (p->dups_of_interest) > p = cds_lfht_next_duplicate(...); > else > p = cds_lfht_next(...); > } > rcu_read_unlock();
Yes, they can technically be intermixed, but I doubt that the "cds_lfht_next" after a lookup will be of any interest: it will get the nodes with keys following the current node's key (in split-ordered order). A use-case that would make more sense would be: - cds_lfht_first - iterate on the table with cds_lfht_next(), until we find an interesting key. - Then, iterate on all the duplicate of this key. So I will document that both can be intermixed. > > > * > > * We define "write" operations as any of cds_lfht_add, > > * cds_lfht_add_unique, cds_lfht_add_replace, cds_lfht_del. > > * > > * We define "prior" and "later" node as nodes observable by reads and > > * read traversals respectively before and after a write or sequence of > > * write operations. > > * > > * It is implicit from the requirement of the read, read traversal, and > > * write operations that RCU read-side locks need to be held across > > * traversals, and between a read or read traversal and invocation of a > > * write that receives a node as argument. > > This paragraph is implying a lot... How about the following? > > Hash-table operations are often chained, for example, the > pointer returned by a cds_lfht_lookup() might be passed to a > cds_lfht_next(), whose return value might in turn be passed to > another hash-table operation. This entire chain of operations > must be enclosed by a pair of matching rcu_read_lock() and > rcu_read_unlock() operations. looks good, will update. > > > * > > * The following ordering guarantees are offered by this hash table: > > * > > * A.1) "read" after "write": if there is ordering between a write and a > > * later read, then the read is guaranteed to see the write or some > > * later write. > > * A.2) "read traversal" after "write": given that there is dependency > > * ordering between reads in a "read traversal", if there is > > * ordering between a write and the first read of the traversal, > > * then the "read traversal" is guaranteed to see the write or > > * some later write. > > * B.1) "write" after "read": if there is ordering between a read and a > > * later write, then the read will never see the write. > > * B.2) "write" after "read traversal": given that there is dependency > > * ordering between reads in a "read traversal", if there is > > * ordering between the last read of the traversal and a later > > * write, then the "read traversal" will never see the write. > > * C) "write" while "read traversal": if a write occurs during a "read > > * traversal", the traversal may, or may not, see the write. > > * D) "write" vs "write": writes occur atomically between their > > * invocation and the moment they return. There is a full memory > > * barrier before and after the atomic "commit" point of each > > * write. > > This is a description of the implementation, while the others are guarantees. > Of course, that begs the question of what the guarantee is... Maybe the > following? > > D.1 "write" after "write": if there is ordering between a write and > a later write, then the later write is guaranteed to see the > effects of the first write. > D.2 Concurrent "write" pairs: The system will assign an arbitrary order > to any pair of concurrent conflicting writes. Non-conflicting > writes (for example, to different keys) are unordered. yep, looks good. I also documented more thoroughly the "cds_lfht_add_unique", which acts as a write (if it returns success) or read (if it returns failure). Here is the updated version: * Ordering Guarantees: * * To discuss these guarantees, we first define "read" operation as any * of the the basic cds_lfht_lookup, cds_lfht_next_duplicate, * cds_lfht_first, cds_lfht_next operation, as well as * cds_lfht_add_unique (failure). * * We define "read traversal" operation as any of the following * group of operations * - cds_lfht_lookup followed by iteration with cds_lfht_next_duplicate * (and/or cds_lfht_next, although less common). * - cds_lfht_add_unique (failure) followed by iteration with * cds_lfht_next_duplicate (and/or cds_lfht_next, although less * common). * - cds_lfht_first followed iteration with cds_lfht_next (and/or * cds_lfht_next_duplicate, although less common). * * We define "write" operations as any of cds_lfht_add, * cds_lfht_add_unique (success), cds_lfht_add_replace, cds_lfht_del. * * When cds_lfht_add_unique succeeds (returns the node passed as * parameter), it acts as a "write" operation. When cds_lfht_add_unique * fails (returns a node different from the one passed as parameter), it * acts as a "read" operation. A cds_lfht_add_unique failure is a * cds_lfht_lookup "read" operation, therefore, any ordering guarantee * referring to "lookup" imply any of "lookup" or cds_lfht_add_unique * (failure). * * We define "prior" and "later" node as nodes observable by reads and * read traversals respectively before and after a write or sequence of * write operations. * * Hash-table operations are often chained, for example, the pointer * returned by a cds_lfht_lookup() might be passed to a cds_lfht_next(), * whose return value might in turn be passed to another hash-table * operation. This entire chain of operations must be enclosed by a * pair of matching rcu_read_lock() and rcu_read_unlock() operations. * * The following ordering guarantees are offered by this hash table: * * A.1) "read" after "write": if there is ordering between a write and a * later read, then the read is guaranteed to see the write or some * later write. * A.2) "read traversal" after "write": given that there is dependency * ordering between reads in a "read traversal", if there is * ordering between a write and the first read of the traversal, * then the "read traversal" is guaranteed to see the write or * some later write. * B.1) "write" after "read": if there is ordering between a read and a * later write, then the read will never see the write. * B.2) "write" after "read traversal": given that there is dependency * ordering between reads in a "read traversal", if there is * ordering between the last read of the traversal and a later * write, then the "read traversal" will never see the write. * C) "write" while "read traversal": if a write occurs during a "read * traversal", the traversal may, or may not, see the write. * D.1) "write" after "write": if there is ordering between a write and * a later write, then the later write is guaranteed to see the * effects of the first write. * D.2) Concurrent "write" pairs: The system will assign an arbitrary * order to any pair of concurrent conflicting writes. * Non-conflicting writes (for example, to different keys) are * unordered. * E) If a grace period separates a "del" or "replace" operation * and a subsequent operation, then that subsequent operation is * guaranteed not to see the removed item. * F) Uniqueness guarantee: given a hash table that does not contain * duplicate items for a given key, there will only be one item in * the hash table after an arbitrary sequence of add_unique and/or * add_replace operations. Note, however, that a pair of * concurrent read operations might well access two different items * with that key. * G.1) If a pair of lookups for a given key are ordered (e.g. by a * memory barrier), then the second lookup will return the same * node as the previous lookup, or some later node. * G.2) A "read traversal" that starts after the end of a prior "read * traversal" (ordered by memory barriers) is guaranteed to see the * same nodes as the previous traversal, or some later nodes. * * Progress guarantees: * * * Reads are wait-free. These operations always move forward in the * hash table linked list, and this list has no loop. * * Writes are lock-free. Any retry loop performed by a write operation * is triggered by progress made within another update operation. * Thanks, Mathieu -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com _______________________________________________ lttng-dev mailing list [email protected] http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
