Hi everyone,
my GSoC project should replace the current global page
hash table (GPHT) and this email lists existing hash
tables (from now on just HT) that might do the job.
I am writing this email to better organize my thoughts
and for future reference.
Concurrent HTs are reviewed here as a data structure.
How to modify GPHT to utilize such a concurrent HT
is not addressed here.
Requirements
============
The current HT used by GPHT does not resize, does not
allow any concurrent access whatsoever. Therefore, the
new HT that will be used to back GPHT should:
- resize
- allow concurrent reads, ideally with low overhead
- preferably allow concurrent updates (insert, remove)
Terminology
===========
HT = hash table
GPHT = global page hash table
RCU = read-copy update
S-w-RCU = solvable with RCU
CAS = compare and swap
MB = memory barrier
update = insert() or remove()
collision chain = linked list of a single bucket of a HT
lock-free = a group of threads as a whole makes progress
each step (ie every time a thread executes an instruction).
Eg if thread X is competing with another thread Y to
change the same element and X does not succeed (eg due
to a failed CAS), we are guaranteed some other thread
made progress/succeeded (Y in this case). Lock-free
algorithms are resilient against preemption/interrupts
or even death of individual threads at any time.
Existing HTs
============
The following HTs should represent state of the art
in their respective domains (locking-based, lock-free
open addressing, lock-free w/ chaining, fully resizable).
A) Michael's HT [4]
---------------
Michael introduces the first fully functional lock-free
single linked list and then uses it in a simple non-
resizable HT. Readers incur a couple of MBs (eg one per
traversed node)
Pros:
+ lock-free reads and updates
Cons:
- nonresizable
- reader overhead (S-w-RCU)
B) Java ConcurrentHashMap [1]
-------------------------
The HT is partitioned into, say, 16 segments. Each segment
is protected by a lock that prevents concurrent updates
and/or resize() of the segment. Readers never block, but
traversing a collision chain requires at least one memory
barrier (volatile Java read) per node. Moreover, resize()
might have to copy certain elements of the table.
Pros:
+ concurrent readers
+ resizable
Cons:
- only grows, never shrinks
- updates in the same segment block
- resize blocks updates until done
- resize copies elements
- reader overhead (S-w-RCU)
- old elements must be garbage collected (S-w-RCU)
C) Click's HT [2,3]
-------------
A completely lock-free HT. It uses linear open addressing
to resolve collisions, so instead of searching/modifying
a linked list upon a collision it probes successive cells
of the table.
Pros:
+ lock-free reads, updates, resize()
Cons:
- remove() does *not* make the table any emptier (the
element is simply marked deleted)
- linear probing tends to build longer strides of taken
cells
- only grows
- reader overhead: multiple MBs (S-w-RCU)
D) Split-ordered HT [5]
-------------------
A growable completely lock-free HT (ie including resize).
All elements are placed in a single cleverly-sorted lock-
free linked list. Buckets are delimited by extra/dummy
nodes in the list. A directory table points to segment
tables (ie 2-level table) that store pointers to dummy
nodes, ie to the start of each bucket.
Pros:
+ completely lock-free: reads, updates, resize
+ incremental resize (lazily, one-bucket at a time)
Cons:
- only grows (only second level table could be made to
shrink, dummy nodes can never be freed)
- extra indirection when accessing elements: 2-level
table + dummy node at the beginning of a bucket
- space overhead: dummy nodes, extra tables
- reader overhead: multiple MB (and one MB/node of list)
(probably S-w-RCU)
E) LHlf [6]
-------
The paper is a little scarce on details, but the HT
combines Larson's good-old linear hashing and lock-
free linked lists into a lock-free table fully resizable
HT. Buckets are also accessed via a 2-level table (ie,
directory + segments). Unfortunately, they require a
CAS instruction that operates atomically on two adjacent
words. I doubt such an operation exists on 64bit machines.
It could be emulated [7] but performance will suffer.
Pros:
+ full resize (grows and shrinks)
+ lock-free: reads, updates, resize
Cons:
- not implementable on existing hw :-)
- reader overhead: multiple MBs (and one MB/node)
- slight indirection overhead due to a 2-level table
F) Relativistic HT [8]
------------------
It uses a single table to access buckets. Updates
and resize use standard striped locking [9]. Readers
are protected by RCU, so they never block. Resize
progressively unzips buckets until all cells of a new
table point to buckets that contain only the nodes/items
that belong there and have no items of the old/split
bucket.
Pros:
+ full resize (grows and shrinks)
+ no overhead for readers
Cons:
- updates block
- resize blocks out all updaters until done, but there
is some small room for concurrency during the last
stages of resize as I see it
G) other HTs
------------
Other concurrent hash tables [10, 11, 12, 13] proved
to be inferior to the ones listed above in one way or
another.
Selected HT
===========
I would like to implement a HT based on (F) because it:
- is fully resizable
- has zero reader overhead
- does not have excessive space overhead
Unfortunately, HT (F) has blocking updates. However,
I thought of how to make updates lock-free even in the
face of resize() using techniques of (A), (C), (D).
First, bucket linked lists would use (A)'s lock-free
linked lists where contention for a node is resolved
locally by competing for its immediate neighbors
(ie predecessor or successor). Second, nodes would be
sorted as in (D) so that a bucket can be split/joined
by changing a single link (instead of a multiple zip/
unzip passes + waiting of (F)). Third, during resize
bucket heads would be moved to the new table lock-freely
one bucket head at a time as in (C).
What remains is to update the single split/join link
between buckets in a lock-free manner. To do this we
will mark the boundary node or head with additional
states. I assume all nodes will reside on a word-sized
address, so we can make use of the lower 2 bits in eg
the "next" node pointers. Details will be provided
with the implementation unless I find a mistake in
my logic before that ;-).
The resulting HT would have these properties:
+ fully resizable
+ zero overhead readers when not resizing
+ minimal space overhead
+ lock-free updates even during resize (ie we can update
the table from exception handlers ever during resize)
- indirection while resizing (have to check bucket
heads in both the old and the new table)
- MB for readers of buckets that are being split/joined
- resize will not be lock-free, ie if a resizing thread
is killed (but it won't be), updaters will not be able
to complete resizing on behalf of the resizing thread.
The table would however continue to function even
after such a failure but without the performance
benefit of a resize.
End
===
I still have to write which RCU algorithm to implement
in HelenOS and how to interface GPTH with a concurrent
HT.
Adam
References
==========
[1] java.util.concurrent.ConcurrentHashMap,
Lea, 2012
http://gee.cs.oswego.edu/cgi-bin/viewcvs.cgi/jsr166/src/main/java/util/concurrent/ConcurrentHashMap.java?revision=1.118&view=markup
[2] A Lock-free Hash Table,
Click, 2007
http://www.azulsystems.com/events/javaone_2007/2007_LockFreeHash.pdf
[3] Towards a Scalable Non-blocking Coding Style,
Click, 2008
http://www.azulsystems.com/events/javaone_2008/2008_CodingNonBlock.pdf
[4] High performance dynamic lock-free hash tables and list-
based sets, Michael, 2002
http://www.research.ibm.com/people/m/michael/spaa-2002.pdf
[5] Split-ordered Lists: Lock-free Extensible Hash Tables,
Shavit, 2006
http://www.cs.ucf.edu/~dcm/Teaching/COT4810-Spring2011/Literature/SplitOrderedLists.pdf
[6] LHlf: lock-free linear hashing,
Larson, 2012
http://dl.acm.org/citation.cfm?id=2145868
[7] Practical Lock-freedom, chapter 3.2
Fraser, 2003
http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-579.pdf
.. this file is not the thesis cited, but seems to
contain everything relevant (also in section 3.2)
[8] Resizable, scalable, concurrent hash tables via relativistic programming,
Triplett, 2011
http://www.usenix.org/event/atc11/tech/final_files/Triplett.pdf
[9] The Art of Multiprocessor Programming, section 13.2.2
Herlihy, 2008
[10] The Art of Multiprocessor Programming, section 13.5
Herlihy, 2008
[11] Exploiting Deferred Destruction: An Analysis of
Read-Copy-Update Techniques, McKenney, 2004, section 6.7
http://www.rdrop.com/users/paulmck/RCU/RCUdissertation.2004.07.14e1.pdf
[12] Read-Copy-Update for OpenSolaris,
Podzimek, 2010
[13] DDDS,
Piggin, 2008
http://lwn.net/Articles/302132/
_______________________________________________
HelenOS-devel mailing list
[email protected]
http://lists.modry.cz/cgi-bin/listinfo/helenos-devel