Re: [lock-free] Single writer multiple reader hash map of int keys and int values

Dmitriy V'jukov Sun, 12 Jan 2014 22:09:59 -0800

On Sun, Jan 12, 2014 at 11:55 PM, Rajiv Kurian <[email protected]> wrote:
>
>
> On Sunday, January 12, 2014 2:49:37 AM UTC-8, Dmitry Vyukov wrote:
>>
>> On Fri, Jan 10, 2014 at 4:33 AM, Rajiv Kurian <[email protected]> wrote:
>> >
>> >
>> > On Thursday, January 9, 2014 7:02:36 AM UTC-8, Dmitry Vyukov wrote:
>> >>
>> >> On Tue, Jan 7, 2014 at 12:42 PM, Rajiv Kurian <[email protected]>
>> >> wrote:
>> >> > I am trying to design a lock-free or close to lock-free hash map of
>> >> > int
>> >> > keys
>> >> > and int values. I have a single writer thread that writes to the
>> >> > hash-map.
>> >> > My efforts to do better than a RW mutex around the map haven't
>> >> > yielded
>> >> > anything good. My gut feeling says I should be able to do better
>> >> > given
>> >> > it's
>> >> > a single writer. I have the following requirements:
>> >> >
>> >> > 1) A single-writer multiple reader map of int keys and int values.
>> >> > 2) Keys are guaranteed to be monotonically increasing. These are just
>> >> > ids
>> >> > and are generated by incrementing an integer.
>> >> > 3) The operations I need are:
>> >> >
>> >> > append(key, value) where key is guaranteed to be higher than anything
>> >> > in
>> >> > the
>> >> > map so far - only called on the writer thread.
>> >> > delete(key) - only called on the writer thread.
>> >> > set(key, value) where key is required to be already in the map - only
>> >> > called
>> >> > on the writer thread.
>> >> > get(key) - could be called by writer/reader threads.
>> >> >
>> >> > 4) All the reader threads and the single writer thread are known
>> >> > during
>> >> > the
>> >> > creation of the map. Reader threads do not pop up or die. They stay
>> >> > constant
>> >> > for the lifetime of the application.
>> >> >
>> >> > I am struggling to use anything cheaper than RW locks (changed for a
>> >> > single
>> >> > writer) for the implementation since the delete and append operations
>> >> > could
>> >> > require resizing of buckets or moving elements around and the reader
>> >> > threads
>> >> > could observe the changes in between.
>> >> >
>> >> > Any pointers?
>> >>
>> >>
>> >> Hi!
>> >>
>> >> The most efficient solution is always tailored to your particular
>> >> situation. Lots of details matter. In particular in this case: 1. what
>> >> is the number of reader threads? 2. what is the relative frequencies
>> >> of different operations?
>> >
>> > Number of reader threads = num cores - 1, so all but one core should be
>> > reading.
>> > Writes are vastly outnumbered by reads, that's all I know for now. I am
>> > trying to build a framework so, the real ratio is dependent on the
>> > application and kind of unpredictable.
>> >>
>> >>
>> >>
>> >> Also the fastest way to do something is to not do it all.
>> >> Can you replace integers with pointers? Even if you send them across
>> >> network, you can verify that the received pointer is valid before
>> >> using it, e.g:
>> >>
>> >> #define MAX_OBJECT_IN_FLIGHT (1<<20)
>> >>
>> >> object_t objects[MAX_OBJECT_IN_FLIGHT];
>> >>
>> >> bool verify_pointer(object_t *o) {
>> >>   if(o < objects || o >= objects + MAX_OBJECT_IN_FLIGHT)
>> >>     return false;
>> >>   if(((uintptr)o - (uintptr)objects) % sizeof(object_t))
>> >>     return false;
>> >>   // OK, this is a valid pointer into objects.
>> >>   // A client can spoof it, but he can spoof the integer as well.
>> >>   return true;
>> >
>> >
>> >>
>> >> }
>> >>
>> >> --
>> >
>> > Hmm, I am not sure what you mean. Are you saying that since my int is an
>> > id
>> > of an object I might as well deal with the pointers to that object
>> > instead
>> > of having extraneous look ups by ids? For more context I am trying to
>> > design
>> > an actor system like Erlang. IDs instead of pointers help because there
>> > are
>> > extraneous requirements like persisting actor state and ids across runs
>> > of
>> > the applications. The hash map helps with scheduling actors to threads.
>> > There are num core - 1 dispatcher threads actually running the actor
>> > logic.
>> > These are the reader threads that I talked about. They are connected by
>> > a
>> > mesh of SPSC queues/ring-buffers. Each dispatcher thread (which is
>> > running
>> > actor logic) needs to check where an actor (id) is scheduled so that it
>> > can
>> > enqueue a msg on the right queue. There is a single writer thread which
>> > actually does the scheduling of actors to dispatcher threads. This is
>> > the
>> > writer thread that writes to the hash map.  Having pointers won't save
>> > me
>> > from doing the look ups since I cannot use the pointer to actually
>> > execute a
>> > function. I need to know where an object/actor is scheduled to run. The
>> > operations I need in context are:
>> >
>> > 1) Append(actorId, dispatcher-id). This is called by the single writer
>> > thread, when a new actor is created and hence the monotonically
>> > increasing
>> > appends.
>> > 2) Set(actorId, dispatcher-id). The single scheduler thread observes
>> > patterns like certain actor pairs communicating a lot, or dispatchers
>> > becoming imbalanced (queue length, queue drain rate etc). Based on these
>> > metrics it can choose to move an actor(s) from one dispatcher to
>> > another.
>> > There are other steps involved here that I am omitting.
>> > 3) Delete(actorId). An actor has just died hence it's deleted from the
>> > central directory.
>> >
>> > On your suggestion of specialization, something I thought of is that
>> > since I
>> > already have communication channels between the writer thread and the
>> > reader
>> > threads (ring-buffers that they poll on), I could use it as an
>> > opportunity
>> > to do updates such as resize/compaction etc. I am using a closed
>> > addressing
>> > hash map implemented by 2^n buckets of key value arrays. Since ids are
>> > monotonically increasing, an append will never displace another value.
>> > The
>> > only time value of an existing cell changes is:
>> > i) There is a compaction (too many deletions).
>> > ii) A rebalance event occurs, resulting in Set(actorId, dispatcher-id)
>> > calls.
>> > iii) An append call will cause a bucket to run out of space or cause the
>> > load-factor of the table to be too high. This will cause wholesale
>> > restructuring of the hash-map - allocation and free of one or more
>> > buckets
>> > or increasing the number of buckets etc.
>> >
>> > If I size my map appropriately (massively oversize) then compaction and
>> > append running out of space events should be rare enough. Rebalance in
>> > general should be rare enough and stablize with time. Rebalance also
>> > requires other coordination in any case - dispatcher threads need to
>> > stop
>> > processing messages of the transitioning actors etc. So what I could do
>> > is
>> > any time the writer thread requires to change the structure of the
>> > hash-map
>> > I could wait for a safe point by signalling a map-about-to-change event
>> > to
>> > all the other threads and waiting for them to acknowledge. Once they
>> > acknowledge (it means they are just busy waiting for end-of- map-change
>> > event) I could make as many changes as I want to the internal structure
>> > of
>> > the map and then signal end-of-map-change. The reader threads will now
>> > be
>> > caught up with all the changes and can proceed. The nice thing is all
>> > get
>> > calls can proceed without any locks. Append calls are always making
>> > changes
>> > to previously unaccessed cells so a normal load of the cell should be
>> > fine
>> > for get calls. All of this is based on the hypothesis that resize,
>> > compaction and set calls will be infrequent.
>>
>>
>>
>> For an actor system I would strongly consider using pointers as actor
>> identity. If you need to persist an actors, you can map it to
>> int/string id only when you are persisitng (this mapping is trivial --
>> you just need to read a field from actor struct, pointer to which you
>> already have). If you want to pin actors to dispatchers, you can
>> operatate on pointer here as well, just add the pointer to an actor
>> into dispatcher queue. And so on.
>> If you will operate on int ids, then that will incur constant
>> unnecessary overheads on most hot paths.
>
> Right, makes sense. However I'd still need some kind of a look up/mechanism
> for different threads to stay up to date with the knowledge of which thread
> an actor(just pointer) lives, so that it can dispatch messages to the right
> queue. Do you think my solution for the look up (replace int ids with
> pointers) makes sense?


I may be missing something, but this is just:
actor->dispatcher->enqueue(msg);


>> >> Then if you can limit the difference between the newest and the oldest
>> >> object, then you can use a ring buffer with direct indexing. If a
>> >> reader asks for a very old object (older than the threshold), then,
>> >> sorry, no luck.
>> >>
>> >> --
>> >>
>> >> If you can not afford to reject requests for very old ids, but you
>> >> expect them to not happen in common case, then you can have fast path
>> >> for recent ids and slow path for older ids. E.g.:
>> >>
>> >> // We do not expect to receive requests for ids older than this
>> >> #define N (1<<20)
>> >>
>> >> struct table_element {
>> >>   int key;
>> >>   int val;
>> >>   table_element_slow* slow;
>> >> };
>> >>
>> >> table_element table[N];
>> >>
>> >> int find(int key) {
>> >>   int idx = key % N;
>> >>   table_element* e = &table[idx];
>> >>   if(e->key == key)
>> >>     return val;
>> >>   // consult e->slow
>> >>   // this can be slow, but we do not expect this to happen in common
>> >> case
>> >> }
>> >>
>> >> This needs special care wrt atomic accesses to key/val, and also some
>> >> protection for slow, and other details. But I hope the idea is clear.
>> >
>> > I think so.
>> >>
>> >>
>> >> --
>> >>
>> >> There can be other "special" solutions.
>> >> It all these cases, accesses are extremely fast in common case.
>> >> If that does not work for you, then I would consider taking a normal
>> >> hashmap, partitioning it into lots of partitions and protecting each
>> >> partition with own rw mutex. On top of that you can have a mutex for
>> >> each thread/partition, readers lock own mutex but the writer need to
>> >> lock mutexes of all threads. This will eliminate contention between
>> >> readers, but make writer slower. Since you know number of readers
>> >> ahead of time, this must be really simple to implement.
>> >
>> > Right, I saw your RW mutex solution based on one cell per core.
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> --
>> >> Dmitry Vyukov
>> >>
>> >> All about lockfree/waitfree algorithms, multicore, scalability,
>> >> parallel computing and related topics:
>> >> http://www.1024cores.net
>> >
>> > --
>> >
>> > ---
>> > You received this message because you are subscribed to the Google
>> > Groups
>> > "Scalable Synchronization Algorithms" group.
>> > To unsubscribe from this group and stop receiving emails from it, send
>> > an
>> > email to [email protected].
>> > To view this discussion on the web visit
>> >
>> > https://groups.google.com/d/msgid/lock-free/5fc8de5a-545e-41dd-bdb4-4c635c2b53c7%40googlegroups.com.
>> >
>> > For more options, visit https://groups.google.com/groups/opt_out.
>>
>>
>>
>> --
>> Dmitry Vyukov
>>
>> All about lockfree/waitfree algorithms, multicore, scalability,
>> parallel computing and related topics:
>> http://www.1024cores.net
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups
> "Scalable Synchronization Algorithms" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/lock-free/8a5ce867-b0e9-4c56-89a2-a64cdbebebd6%40googlegroups.com.
>
> For more options, visit https://groups.google.com/groups/opt_out.



-- 
Dmitry Vyukov

All about lockfree/waitfree algorithms, multicore, scalability,
parallel computing and related topics:
http://www.1024cores.net

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"Scalable Synchronization Algorithms" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/lock-free/CAEeQi3s6M%3DYavNoz_5qSOS4PBdu0Mvk_dNz%2B3gx1SE27xQe9cQ%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [lock-free] Single writer multiple reader hash map of int keys and int values

Reply via email to