Re: asynchronous communication between threads

Charles Hixson Thu, 03 Jan 2013 22:45:22 -0800

On 01/03/2013 11:34 AM, Dmitry Olshansky wrote:

03-Jan-2013 22:38, Charles Hixson пишет:

On 01/03/2013 08:40 AM, Dmitry Olshansky wrote:

02-Jan-2013 03:54, Charles Hixson пишет:

If I were to use the below as an asynchronous communication channel,
would it avoid deadlocks (presuming that only Cell called Msg) and that
when a thread activated Cell, the first thing it did was process it's
mailbox?
Also, if only around 7 cells were created in the main thread, would RAM
usage remain reasonable (i.e., when a thread was through with an MCell,
would only one copy continue to exist in RAM? Or would there be a copy
for each thread that had ever referenced it? Should MCell instances be
marked shared?


That's a lot of questions to ask and currently I hardly can decipher the
whole logic from this and the code below alone, too much is behind the
scenes.

I'm willing to help out if you could post a more complete example or
more accurate questions as e.g. "created 7 cells in main thread" could
be done very differently.

Also a nit at this size of code (and for me it's unformatted) I'd
recommend linking to the code on some paste service e.g. dpaste
http://dpaste.dzfl.pl/.

[snip]

I'm trying to design a program (I'm not writing it yet) that requires
asynchronous communication between LOTS of separate "cells". You can
think of it as vaguely like a neural network, but it doesn't fit the
definition, that's just an analogy. I thought that D would be a good
language to do this in, but all the built-in approaches seem to be
"overly protective", locking a lot more than I want and yielding
deadlocks, or copying more than I want, and ensuring that I end up with
stale data. The approach that I've come up with to avoiding this is to
attach a mailbox to each "cell". In an ideal world, each cell would be
a separate thread, but as there will be at least hundreds of thousands
of them, and likely millions, that's unreasonable. So I need the
threads to cycle through pools of "cells". Each cell may send messages
to any other cell (not really, but you don't know in advance what the
connections will be).


So cell is in fact a task (more common name for it I think) with a
mailbox and you want threads multiplexed across these tasks. The task is
running some code/function/callback/whatever that periodically polls a
mailbox & puts stuff in other task's mailboxes. So far good?

Then definitely take a look at Fiber in druntime (it's core.Fiber AFAIK).

It's in core.Thread. A fiber would work well for the cell, but it'sbeen pointed out to me that with this design, an ordinary class instancewould also work, so for that it's probably overkill. Which leaves themailbox unhandled. "Periodic" polling, as in time based, is tooexpensive. Periodic polling as in "once in each cycle" is what isplanned, and that part doesn't require the overhead of threads, etc.,except to keep martial messages from being read. But the documentationof fiber *strongly* implies that the contents of the mailbox would becopied from it's normal location to the active thread, if it's not thesame thread within which the cell executes. So it looks to me as if themailbox needs to be synchronized. But I'm not really sure whatsynchronized means in D. (N.B.: The cell is logically a task, but ifit were implemented as such, then the mailbox would be unnecessary. Butthe overhead would be unbearable.)


That skeleton of code was intended to show the idea of cells isolated
from outside attached to a mailbox, which has blocking access from the
outside. The cells should never block, but this is largely because they
are only directly accessed by the thread within which they run. The
mailboxes can receive messages from anyone, but their processes are so
short, that blocking will be extremely brief.


I'm sure not so optimistic about locking. Even though it's brief there
are many threads that may be putting stuff simultaneously into mailboxes
thus contending on the lock and causing context switches.
+ the lock/unlock of a mutex is not free. The lock-based message queue
is nice for a start (less _subtle_ bugs) but you sure got to look into
lock-free alternatives later on.

The number of cells is huge when compared with the number of availablethreads, On the rough order of 1,000,000 to 6. Because of this I don'texpect contention to be a big problem. OTOH, the number ofcores/processor is going *up*, so a lock-free system would be verydesireable. But the only lock-free design I'm familiar with is CSMACD(ethernet). And to implement that, I think the mailboxes would need tobe shared. And this requires some protocol such as CSMACD for eachaccess. It's not clear to me that this would have less overhead thanwould synchronized (which locks, if I understand correctly, the classinstance when someone else has write access to it). Certainly TDPLtalks about them sequentially, and implies that there is a tightconnection between shared variables and synchronized classes. It is asif a synchronized class is the analog of an atomic operation acting on alarger set of data. And when I look at implementing CSMACD it looks asif only one thread can detect that it's write was interrupted, and thenonly after it's complete. Unless I implement a checksum scheme, inwhich case I guess both could detect an improperly completed write. Butthen independent readers couldn't detect the problem. So readers wouldneed to be locked out while writing was in process...and we're back tomutexes/spinlocks/etc.So it looks to me as if a synchronized mailbox class eliminates amultitude of problems a the lowest reasonable cost. If it works as Ihope it does. But if the mailbox ends up getting copied from threadaddress space to thread address space, then the overhead would be muchhigher than a naive estimate, and I can't guess how high. It *could*scale so poorly that 90% of the computation consisted of mailboxes beingcopied from thread to thread.

I wanted to know if this proposed design would work, as in not getting
into deadlocks, not blocking excessively, and not giving me excessively
stale data.


The the crucial part is missing - taking a message out of the mailbox ;)

The only entity permitted to take a message from the mailbox is the cellto which it is attached, and that happens at each initiation ofactivation. But that's one reason that the mailbox should be synchronized.


But anyway let's focus on the details. 2 classes and 2 functions. Cell's
send & MBox's receive.

Let's suppose we have 2 Cells A & B and their mboxes MA & MB.
 From the code I see (that's full of typos? MCell & Cell are used
interchangeably) the chain of event for full send from A --> B is:

1. A Cell's send locks cell A. (send is sync-ed)

Why does it lock the cell? But perhaps this is because the profferedcell class was synchronized.

2. It locks target cell B.

Why does it lock cell B? Cell A makes no access to cell B. Only to themailbox. Which is why in that rough design cell and mailbox wereseparate (synchronized) classes at the same level. It it locks Cell B,then the design won't work. (OTOH, the Cell function doesn't need to besynchronized anyway, that's a remnant from a prior design.)

3. It then locks its mailbox MB.

Check.

4. undoes all the locks backwards.

Check.


Then there is of course a deadlock bound to happen if B is sending
message in opposite direction, e.g. :
1. A locks A (making its first step)
2. B lock B (ditto)
3. A locks B & blocks
3. B locks A & blocks

If the cells get locked, then the design will not work. No argument. Iwas attempting to avoid that by making the mailbox a separate class, andhaving each cell only contact the other cell's mailbox. If that doesn'twork, then the design is not going to work, and I'll need to create anew one.


that if for instance there is step 2. So I guess you haven't meant the
step 2.

If there is no lock of the cell except before sending then it looks
legit as there are 2 locks protecting separate entities:

- one is to protect message queue on putting message into it

- the other one is to protect ... what exactly? send is already
implicitly guarded by target queue's lock.

Yeah. Sorry, that was sloppy thinking on my part. The cell doesn'tneed to be synchronized. (But I don't understand why that would bedestructive of anything except efficiency.)


So I'd say you only need to guard the message queue and that's about it.
The only other concern is properly scheduling the execution of tasks (or
your cells) so that one runs on no more then one thread at any given time.

The idea here is that each cell has an id#, and there is a pool ofthreads. Each cell is accessible by the thread whose sequence number isthe same as id# mod # of threads. Each thread loops through the cellsaccessible to it. When a cell is activated by the thread, first itchecks it's mailbox to update it's state. Then it processes, possiblysending messages to the mailboxes of other cells in a task independentway. (It won't know about threads, only about mailboxes.) Executionthen passes on the the next cell in the thread.


In simplest case just make a locked (or lock-free) queue of these and
let threads pick cells/tasks from it and put back when they are done.

A lock-free queue would do it, I think, but the only ones I know of areattached to the thread rather than to a data instance, and that WOULDlead to massive contention. And increasing contention as the number ofcores (threads) increased.


Far better is a large bounded queue that you never ever remove/put stuff
into. It's just a big array of pointers/references to task. A thread
then just goes around it looking for valid/ready entries (they stay
allocated so no nulls there) and executes them. That goes firmly into
lock-free zone to make it correctly synchronized though but with a bit
of care it should be doable.

It would be quite reasonable to make the queue bounded, and rejectmessages when the mailbox was full. But, again, if it's attached to thethread rather than to the object there is going to be a massive problemwith contention.


The second one also can be done with locks. In this case a thread goes
through all of tasks/cells and tries to lock them (That's where your
lock around it comes in, is it?). If it locks - cool, work on it, if not
- try the next one.

Good point. If the cell is initiated and can't read it's mailbox (thisaction needs a write lock, as it removes messages), then going on to thenext cell rather than blocking is better. (Again, if the mailbox isattached to the thread, this won't work. All mailboxes will be nearlyconstantly in contention.)



 > More details aren't available, because I didn't want to

commit to this design if the basic design was wrong, so I haven't
written them. It has been suggested that since the cell will only be
accessed by the thread, it doesn't need to be synchronous.

I'm really nervous about how many copies of the cell will exist,
however. Since there are going to be so many of them, if I ended up
with a cell/thread, the system would bog down in unused storage. But
the mailbox needs to be globally accessible for the scheme to work.


Everything on the heap is accessible from other threads, provided they
have the pointer to the object in question.

Good. And since I'm working with classes, everything is really on theheap, which means that only pointers will get copied. (Well,references. I avoid using pointers in my code unless I really can'tavoid them.)

N.B.: When a cell receives a message from another cell it's likely, but
not guaranteed, to send a response back. It may also send responses
onwards. And the number of cells isn't fixed, nor is the number of
their connections. (This is less important in D than in many other
languages, but even in D it affects serialization.)

FWIW, I've been told that an approximately similar design has worked in
Ada, though the design was written in Ada-83. (In Ada the mailbox was
protected, which I take to be approximately the same as synchronized.)


In general there are ways to make it fly. The tricks to use depend on
the use case and what is bottleneck (I/O or the CPU time).
The pain points is faster mailboxes and better scheduling (as in less
context switches for nothing, faster turn-around time etc.).

The bottleneck will be CPU time (given sufficient RAM). No way to avoidthat. Stuffing things into a mailbox is going to be basically copying astruct. (That hasn't been designed yet, but it's going to include a refto the sender, a requested action, a numeric value, and I'm not sureof what else. The "requested action" will probably be represented as anenum. I'm probably going to avoid strings, as they don't appear to bevaluable, even though that's just a reference copy. So say 128 bytes ofparameter, or possibly 256. And receiving a message is copying theparameters into a queue. Perhaps I could remove the synchronizationfrom the class, and just guard calculating the index position into thequeue, as once the position in the queue was known, there wouldn't beany contention WRT where the message would be stored. That should be avery fast design, but I worry about how much RAM space would berequired. With a pre-allocated queue, each mailbox would consume themaximal amount of space even if none were used. So if there were spacefor 20 messages, then the mailbox would consume 5120 bytes + overheadfor each cell. Which means that if I have 500,000 cells (an extremelylowball figure) just the mailboxes would consume 1,024,000,000 bytesplus overhead. True. most of that would never be accessed, but enoughaccesses would be randomly distributed throughout the system that Iwould expect thrashing...or even failure due to inability to allocatememory. This would be compressed significantly if I used a threadattached mailbox, at the cost of nearly guaranteed massive contentionproblems. And 20 messages is an unreasonably low upper limit, eventhough it's too high for a mean value. (I expect the mean value to becloser to 3-4.) So I'd been planning on using variable length arraysfor the mailbox, which would be deallocated every time the cell accessedthem. So messages could be attached by simply doing an append. Thiswould place the message queue on the heap, with an initially quite lowvalue for maximal # of messages. I'll admit that this may only lookbetter because I can't estimate the amount of RAM consumed.

Perhaps I need to use some sort of disk cache of a data file, and onlyhave the most active cells in memory at any particular time. This wouldstill result in a lot of thrashing, but in a more controlled manner.I've only around 8 GB of actual memory and it looks like 7.8 GB totalmemory, if one includes virtual memory. Perhaps this needs to beupgraded...which would probably mean I upgraded the number of coresavailable, meaning increased contention. But I could clearly upgradethe amount of virtual memory with just a repartitioning of the disk.It's not been needed so far, but then I've just been designing thissystem, not implementing it. OTOH, virtual RAM is thrashing, so it'snot clear how much that would help over, say, a BTree that rolled outrelatively inactive cells, even though each cell would need to be RAMresident at least once per cycle,

That said, the application I'm designing would probably overstress anycomputer I could afford. I'm almost resigned to that. I just want tocome as close to reasonable speed of execution as possible, and Iclearly want to avoid deadlocks and data that doesn't get properly updated.

OTOH, rolling things out to a B+Tree means that I need to devise a wayto access the mailbox based around the target's id# rather than around areference to the item. A hash table is the obvious solution, but theclass managing the hash table would need to roll the cell+mailbox in ifit isn't RAM resident. Not something that's reasonable to do while themailbox access is locked. So the mailbox would need to queue therequest for access to the cell's mailbox with a thread, stuff themessage in a "to be acted upon" queue, and return. And where that queueshould be stored isn't clear to me. Perhaps that's where the threadattached non-blocking queue should come into play. Also, hash tableshave enough overhead themselves that they would limit the number of RAMresident cells considerably over prior estimates, even while increasingthe total number that could be dealt with. Probably they would halvethe number of RAM resident cells (a rough estimate, admittedly), whileexpanding the total number of cells that could be handled to be limitedby available disk space. They would also impose a severe performancepenalty. (In prior small scale tests, hash table access has been veryroughly about 1/10 as fast as variable length array access. Of course,this is still a lot faster than disk file access.)

Still, it's sounding like the basic design will work, unless a cellcalling the mailbox of another cell locks that cell, and I don't see anyreason why it should...but I've been repeatedly surprised by therequirements imposed on concurrently executing threads of execution.

Re: asynchronous communication between threads

Reply via email to