Having ordered delivery as defined by Akka [1] and Erlang [2] is definitely useful, we've made several assumptions around it, despite knowing that it is not guaranteed due to the subtleties around link-less sends and link breakage that you covered in (a) and (b).
The memory overhead of storing all sender addresses forever indeed seem pretty unfortunate. Another approach in my mind without this memory overhead is to cut the problem at the source by ensuring only a single active TCP socket per sender *on the receiving side*. The sending libprocess will currently only initiate a second connection if not already holding a connection to the receiving libprocess. As a result of this, the order in which sockets are accepted in the receiving libprocess will match the order in which sockets are connected in the sending libprocess. When the sending libprocess initiates a new connection, the receiving libprocess may still see an active connection to the sender. In this case, on the receiving side we currently accept the second connection and will problematically race between reading a message on the first and second connections. There are two easy choices that give us the ordering guarantee: (1) when seeing an additional connection from a sender in the receiving libprocess, close the old connection and read from the new connection, (2) instead of closing the old connection, close the new connection. In addition to just keeping a single active connection, we can avoid dropping the messages. We could (3) keep both connections, but do not read messages from the new connection until the old connection closes. Of course, this extends beyond just two connections and seems complicated. I do find the unnecessary dropping in (1) and (2) unfortunate, and I wonder if Akka and Erlang have implementation warts like this? I say wart because technically these were delivered over the network but the runtime dropped them. By the way, our semantics around PID and link are different than Erlang's, which has implications on this stuff. My understanding is that if you see an EXIT for a PID in Erlang, you cannot talk to that PID ever again since PIDs are unique and EXIT means it is definitely terminated. Whereas in our case you can talk to that PID again since a socket disconnection will lead to an 'exited', not to mention we don't have PID uniqueness. Would love to get the details on how Erlang works in this regard, since I'm not sure.. :) [1] http://doc.akka.io/docs/akka/snapshot/general/message-delivery-reliability.html#Discussion__Message_Ordering [2] http://www.erlang.org/faq/academic.html#idp32841424 On Sun, Nov 15, 2015 at 4:20 AM, Neil Conway <[email protected]> wrote: > Good point -- following what Erlang and Akka provide (ordering but not > reliability) is probably a reasonable starting-point. > > I suggested earlier that this would require doing our own > retransmission logic (which would be pretty unfortunate), but on > reflection that is obviously not true. We can instead: > > * assign sender-side sequence numbers > * at the receiver, remember the last sequence number we've delivered > from each sender > * drop inbound messages whose sequence number is < the last delivered > message from that sender > > That requires each receiver to keep a sequence number for every > sending PID it has ever seen a message from. For a very large cluster > with a lot of PID churn, that might be a pretty sizable amount of > state. We could improve that incrementally by keeping a single > sequence number for each sending network::Address. I suppose that's > acceptable... > > Neil > > On Sun, Nov 15, 2015 at 12:43 PM, haosdent <[email protected]> wrote: > > As I know, Akka could guaranteed message ordering for per sender-receiver > > pair, but not guaranteed message delivery. Erlang also similar to this. I > > think if to implement own sequencing, acking, and retransmission logic, > > this work nearly to reimplement a TCP stack. To implement a new TCP-like > > stack in TCP looks strange. How about just make sure only have one > > connection between per sender-receiver pair, so TCP could help us > > guaranteed message order. >
