Hi fellow hakkers, I have two questions regarding the Artery module (I am not considering the previous remoting as it will eventually get deprecated). I implemented a peer sampling service (HyParView) and am in the process of implementing a clustering service (Vicinity), both directly on top of Artery. Although it seems to work, I have some worries after delving deeper into Artery and Aeron.
*Connection control*: In my use case, a node may have contacted 1000s of other nodes over time in its lifetime, while only actively using a handful (<10) in a single minute. The services I am implementing are (supposed to be) lightweight, but I see potential reasons why performance might deteriorate over time. - Aeron claims <https://github.com/real-logic/aeron/wiki/Best-Practices-Guide#system-design> that its number of streams shouldn't be high (never over a thousand, but ideally much lower). It is not clear to me what the costs are for Aeron if a 'connection' is not used (the linked documentation might even refer only to the Publisher/Subscribers directly connected to the MediaDriver, I'm too much of a noob to understand the docs) - Artery registers an association for each contacted remote (perhaps more state even, I may have missed stuff). Users don't get to 'close the connection' (I can see reasons why), but Artery does not seem to come with a mechanism to clean unused connections either. Can you guys make an educated guess for performance drop in my use case? And if it is significant, what would you advise as a counter-measure? I could see unused-association-garbage collection as a useful addition to Remoting, I would be happy to help out if useful. *Quarantining* When remote watch fails for some remote actor system, that actor system gets quarantined. In my case, that is a bit radical, as I don't necessarily have control over either of those ActorSystems. Without the ability to reboot either ActorSystem, these systems would continue treating each other as 'down' even though the partition may have long passed. I could instantiate a failure detector explicitly instead of using context.watch, in a way that quarantining is not a consequence of failure-detection. However, it feels like I am missing something with such a simple solution. Why is quarantining as persistent as it is, if skipping it has no downside? What would you guys advise for the case where restarting actor-systems is not an option yet you would like to use failure-detection? Thanks in advance for your insights! Kind regards, Merlijn -- >>>>>>>>>> Read the docs: http://akka.io/docs/ >>>>>>>>>> Check the FAQ: >>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html >>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user --- You received this message because you are subscribed to the Google Groups "Akka User List" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/akka-user. For more options, visit https://groups.google.com/d/optout.
