Hi fellow hakkers,

I have two questions regarding the Artery module (I am not considering the 
previous remoting as it will eventually get deprecated). I implemented a 
peer sampling service (HyParView) and am in the process of implementing a 
clustering service (Vicinity), both directly on top of Artery. Although it 
seems to work, I have some worries after delving deeper into Artery and 
Aeron.

*Connection control*:
In my use case, a node may have contacted 1000s of other nodes over time in 
its lifetime, while only actively using a handful (<10) in a single minute. 
The services I am implementing are (supposed to be) lightweight, but I see 
potential reasons why performance might deteriorate over time.

- Aeron claims 
<https://github.com/real-logic/aeron/wiki/Best-Practices-Guide#system-design> 
that its number of streams shouldn't be high (never over a thousand, but 
ideally much lower). It is not clear to me what the costs are for Aeron if 
a 'connection' is not used (the linked documentation might even refer only 
to the Publisher/Subscribers directly connected to the MediaDriver, I'm too 
much of a noob to understand the docs)
- Artery registers an association for each contacted remote (perhaps more 
state even, I may have missed stuff). Users don't get to 'close the 
connection' (I can see reasons why), but Artery does not seem to come with 
a mechanism to clean unused connections either.

Can you guys make an educated guess for performance drop in my use case? 
And if it is significant, what would you advise as a counter-measure? I 
could see unused-association-garbage collection as a useful addition to 
Remoting, I would be happy to help out if useful.

*Quarantining*
When remote watch fails for some remote actor system, that actor system 
gets quarantined. In my case, that is a bit radical, as I don't necessarily 
have control over either of those ActorSystems. Without the ability to 
reboot either ActorSystem, these systems would continue treating each other 
as 'down' even though the partition may have long passed. I could 
instantiate a failure detector explicitly instead of using context.watch, 
in a way that quarantining is not a consequence of failure-detection. 
However, it feels like I am missing something with such a simple solution. 
Why is quarantining as persistent as it is, if skipping it has no downside? 
What would you guys advise for the case where restarting actor-systems is 
not an option yet you would like to use failure-detection?

Thanks in advance for your insights!

Kind regards,

Merlijn

-- 
>>>>>>>>>>      Read the docs: http://akka.io/docs/
>>>>>>>>>>      Check the FAQ: 
>>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>>      Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

Reply via email to