Apologies for the delay on providing feedback. Comments below: Memory Usage I thought we had discussed this in the past and decided it wasn’t worth the effort. Has that changed? From a technical perspective this seems a reasonable approach, particularly if it is being used elsewhere in the codebase as the code snippets seem to imply. If implemented this should be used is in the routing node ranking calculations: i.e. the desirability as an RN for TC nodes should drop as less memory is available.
Refactoring Completely agree, I suspect the reason those are separate methods was more risk mitigation rather than anything else. My question on this is: is it worth the effort? I assume the main goal there is simply maintainability in that there is a common code path rather than two (largely) duplicate code paths? Message Queues There was a lot of work done to try and make sure that control messages could flow even in the presence of heavy traffic. I believe that was the reason for splitting out the handing of data vs. control plane messaging. Todd, Sheshambika and to a lesser extent myself spend a lot of time discussing system behavior and how to avoid “congestive collapse” which is something that plagues early alliance releases, these changes were made to try and address those issues. An example of congestive collapse was when you had a slow reader (i.e. a node that dequeued messages very slowly) this would back traffic up, unfortunately for any multicast messages, that meant other nodes would also not get their traffic, and even worse this particular RN would stop sending and receiving traffic, creating another slow reader. This would spread through the system until no traffic moved. Incidentally, this problem is also why things like the UDP keep-alive are built into ARDP rather than a higher layer. In general there is no specific proposal about what to change, it mainly describes the current behavior and implies thee is a problem: e.g. suggesting that the PushXXXNode() messages have the removal mechanism reworked, but it isn’t clear to me what the proposed change is. Before anything is changed here I would like to understand exactly what problem is being solved by touching this code, and that we understand what the implications would be for a system under load (like the slow reader scenario above). This code is at the center of how the system works, and while I cannot dispute that things could be done better there, we need to be really careful about what those changes are and how they will impact the system. Timeouts Again, this is very touchy code, and before changing it we need to understand what problem is being solved. It is important to characterize the existing behavior, and what the new behavior would be. Sender vs Receiver The problem being solved here is what we saw in practice: we had nodes that crashed, or were CPU bound and so were not draining the queues: so the corresponding RN would back up traffic, which would then cause it to become less responsive and so on. Yes we could add some mechanisms to flow control applications that tend to “spew” traffic onto the network. This however will not solve the problem of excessively slow readers, though it might help mitigate the issue. In short one cannot assume that rate limiting a sender solves the problems of slow readers. We solved the slow reader problem, not the aggressive sender problem. The system would certainly be better if both were implemented, but we cannot replace the existing implementation with it’s complement. From: [email protected] [mailto:[email protected]] On Behalf Of Lioy, Marcello Sent: Thursday, July 14, 2016 4:09 PM To: Andrey Krokhin <[email protected]>; Daniel Mihai <[email protected]>; Way Vadhanasin <[email protected]>; Josh Spain <[email protected]>; Arvind Padole <[email protected]> Cc: '[email protected]' <[email protected]> Subject: Re: [Allseen-core] ASACORE-2946 Notes and Scope of Changes + working group I have uploaded the proposal to the wiki in the Technical Proposals<https://wiki.allseenalliance.org/core/overview?&#technical_proposals> section. I will review and provide feedback before the end of the week. From: Andrey Krokhin [mailto:[email protected]] Sent: Monday, July 11, 2016 11:22 AM To: Daniel Mihai <[email protected]>; Way Vadhanasin <[email protected]>; Josh Spain <[email protected]>; Lioy, Marcello <[email protected]>; Arvind Padole <[email protected]> Subject: ASACORE-2946 Notes and Scope of Changes Hello Marcello and Arvind, Please take a look at the attached document, containing an outline of proposed changes to the RemoteEndpoint class. It originated from ASACORE-2946, but I think it needs to be split into several JIRA tickets due to scope of changes. Please provide your feedback on the document, and let me know if we need to have a separate conference call to discuss it. Thanks, Andrey ---------- Forwarded message ---------- From: Daniel Mihai <[email protected]<mailto:[email protected]>> Date: Fri, Jul 8, 2016 at 8:14 PM Subject: RE: ASACORE-2946 Notes and Scope of Changes To: Andrey Krokhin <[email protected]<mailto:[email protected]>> Cc: Way Vadhanasin <[email protected]<mailto:[email protected]>>, Josh Spain <[email protected]<mailto:[email protected]>> Nice work Andrey! This doc looks like a good starting point for further discussions with Marcello and others. I think you should check with Josh about the next step. If this was my doc, I would: 1. Send out the doc to more people, including Marcello and Arvind, asking for feedback 2. Ask the same folks if we should have an “ad-hoc technical meeting” to discuss these proposals and gather feedback I expect that Marcello will have useful feedback – either in writing or in the ad-hoc meeting. Thanks. From: Andrey Krokhin [mailto:[email protected]<mailto:[email protected]>] Sent: Thursday, July 7, 2016 5:33 AM To: Daniel Mihai <[email protected]<mailto:[email protected]>> Cc: Way Vadhanasin <[email protected]<mailto:[email protected]>>; Josh Spain <[email protected]<mailto:[email protected]>> Subject: Re: ASACORE-2946 Notes and Scope of Changes Updated document attached. Let me know if corrections are needed. - Andrey On Wed, Jul 6, 2016 at 6:27 PM, Daniel Mihai <[email protected]<mailto:[email protected]>> wrote: I think the next step is to send out the latest doc – either just to us here, or include Marcello & Arvind too – I have no preference. There is no template. Thanks! From: Andrey Krokhin [mailto:[email protected]<mailto:[email protected]>] Sent: Wednesday, July 6, 2016 4:24 PM To: Daniel Mihai <[email protected]<mailto:[email protected]>> Cc: Way Vadhanasin <[email protected]<mailto:[email protected]>>; Josh Spain <[email protected]<mailto:[email protected]>> Subject: Re: ASACORE-2946 Notes and Scope of Changes I have prepared an updated document that incorporates most of the points we discussed, and removed alternatives that we agreed not to implement. However, it doesn't have Dan's or Way's comments inside the document (edit history). Should I send you the updated document? As I understand, eventually this needs to be sent to Marcello, should I rewrite according to a specific template? What are the next steps? [Image removed by sender. http://i61.tinypic.com/5luc5u.png.] Andrey Krokhin, Software Engineer Affinegy 1705 S. Capital of Texas Hwy, Ste. 310, Austin, TX, 78746 512.535.1700<tel:512.535.1700> [email protected]<mailto:[email protected]> http://affinegy.com<http://affinegy.com/>
_______________________________________________ Allseen-core mailing list [email protected] https://lists.allseenalliance.org/mailman/listinfo/allseen-core
