Apologies for the delay on providing feedback.  Comments below:
Memory Usage
I thought we had discussed this in the past and decided it wasn’t worth the 
effort.  Has that changed?  From a technical perspective this seems a 
reasonable approach, particularly if it is being used elsewhere in the codebase 
as the code snippets seem to imply.  If implemented  this should be used is in 
the routing node ranking calculations: i.e. the desirability as an RN for TC 
nodes should drop as less memory is available.

Refactoring
Completely agree, I suspect the reason those are separate methods was more risk 
mitigation rather than anything else.  My question on this is: is it worth the 
effort?  I assume the main goal there is simply maintainability in that there 
is a common code path rather than two (largely) duplicate code paths?

Message Queues
There was a lot of work done to try and make sure that control messages could 
flow even in the presence of heavy traffic.  I believe that was the reason for 
splitting out the handing of data vs. control plane messaging.  Todd, 
Sheshambika and to a lesser extent myself spend a lot of time discussing system 
behavior and how to avoid “congestive collapse” which is something that plagues 
early alliance releases, these changes were made to try and address those 
issues.  An example of congestive collapse was when you had a slow reader (i.e. 
a node that dequeued messages very slowly) this would back traffic up, 
unfortunately for any multicast messages, that meant other nodes would also not 
get their traffic, and even worse this particular RN would stop sending and 
receiving traffic, creating another slow reader.  This would spread through the 
system until no traffic moved.  Incidentally, this problem is also why things 
like the UDP keep-alive are built into ARDP rather than a higher layer.

In general there is no specific proposal about what to change, it mainly 
describes the current behavior and implies thee is a problem: e.g. suggesting 
that the PushXXXNode() messages have the removal mechanism reworked, but it 
isn’t clear to me what the proposed change is.

Before anything is changed here I would like to understand exactly what problem 
is being solved by touching this code, and that we understand what the 
implications would be for a system under load (like the slow reader scenario 
above).  This code is at the center of how the system works, and while I cannot 
dispute that things could be done better there, we need to be really careful 
about what those changes are and how they will impact the system.

Timeouts
Again, this is very touchy code, and before changing it we need to understand 
what problem is being solved.  It is important to characterize the existing 
behavior, and what the new behavior would be.

Sender vs Receiver
The problem being solved here is what we saw in practice: we had nodes that 
crashed, or were CPU bound and so were not draining the queues: so the 
corresponding RN would back up traffic, which would then cause it to become 
less responsive and so on.  Yes we could add some mechanisms to flow control 
applications that tend to “spew” traffic onto the network.  This however will 
not solve the problem of excessively slow readers, though it might help 
mitigate the issue.  In short one cannot assume that rate limiting a sender 
solves the problems of slow readers.  We solved the slow reader problem, not 
the aggressive sender problem.  The system would certainly be better if both 
were implemented, but we cannot replace the existing implementation with it’s 
complement.

From: [email protected] 
[mailto:[email protected]] On Behalf Of Lioy, 
Marcello
Sent: Thursday, July 14, 2016 4:09 PM
To: Andrey Krokhin <[email protected]>; Daniel Mihai 
<[email protected]>; Way Vadhanasin 
<[email protected]>; Josh Spain <[email protected]>; Arvind 
Padole <[email protected]>
Cc: '[email protected]' 
<[email protected]>
Subject: Re: [Allseen-core] ASACORE-2946 Notes and Scope of Changes

+ working group

I have uploaded the proposal to the wiki in the Technical 
Proposals<https://wiki.allseenalliance.org/core/overview?&#technical_proposals> 
section.  I will review and provide feedback before the end of the week.

From: Andrey Krokhin [mailto:[email protected]]
Sent: Monday, July 11, 2016 11:22 AM
To: Daniel Mihai <[email protected]>; Way Vadhanasin 
<[email protected]>; Josh Spain <[email protected]>; Lioy, 
Marcello <[email protected]>; Arvind Padole 
<[email protected]>
Subject: ASACORE-2946 Notes and Scope of Changes

Hello Marcello and Arvind,

Please take a look at the attached document, containing an outline of proposed 
changes to the RemoteEndpoint class. It originated from ASACORE-2946, but I 
think it needs to be split into several JIRA tickets due to scope of changes.

Please provide your feedback on the document, and let me know if we need to 
have a separate conference call to discuss it.

                       Thanks,
                       Andrey


---------- Forwarded message ----------
From: Daniel Mihai 
<[email protected]<mailto:[email protected]>>
Date: Fri, Jul 8, 2016 at 8:14 PM
Subject: RE: ASACORE-2946 Notes and Scope of Changes
To: Andrey Krokhin <[email protected]<mailto:[email protected]>>
Cc: Way Vadhanasin 
<[email protected]<mailto:[email protected]>>, 
Josh Spain <[email protected]<mailto:[email protected]>>
Nice work Andrey! This doc looks like a good starting point for further 
discussions with Marcello and others.

I think you should check with Josh about the next step.

If this was my doc, I would:

1.       Send out the doc to more people, including Marcello and Arvind, asking 
for feedback

2.       Ask the same folks if we should have an “ad-hoc technical meeting” to 
discuss these proposals and gather feedback

I expect that Marcello will have useful feedback – either in writing or in the 
ad-hoc meeting.

Thanks.

From: Andrey Krokhin 
[mailto:[email protected]<mailto:[email protected]>]
Sent: Thursday, July 7, 2016 5:33 AM
To: Daniel Mihai <[email protected]<mailto:[email protected]>>

Cc: Way Vadhanasin 
<[email protected]<mailto:[email protected]>>; 
Josh Spain <[email protected]<mailto:[email protected]>>
Subject: Re: ASACORE-2946 Notes and Scope of Changes

Updated document attached. Let me know if corrections are needed.
                  - Andrey

On Wed, Jul 6, 2016 at 6:27 PM, Daniel Mihai 
<[email protected]<mailto:[email protected]>> wrote:
I think the next step is to send out the latest doc – either just to us here, 
or include Marcello & Arvind too – I have no preference.

There is no template.

Thanks!

From: Andrey Krokhin 
[mailto:[email protected]<mailto:[email protected]>]
Sent: Wednesday, July 6, 2016 4:24 PM

To: Daniel Mihai <[email protected]<mailto:[email protected]>>
Cc: Way Vadhanasin 
<[email protected]<mailto:[email protected]>>; 
Josh Spain <[email protected]<mailto:[email protected]>>
Subject: Re: ASACORE-2946 Notes and Scope of Changes

I have prepared an updated document that incorporates most of the points we 
discussed, and removed alternatives that we agreed not to implement.
However, it doesn't have Dan's or Way's comments inside the document (edit 
history).
Should I send you the updated document? As I understand, eventually this needs 
to be sent to Marcello, should I rewrite according to a specific template?
What are the next steps?

[Image removed by sender. http://i61.tinypic.com/5luc5u.png.]


Andrey Krokhin, Software Engineer

Affinegy

1705 S. Capital of Texas Hwy, Ste. 310, Austin, TX, 78746

512.535.1700<tel:512.535.1700>

[email protected]<mailto:[email protected]>   
http://affinegy.com<http://affinegy.com/>
_______________________________________________
Allseen-core mailing list
[email protected]
https://lists.allseenalliance.org/mailman/listinfo/allseen-core

Reply via email to