I think the priority is relatively high due to multiple problems:
1. Deadlocks and complexity associated with Max Alarms
a. If I remember correctly, Max Alarms was the initial blocker for HPs
recent work
b. I believe Todd & Sheshambika asked for Push changes, in the attached
email, before they felt comfortable with removing Max Alarms. Todd said
LocalEndpoint – but I guess that he meant RemoteEndpoint.
2. I believe the current policy of disconnecting LNs is unfair – and
interferes with our testing, if nothing else
a. An example is in https://jira.allseenalliance.org/browse/ASACORE-2946
b. There may be additional examples in the doc maintained by Andrey
3. There is no clear way to disable all of these limits in the Windows
RN. We would like to disable them.
4. The calculation “maxControlMessages = sendTimeout *
MAX_CONTROL_MSGS_PER_SECOND” seems bizarre.
5. Maybe additional confidence in RN’s ability to guard itself from
misbehaving LNs will let us avoid deadlocks related to the Bus Attachment
Concurrency Limit too
a. We decided to get rid of these deadlocks at the ASA summit many months
ago, but we still have this problem
b. Every Alljoyn app developer I have seen inside Microsoft sends an
email sooner or later saying: “is AllJoyn Core supposed to deadlock when I make
a method call from my Announce callback?”
c. Way, did HP get bitten by this one too?
The risk is also relatively high, so I agree that we’ll need to consider all
our options carefully.
Marcello, I still don’t understand why Control Messages would be treated
differently than messages generated by apps, when it comes to enforcing memory
quotas.
Thanks.
From: Andrey Krokhin [mailto:[email protected]]
Sent: Tuesday, July 19, 2016 11:47 PM
To: Lioy, Marcello <[email protected]>
Cc: Daniel Mihai <[email protected]>; Way Vadhanasin
<[email protected]>; Josh Spain <[email protected]>; Arvind
Padole <[email protected]>; [email protected]
Subject: Re: ASACORE-2946 Notes and Scope of Changes
Marcello, thanks for the feedback. Regarding your questions whether a
particular feature is "worth the effort", I don't know that--this is just a
proposal, and you can assign different priorities to different tasks, or drop
them completely. I made changes to the document (it's now at version 3) based
on your and Josh's comments. I tried to make more clear what the proposed
changes are (based on our previous discussions and email threads). I should
note that:
- I don't know how how to set up a test scenario for multicast messages, and I
don't believe that we discussed that with Dan and Way. I didn't know about the
"slow reader" problem.
- In "Sender vs Receiver", you say "In short one cannot assume that rate
limiting a sender solves the problems of slow readers. The system would
certainly be better if both were implemented, but we cannot replace the
existing implementation with it’s complement". If you look at the wording of
original ASACORE-2946, which originated this whole discussion, it says "I
believe that in this case the RN should punish/disconnect the noisy LN2 and
others, not the innocent but slowish LN1." So it looks like it is asking for
"rate limiting a sender". We might be able to preserve existing behavior
(disconnecting RNs) while also adding disconnect conditions for RN.
I had to add unresolved questions as "TBD" (to be defined later) in the
document. This is partly due to my knowledge gaps. If you had previous
discussions and documents regarding these issues, you understand them much
better, and can replace the TBD sections with concrete proposals. Also, please
feel free to make changes where you feel I was not reflecting your comments.
Let me know if you would like me to place this updated document in the Wiki.
- Andrey
On Mon, Jul 18, 2016 at 12:36 PM, Lioy, Marcello
<[email protected]<mailto:[email protected]>> wrote:
Apologies for the delay on providing feedback. Comments below:
Memory Usage
I thought we had discussed this in the past and decided it wasn’t worth the
effort. Has that changed? From a technical perspective this seems a
reasonable approach, particularly if it is being used elsewhere in the codebase
as the code snippets seem to imply. If implemented this should be used is in
the routing node ranking calculations: i.e. the desirability as an RN for TC
nodes should drop as less memory is available.
Refactoring
Completely agree, I suspect the reason those are separate methods was more risk
mitigation rather than anything else. My question on this is: is it worth the
effort? I assume the main goal there is simply maintainability in that there
is a common code path rather than two (largely) duplicate code paths?
Message Queues
There was a lot of work done to try and make sure that control messages could
flow even in the presence of heavy traffic. I believe that was the reason for
splitting out the handing of data vs. control plane messaging. Todd,
Sheshambika and to a lesser extent myself spend a lot of time discussing system
behavior and how to avoid “congestive collapse” which is something that plagues
early alliance releases, these changes were made to try and address those
issues. An example of congestive collapse was when you had a slow reader (i.e.
a node that dequeued messages very slowly) this would back traffic up,
unfortunately for any multicast messages, that meant other nodes would also not
get their traffic, and even worse this particular RN would stop sending and
receiving traffic, creating another slow reader. This would spread through the
system until no traffic moved. Incidentally, this problem is also why things
like the UDP keep-alive are built into ARDP rather than a higher layer.
In general there is no specific proposal about what to change, it mainly
describes the current behavior and implies thee is a problem: e.g. suggesting
that the PushXXXNode() messages have the removal mechanism reworked, but it
isn’t clear to me what the proposed change is.
Before anything is changed here I would like to understand exactly what problem
is being solved by touching this code, and that we understand what the
implications would be for a system under load (like the slow reader scenario
above). This code is at the center of how the system works, and while I cannot
dispute that things could be done better there, we need to be really careful
about what those changes are and how they will impact the system.
Timeouts
Again, this is very touchy code, and before changing it we need to understand
what problem is being solved. It is important to characterize the existing
behavior, and what the new behavior would be.
Sender vs Receiver
The problem being solved here is what we saw in practice: we had nodes that
crashed, or were CPU bound and so were not draining the queues: so the
corresponding RN would back up traffic, which would then cause it to become
less responsive and so on. Yes we could add some mechanisms to flow control
applications that tend to “spew” traffic onto the network. This however will
not solve the problem of excessively slow readers, though it might help
mitigate the issue. In short one cannot assume that rate limiting a sender
solves the problems of slow readers. We solved the slow reader problem, not
the aggressive sender problem. The system would certainly be better if both
were implemented, but we cannot replace the existing implementation with it’s
complement.
From:
[email protected]<mailto:[email protected]>
[mailto:[email protected]<mailto:[email protected]>]
On Behalf Of Lioy, Marcello
Sent: Thursday, July 14, 2016 4:09 PM
To: Andrey Krokhin <[email protected]<mailto:[email protected]>>;
Daniel Mihai <[email protected]<mailto:[email protected]>>;
Way Vadhanasin
<[email protected]<mailto:[email protected]>>;
Josh Spain <[email protected]<mailto:[email protected]>>; Arvind Padole
<[email protected]<mailto:[email protected]>>
Cc:
'[email protected]<mailto:[email protected]>'
<[email protected]<mailto:[email protected]>>
Subject: Re: [Allseen-core] ASACORE-2946 Notes and Scope of Changes
+ working group
I have uploaded the proposal to the wiki in the Technical
Proposals<https://wiki.allseenalliance.org/core/overview?&#technical_proposals>
section. I will review and provide feedback before the end of the week.
From: Andrey Krokhin
[mailto:[email protected]<mailto:[email protected]>]
Sent: Monday, July 11, 2016 11:22 AM
To: Daniel Mihai
<[email protected]<mailto:[email protected]>>; Way Vadhanasin
<[email protected]<mailto:[email protected]>>;
Josh Spain <[email protected]<mailto:[email protected]>>; Lioy, Marcello
<[email protected]<mailto:[email protected]>>; Arvind Padole
<[email protected]<mailto:[email protected]>>
Subject: ASACORE-2946 Notes and Scope of Changes
Hello Marcello and Arvind,
Please take a look at the attached document, containing an outline of proposed
changes to the RemoteEndpoint class. It originated from ASACORE-2946, but I
think it needs to be split into several JIRA tickets due to scope of changes.
Please provide your feedback on the document, and let me know if we need to
have a separate conference call to discuss it.
Thanks,
Andrey
---------- Forwarded message ----------
From: Daniel Mihai
<[email protected]<mailto:[email protected]>>
Date: Fri, Jul 8, 2016 at 8:14 PM
Subject: RE: ASACORE-2946 Notes and Scope of Changes
To: Andrey Krokhin <[email protected]<mailto:[email protected]>>
Cc: Way Vadhanasin
<[email protected]<mailto:[email protected]>>,
Josh Spain <[email protected]<mailto:[email protected]>>
Nice work Andrey! This doc looks like a good starting point for further
discussions with Marcello and others.
I think you should check with Josh about the next step.
If this was my doc, I would:
1. Send out the doc to more people, including Marcello and Arvind, asking
for feedback
2. Ask the same folks if we should have an “ad-hoc technical meeting” to
discuss these proposals and gather feedback
I expect that Marcello will have useful feedback – either in writing or in the
ad-hoc meeting.
Thanks.
From: Andrey Krokhin
[mailto:[email protected]<mailto:[email protected]>]
Sent: Thursday, July 7, 2016 5:33 AM
To: Daniel Mihai <[email protected]<mailto:[email protected]>>
Cc: Way Vadhanasin
<[email protected]<mailto:[email protected]>>;
Josh Spain <[email protected]<mailto:[email protected]>>
Subject: Re: ASACORE-2946 Notes and Scope of Changes
Updated document attached. Let me know if corrections are needed.
- Andrey
On Wed, Jul 6, 2016 at 6:27 PM, Daniel Mihai
<[email protected]<mailto:[email protected]>> wrote:
I think the next step is to send out the latest doc – either just to us here,
or include Marcello & Arvind too – I have no preference.
There is no template.
Thanks!
From: Andrey Krokhin
[mailto:[email protected]<mailto:[email protected]>]
Sent: Wednesday, July 6, 2016 4:24 PM
To: Daniel Mihai <[email protected]<mailto:[email protected]>>
Cc: Way Vadhanasin
<[email protected]<mailto:[email protected]>>;
Josh Spain <[email protected]<mailto:[email protected]>>
Subject: Re: ASACORE-2946 Notes and Scope of Changes
I have prepared an updated document that incorporates most of the points we
discussed, and removed alternatives that we agreed not to implement.
However, it doesn't have Dan's or Way's comments inside the document (edit
history).
Should I send you the updated document? As I understand, eventually this needs
to be sent to Marcello, should I rewrite according to a specific template?
What are the next steps?
[Image removed by sender. http://i61.tinypic.com/5luc5u.png.]
Andrey Krokhin, Software Engineer
Affinegy
1705 S. Capital of Texas Hwy, Ste. 310, Austin, TX, 78746
512.535.1700<tel:512.535.1700>
[email protected]<mailto:[email protected]>
http://affinegy.com<http://affinegy.com/>
--
[http://i61.tinypic.com/5luc5u.png.]
Andrey Krokhin, Software Engineer
Affinegy
1705 S. Capital of Texas Hwy, Ste. 310, Austin, TX, 78746
512.535.1700<tel:512.535.1700>
[email protected]<mailto:[email protected]>
http://affinegy.com<http://affinegy.com/>
--- Begin Message ---
Outcome of the meeting, please add if I missed anything:
* Dan to put current patch on feature branch of his choosing
* Discuss putting some queue management functionality in local endpoint
PushMessage specifically, and push that onto feature branch
-Todd
-----Original Appointment-----
From: Daniel Mihai [mailto:[email protected]]
Sent: Wednesday, December 16, 2015 4:26 PM
To: Daniel Mihai; Venkateshwaran, Sheshambika; Malsbary, Todd
Cc: Lioy, Marcello; Way Vadhanasin
Subject: discuss proposal to remove the Timer maxAlarms feature (ASACORE-2650)
When: Friday, December 18, 2015 10:00 AM-11:00 AM (UTC-08:00) Pacific Time (US
& Canada).
Where: Skype Meeting
Moving to Friday, as we discussed.
.........................................................................................................................................
--> Join Skype
Meeting<https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fmeet.lync.com%2fmicrosoft%2fdmihai%2f4D6D9DQW&data=01%7c01%7cDaniel.Mihai%40microsoft.com%7cc05a25f77a944fa1939608d307da0f02%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=cbk5JdvOQx1erMQNWWIeuIfjjI5L7g%2bzolNg%2b%2fHFnP4%3d>
This is an online meeting for Skype for Business, the professional meetings and
communications app formerly known as Lync.
Join by Phone
Toll-free number: +1 (866) 641-7188<tel:+1%20(866)%20641-7188>
Toll number: +1 (773) 917-4061<tel:+1%20(773)%20917-4061>
Find a local
number<https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fdialin.lync.com%2fmicrosoft.com%2fdmihai&data=01%7c01%7cDaniel.Mihai%40microsoft.com%7cc05a25f77a944fa1939608d307da0f02%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=jcMh5G43m3OxxDudqFHvbevSlSmXxYL5T1tS88NVdR0%3d>
Conference ID: 409674
Help<https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2fgo.microsoft.com%2ffwlink%2f%3fLinkId%3d389737&data=01%7c01%7cDaniel.Mihai%40microsoft.com%7cc05a25f77a944fa1939608d307da0f02%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=tG1lwz7IDKUesJTJ5qJTCHFc8JC9s2Z9BnClaTS1r5Q%3d>
Welcome to Skype for Business Online Meetings! Please note, all participants
using the Join by Phone option must dial the full number for access to the call!
[!OC([1033])!]
.........................................................................................................................................
--- End Message ---
_______________________________________________
Allseen-core mailing list
[email protected]
https://lists.allseenalliance.org/mailman/listinfo/allseen-core