Hi Gary, All,
Some comments inline. Thanks, ~Sumit. From: netstack-bounces+snaiksat=cisco....@lists.launchpad.net [mailto:netstack-bounces+snaiksat=cisco....@lists.launchpad.net] On Behalf Of Gary Kotton Sent: Friday, May 11, 2012 1:29 AM To: Maru Newby Cc: Christopher Wright; netstack@lists.launchpad.net Subject: Re: [Netstack] ScalableAgents(https://blueprints.launchpad.net/quantum/+spec/scalable-a gent-comms) Hi, Thanks for the input and comments. This is really great. I would like to propose the following staged development (enable to stabilize, test, and then optimize): 1. Stage 1 - have the agent detect a change, initially by polling. When the agent detects and update then it will contact the plugin for a detailed update about the network. <Sumit> Currently the agents work by first polling the Quantum DB to detect changes in the network/port state, and then accordingly react locally (to check for the presence of a tap device, etc.). As I understand, you are proposing to switch the sequence of this logic, i.e., the agent first detects a change in the local state (e.g., a new tap device has been created) and then communicates with the Quantum plugin to obtain more context for this change. If this is the thought, I believe it is reasonable and will eliminate the overhead from polling the DB. The premise here of course is that the agent is able to locally detect all the changes that it needs to react to, and in the basic case of the Linux Bridge plugin, I don't think there is anything beyond creation of tap devices, so this should work. If there are other state changes introduced by the Quantum plugin that the agent needs to react to, then the agent would not know about these in the absence of a notification mechanism.</Sumit> 2. Stage 2 - Event driven support. One option is to have the operating system notify the agent (as suggested by Darragh) another is to have the VIF driver notify the agent. I am in favour of the latter. The VIF driver is essentially creating the new tap device or deleting the existing tap device. It seems logical that this would drive the update on the agent. <Sumit> I would actually prefer the former approach. It's better to decouple the components to the extent possible so as to be able to update/reuse them independently. I doubt that we will get any significant performance advantage out of making the VIF driver aware of the agents and having them communicate with those explicitly. However, it does introduce a stronger coupling and we should probably avoid it.</Sumit> What I would like to do is a quick POC of the above and then write a detailed design of the flow so that we can all review. If it "compiles and runs" on paper then it will speed up the development, testing and deployment. It will also enable us to document for future reference. This will also save time with review and the ping/pong with the -1's. <Sumit> Great, thanks for doing this! </Sumit> Have a good weekend and thanks for the inputs and comments. Hopefully next week I'll have an update on the progress. Thanks Gary On 05/11/2012 12:30 AM, Maru Newby wrote: Thanks Darragh! That should cover kvm. And apparently it's possible to be notified of vif changes from xen/xcp too, a more xen-savvy co-worker is tracking down details. Gary, it sounds like it will be possible to have the agent notified directly of device changes. What are you thoughts as to modifying your proposal to take this into account? Cheers, Maru On 2012-05-10, at 2:06 PM, Darragh OReilly wrote: maybe udev events rules/actions could be installed for add/remove tap device events http://www.reactivated.net/writing_udev_rules.html#external-run ________________________________ From: Maru Newby <mne...@internap.com> To: gkot...@redhat.com Cc: Christopher Wright <chr...@redhat.com>; netstack@lists.launchpad.net Sent: Thursday, 10 May 2012, 18:38 Subject: Re: [Netstack] Scalable Agents(https://blueprints.launchpad.net/quantum/+spec/scalable-agent-com ms) Hi Gary, I appreciate the effort you've put into condensing the options. I agree with your suggestion that option 1 is a good starting point. How will the agent discover changes to tap devices? Can an agent register for events from linux/kvm or xen, or would the agent just poll? For all I know agents may do this already, so I apologize if this is a silly question. Regarding option 2, I still see no reason to have the vif driver talk to the agent directly. Ensuring a single point of contact between quantum clients (of which the vif driver is one) and quantum, namely the rest interface, limits complexity and will be easier to maintain and test. If and when performance or other concerns require direct vif driver to agent communication, we can go down that road, but as of now it's answering a question that hasn't been asked. YAGNI. I would also argue that even RPC communication between the plugin and agent is gold-plating. The problem at hand is that database polling doesn't scale well. The simple answer is for the plugin and agent to communicate directly rather than through a database intermediary. Adding RPC to the mix is an implementation detail, pure and simple, and is not cost-free. RPC introduces queue dependency that can be problematic to debug and as we've seen in nova can cause performance issues all its own. I'm all for leaving us open going forward to introduce an RPC dependency, but I think the most important thing is to create a clean communication interface between plugin and agent. The initial implementation can be something simple (and relatively dependency free) like secured http. The semantics for implementing and debugging http communication are well-known to all of us. If and when RPC becomes necessary, it will be straightforward to plug in a new transport driver. Let's keep it simple - distributed computing is complicated enough! Cheers, Maru On 2012-05-10, at 8:22 AM, Gary Kotton wrote: Hi, Below is a table that lists a number of options, a short description, their advantages and disadvantages. Hopefully this can give an idea of the scope and complexity. Option Description Advantages Disadvantages 1 .Agent driving data retrieval from plugin The agent maintains a list of tap devices. If there is a new tap device then the agent will request the network information for this tap device from the quantum plugin. In the case of the open source ovs and lb (linuxbridge) plugins this is tap + 11 letters of the attachment id. The agent will send an RCP update about the delta to the plugin. The plugin will answer accordingly. For example if one or more tap devices are detected then these are sent to the plugin. For each new tap device the plugin will sent the network information (tags etc) and set the database attachment as up. For deletion they will be removed (or set as down). Simple Self contained in Quantum If there is more than 1 attachment ID with the same prefix of 11 characters then this will not work (this currently is a bug) The agent will still have to poll the network interfaces. 2 .VIF driver driving retrieval from plugin The VIF driver updates the plugin about a change, which inturn updates the relevant agent (this was described in the link https://docs.google.com/document/d/1MbcBA2Os4b98ybdgAw2qe_68R1NG6KMh8zd ) Event driven. No polling VIF driver and agents will need to share communication channels 3. Plugin broadcasting When the plugin receives a change it broadcasts the change to all of the registered agents Relatively simple Lots of unnecessary messages to agents that do not need to deal with the traffic I think that option #1 is a good start. This can later be optimized to option #2. Thanks Gary On 05/10/2012 10:05 AM, Gary Kotton wrote: On 05/10/2012 12:55 AM, Sumit Naiksatam (snaiksat) wrote: Hi Gary, Thanks for initiating this. A couple of comments/questions - 1. Do we really need the VIF driver to communicate the agent's identity; I am referring to the agent ID being sent by the VIF driver in the message? In general, I am not sure if there is a need to have the VIF driver send messages/notifications in the first place, but I perhaps it's being included as a capability in the framework? At the moment the open source plugins are not aware of the agents. The agents poll the data base for updates. The agent ID enables a agent to regsiter with the plugin, this in trrun enables the plugin to send a update to the specific agent. The update is initiated by the VIF driver. In my opinion this does the following: 1. updates the agents as soon as possible regarding a network change 2. limits traffic on the network 3. removes the database interface from the agents 2. One model I was thinking of (which is kind of inline with the existing agent implementations), is where the agents are smart, and they know what to do in response to changes in the state of the logical Quantum resources. In such cases, the Quantum plugin need not have to keep track of sending a message to a particular agent. Instead, can we have broadcast messages from the plugin to all the agents? If the plugin has to unicast messages to specific agents, then it needs to maintain a lot more state/topology information which should not be mandated for this sole reason. I too thought about this option. In a sense the above proposal is an optimization of what you mention. This comes at the cost of complexity. The broadcast option is nice when the number of agents is small. When this is large, then for each network update there will be NUMBER_OF_AGENT messages sent for each update. The advantage of what you mention is that the code is self contained in Quantum. It may be better to start with the broadcast and then deal with the optimizations afterwards. Thanks Gary Thanks, ~Sumit. -----Original Message----- From: netstack-bounces+snaiksat=cisco....@lists.launchpad.net [mailto:netstack-bounces+snaiksat=cisco....@lists.launchpad.net] On Behalf Of Gary Kotton Sent: Wednesday, May 09, 2012 4:27 AM To:<netstack@lists.launchpad.net> <mailto:netstack@lists.launchpad.net> Subject: [Netstack] Scalable Agents(https://blueprints.launchpad.net/quantum/+spec/scalable-agent- comms) Hi, I have added a very high level description on how to address the issue. This can be seen at: https://docs.google.com/document/d/1MbcBA2Os4b98ybdgAw2qe_68R1NG6KMh8zd ZKgOlpvg/edit Comments will be greatly appreciated. Questions: 1. Do we want agents to be backward compatible (that is, still maintain the polling code) 2. The generation of the Agent ID 3. Any other ideas or thoughts about the matter? I'd like to go ahead with a POC and implement this. Thanks Gary -- Mailing list: https://launchpad.net/~netstack <https://launchpad.net/%7Enetstack> Post to : netstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~netstack <https://launchpad.net/%7Enetstack> More help : https://help.launchpad.net/ListHelp -- Mailing list: https://launchpad.net/~netstack <https://launchpad.net/%7Enetstack> Post to : netstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~netstack <https://launchpad.net/%7Enetstack> More help : https://help.launchpad.net/ListHelp -- Mailing list: https://launchpad.net/~netstack <https://launchpad.net/%7Enetstack> Post to : netstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~netstack <https://launchpad.net/%7Enetstack> More help : https://help.launchpad.net/ListHelp
-- Mailing list: https://launchpad.net/~netstack Post to : netstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~netstack More help : https://help.launchpad.net/ListHelp