----- Original Message ----- > From: "Alon Bar-Lev" <[email protected]> > To: "Barak Azulay" <[email protected]> > Cc: "arch" <[email protected]>, "Simon Grinberg" <[email protected]> > Sent: Sunday, May 12, 2013 11:25:45 AM > Subject: Re: feature suggestion: initial generation of management network > > > > ----- Original Message ----- > > From: "Barak Azulay" <[email protected]> > > To: "Livnat Peer" <[email protected]> > > Cc: "Alon Bar-Lev" <[email protected]>, "arch" <[email protected]>, "Simon > > Grinberg" <[email protected]> > > Sent: Sunday, May 12, 2013 11:15:20 AM > > Subject: Re: feature suggestion: initial generation of management network > > > > > > > > ----- Original Message ----- > > > From: "Livnat Peer" <[email protected]> > > > To: "Moti Asayag" <[email protected]> > > > Cc: "arch" <[email protected]>, "Alon Bar-Lev" <[email protected]>, "Barak > > > Azulay" <[email protected]>, "Simon > > > Grinberg" <[email protected]> > > > Sent: Sunday, May 12, 2013 9:59:07 AM > > > Subject: Re: feature suggestion: initial generation of management network > > > > > > Thread Summary - > > > > > > 1. We all agree the automatic reboot after host installation is not > > > needed anymore and can be removed. > > > > > > 2. There is a vast agreement that we need to add a new VDSM verb for > > > reboot. > > > > I disagree with the above > > > > In addition to the fact that it will not work when VDSM is not responsive > > (when this action will be needed the most) > > If vdsm is unresponsive because of a fault in vdsm we can add a fail safe > mechanism for critical commands within vdsm. > And we can always fallback to the standard fencing in such cases. > > Can you please describe the scenario of which host-deploy succeeds and vdsm > is unresponsive? > > Current sequence: > 1. host-deploy + reboot - all via single ssh session. > > New sequence: > 1. host-deploy - via ssh. > 2. network setup - via vdsm.
I'd like to add that if step 2 fails, VDSM should rollback to the last known network configuration, therefore it shouldn't remain non-responsive in case the setup network command caused a communication lose. > 3. optional reboot - via vdsm. > > In the new sequence, vdsm must be responsive to accomplish (2), and if (2) > succeeds vdsm, again, must be responsive. > > Thanks! > > > > > > > > > > > 3. There was a suggestion to add a checkbox when adding a host to reboot > > > the host after installation, default would be not to reboot. (leaving > > > the option to reboot to the administrator). > > > > > > > > > If there is no objection we'll go with the above. > > > > > > Thanks, Livnat > > > > > > > > > On 05/07/2013 02:22 PM, Moti Asayag wrote: > > > > I stumbled upon few issues with the current design while implementing > > > > it: > > > > > > > > There seems to be a requirement to reboot the host after the > > > > installation > > > > is completed in order to assure the host is recoverable. > > > > > > > > Therefore, the building blocks of the installation process of 3.3 are: > > > > 1. host deploy which installs the host expect configuring its > > > > management > > > > network. > > > > 2. SetupNetwork (and CommitNetworkChanges) - for creating the > > > > management > > > > network > > > > on the host and persisting the network configuration. > > > > 3. Reboot the host - This is a missing piece. (engine has FenceVds > > > > command, > > > > but it > > > > requires the power management to be configured prior to the > > > > installation > > > > and might > > > > be irrelevant for hosts without PM.) > > > > > > > > So, there are couple of issues here: > > > > 1. How to reboot the host? > > > > 1.1. By exposing new RebootNode verb in VDSM and invoking it from the > > > > engine > > > > 1.2. By opening ssh dialog to the host in order to execute the reboot > > > > > > > > 2. When to perform the reboot? > > > > 2.1. After host deploy, by utilizing the host deploy to perform the > > > > reboot. > > > > It requires to configure the network by the monitor when the host is > > > > detected by the engine, > > > > detached from the installation flow. However it is a step toward the > > > > non-persistent network feature > > > > yet to be defined. > > > > 2.2. After setupNetwork is done and network was configured and > > > > persisted > > > > on > > > > the host. > > > > There is no special advantage from recoverable aspect, as setupNetwork > > > > is > > > > constantly > > > > used to persist the network configuration (by the complementary > > > > CommitNetworkChanges command). > > > > In case and network configuration fails, VDSM will revert to the last > > > > well > > > > known configuration > > > > - so connectivity with engine should be restored. Design wise, it fits > > > > to > > > > configure the management > > > > network as part of the installation sequence. > > > > If the network configuration fails in this context, the host status > > > > will > > > > be > > > > set to "InstallFailed" rather than "NonOperational", > > > > as might occur as a result of a failed setupNetwork command. > > > > > > > > > > > > Your inputs are welcome. > > > > > > > > Thanks, > > > > Moti > > > > ----- Original Message ----- > > > >> From: "Dan Kenigsberg" <[email protected]> > > > >> To: "Simon Grinberg" <[email protected]>, "Moti Asayag" > > > >> <[email protected]> > > > >> Cc: "arch" <[email protected]> > > > >> Sent: Tuesday, January 1, 2013 2:47:57 PM > > > >> Subject: Re: feature suggestion: initial generation of management > > > >> network > > > >> > > > >> On Thu, Dec 27, 2012 at 07:36:40AM -0500, Simon Grinberg wrote: > > > >>> > > > >>> > > > >>> ----- Original Message ----- > > > >>>> From: "Dan Kenigsberg" <[email protected]> > > > >>>> To: "Simon Grinberg" <[email protected]> > > > >>>> Cc: "arch" <[email protected]> > > > >>>> Sent: Thursday, December 27, 2012 2:14:06 PM > > > >>>> Subject: Re: feature suggestion: initial generation of management > > > >>>> network > > > >>>> > > > >>>> On Tue, Dec 25, 2012 at 09:29:26AM -0500, Simon Grinberg wrote: > > > >>>>> > > > >>>>> > > > >>>>> ----- Original Message ----- > > > >>>>>> From: "Dan Kenigsberg" <[email protected]> > > > >>>>>> To: "arch" <[email protected]> > > > >>>>>> Sent: Tuesday, December 25, 2012 2:27:22 PM > > > >>>>>> Subject: feature suggestion: initial generation of management > > > >>>>>> network > > > >>>>>> > > > >>>>>> Current condition: > > > >>>>>> ================== > > > >>>>>> The management network, named ovirtmgmt, is created during host > > > >>>>>> bootstrap. It consists of a bridge device, connected to the > > > >>>>>> network > > > >>>>>> device that was used to communicate with Engine (nic, bonding or > > > >>>>>> vlan). > > > >>>>>> It inherits its ip settings from the latter device. > > > >>>>>> > > > >>>>>> Why Is the Management Network Needed? > > > >>>>>> ===================================== > > > >>>>>> Understandably, some may ask why do we need to have a management > > > >>>>>> network - why having a host with IPv4 configured on it is not > > > >>>>>> enough. > > > >>>>>> The answer is twofold: > > > >>>>>> 1. In oVirt, a network is an abstraction of the resources > > > >>>>>> required > > > >>>>>> for > > > >>>>>> connectivity of a host for a specific usage. This is true for > > > >>>>>> the > > > >>>>>> management network just as it is for VM network or a display > > > >>>>>> network. > > > >>>>>> The network entity is the key for adding/changing nics and IP > > > >>>>>> address. > > > >>>>>> 2. In many occasions (such as small setups) the management > > > >>>>>> network is > > > >>>>>> used as a VM/display network as well. > > > >>>>>> > > > >>>>>> Problems in current connectivity: > > > >>>>>> ================================ > > > >>>>>> According to alonbl of ovirt-host-deploy fame, and with no > > > >>>>>> conflict > > > >>>>>> to > > > >>>>>> my own experience, creating the management network is the most > > > >>>>>> fragile, > > > >>>>>> error-prone step of bootstrap. > > > >>>>> > > > >>>>> +1, > > > >>>>> I've raise that repeatedly in the past, bootstrap should not create > > > >>>>> the management network but pick up the existing configuration and > > > >>>>> let the engine override later with it's own configuration if it > > > >>>>> differs , I'm glad that we finally get to that. > > > >>>>> > > > >>>>>> > > > >>>>>> Currently it always creates a bridged network (even if the DC > > > >>>>>> requires a > > > >>>>>> non-bridged ovirtmgmt), it knows nothing about the defined MTU > > > >>>>>> for > > > >>>>>> ovirtmgmt, it uses ping to guess on top of which device to build > > > >>>>>> (and > > > >>>>>> thus requires Vdsm-to-Engine reverse connectivity), and is the > > > >>>>>> sole > > > >>>>>> remaining user of the addNetwork/vdsm-store-net-conf scripts. > > > >>>>>> > > > >>>>>> Suggested feature: > > > >>>>>> ================== > > > >>>>>> Bootstrap would avoid creating a management network. Instead, > > > >>>>>> after > > > >>>>>> bootstrapping a host, Engine would send a getVdsCaps probe to the > > > >>>>>> installed host, receiving a complete picture of the network > > > >>>>>> configuration on the host. Among this picture is the device that > > > >>>>>> holds > > > >>>>>> the host's management IP address. > > > >>>>>> > > > >>>>>> Engine would send setupNetwork command to generate ovirtmgmt with > > > >>>>>> details devised from this picture, and according to the DC > > > >>>>>> definition > > > >>>>>> of > > > >>>>>> ovirtmgmt. For example, if Vdsm reports: > > > >>>>>> > > > >>>>>> - vlan bond4.3000 has the host's IP, configured to use dhcp. > > > >>>>>> - bond4 is comprises eth2 and eth3 > > > >>>>>> - ovirtmgmt is defined as a VM network with MTU 9000 > > > >>>>>> > > > >>>>>> then Engine sends the likes of: > > > >>>>>> setupNetworks(ovirtmgmt: {bridged=True, vlan=3000, iface=bond4, > > > >>>>>> bonding=bond4: {eth2,eth3}, MTU=9000) > > > >>>>> > > > >>>>> Just one comment here, > > > >>>>> In order to save time and confusion - if the ovirtmgmt is defined > > > >>>>> with default values meaning the user did not bother to touch it, > > > >>>>> let it pick up the VLAN configuration from the first host added in > > > >>>>> the Data Center. > > > >>>>> > > > >>>>> Otherwise, you may override the host VLAN and loose connectivity. > > > >>>>> > > > >>>>> This will also solve the situation many users encounter today. > > > >>>>> 1. The engine in on a host that actually has VLAN defined > > > >>>>> 2. The ovirtmgmt network was not updated in the DC > > > >>>>> 3. A host, with VLAN already defined is added - everything works > > > >>>>> fine > > > >>>>> 4. Any number of hosts are now added, again everything seems to > > > >>>>> work fine. > > > >>>>> > > > >>>>> But, now try to use setupNetworks, and you'll find out that you > > > >>>>> can't do much on the interface that contains the ovirtmgmt since > > > >>>>> the definition does not match. You can't sync (Since this will > > > >>>>> remove the VLAN and cause connectivity lose) you can't add more > > > >>>>> networks on top since it already has non-VLAN network on top > > > >>>>> according to the DC definition, etc. > > > >>>>> > > > >>>>> On the other hand you can't update the ovirtmgmt definition on the > > > >>>>> DC since there are clusters in the DC that use the network. > > > >>>>> > > > >>>>> The only workaround not involving DB hack to change the VLAN on the > > > >>>>> network is to: > > > >>>>> 1. Create new DC > > > >>>>> 2. Do not use the wizard that pops up to create your cluster. > > > >>>>> 3. Modify the ovirtmgmt network to have VLANs > > > >>>>> 4. Now create a cluster and add your hosts. > > > >>>>> > > > >>>>> If you insist on using the default DC and cluster then before > > > >>>>> adding the first host, create an additional DC and move the > > > >>>>> Default cluster over there. You may then change the network on the > > > >>>>> Default cluster and then move the Default cluster back > > > >>>>> > > > >>>>> Both are ugly. And should be solved by the proposal above. > > > >>>>> > > > >>>>> We do something similar for the Default cluster CPU level, where we > > > >>>>> set the intial level based on the first host added to the cluster. > > > >>>> > > > >>>> I'm not sure what Engine has for Default cluster CPU level. But I > > > >>>> have > > > >>>> reservation of the hysteresis in your proposal - after a host is > > > >>>> added, > > > >>>> the DC cannot forget ovirtmgmt's vlan. > > > >>>> > > > >>>> How about letting the admin edit ovirtmgmt's vlan in the DC level, > > > >>>> thus > > > >>>> rendering all hosts out-of-sync. The the admin could manually, or > > > >>>> through a script, or in the future through a distributed operation, > > > >>>> sync > > > >>>> all the hosts to the definition? > > > >>> > > > >>> Usually if you do that you will loose connectivity to the hosts. > > > >> > > > >> Yes, changing the management vlan id (or ip address) is never fun, and > > > >> requires out-of-band intervention. > > > >> > > > >>> I'm not insisting on the automatic adjustment of the ovirtmgmt > > > >>> network > > > >>> to > > > >>> match the hosts' (that is just a nice touch) we can take the allow > > > >>> edit > > > >>> approach. > > > >>> > > > >>> But allow to change VLAN on the ovirtmgmt network will indeed solve > > > >>> the > > > >>> issue I'm trying to solve while creating another issue of user > > > >>> expecting > > > >>> that we'll be able to re-tag the host from the engine side, which is > > > >>> challenging to do. > > > >>> > > > >>> On the other hand, if we allow to change the VLAN as long as the > > > >>> change > > > >>> matches the hosts' configuration, it will both solve the issue while > > > >>> not > > > >>> eluding the user to think that we really can solve the chicken and > > > >>> egg > > > >>> issue of re-tag the entire system. > > > >>> > > > >>> Now with the above ability you do get a flow to do the re-tag. > > > >>> 1. Place all the hosts in maintenance > > > >>> 2. Re-tag the ovirtmgmt on all the hosts > > > >>> 3. Re-tag the hosts on which the engine on > > > >>> 4. Activate the hosts - this should work well now since connectivity > > > >>> exist > > > >>> 5. Change the tag on ovirtmgmt on the engine to match the hosts' > > > >>> > > > >>> Simple and clear process. > > > >>> > > > >>> When the workaround of creating another DC was not possible since the > > > >>> system was already long in use and the need was re-tag of the network > > > >>> the > > > >>> above is what I've recommended in the, except that steps 4-5 where > > > >>> done > > > >>> as: > > > >>> 4. Stop the engine > > > >>> 5. Change the tag in the DB > > > >>> 6. Start the engine > > > >>> 7. Activate the hosts > > > >> > > > >> Sounds reasonable to me - but as far as I am aware this is not tightly > > > >> related to the $Subject, which is the post-boot ovirtmgmt definition. > > > >> > > > >> I've added a few details to > > > >> http://www.ovirt.org/Features/Normalized_ovirtmgmt_Initialization#Engine > > > >> and I would apreciate a review from someone with intimate Engine > > > >> know-how. > > > >> > > > >> Dan. > > > >> > > > > _______________________________________________ > > > > Arch mailing list > > > > [email protected] > > > > http://lists.ovirt.org/mailman/listinfo/arch > > > > > > > > > > > > > > > > _______________________________________________ > > Arch mailing list > > [email protected] > > http://lists.ovirt.org/mailman/listinfo/arch > > > _______________________________________________ > Arch mailing list > [email protected] > http://lists.ovirt.org/mailman/listinfo/arch > _______________________________________________ Arch mailing list [email protected] http://lists.ovirt.org/mailman/listinfo/arch
