On 05/12/2013 11:25 AM, Alon Bar-Lev wrote: > > > ----- Original Message ----- >> From: "Barak Azulay" <[email protected]> >> To: "Livnat Peer" <[email protected]> >> Cc: "Alon Bar-Lev" <[email protected]>, "arch" <[email protected]>, "Simon >> Grinberg" <[email protected]> >> Sent: Sunday, May 12, 2013 11:15:20 AM >> Subject: Re: feature suggestion: initial generation of management network >> >> >> >> ----- Original Message ----- >>> From: "Livnat Peer" <[email protected]> >>> To: "Moti Asayag" <[email protected]> >>> Cc: "arch" <[email protected]>, "Alon Bar-Lev" <[email protected]>, "Barak >>> Azulay" <[email protected]>, "Simon >>> Grinberg" <[email protected]> >>> Sent: Sunday, May 12, 2013 9:59:07 AM >>> Subject: Re: feature suggestion: initial generation of management network >>> >>> Thread Summary - >>> >>> 1. We all agree the automatic reboot after host installation is not >>> needed anymore and can be removed. >>> >>> 2. There is a vast agreement that we need to add a new VDSM verb for >>> reboot. >> >> I disagree with the above >> >> In addition to the fact that it will not work when VDSM is not responsive >> (when this action will be needed the most) > > If vdsm is unresponsive because of a fault in vdsm we can add a fail safe > mechanism for critical commands within vdsm. > And we can always fallback to the standard fencing in such cases. > > Can you please describe the scenario of which host-deploy succeeds and vdsm > is unresponsive? > > Current sequence: > 1. host-deploy + reboot - all via single ssh session. > > New sequence: > 1. host-deploy - via ssh. > 2. network setup - via vdsm. > 3. optional reboot - via vdsm. > > In the new sequence, vdsm must be responsive to accomplish (2), and if (2) > succeeds vdsm, again, must be responsive. >
+1, fully agree with the above. > Thanks! > >> >> >>> >>> 3. There was a suggestion to add a checkbox when adding a host to reboot >>> the host after installation, default would be not to reboot. (leaving >>> the option to reboot to the administrator). >>> >>> >>> If there is no objection we'll go with the above. >>> >>> Thanks, Livnat >>> >>> >>> On 05/07/2013 02:22 PM, Moti Asayag wrote: >>>> I stumbled upon few issues with the current design while implementing it: >>>> >>>> There seems to be a requirement to reboot the host after the installation >>>> is completed in order to assure the host is recoverable. >>>> >>>> Therefore, the building blocks of the installation process of 3.3 are: >>>> 1. host deploy which installs the host expect configuring its management >>>> network. >>>> 2. SetupNetwork (and CommitNetworkChanges) - for creating the management >>>> network >>>> on the host and persisting the network configuration. >>>> 3. Reboot the host - This is a missing piece. (engine has FenceVds >>>> command, >>>> but it >>>> requires the power management to be configured prior to the installation >>>> and might >>>> be irrelevant for hosts without PM.) >>>> >>>> So, there are couple of issues here: >>>> 1. How to reboot the host? >>>> 1.1. By exposing new RebootNode verb in VDSM and invoking it from the >>>> engine >>>> 1.2. By opening ssh dialog to the host in order to execute the reboot >>>> >>>> 2. When to perform the reboot? >>>> 2.1. After host deploy, by utilizing the host deploy to perform the >>>> reboot. >>>> It requires to configure the network by the monitor when the host is >>>> detected by the engine, >>>> detached from the installation flow. However it is a step toward the >>>> non-persistent network feature >>>> yet to be defined. >>>> 2.2. After setupNetwork is done and network was configured and persisted >>>> on >>>> the host. >>>> There is no special advantage from recoverable aspect, as setupNetwork is >>>> constantly >>>> used to persist the network configuration (by the complementary >>>> CommitNetworkChanges command). >>>> In case and network configuration fails, VDSM will revert to the last >>>> well >>>> known configuration >>>> - so connectivity with engine should be restored. Design wise, it fits to >>>> configure the management >>>> network as part of the installation sequence. >>>> If the network configuration fails in this context, the host status will >>>> be >>>> set to "InstallFailed" rather than "NonOperational", >>>> as might occur as a result of a failed setupNetwork command. >>>> >>>> >>>> Your inputs are welcome. >>>> >>>> Thanks, >>>> Moti >>>> ----- Original Message ----- >>>>> From: "Dan Kenigsberg" <[email protected]> >>>>> To: "Simon Grinberg" <[email protected]>, "Moti Asayag" >>>>> <[email protected]> >>>>> Cc: "arch" <[email protected]> >>>>> Sent: Tuesday, January 1, 2013 2:47:57 PM >>>>> Subject: Re: feature suggestion: initial generation of management >>>>> network >>>>> >>>>> On Thu, Dec 27, 2012 at 07:36:40AM -0500, Simon Grinberg wrote: >>>>>> >>>>>> >>>>>> ----- Original Message ----- >>>>>>> From: "Dan Kenigsberg" <[email protected]> >>>>>>> To: "Simon Grinberg" <[email protected]> >>>>>>> Cc: "arch" <[email protected]> >>>>>>> Sent: Thursday, December 27, 2012 2:14:06 PM >>>>>>> Subject: Re: feature suggestion: initial generation of management >>>>>>> network >>>>>>> >>>>>>> On Tue, Dec 25, 2012 at 09:29:26AM -0500, Simon Grinberg wrote: >>>>>>>> >>>>>>>> >>>>>>>> ----- Original Message ----- >>>>>>>>> From: "Dan Kenigsberg" <[email protected]> >>>>>>>>> To: "arch" <[email protected]> >>>>>>>>> Sent: Tuesday, December 25, 2012 2:27:22 PM >>>>>>>>> Subject: feature suggestion: initial generation of management >>>>>>>>> network >>>>>>>>> >>>>>>>>> Current condition: >>>>>>>>> ================== >>>>>>>>> The management network, named ovirtmgmt, is created during host >>>>>>>>> bootstrap. It consists of a bridge device, connected to the >>>>>>>>> network >>>>>>>>> device that was used to communicate with Engine (nic, bonding or >>>>>>>>> vlan). >>>>>>>>> It inherits its ip settings from the latter device. >>>>>>>>> >>>>>>>>> Why Is the Management Network Needed? >>>>>>>>> ===================================== >>>>>>>>> Understandably, some may ask why do we need to have a management >>>>>>>>> network - why having a host with IPv4 configured on it is not >>>>>>>>> enough. >>>>>>>>> The answer is twofold: >>>>>>>>> 1. In oVirt, a network is an abstraction of the resources >>>>>>>>> required >>>>>>>>> for >>>>>>>>> connectivity of a host for a specific usage. This is true for >>>>>>>>> the >>>>>>>>> management network just as it is for VM network or a display >>>>>>>>> network. >>>>>>>>> The network entity is the key for adding/changing nics and IP >>>>>>>>> address. >>>>>>>>> 2. In many occasions (such as small setups) the management >>>>>>>>> network is >>>>>>>>> used as a VM/display network as well. >>>>>>>>> >>>>>>>>> Problems in current connectivity: >>>>>>>>> ================================ >>>>>>>>> According to alonbl of ovirt-host-deploy fame, and with no >>>>>>>>> conflict >>>>>>>>> to >>>>>>>>> my own experience, creating the management network is the most >>>>>>>>> fragile, >>>>>>>>> error-prone step of bootstrap. >>>>>>>> >>>>>>>> +1, >>>>>>>> I've raise that repeatedly in the past, bootstrap should not create >>>>>>>> the management network but pick up the existing configuration and >>>>>>>> let the engine override later with it's own configuration if it >>>>>>>> differs , I'm glad that we finally get to that. >>>>>>>> >>>>>>>>> >>>>>>>>> Currently it always creates a bridged network (even if the DC >>>>>>>>> requires a >>>>>>>>> non-bridged ovirtmgmt), it knows nothing about the defined MTU >>>>>>>>> for >>>>>>>>> ovirtmgmt, it uses ping to guess on top of which device to build >>>>>>>>> (and >>>>>>>>> thus requires Vdsm-to-Engine reverse connectivity), and is the >>>>>>>>> sole >>>>>>>>> remaining user of the addNetwork/vdsm-store-net-conf scripts. >>>>>>>>> >>>>>>>>> Suggested feature: >>>>>>>>> ================== >>>>>>>>> Bootstrap would avoid creating a management network. Instead, >>>>>>>>> after >>>>>>>>> bootstrapping a host, Engine would send a getVdsCaps probe to the >>>>>>>>> installed host, receiving a complete picture of the network >>>>>>>>> configuration on the host. Among this picture is the device that >>>>>>>>> holds >>>>>>>>> the host's management IP address. >>>>>>>>> >>>>>>>>> Engine would send setupNetwork command to generate ovirtmgmt with >>>>>>>>> details devised from this picture, and according to the DC >>>>>>>>> definition >>>>>>>>> of >>>>>>>>> ovirtmgmt. For example, if Vdsm reports: >>>>>>>>> >>>>>>>>> - vlan bond4.3000 has the host's IP, configured to use dhcp. >>>>>>>>> - bond4 is comprises eth2 and eth3 >>>>>>>>> - ovirtmgmt is defined as a VM network with MTU 9000 >>>>>>>>> >>>>>>>>> then Engine sends the likes of: >>>>>>>>> setupNetworks(ovirtmgmt: {bridged=True, vlan=3000, iface=bond4, >>>>>>>>> bonding=bond4: {eth2,eth3}, MTU=9000) >>>>>>>> >>>>>>>> Just one comment here, >>>>>>>> In order to save time and confusion - if the ovirtmgmt is defined >>>>>>>> with default values meaning the user did not bother to touch it, >>>>>>>> let it pick up the VLAN configuration from the first host added in >>>>>>>> the Data Center. >>>>>>>> >>>>>>>> Otherwise, you may override the host VLAN and loose connectivity. >>>>>>>> >>>>>>>> This will also solve the situation many users encounter today. >>>>>>>> 1. The engine in on a host that actually has VLAN defined >>>>>>>> 2. The ovirtmgmt network was not updated in the DC >>>>>>>> 3. A host, with VLAN already defined is added - everything works >>>>>>>> fine >>>>>>>> 4. Any number of hosts are now added, again everything seems to >>>>>>>> work fine. >>>>>>>> >>>>>>>> But, now try to use setupNetworks, and you'll find out that you >>>>>>>> can't do much on the interface that contains the ovirtmgmt since >>>>>>>> the definition does not match. You can't sync (Since this will >>>>>>>> remove the VLAN and cause connectivity lose) you can't add more >>>>>>>> networks on top since it already has non-VLAN network on top >>>>>>>> according to the DC definition, etc. >>>>>>>> >>>>>>>> On the other hand you can't update the ovirtmgmt definition on the >>>>>>>> DC since there are clusters in the DC that use the network. >>>>>>>> >>>>>>>> The only workaround not involving DB hack to change the VLAN on the >>>>>>>> network is to: >>>>>>>> 1. Create new DC >>>>>>>> 2. Do not use the wizard that pops up to create your cluster. >>>>>>>> 3. Modify the ovirtmgmt network to have VLANs >>>>>>>> 4. Now create a cluster and add your hosts. >>>>>>>> >>>>>>>> If you insist on using the default DC and cluster then before >>>>>>>> adding the first host, create an additional DC and move the >>>>>>>> Default cluster over there. You may then change the network on the >>>>>>>> Default cluster and then move the Default cluster back >>>>>>>> >>>>>>>> Both are ugly. And should be solved by the proposal above. >>>>>>>> >>>>>>>> We do something similar for the Default cluster CPU level, where we >>>>>>>> set the intial level based on the first host added to the cluster. >>>>>>> >>>>>>> I'm not sure what Engine has for Default cluster CPU level. But I >>>>>>> have >>>>>>> reservation of the hysteresis in your proposal - after a host is >>>>>>> added, >>>>>>> the DC cannot forget ovirtmgmt's vlan. >>>>>>> >>>>>>> How about letting the admin edit ovirtmgmt's vlan in the DC level, >>>>>>> thus >>>>>>> rendering all hosts out-of-sync. The the admin could manually, or >>>>>>> through a script, or in the future through a distributed operation, >>>>>>> sync >>>>>>> all the hosts to the definition? >>>>>> >>>>>> Usually if you do that you will loose connectivity to the hosts. >>>>> >>>>> Yes, changing the management vlan id (or ip address) is never fun, and >>>>> requires out-of-band intervention. >>>>> >>>>>> I'm not insisting on the automatic adjustment of the ovirtmgmt network >>>>>> to >>>>>> match the hosts' (that is just a nice touch) we can take the allow edit >>>>>> approach. >>>>>> >>>>>> But allow to change VLAN on the ovirtmgmt network will indeed solve the >>>>>> issue I'm trying to solve while creating another issue of user >>>>>> expecting >>>>>> that we'll be able to re-tag the host from the engine side, which is >>>>>> challenging to do. >>>>>> >>>>>> On the other hand, if we allow to change the VLAN as long as the change >>>>>> matches the hosts' configuration, it will both solve the issue while >>>>>> not >>>>>> eluding the user to think that we really can solve the chicken and egg >>>>>> issue of re-tag the entire system. >>>>>> >>>>>> Now with the above ability you do get a flow to do the re-tag. >>>>>> 1. Place all the hosts in maintenance >>>>>> 2. Re-tag the ovirtmgmt on all the hosts >>>>>> 3. Re-tag the hosts on which the engine on >>>>>> 4. Activate the hosts - this should work well now since connectivity >>>>>> exist >>>>>> 5. Change the tag on ovirtmgmt on the engine to match the hosts' >>>>>> >>>>>> Simple and clear process. >>>>>> >>>>>> When the workaround of creating another DC was not possible since the >>>>>> system was already long in use and the need was re-tag of the network >>>>>> the >>>>>> above is what I've recommended in the, except that steps 4-5 where done >>>>>> as: >>>>>> 4. Stop the engine >>>>>> 5. Change the tag in the DB >>>>>> 6. Start the engine >>>>>> 7. Activate the hosts >>>>> >>>>> Sounds reasonable to me - but as far as I am aware this is not tightly >>>>> related to the $Subject, which is the post-boot ovirtmgmt definition. >>>>> >>>>> I've added a few details to >>>>> http://www.ovirt.org/Features/Normalized_ovirtmgmt_Initialization#Engine >>>>> and I would apreciate a review from someone with intimate Engine >>>>> know-how. >>>>> >>>>> Dan. >>>>> >>>> _______________________________________________ >>>> Arch mailing list >>>> [email protected] >>>> http://lists.ovirt.org/mailman/listinfo/arch >>>> >>>> >>> >>> >> _______________________________________________ >> Arch mailing list >> [email protected] >> http://lists.ovirt.org/mailman/listinfo/arch >> _______________________________________________ Arch mailing list [email protected] http://lists.ovirt.org/mailman/listinfo/arch
