----- Original Message -----
> From: "Livnat Peer" <[email protected]>
> To: "Simon Grinberg" <[email protected]>
> Cc: "Dan Kenigsberg" <[email protected]>, [email protected], "Orit Wasserman" 
> <[email protected]>, "Yuval M"
> <[email protected]>, "Laine Stump" <[email protected]>, "Limor Gavish" 
> <[email protected]>
> Sent: Sunday, January 13, 2013 1:53:23 PM
> Subject: Re: feature suggestion: migration network
> 
> On 01/10/2013 02:54 PM, Simon Grinberg wrote:
> > 
> > 
> > ----- Original Message -----
> >> From: "Dan Kenigsberg" <[email protected]>
> >> To: "Doron Fediuck" <[email protected]>
> >> Cc: "Simon Grinberg" <[email protected]>, "Orit Wasserman"
> >> <[email protected]>, "Laine Stump" <[email protected]>,
> >> "Yuval M" <[email protected]>, "Limor Gavish" <[email protected]>,
> >> [email protected], "Mark Wu"
> >> <[email protected]>
> >> Sent: Thursday, January 10, 2013 1:46:08 PM
> >> Subject: Re: feature suggestion: migration network
> >>
> >> On Thu, Jan 10, 2013 at 04:43:45AM -0500, Doron Fediuck wrote:
> >>>
> >>>
> >>> ----- Original Message -----
> >>>> From: "Simon Grinberg" <[email protected]>
> >>>> To: "Mark Wu" <[email protected]>, "Doron Fediuck"
> >>>> <[email protected]>
> >>>> Cc: "Orit Wasserman" <[email protected]>, "Laine Stump"
> >>>> <[email protected]>, "Yuval M" <[email protected]>, "Limor
> >>>> Gavish" <[email protected]>, [email protected], "Dan Kenigsberg"
> >>>> <[email protected]>
> >>>> Sent: Thursday, January 10, 2013 10:38:56 AM
> >>>> Subject: Re: feature suggestion: migration network
> >>>>
> >>>>
> >>>>
> >>>> ----- Original Message -----
> >>>>> From: "Mark Wu" <[email protected]>
> >>>>> To: "Dan Kenigsberg" <[email protected]>
> >>>>> Cc: "Simon Grinberg" <[email protected]>, "Orit Wasserman"
> >>>>> <[email protected]>, "Laine Stump" <[email protected]>,
> >>>>> "Yuval M" <[email protected]>, "Limor Gavish"
> >>>>> <[email protected]>,
> >>>>> [email protected]
> >>>>> Sent: Thursday, January 10, 2013 5:13:23 AM
> >>>>> Subject: Re: feature suggestion: migration network
> >>>>>
> >>>>> On 01/09/2013 03:34 AM, Dan Kenigsberg wrote:
> >>>>>> On Tue, Jan 08, 2013 at 01:23:02PM -0500, Simon Grinberg
> >>>>>> wrote:
> >>>>>>>
> >>>>>>> ----- Original Message -----
> >>>>>>>> From: "Yaniv Kaul" <[email protected]>
> >>>>>>>> To: "Dan Kenigsberg" <[email protected]>
> >>>>>>>> Cc: "Limor Gavish" <[email protected]>, "Yuval M"
> >>>>>>>> <[email protected]>, [email protected], "Simon Grinberg"
> >>>>>>>> <[email protected]>
> >>>>>>>> Sent: Tuesday, January 8, 2013 4:46:10 PM
> >>>>>>>> Subject: Re: feature suggestion: migration network
> >>>>>>>>
> >>>>>>>> On 08/01/13 15:04, Dan Kenigsberg wrote:
> >>>>>>>>> There's talk about this for ages, so it's time to have
> >>>>>>>>> proper
> >>>>>>>>> discussion
> >>>>>>>>> and a feature page about it: let us have a "migration"
> >>>>>>>>> network
> >>>>>>>>> role, and
> >>>>>>>>> use such networks to carry migration data
> >>>>>>>>>
> >>>>>>>>> When Engine requests to migrate a VM from one node to
> >>>>>>>>> another,
> >>>>>>>>> the
> >>>>>>>>> VM
> >>>>>>>>> state (Bios, IO devices, RAM) is transferred over a TCP/IP
> >>>>>>>>> connection
> >>>>>>>>> that is opened from the source qemu process to the
> >>>>>>>>> destination
> >>>>>>>>> qemu.
> >>>>>>>>> Currently, destination qemu listens for the incoming
> >>>>>>>>> connection
> >>>>>>>>> on
> >>>>>>>>> the
> >>>>>>>>> management IP address of the destination host. This has
> >>>>>>>>> serious
> >>>>>>>>> downsides: a "migration storm" may choke the destination's
> >>>>>>>>> management
> >>>>>>>>> interface; migration is plaintext and ovirtmgmt includes
> >>>>>>>>> Engine
> >>>>>>>>> which
> >>>>>>>>> sits may sit the node cluster.
> >>>>>>>>>
> >>>>>>>>> With this feature, a cluster administrator may grant the
> >>>>>>>>> "migration"
> >>>>>>>>> role to one of the cluster networks. Engine would use that
> >>>>>>>>> network's IP
> >>>>>>>>> address on the destination host when it requests a
> >>>>>>>>> migration
> >>>>>>>>> of
> >>>>>>>>> a
> >>>>>>>>> VM.
> >>>>>>>>> With proper network setup, migration data would be
> >>>>>>>>> separated
> >>>>>>>>> to
> >>>>>>>>> that
> >>>>>>>>> network.
> >>>>>>>>>
> >>>>>>>>> === Benefit to oVirt ===
> >>>>>>>>> * Users would be able to define and dedicate a separate
> >>>>>>>>> network
> >>>>>>>>> for
> >>>>>>>>>     migration. Users that need quick migration would use
> >>>>>>>>>     nics
> >>>>>>>>>     with
> >>>>>>>>>     high
> >>>>>>>>>     bandwidth. Users who want to cap the bandwidth
> >>>>>>>>>     consumed by
> >>>>>>>>>     migration
> >>>>>>>>>     could define a migration network over nics with
> >>>>>>>>>     bandwidth
> >>>>>>>>>     limitation.
> >>>>>>>>> * Migration data can be limited to a separate network,
> >>>>>>>>> that
> >>>>>>>>> has
> >>>>>>>>> no
> >>>>>>>>>     layer-2 access from Engine
> >>>>>>>>>
> >>>>>>>>> === Vdsm ===
> >>>>>>>>> The "migrate" verb should be extended with an additional
> >>>>>>>>> parameter,
> >>>>>>>>> specifying the address that the remote qemu process should
> >>>>>>>>> listen
> >>>>>>>>> on. A
> >>>>>>>>> new argument is to be added to the currently-defined
> >>>>>>>>> migration
> >>>>>>>>> arguments:
> >>>>>>>>> * vmId: UUID
> >>>>>>>>> * dst: management address of destination host
> >>>>>>>>> * dstparams: hibernation volumes definition
> >>>>>>>>> * mode: migration/hibernation
> >>>>>>>>> * method: rotten legacy
> >>>>>>>>> * ''New'': migration uri, according to
> >>>>>>>>> http://libvirt.org/html/libvirt-libvirt.html#virDomainMigrateToURI2
> >>>>>>>>> such as tcp://<ip of migration network on remote node>
> >>>>>>>>>
> >>>>>>>>> === Engine ===
> >>>>>>>>> As usual, complexity lies here, and several changes are
> >>>>>>>>> required:
> >>>>>>>>>
> >>>>>>>>> 1. Network definition.
> >>>>>>>>> 1.1 A new network role - not unlike "display network"
> >>>>>>>>> should
> >>>>>>>>> be
> >>>>>>>>>       added.Only one migration network should be defined
> >>>>>>>>>       on a
> >>>>>>>>>       cluster.
> >>>>>>> We are considering multiple display networks already, then
> >>>>>>> why
> >>>>>>> not
> >>>>>>> the
> >>>>>>> same for migration?
> >>>>>> What is the motivation of having multiple migration networks?
> >>>>>> Extending
> >>>>>> the bandwidth (and thus, any network can be taken when
> >>>>>> needed) or
> >>>>>> data separation (and thus, a migration network should be
> >>>>>> assigned
> >>>>>> to
> >>>>>> each VM in the cluster)? Or another morivation with
> >>>>>> consequence?
> >>>>> My suggestion is making the migration network role determined
> >>>>> dynamically on each migrate.  If we only define one migration
> >>>>> network
> >>>>> per cluster,
> >>>>> the migration storm could happen to that network. It could
> >>>>> cause
> >>>>> some
> >>>>> bad impact on VM applications.  So I think engine could choose
> >>>>> the
> >>>>> network which
> >>>>> has lower traffic load on migration, or leave the choice to
> >>>>> user.
> >>>>
> >>>> Dynamic migration selection is indeed desirable but only from
> >>>> migration networks - migration traffic is insecure so it's
> >>>> undesirable to have it mixed with VM traffic unless permitted by
> >>>> the
> >>>> admin by marking this network as migration network.
> >>>>
> >>>> To clarify what I've meant in the previous response to Livnat -
> >>>> When
> >>>> I've said "...if the customer due to the unsymmetrical nature of
> >>>> most bonding modes prefers to use muplitple networks for
> >>>> migration
> >>>> and will ask us to optimize migration across these..."
> >>>>
> >>>> But the dynamic selection should be based on SLA which the above
> >>>> is
> >>>> just part:
> >>>> 1. Need to consider tenant traffic segregation rules = security
> >>>> 2. SLA contracts
> >>
> >> We could devise a complex logic of assigning each Vm a pool of
> >> applicable migration networks, where one of them is chosen by
> >> Engine
> >> upon migration startup.
> >>
> >> I am, however, not at all sure that extending the migration
> >> bandwidth
> >> by
> >> means of multiple migration networks is worth the design hassle
> >> and
> >> the
> >> GUI noise. A simpler solution would be to build a single migration
> >> network on top of a fat bond, tweaked by a fine-tuned SLA.
> > 
> > Except for mod-4 most bonding modes are either optimized for
> > outbound optimization or inbound - not both. It's far from
> > optimal.
> > And you are forgetting the other reason I've raised, like isolation
> > of tenants traffic and not just from SLA reasons.
> > 
> 
> Why do we need isolation of tenants migration traffic if not for SLA
> reasons?


Security (migration is not encrypted) and segregation of resources (poor 
man's/simple stupid SLA or until you have real SLA) and as said before better 
utilization of resources (Bond are asymmetric). SLA in our discussion is 
maintained via traffic shaping and this has it's performance impact, the first 
3 are not.  

Another reason would be to use with external network providers like CISCO or 
Mellanox who already have traffic control. There you may well easily have 
dedicated networks per tenant, including migration network (as part of a tenant 
dedicated resources and segregation of resources) 


> 
> > Even from pure active - active redundancy you may want to have more
> > then one or asymmetrical hosts
> 
> That's again going back to SLA policies and not specific for the
> migration network.
> 
> > Example.
> > We have a host with 3 nics - you dedicate each for management,
> > migration, storage - respectively. But if the migration fails, you
> > want the engagement network to become your migration
> > (automatically)
> > 
> 
> OR you may not want that.
> That's a policy for handling network roles, not related specifically
> to
> migration network.

right, but there is a chicken and egg thing here
Unless you have multiple migration networks, you won't be able to implement the 
above
If you implement the above without pre-defining multiple networks that are 
allowed to act as migration networks, the implementation may be more complex. 

> 
> 
> > Another:
> > A large host with many nics and smaller host with less - as long as
> > this a rout between the migration and management networks you
> > could think on a scenario where on the larger host you have
> > separate networks for each role while on the smaller you have a
> > single network assuming both rolls.
> > 
> 
> I'm not sure this is the main use case and if we want to make the
> general flow complicated because of exotic use cases.

What I'm trying to say here is:
Please do not look at each use case separately, I agree that estimating each 
one by one may lead you to say: This one is not worth it, and the other on 
stand alone not worth it, and so on. But looking at everything put together it 
accumulates.  

> 
> Maybe what you are looking for is override on host level to network
> roles. Not sure how useful this is though.

Maybe,
I've already suggested to allow override on per migration bases 

> > Other examples can be found.
> > 
> 
> If you have some main use cases I would love to here them maybe they
> can
> make the requirement more clear.

Gave some above,
I think for the immediate terms the most compelling is the external network 
provider use case, where you want to allow the external network management to 
rout/shape the traffic per tenant, something that will be hard to do if all is 
aggregated on the host.
 
But coming to think of it, I like more and more the idea of having migration 
network as part of the VM configuration. It's both simple to do now and later 
add logic on top if required, and VDSM supports that already now. 

So:
1. Have a default migration network per cluster (default is the management 
network as before)
2. This is the default migration network for all VMs created in that cluster
3. Allow in VM properties to override this (Tenant use case, and supports the 
external network manager use case)
4. Allow from the migration network to override as well. 

Simple, powerful, flexible, while the logic is not complicated since the engine 
has nothing to decide - everything is orchestrated by the admin while initial 
out of the box setup is very simple (one migration network for all which is by 
default the management network). 

Later you may apply policies on top of this. 

Thoughts? 

> 
> > It's really not just one reason to support more then one migration
> > network or display network or storage or any other 'facility'
> > network. Any facility network may apply for more then one on a
> > cluster.
> > 
> 
> I'm not sure display can be on the same bucket as migration
> management
> and storage.

I think it can in the tenant use case, but I will be happy to get a solution 
like the above (have a default network per cluster and allow to override per VM)


> 
> > 
> >>
> >>>>
> >>>> If you keep 2, migration storms mitigation is granted. But you
> >>>> are
> >>>> right that another feature required for #2 above is to control
> >>>> the
> >>>> migration bandwidth (BW) per migration. We had discussion in the
> >>>> past for VDSM to do dynamic calculation based on f(Line Speed,
> >>>> Max
> >>>> Migration BW, Max allowed per VM, Free BW, number of migrating
> >>>> machines) when starting migration. (I actually wanted to do so
> >>>> years
> >>>> ago, but never got to that - one of those things you always
> >>>> postpone
> >>>> to when you'll find the time). We did not think that the engine
> >>>> should provide some, but coming to think of it, you are right
> >>>> and
> >>>> it
> >>>> makes sense. For SLA - Max per VM + Min guaranteed should be
> >>>> provided by the engine to maintain SLA. And it's up to the
> >>>> engine
> >>>> not to VMs with Min-Guaranteed x number of concurrent migrations
> >>>> will exceed Max Migration BW.
> >>>>
> >>>> Dan this is way too much for initial implementation, but don't
> >>>> you
> >>>> think we should at least add place holders in the migration API?
> >>
> >> In my opinion this should wait for another feature. For each VM,
> >> I'd
> >> like to see a means to define the SLA of each of its vNIC. When we
> >> have
> >> that, we should similarly define how much bandwidth does it have
> >> for
> >> migration
> >>
> >>>> Maybe Doron can assist with the required verbs.
> >>>>
> >>>> (P.S., I don't want to alarm but we may need SLA parameters for
> >>>> setupNetworks as well :) unless we want these as separate API
> >>>> tough
> >>>> it means more calls during set up)
> >>
> >> Exactly - when we have a migration network concept, and when we
> >> have
> >> general network SLA defition, we could easily apply the latter on
> >> the
> >> former.
> >>
> >>>>
> >>>
> >>> As with other resources the bare minimum are usually MIN capacity
> >>> and
> >>> MAX to avoid choking of other tenants / VMs. In this context we
> >>> may
> >>> need
> >>> to consider other QoS elements (delays, etc) but indeed it can be
> >>> an additional
> >>> limitation on top of the basic one.
> >>>
> >>
> > _______________________________________________
> > Arch mailing list
> > [email protected]
> > http://lists.ovirt.org/mailman/listinfo/arch
> > 
> 
> 
_______________________________________________
Arch mailing list
[email protected]
http://lists.ovirt.org/mailman/listinfo/arch

Reply via email to