Re: Guest bridge setup variations

2009-12-16 Thread Arnd Bergmann
On Wednesday 16 December 2009, Leonid Grossman wrote:
   3. Doing the bridging in the NIC using macvlan in passthrough
   mode. This lowers the CPU utilization further compared to 2,
   at the expense of limiting throughput by the performance of
   the PCIe interconnect to the adapter. Whether or not this
   is a win is workload dependent. 
 
 This is certainly true today for pci-e 1.1 and 2.0 devices, but 
 as NICs move to pci-e 3.0 (while remaining almost exclusively dual port
 10GbE for a long while), 
 EVB internal bandwidth will significantly exceed external bandwidth.
 So, #3 can become a win for most inter-guest workloads.

Right, it's also hardware dependent, but it usually comes down
to whether it's cheaper to spend CPU cycles or to spend IO bandwidth.

I would be surprised if all future machines with PCIe 3.0 suddenly have
a huge surplus of bandwidth but no CPU to keep up with that.

   Access controls now happen
   in the NIC. Currently, this is not supported yet, due to lack of
   device drivers, but it will be an important scenario in the future
   according to some people.
 
 Actually, x3100 10GbE drivers support this today via sysfs interface to
 the host driver 
 that can choose to control VEB tables (and therefore MAC addresses, vlan
 memberships, etc. for all passthru interfaces behind the VEB).

Ok, I didn't know about that.

 OF course a more generic vendor-independent interface will be important
 in the future.

Right. I hope we can come up with something soon. I'll have a look at
what your driver does and see if that can be abstracted in some way.
I expect that if we can find an interface between the kernel and device
driver for two or three NIC implementations that it will be good enough
to adapt to everyone else as well.

Arnd 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


RE: Guest bridge setup variations

2009-12-16 Thread Leonid Grossman


 -Original Message-
 From: Arnd Bergmann [mailto:a...@arndb.de]
 Sent: Wednesday, December 16, 2009 6:16 AM
 To: virtualization@lists.linux-foundation.org
 Cc: Leonid Grossman; qemu-de...@nongnu.org
 Subject: Re: Guest bridge setup variations
 
 On Wednesday 16 December 2009, Leonid Grossman wrote:
3. Doing the bridging in the NIC using macvlan in passthrough
mode. This lowers the CPU utilization further compared to 2,
at the expense of limiting throughput by the performance of
the PCIe interconnect to the adapter. Whether or not this
is a win is workload dependent.
 
  This is certainly true today for pci-e 1.1 and 2.0 devices, but
  as NICs move to pci-e 3.0 (while remaining almost exclusively dual
 port
  10GbE for a long while),
  EVB internal bandwidth will significantly exceed external bandwidth.
  So, #3 can become a win for most inter-guest workloads.
 
 Right, it's also hardware dependent, but it usually comes down
 to whether it's cheaper to spend CPU cycles or to spend IO bandwidth.
 
 I would be surprised if all future machines with PCIe 3.0 suddenly
have
 a huge surplus of bandwidth but no CPU to keep up with that.
 
Access controls now happen
in the NIC. Currently, this is not supported yet, due to lack of
device drivers, but it will be an important scenario in the
 future
according to some people.
 
  Actually, x3100 10GbE drivers support this today via sysfs interface
 to
  the host driver
  that can choose to control VEB tables (and therefore MAC addresses,
 vlan
  memberships, etc. for all passthru interfaces behind the VEB).
 
 Ok, I didn't know about that.
 
  OF course a more generic vendor-independent interface will be
 important
  in the future.
 
 Right. I hope we can come up with something soon. I'll have a look at
 what your driver does and see if that can be abstracted in some way.

Sounds good, please let us know if looking at the code/documentation
will suffice or you need a couple cards to go along with the code.

 I expect that if we can find an interface between the kernel and
device
 driver for two or three NIC implementations that it will be good
enough
 to adapt to everyone else as well.

The interface will likely evolve along with EVB standards and other
developments, but 
initial implementation can be pretty basic (and vendor-independent). 
Early IOV NIC deployments can benefit from an interface that sets couple
VF parameters missing in legacy NIC interface - things like bandwidth
limit and list of MAC addresses (since setting a NIC in promisc mode
doesn't work well for VEB, it is currently forced to learn the addresses
it is configured for). 
The interface can also include querying IOV NIC capabilities like number
of VFs, support for VEB and/or VEPA mode, etc as well as getting VF
stats and MAC/VLAN tables - all in all, it is not a long list.


 
   Arnd
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


RE: Guest bridge setup variations

2009-12-15 Thread Leonid Grossman
  -Original Message-
  From: virtualization-boun...@lists.linux-foundation.org
  [mailto:virtualization-boun...@lists.linux-foundation.org] On Behalf
 Of
  Arnd Bergmann
  Sent: Tuesday, December 08, 2009 8:08 AM
  To: virtualization@lists.linux-foundation.org
  Cc: qemu-de...@nongnu.org
  Subject: Guest bridge setup variations
 
  As promised, here is my small writeup on which setups I feel
  are important in the long run for server-type guests. This
  does not cover -net user, which is really for desktop kinds
  of applications where you do not want to connect into the
  guest from another IP address.
 
  I can see four separate setups that we may or may not want to
  support, the main difference being how the forwarding between
  guests happens:
 
  1. The current setup, with a bridge and tun/tap devices on ports
  of the bridge. This is what Gerhard's work on access controls is
  focused on and the only option where the hypervisor actually
  is in full control of the traffic between guests. CPU utilization
 should
  be highest this way, and network management can be a burden,
  because the controls are done through a Linux, libvirt and/or
 Director
  specific interface.
 
  2. Using macvlan as a bridging mechanism, replacing the bridge
  and tun/tap entirely. This should offer the best performance on
  inter-guest communication, both in terms of throughput and
  CPU utilization, but offer no access control for this traffic at
all.
  Performance of guest-external traffic should be slightly better
  than bridge/tap.
 
  3. Doing the bridging in the NIC using macvlan in passthrough
  mode. This lowers the CPU utilization further compared to 2,
  at the expense of limiting throughput by the performance of
  the PCIe interconnect to the adapter. Whether or not this
  is a win is workload dependent. 

This is certainly true today for pci-e 1.1 and 2.0 devices, but 
as NICs move to pci-e 3.0 (while remaining almost exclusively dual port
10GbE for a long while), 
EVB internal bandwidth will significantly exceed external bandwidth.
So, #3 can become a win for most inter-guest workloads.

  Access controls now happen
  in the NIC. Currently, this is not supported yet, due to lack of
  device drivers, but it will be an important scenario in the future
  according to some people.

Actually, x3100 10GbE drivers support this today via sysfs interface to
the host driver 
that can choose to control VEB tables (and therefore MAC addresses, vlan
memberships, etc. for all passthru interfaces behind the VEB).
OF course a more generic vendor-independent interface will be important
in the future.

 
  4. Using macvlan for actual VEPA on the outbound interface.
  This is mostly interesting because it makes the network access
  controls visible in an external switch that is already managed.
  CPU utilization and guest-external throughput should be
  identical to 3, but inter-guest latency can only be worse because
  all frames go through the external switch.
 
  In case 2 through 4, we have the choice between macvtap and
  the raw packet interface for connecting macvlan to qemu.
  Raw sockets are better tested right now, while macvtap has
  better permission management (i.e. it does not require
  CAP_NET_ADMIN). Neither one is upstream though at the
  moment. The raw driver only requires qemu patches, while
  macvtap requires both a new kernel driver and a trivial change
  in qemu.
 
  In all four cases, vhost-net could be used to move the workload
  from user space into the kernel, which may be an advantage.
  The decision for or against vhost-net is entirely independent of
  the other decisions.
 
  Arnd
  ___
  Virtualization mailing list
  Virtualization@lists.linux-foundation.org
  https://lists.linux-foundation.org/mailman/listinfo/virtualization
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


RE: Guest bridge setup variations

2009-12-10 Thread Fischer, Anna
 Subject: Guest bridge setup variations
 
 As promised, here is my small writeup on which setups I feel
 are important in the long run for server-type guests. This
 does not cover -net user, which is really for desktop kinds
 of applications where you do not want to connect into the
 guest from another IP address.
 
 I can see four separate setups that we may or may not want to
 support, the main difference being how the forwarding between
 guests happens:
 
 1. The current setup, with a bridge and tun/tap devices on ports
 of the bridge. This is what Gerhard's work on access controls is
 focused on and the only option where the hypervisor actually
 is in full control of the traffic between guests. CPU utilization should
 be highest this way, and network management can be a burden,
 because the controls are done through a Linux, libvirt and/or Director
 specific interface.
 
 2. Using macvlan as a bridging mechanism, replacing the bridge
 and tun/tap entirely. This should offer the best performance on
 inter-guest communication, both in terms of throughput and
 CPU utilization, but offer no access control for this traffic at all.
 Performance of guest-external traffic should be slightly better
 than bridge/tap.
 
 3. Doing the bridging in the NIC using macvlan in passthrough
 mode. This lowers the CPU utilization further compared to 2,
 at the expense of limiting throughput by the performance of
 the PCIe interconnect to the adapter. Whether or not this
 is a win is workload dependent. Access controls now happen
 in the NIC. Currently, this is not supported yet, due to lack of
 device drivers, but it will be an important scenario in the future
 according to some people.

Can you differentiate this option from typical PCI pass-through mode? It is not 
clear to me where macvlan sits in a setup where the NIC does bridging.

Typically, in a PCI pass-through configuration, all configuration goes through 
the physical function device driver (and all data goes directly to the NIC). 
Are you suggesting to use macvlan as a common configuration layer that then 
configures the underlying NIC? I could see some benefit in such a model, though 
I am not certain I understand you correctly.

Thanks,
Anna
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: Guest bridge setup variations

2009-12-10 Thread Arnd Bergmann
On Thursday 10 December 2009, Fischer, Anna wrote:
  
  3. Doing the bridging in the NIC using macvlan in passthrough
  mode. This lowers the CPU utilization further compared to 2,
  at the expense of limiting throughput by the performance of
  the PCIe interconnect to the adapter. Whether or not this
  is a win is workload dependent. Access controls now happen
  in the NIC. Currently, this is not supported yet, due to lack of
  device drivers, but it will be an important scenario in the future
  according to some people.
 
 Can you differentiate this option from typical PCI pass-through mode?
 It is not clear to me where macvlan sits in a setup where the NIC does
 bridging.

In this setup (hypothetical so far, the code doesn't exist yet), we use
the configuration logic of macvlan, but not the forwarding. This also
doesn't do PCI pass-through but instead gives all the logical interfaces
to the host, using only the bridging and traffic separation capabilities
of the NIC, but not the PCI-separation.

Intel calls this mode VMDq, as opposed to SR-IOV, which implies
the assignment of the adapter to a guest.

It was confusing of me to call it passthrough above, sorry for that.

 Typically, in a PCI pass-through configuration, all configuration goes
 through the physical function device driver (and all data goes directly
 to the NIC). Are you suggesting to use macvlan as a common
 configuration layer that then configures the underlying NIC?
 I could see some benefit in such a model, though I am not certain I
 understand you correctly.

This is something I also have been thinking about, but it is not what
I was referring to above. I think it would be good to keep the three
cases (macvlan, VMDq, SR-IOV) as similar as possible from the user
perspective, so using macvlan as an infrastructure for all of them
sounds reasonable to me.

The difference between VMDq and SR-IOV in that case would be
that the former uses a virtio-net driver in the guest and a hardware
driver in the host, while the latter uses a hardware driver in the guest
only. The data flow on these two would be identical though, while
in the classic macvlan the data forwarding decisions are made in
the host kernel.

Arnd
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: Guest bridge setup variations

2009-12-10 Thread Alexander Graf

On 10.12.2009, at 15:18, Arnd Bergmann wrote:

 On Thursday 10 December 2009, Fischer, Anna wrote:
 
 3. Doing the bridging in the NIC using macvlan in passthrough
 mode. This lowers the CPU utilization further compared to 2,
 at the expense of limiting throughput by the performance of
 the PCIe interconnect to the adapter. Whether or not this
 is a win is workload dependent. Access controls now happen
 in the NIC. Currently, this is not supported yet, due to lack of
 device drivers, but it will be an important scenario in the future
 according to some people.
 
 Can you differentiate this option from typical PCI pass-through mode?
 It is not clear to me where macvlan sits in a setup where the NIC does
 bridging.
 
 In this setup (hypothetical so far, the code doesn't exist yet), we use
 the configuration logic of macvlan, but not the forwarding. This also
 doesn't do PCI pass-through but instead gives all the logical interfaces
 to the host, using only the bridging and traffic separation capabilities
 of the NIC, but not the PCI-separation.
 
 Intel calls this mode VMDq, as opposed to SR-IOV, which implies
 the assignment of the adapter to a guest.
 
 It was confusing of me to call it passthrough above, sorry for that.
 
 Typically, in a PCI pass-through configuration, all configuration goes
 through the physical function device driver (and all data goes directly
 to the NIC). Are you suggesting to use macvlan as a common
 configuration layer that then configures the underlying NIC?
 I could see some benefit in such a model, though I am not certain I
 understand you correctly.
 
 This is something I also have been thinking about, but it is not what
 I was referring to above. I think it would be good to keep the three
 cases (macvlan, VMDq, SR-IOV) as similar as possible from the user
 perspective, so using macvlan as an infrastructure for all of them
 sounds reasonable to me.

Oh, so you'd basically do -net vt-d,if=eth0 and the rest would automatically 
work? That's a pretty slick idea!

Alex
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] Re: Guest bridge setup variations

2009-12-10 Thread Arnd Bergmann
On Thursday 10 December 2009 19:14:28 Alexander Graf wrote:
  This is something I also have been thinking about, but it is not what
  I was referring to above. I think it would be good to keep the three
  cases (macvlan, VMDq, SR-IOV) as similar as possible from the user
  perspective, so using macvlan as an infrastructure for all of them
  sounds reasonable to me.
 
 Oh, so you'd basically do -net vt-d,if=eth0 and the rest would
 automatically work? That's a pretty slick idea!

I was only referring to how they get set up under the covers, e.g.
creating the virtual device, configuring the MAC address etc, not
the qemu side, but that would probably make sense as well.

Or even better, qemu should probably not even know the difference
between macvlan and VT-d. In both cases, it would open a macvtap
file, but for VT-d adapters, the macvlan infrastructure can
use hardware support, much in the way that VLAN tagging gets
offloaded automatically to the hardware.

Arnd 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] Re: Guest bridge setup variations

2009-12-10 Thread Alexander Graf

On 10.12.2009, at 21:20, Arnd Bergmann wrote:

 On Thursday 10 December 2009 19:14:28 Alexander Graf wrote:
 This is something I also have been thinking about, but it is not what
 I was referring to above. I think it would be good to keep the three
 cases (macvlan, VMDq, SR-IOV) as similar as possible from the user
 perspective, so using macvlan as an infrastructure for all of them
 sounds reasonable to me.
 
 Oh, so you'd basically do -net vt-d,if=eth0 and the rest would
 automatically work? That's a pretty slick idea!
 
 I was only referring to how they get set up under the covers, e.g.
 creating the virtual device, configuring the MAC address etc, not
 the qemu side, but that would probably make sense as well.
 
 Or even better, qemu should probably not even know the difference
 between macvlan and VT-d. In both cases, it would open a macvtap
 file, but for VT-d adapters, the macvlan infrastructure can
 use hardware support, much in the way that VLAN tagging gets
 offloaded automatically to the hardware.

Well, vt-d means we use PCI passthrough. But it probably makes sense to have a 
-net bridge,if=eth0 that automatically uses whatever is around (pci 
passthrough, macvtap, anthony's bridge script, etc.). Of course we should 
leverage vmdq for macvtap whenever available :-).

Alex
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization